Hacker plants false memories in ChatGPT to steal user data in perpetuity

MEMORY PROBLEMS —

Emails, documents, and other untrusted content can plant malicious memories.

Dan Goodin
– Sep 24, 2024 8:56 pm UTC

Hacker plants false memories in ChatGPT to steal user data in perpetuity — Getty Images

When security researcher Johann Rehberger recently reported a vulnerability in ChatGPT that allowed attackers to store false information and malicious instructions in a user’s long-term memory settings, OpenAI summarily closed the inquiry, labeling the flaw a safety issue, not, technically speaking, a security concern.

So Rehberger did what all good researchers do: He created a proof-of-concept exploit that used the vulnerability to exfiltrate all user input in perpetuity. OpenAI engineers took notice and issued a partial fix earlier this month.

Strolling down memory lane

The vulnerability abused long-term conversation memory, a feature OpenAI began testing in February and made more broadly available in September. Memory with ChatGPT stores information from previous conversations and uses it as context in all future conversations. That way, the LLM can be aware of details such as a user’s age, gender, philosophical beliefs, and pretty much anything else, so those details don’t have to be inputted during each conversation.

Within three months of the rollout, Rehberger found that memories could be created and permanently stored through indirect prompt injection, an AI exploit that causes an LLM to follow instructions from untrusted content such as emails, blog posts, or documents. The researcher demonstrated how he could trick ChatGPT into believing a targeted user was 102 years old, lived in the Matrix, and insisted Earth was flat and the LLM would incorporate that information to steer all future conversations. These false memories could be planted by storing files in Google Drive or Microsoft OneDrive, uploading images, or browsing a site like Bing—all of which could be created by a malicious attacker.

Rehberger privately reported the finding to OpenAI in May. That same month, the company closed the report ticket. A month later, the researcher submitted a new disclosure statement. This time, he included a PoC that caused the ChatGPT app for macOS to send a verbatim copy of all user input and ChatGPT output to a server of his choice. All a target needed to do was instruct the LLM to view a web link that hosted a malicious image. From then on, all input and output to and from ChatGPT was sent to the attacker’s website.

ChatGPT: Hacking Memories with Prompt Injection – POC

“What is really interesting is this is memory-persistent now,” Rehberger said in the above video demo. “The prompt injection inserted a memory into ChatGPT’s long-term storage. When you start a new conversation, it actually is still exfiltrating the data.”

The attack isn’t possible through the ChatGPT web interface, thanks to an API OpenAI rolled out last year.

While OpenAI has introduced a fix that prevents memories from being abused as an exfiltration vector, the researcher said, untrusted content can still perform prompt injections that cause the memory tool to store long-term information planted by a malicious attacker.

LLM users who want to prevent this form of attack should pay close attention during sessions for output that indicates a new memory has been added. They should also regularly review stored memories for anything that may have been planted by untrusted sources. OpenAI provides guidance here for managing the memory tool and specific memories stored in it. Company representatives didn’t respond to an email asking about its efforts to prevent other hacks that plant false memories.

News Week
Magazine PRO

Company

Hacker plants false memories in ChatGPT to steal user data in perpetuity

MEMORY PROBLEMS —

Emails, documents, and other untrusted content can plant malicious memories.

Strolling down memory lane

LEAVE A REPLY Cancel reply

Subscribe

15 Best USB-C Cables (2024): For iPhones, Android Phones, Tablets, and Laptops

32 Delightful Gift Ideas for Music Lovers and Audiophiles

JubileeTV Review: Video Calls and Remote Support for Elders

18 Giftable Subscription Boxes (2024), Tested and Reviewed

The Best Cookbooks of 2024 (So Far): Big Dip Energy, Koreaworld, and More

More like this
Related

15 Best USB-C Cables (2024): For iPhones, Android Phones, Tablets, and Laptops

32 Delightful Gift Ideas for Music Lovers and Audiophiles

JubileeTV Review: Video Calls and Remote Support for Elders

18 Giftable Subscription Boxes (2024), Tested and Reviewed

About us

Company

The latest

15 Best USB-C Cables (2024): For iPhones, Android Phones, Tablets, and Laptops

32 Delightful Gift Ideas for Music Lovers and Audiophiles

JubileeTV Review: Video Calls and Remote Support for Elders

Subscribe

News WeekMagazine PRO

Company

Hacker plants false memories in ChatGPT to steal user data in perpetuity

MEMORY PROBLEMS —

Emails, documents, and other untrusted content can plant malicious memories.

Strolling down memory lane

LEAVE A REPLY Cancel reply

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

News Week
Magazine PRO

More like this
Related