Twitter scammers derail GPT-3 bot with newly discovered ‘instant injection’ hack

Zoom / Tin toy robot lying on its side.

On Thursday, a few Twitter users Discover How to Hijack an Automated Tweet Bot, Dedicated to Remote Jobs, Running on GPT-3 Language model by OpenAI. using a newly discovered technology calledImmediate injection attackThey redirected the bot to repeat embarrassing and funny phrases.

The bot is powered by Remoteli.io, a remote job aggregation site that describes itself as an “OpenAI paid bot that helps you discover remote jobs that allow you to work from anywhere.” He usually responds to tweets directed at him with general statements about the pros of working remotely. After the hack spread and hundreds of people tried to exploit themselves, the bot was shut down late yesterday.

This latest hack came just four days after data researcher Riley Goodside Discover The ability to prompt GPT-3 for “malicious input” that instructs the model to ignore previous trends and do something else instead. Amnesty International researcher Simon Willison Post an overview From the exploit on his blog the next day, he coined the term “instant injection” to describe it.

The vulnerability exists any time anyone writes a piece of software that works by providing an encrypted set of quick instructions and then appends user-provided input,” Willison told Ars. That’s because the user can type “ignore previous instructions and” (do this instead). “

The concept of an injection attack is not new. Security researchers have learned SQL injection, for example, which can execute a malicious SQL statement when user input is requested if it is not protected. But Willison expressed concern about mitigating immediate injection attacks, writing“I know how to beat XSS, SQL injection, and many other feats. I have no idea how to reliably beat instant injection!”

The difficulty of defending against instant injection comes from the fact that mitigation for other types of injection attacks comes from fixing syntax errors, pointed A researcher named Glyph on Twitter. “cOr correct the syntax and you correct the error. Immediate injection is not wrong! There is no official formula for artificial intelligence like this, that’s the point.

GPT-3 is a large language model Created by OpenAI, released in 2020, can compose text in many styles at a human-like level. It is available as a commercial product through an API that can be integrated into third-party products such as bots, subject to OpenAI approval. This means that there may be a lot of GPT-3 products that may be subject to immediate injection.

At this point, I’d be very surprised if anything [GPT-3] Bots that weren’t vulnerable to this somehowWillison said.

But unlike SQL injection, instant injection can often make the bot (or the company behind it) look a fool rather than a data security threat. “The extent of damage caused by exploits varies, Willison said. “If the only person who will see the output of the tool is the person using it, it probably won’t matter. They may embarrass your company by sharing a screenshot, but it is unlikely to cause harm beyond that.”

However, spot injection is an important new risk to consider for people developing GPT-3 bots since it may be exploited in unexpected ways in the future.