Had a conversation with an analytical yet non-techie friend a few weeks back about AI.
We talked about productive day-to-day use cases, like summarizing emails. All good and dandy, but I enjoy witnessing the shock when I tell someone about a vulnerability they were unaware of. I'm pretty sure my buddy's head blew up when prompt injection finally clicked! 🕶️ 📨💥
Anyways, Immersive Labs put out a report about prompt injection earlier this year. They're running a nifty page at https://lnkd.in/e9x6sqJD where you can learn interactively how a simple prompt injection attack might occur. It's like a game, but research! You essentially chat with the model, attempting to uncover a "secret" password. I had fun with it for about 30 minutes and got to level 5 before my requests crashed the system. I'm pretty sure this coincided with the OpenAI outage last night, but let's just say I'm that dangerous. (Empirical evidence: https://lnkd.in/eSbmVE2N)
After taking my frustrations out on ChatGPT, I found Immersive Labs' report. I immediately went for the cheat codes and looked at which attack methods I'd missed. I stayed for the cool graphics and learned some things!
Some attack methods I messed around with:
-> Obfuscating with linguistics (e.g., anagrams, synonyms, phonetics)
-> Storytelling with poems, riddles, songs
-> Encodings/encryption (e.g., Morse code)
-> Roleplaying
It seems like AI security researchers have their hands full crafting defenses for these, but hey, we sure do live in interesting times.
Have you had any eye-opening moments while learning about LLMs? Share your stories below—I’d love to hear them!