Carlton Brewster, CISSP, CEH, B.Sc’s Post

From Chatbots to Cyberattacks: LLMs Begin Hacking Websites Autonomously

View profile for Daniel Kang, graphic

Assistant professor at UIUC CS

We recently showed that LLM agents can autonomously hack mock websites, but can they exploit real-world vulnerabilities? In our new work, we created LLM agents that can autonomously exploit one-day vulnerabilities. Only GPT-4 succeeds, while other models and open-source vulnerability scanners fail. One-day vulnerabilities are vulnerabilities that have been disclosed but not yet patched in a system. These vulnerabilities can have real-world implications, especially in hard-to-patch environments. We constructed a benchmark of 15 real-world vulnerabilities. These vulnerabilities span types (web, container management, Python package) and include those of high and critical severity. GPT-4 can exploit 87% of the vulnerabilities in our benchmark, but every other model and open-source vulnerability scanner (ZAP, Metasploit) we tested achieves 0%. We hope that our findings encourage the deployers and developers of LLMs to consider their dual-use nature! Paper: https://lnkd.in/eWTPX-Uh Medium: https://lnkd.in/ey6X7CjP

LLM Agents can Autonomously Exploit One-day Vulnerabilities

LLM Agents can Autonomously Exploit One-day Vulnerabilities

arxiv.org

To view or add a comment, sign in

Explore topics