Sockpuppeting: New Jailbreak Technique Threatens 11 Major AI Models
Severity: High (Score: 67.5)
Sources: Trendmicro, Cybersecuritynews
Summary
A new jailbreak technique called 'sockpuppeting' has been identified, allowing attackers to bypass safety measures in 11 major large language models (LLMs) using a single line of code. This method exploits the assistant prefill feature in APIs, enabling the injection of compliant-sounding prefixes that trick models into providing prohibited information. The technique has shown success rates of up to 95% on Qwen-8B and 77% on Llama-3.1-8B without requiring complex optimization. Affected models include popular ones like ChatGPT, Claude, and Gemini. OpenAI has acknowledged the effectiveness of this technique, indicating it poses a significant risk to their safety protocols. Security experts recommend immediate attention to mitigate potential exploitation. The current status of the threat is active, with ongoing discussions in the cybersecurity community regarding its implications. Key Points: • Sockpuppeting allows bypassing safety guardrails in 11 major LLMs with a single line of code. • The technique exploits the assistant prefill feature to inject fake acceptance messages. • OpenAI has confirmed the effectiveness of sockpuppeting, raising concerns about model safety.