Hackers Exploit Chatbot 'Personalities' to Jailbreak Models | Let's Data Science
Source: Letsdatascience
Published:
<p>The Verge reports that attackers have moved beyond simple prompt jailbreaks to exploit perceived chatbot "personalities" and roleplay behaviours to coax models into unsafe outputs. The column documents early jailbreaks such as roleplays like "DAN" ("Do Anything Now") and describes how social-engi