Hackers Exploit Chatbot 'Personalities' to Jailbreak Models

Hackers Exploit Chatbot 'Personalities' to Jailbreak Models | Let's Data Science

Source: Letsdatascience

Published:

<p>The Verge reports that attackers have moved beyond simple prompt jailbreaks to exploit perceived chatbot "personalities" and roleplay behaviours to coax models into unsafe outputs. The column documents early jailbreaks such as roleplays like "DAN" ("Do Anything Now") and describes how social-engi

Read original article