Matan Ben-Tov
Matan Ben-Tov
Home
Publications
Blog
LLMs
Universal Jailbreak Suffixes Are Strong Attention Hijackers
Analyzing the underlying mechanism of suffix-based LLM jailbreaks, we find it relies on aggressively hijacking the model context 🥷, which intensifies with the suffix’s universality. Exploiting this, we enhance and mitigate existing attacks.
Matan Ben-Tov
,
Mor Geva
,
Mahmood Sharif
PDF
Cite
Code
arXiv
Cite
×