Matan Ben-Tov
Matan Ben-Tov
Home
Publications
Posts
Light
Dark
Automatic
Retrieval
Universal Jailbreak Suffixes Are Strong Attention Hijackers
Analyzing the underlying mechanism of suffix-based LLM jailbreaks, we find it relies on aggressively hijacking the model context 🥷, which intensifies with the suffix’s universality. Exploiting this, we enhance and mitigate existing attacks.
Matan Ben-Tov
,
Mor Geva
,
Mahmood Sharif
PDF
Cite
Code
arXiv
🤗
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search
Through introducing a strong, new SEO attack ⛽💡, we extensively evaluate widely-used embedding-based retrievers’ susceptibility to SEO attacks via corpus poisoning, linking it to key properties in embedding space.
Matan Ben-Tov
,
Mahmood Sharif
PDF
Cite
Code
arXiv
Cite
×