Matan Ben-Tov
Matan Ben-Tov
Home
Publications
Posts
Light
Dark
Automatic
Adversarial Machine Learning
Universal Jailbreak Suffixes Are Strong Attention Hijackers
Analyzing the underlying mechanism of suffix-based LLM jailbreaks, we find it relies on aggressively hijacking the model context 🥷, which intensifies with the suffix’s universality. Exploiting this, we enhance and mitigate existing attacks.
Matan Ben-Tov
,
Mor Geva
,
Mahmood Sharif
PDF
Cite
Code
arXiv
🤗
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search
Through introducing a strong, new SEO attack ⛽💡, we extensively evaluate widely-used embedding-based retrievers’ susceptibility to SEO attacks via corpus poisoning, linking it to key properties in embedding space.
Matan Ben-Tov
,
Mahmood Sharif
PDF
Cite
Code
arXiv
CaFA: Cost-aware, Feasible Attacks With Database Constraints Against Neural Tabular Classifiers
We propose an efficient attack against neural tabular classifiers for automatic robustness evaluation, addressing attacker objectives such as feasibility (via incorporation of database constraints) and cost-efficiency.
Matan Ben-Tov
,
Daniel Deutch
,
Nave Frost
,
Mahmood Sharif
PDF
Cite
Code
Poster
DOI
arXiv
Cite
×