Adversarial Machine Learning

Universal Jailbreak Suffixes Are Strong Attention Hijackers

Analyzing the underlying mechanism of suffix-based LLM jailbreaks, we find it relies on aggressively hijacking the model context 🥷, which intensifies with the suffix’s universality. Exploiting this, we enhance and mitigate existing attacks.

Matan Ben-Tov, Mor Geva, Mahmood Sharif

GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search

Through introducing a strong, new SEO attack ⛽💡, we extensively evaluate widely-used embedding-based retrievers’ susceptibility to SEO attacks via corpus poisoning, linking it to key properties in embedding space.

Matan Ben-Tov, Mahmood Sharif

CaFA: Cost-aware, Feasible Attacks With Database Constraints Against Neural Tabular Classifiers

We propose an efficient attack against neural tabular classifiers for automatic robustness evaluation, addressing attacker objectives such as feasibility (via incorporation of database constraints) and cost-efficiency.

Matan Ben-Tov, Daniel Deutch, Nave Frost, Mahmood Sharif

CaFA: Cost-aware, Feasible Attacks With Database Constraints Against Neural Tabular Classifiers