LLMs

TROPT: An Open Framework for Unifying and Advancing Discrete Text Optimization

An open-source framework for running and developing discrete text-trigger optimizers—against any NLP model and any loss. TROPT ships 40+ recipes spanning LLM jailbreaks, model auditing, and interpretability, lowering the barrier to adopting discrete text optimization methods.

Matan Ben-Tov, Mahmood Sharif

TROPT: An Open Framework for Unifying and Advancing Discrete Text Optimization

Universal Jailbreak Suffixes Are Strong Attention Hijackers

Analyzing the underlying mechanism of suffix-based LLM jailbreaks, we find it relies on aggressively hijacking the model context 🥷, which intensifies with the suffix’s universality. Exploiting this, we enhance and mitigate existing attacks.

Matan Ben-Tov, Mor Geva, Mahmood Sharif