I am a Computer Science PhD student at Tel Aviv University and an AI security researcher in the PLUS research group, advised by Dr. Mahmood Sharif. I’m currently interning at Google. In the past, I worked as a software engineer at Check Point Research.
I am interested in the security of Natural Language Processing (NLP) systems and models. In particular, I explore the execution, practicality, implications and potential mitigations of attacks against NLP models, aiming to better understand what makes these models vulnerable to certain attacks.
An open-source framework for running and developing discrete text-trigger optimizers—against any NLP model and any loss. TROPT ships 40+ recipes spanning LLM jailbreaks, model auditing, and interpretability, lowering the barrier to adopting discrete text optimization methods.
Analyzing the underlying mechanism of suffix-based LLM jailbreaks, we find it relies on aggressively hijacking the model context 🥷, which intensifies with the suffix’s universality. Exploiting this, we enhance and mitigate existing attacks.
Through introducing a strong, new SEO attack ⛽💡, we extensively evaluate widely-used embedding-based retrievers’ susceptibility to SEO attacks via corpus poisoning, linking it to key properties in embedding space.
We propose an efficient attack against neural tabular classifiers for automatic robustness evaluation, addressing attacker objectives such as feasibility (via incorporation of database constraints) and cost-efficiency.
PGD is the perfect [adversarial] example of neural networks’ vulnerabilities. I implemented a compact version of this method, that can be …
Transferring previously-trained Vec2Text’s embedding inversion to new text encoders, by training a mere affine mapping.