Itay Nakash

NLP, AI Security & Safety Researcher @ IBM Research

prof_pic_circel.png

Hey there! I’m Itay Nakash, an AI Research Scientist at IBM.

I’m interested in safe AI and LLM-based agent security. My work focuses on developing and evaluating the safety of AI models, particularly in real-world scenarios where they autonomously take actions on behalf of users.

Feel free to connect if you’d like to discuss AI safety, security, or any exciting developments in tech!

news

May 25, 2025 📄 Our paper Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of LLMs got accepted to ACL! We present new benchmark (POBS) and metrics to evaluate LLMs Preferences, Opinions and Beliefs.
Check it out here: POBS
Mar 25, 2025 📄 Our paper AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation is now published!
Check it out here: adaptivocab.github.io
Jan 01, 2025 I’m starting my full-time role as an NLP Researcher at IBM Research, focusing on Gen-AI safety and agentic AI security. Looking forward to tackling new challenges in the field!
Dec 15, 2024 📄 Our new paper, “Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In”, has been accepted to NAACL 2025 and will be presented at the conference! This work focuses on red-teaming LLM-based AI agents, examining vulnerabilities in autonomous systems and exploring how they can be both exploited and safeguarded. You can find the paper here.
Aug 01, 2024 Excited to begin my research internship at IBM Research as an NLP Researcher!
Jul 01, 2024 🏆 Honored to be recognized as an Excellent Teaching Assistant for my work in the Methods for NLP course.

selected publications

  1. Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In
    Itay Nakash, George Kour, Guy Uziel, and 1 more author
    In Findings of the Association for Computational Linguistics: NAACL 2025, Apr 2025
  2. AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation
    Itay Nakash, Nitay Calderon, Eyal Ben David, and 2 more authors
    Apr 2025
  3. Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models
    George Kour, Itay Nakash, Ateret Anaby Tavor, and 1 more author
    Apr 2025