Leading AI Models Vulnerable to Simple Language Manipulation
TrendAI today publishes new analyses showing how simple text manipulation, known as sockpuppeting, can cause prominent AI models such as GPT-4o, Claude 4 Sonnet and Gemini 2.5 Flash to bypass their own safety filters. By hiding malicious instructions inside a seemingly innocent prompt, an attacker can trick an assistant into violating its guidelines. All tested … Read more