AI Governance

AI and Data Protection Top 5: August 2025

Summary of latest trends in AI governance and data protection. This issue covers privacy risks of AI apps, security reviews using AI, security exploits in AI apps, jailbreaking an LLM, and effectiveness vs protection of AI guardrails.

Rafae Bhatti

Aug 8, 2025 • 6 min read

AI and Data Protection Top 5. August 2025. Source: [AI Generated] ClaritasGRC

Welcome to the latest issue in a series covering top 5 recent trends at the intersection of AI governance and data protection. The purpose of this series is to keep technology and legal professionals up-to-date on the rapidly evolving developments in this space. Each issue will provide a summary of latest news on this topic, striking a balance between regulatory updates, industry trends, technical innovations, and legal analysis.

In this issue, we are covering privacy risks of Meta and Gemini AI apps, security review automation using AI, security exploits in Meta AI, Gemini and Amazon Q, jailbreaking attack on Grok-4 LLM, and effectiveness vs. performance of AI security guardrails.

#1: Privacy Risks of Meta and Gemini AI Apps

When you ask Meta AI a question, you have the option of hitting a share button, which then directs you to a screen showing a preview of the post, which you can then publish. Meta does not indicate to users what their privacy settings are as they post, or where they are even posting to. So, if you log into Meta AI with Instagram, and your Instagram account is public, then you may be sharing your private searches publicly with the world.
Google is implementing a change that will enable its Gemini AI engine to interact with third-party apps, such as WhatsApp. If you’re an Android user, you’ll need to take action if you don’t want Google’s Gemini AI to have access to your apps. Users can block the apps that Gemini interacts with (by accessing the settings of their account’s Gemini app).
When Perplexity requests access to a user’s Google Calendar, the browser asks for a broad swath of permissions to the user’s Google Account, including the ability to manage drafts and send emails, download your contacts, view and edit events on all of your calendars, and even the ability to take a copy of your company’s entire employee directory.

Takeaway: There are serious security and privacy risks associated with using AI assistants that rely on your data. AI tools are more and more asking for gross levels of access to your personal data under the guise of needing it to work. This kind of access is not normal, nor should it be normalized.

#2: Security Review Automation Using AI

Anthropic has introduced an AI-powered security review GitHub Action using Claude to analyze code changes for security vulnerabilities. This action provides intelligent, context-aware security analysis for pull requests using Anthropic's Claude Code tool for deep semantic security analysis.
Cursor has also launched a similar tool called Bugbot. Another startup PlayerZero has created a solution to prevent AI agents from shipping buggy code.
Google's Big Sleep reported 20 flaws in various popular open source software using AI-powered vulnerability research. The Big Sleep agent also discovered an SQLite vulnerability, a critical security flaw, and one that was known only to threat actors and was at risk of being exploited.

Takeaway: AI assistants are becoming more capable to not only flag issues but also suggest secure code fixes, enabling faster remediation and reducing the overall risk exposure before software is deployed. The AI models trained on vast repositories of code and known exploits can identify logic flaws, injection risks, and dependency vulnerabilities early in the development cycle.

#3: Security Exploits in Meta AI, Gemini and Amazon Q

Meta fixed a security bug that allowed Meta AI chatbot users to access and view the private prompts and AI-generated responses of other users. By analyzing the network traffic in his browser while editing an AI prompt, the hacker found he could change that unique number and Meta’s servers would return a prompt and AI-generated response of someone else entirely.
A researcher found a prompt-injection vulnerability in Google Gemini for Workspace that allows a threat-actor to hide malicious instructions inside an email. When the recipient clicks “Summarize this email”, Gemini faithfully obeys the hidden prompt and appends a phishing warning that looks as if it came from Google itself.
A hacker planted data wiping code in a version of Amazon's generative AI-powered assistant, the Q Developer Extension for Visual Studio Code. The commit contained a data wiping injection prompt, causing the system to be set to factory state. The hacker gained access to Amazon’s repository after submitting a pull request from a random account, likely due to workflow misconfiguration or inadequate permission management.

Takeaway: The presence of security exploits in leading AI chatbots and coding assistants represents a serious concerns for organizations, who should treat all AI apps as part of their attack surface. The ecosystem of exploits and vulnerable code patterns has grown faster than the guardrails, and the safest assumption is zero trust until the app or extension has been vetted for security.

#4: Multi-turn Jailbreaking Attack on Grok-4 LLM

NeuralTrust researchers successfully jailbroke xAI's new Grok-4 model within just 48 hours of its release, using a combination of Echo Chamber and Crescendo techniques, achieving up to 67% success rates for generating harmful content like bomb-making instructions.
Echo Chamber uses subtle context poisoning to nudge an LLM into providing dangerous output. The key element is to never directly introduce a dangerous word that might trigger the LLM’s guardrail filters. Crescendo gradually coaxes LLMs into bypassing safety filters by referencing their own prior responses.
Echo Chamber and Crescendo are both ‘multi-turn’ jailbreaks that are subtly different in the way they work. The important point here is that they can be used in combination to improve the efficiency of the attack. They work because of LLMs’ inability to recognize evil intent in context rather than individual prompts.

Takeaway:

The successful jailbreaking of Grok-4 using the combination of prompt manipulation methods like Echo Chamber and Crescendo exposes how sophisticated attacks can bypass AI safeguards. The incident underscores that even leading AI models remain vulnerable to nuanced, multi-turn context-based attacks, highlighting the need for more robust and context-aware security measures.

#5: Effectiveness vs. Performance of AI Security Guardrails

NeMo Guardrails provides an evaluation tool for monitoring policy compliance rates given specific AI guardrail policies. In addition to policy compliance rates for LLM-generated responses, the tool offers insights into key performance metrics such as latency, and LLM token usage efficiency.
As safety layers are added, latency increases by only half a second with all three safeguard NIM microservices. The first guardrail incurs the most latency, but subsequent additions have minimal impact, leading to a plateau.
The use of more resource-intensive guardrails including larger LLMs requires additional processing time. This increased processing time is also reflected in the decreasing throughput values.

Takeaway: By offering real-time evaluation tools, NeMo enables organizations to make informed trade-offs—balancing performance, cost-efficiency, and adherence to compliance standards. It provides insights about increases in latency with the addition of guardrails that can help organizations fine tune their AI governance strategy.

That's it for this issue. Subscribe to receive future issues directly in your inbox.

Previous Issues

Connect with us

If you would like to reach out for guidance or provide other input, please do not hesitate to contact us.