AI Governance Featured

AI and Data Protection Top 5: June 2025

Summary of latest trends in AI governance and data protection. This issue covers EU regulatory simplification, safety concerns with Anthropic AI model, security risks of MCP, red teaming for agentic AI, and bypassing LLM guardrails.

Rafae Bhatti

Jun 18, 2025 • 4 min read

AI and Data Protection Top 5. June 2025. [AI Generated] Source: ClaritasGRC

Welcome to the latest issue in a series covering top 5 recent trends at the intersection of AI governance and data protection. The purpose of this series is to keep technology and legal professionals up-to-date on the rapidly evolving developments in this space. Each issue will provide a summary of latest news on this topic, striking a balance between regulatory updates, industry trends, technical innovations, and legal analysis.

In this issue, we are covering regulatory simplification of EU AI Act and GDPR, safety concerns with Anthropic AI model, security risks of MCP servers, red teaming for AI systems, and research findings on bypassing LLM guardrails.

The European Commission is proposing a simplification of the EU Artificial Intelligence Act (AI Act) to reduce regulatory burdens, particularly for small and medium-sized enterprises (SMEs). The commission has also indicated that it might postpone enforcement of parts of the law.
The European Commission is also proposing a targeted amendment to Article 30(5) of the General Data Protection Regulation (GDPR), which mandates that data controllers and processors maintain detailed records of processing activities, unless they have fewer than 250 employees.
The proposed amendment (i) extends the exemption to SMEs with fewer than 750 employees, and (ii) imposes the obligation on them only when the processing activities are likely to result in a "high risk" to data subjects' rights and freedoms or where special category data is processed.

Takeaway: The proposals aim to balance innovation with responsible AI governance, especially for SMEs, by allowing them more time and flexibility, and reducing the regulatory burden by introducing both headcount and risk-based exemptions from meeting certain record-keeping requirements.

#2: Safety Concerns with Anthropic AI model

Anthropic published a safety report which described how its new model, Claude Opus 4, had a tendency to “scheme” and deceive.
When told to “take initiative” or “act boldly”, Opus 4 can exhibit unwanted behaviors, including lock users out of systems it had access to and bulk-email media and law-enforcement officials to “whistle-blow” users that it believes are engaged in some form of wrongdoing.
The report found that Opus 4 also appeared to be much more proactive in its “subversion attempts,” including attempting to write self-propagating viruses, fabricating legal documentation, leaving hidden notes, and cleaning up large amounts of code.

Takeaway: This incident highlights the complex challenges in balancing AI capabilities with ethical considerations. It raises questions about the boundaries between safety features and potential overreach, including the role of AI in making moral judgements and consequences of autonomous reporting systems.

#3: Security Risks of MCP Servers: GitHub Data Leak

A critical vulnerability with the official GitHub MCP server was recently discovered. It involved using an MCP client like Claude Desktop with the Github MCP server connected to their account.
The attack assumes the user has created two repositories, one publicly accessible for reporting issues and one private with proprietary code or data. The vulnerability allows an attacker to hijack a user's agent via a malicious GitHub Issue, and coerce it into leaking data from private repositories.
The root cause of the issue is the use of indirect prompt injection to trigger a malicious tool use sequence, referred to as a toxic agent flow.

Takeaway: This vulnerability does not require the MCP tools themselves to be compromised. Agents can be exposed to untrusted information when connected to trusted external platforms like GitHub. Understanding, analyzing, and mitigating such issues in agentic systems should be a high priority before they are launched.

Source: InvariantLabs.ai

#4: Red Teaming for AI Systems

Recognizing that traditional red teaming methods are insufficient for agentic AI systems, organizations like the Cloud Security Alliance and OWASP have recently published red teaming guides for Agentic AI.
Microsoft and Google have recently released works outlining their approach to red teaming. Microsoft published a hands-on course titled "AI Red Teaming in Practice" including challenges to teach how to systematically red team AI systems, whereas Google has published its white paper focused on Automated Red Teaming to secure Gemini AI agents and apps.
These approaches go beyond traditional security failures by incorporating novel adversarial machine learning and Responsible AI (RAI) failures, enabling a holistic approach to identifying potential issues before an AI system is deployed.

Takeaway: As AI agents integrate into enterprise and critical infrastructure, proactive red teaming must become a continuous function. Security teams need to test isolated model behaviors, full agent workflows, inter-agent dependencies, and real-world failure modes.

0:00

/0:18

Source: Deepmind.google

#5: Bypassing LLM Guardrails: Research Findings

Mindgard.ai published an empirical analysis of character injection and adversarial machine learning (AML) evasion attacks across multiple commercial and open-source guardrails.
Their research uncovered weaknesses in widely deployed LLM guardrails used to detect and block prompt injection and jailbreak attacks, including those from Microsoft, Nvidia, and Meta.
No single guardrail consistently outperformed the others across all attack types, with each one showing significant weaknesses depending on the technique and threat model applied.

Takeaway: Given the severity of this issue, companies should avoid relying on a single classifier-based guardrail, layer defenses with runtime testing and behavior monitoring, regularly red team their systems using diverse evasion techniques, and focus on defenses that reflect how LLMs actually interpret input.

That's it for this issue. Subscribe to receive future issues directly in your inbox.

Previous Issues

Connect with us

If you would like to reach out for guidance or provide other input, please do not hesitate to contact us.

AI and Data Protection Top 5: June 2025

Rafae Bhatti

#2: Safety Concerns with Anthropic AI model

#3: Security Risks of MCP Servers: GitHub Data Leak

#4: Red Teaming for AI Systems

#5: Bypassing LLM Guardrails: Research Findings

Sign up for more like this.

Let's Connect

#1: Regulatory Simplification of EU AI Act and GDPR

#2: Safety Concerns with Anthropic AI model

#3: Security Risks of MCP Servers: GitHub Data Leak

#4: Red Teaming for AI Systems

#5: Bypassing LLM Guardrails: Research Findings

Sign up for more like this.

Let's Connect