Threatstealth
AI Security 2026-06-03 13 min read

Shadow AI Risk: How Employees Are Leaking Data Through Unsanctioned AI Tools

Shadow AI is the fastest-growing source of enterprise data leakage. Learn how employees use unsanctioned AI tools, what data is at risk, and how to build an AI usage policy that security teams can actually enforce.

Threatstealth AI Security

Shadow AI refers to the use of AI tools and services by employees without formal IT approval or security review — analogous to shadow IT, but with a critical difference: AI tools process and potentially retain the data submitted to them. A rogue SaaS subscription might expose configuration data. A rogue AI tool can expose source code, customer records, legal contracts, and financial projections in a single session.

The Scale of the Problem

Enterprise security teams systematically underestimate shadow AI adoption because it is invisible in traditional security telemetry. Network traffic to OpenAI, Anthropic, Google, and dozens of smaller AI providers looks identical to normal HTTPS traffic. Endpoint detection tools see a web browser making API calls — not an employee submitting their customer database to an external AI service.

Survey data consistently shows that 60–80% of employees in knowledge-work roles regularly use AI tools not approved by their employer. When cross-referenced against most enterprise AI governance programmes, which formally approve fewer than five AI tools for general use, the gap represents a vast unmonitored attack surface.

Why Traditional DLP Fails Against Shadow AI

Data Loss Prevention (DLP) tools were designed for a world where sensitive data moved through identifiable channels — email attachments, USB drives, and file-sharing services. Shadow AI breaks every assumption that DLP is built on.

Traditional DLP inspects file transfers and known egress vectors. AI interactions are conversational: sensitive data is typed or pasted as plain text in a chat interface, transmitted over TLS to a third-party API, and processed by an external model. The DLP tool sees encrypted HTTPS traffic to a known SaaS domain — the same pattern it sees for Google Docs or Salesforce. There is no file, no attachment, and no network signature that distinguishes a sales rep writing an email from a sales rep submitting their entire customer list for AI analysis.

What Data Leaves the Organisation

Understanding shadow AI risk requires mapping what categories of sensitive data employees actually submit to AI tools. Enterprise security teams that have conducted AI usage audits consistently find the same categories of data appearing in AI interactions.

Shadow AI Data Leakage Risk by Data Category
Data categoryCommon AI use caseRisk levelRegulatory exposure
Source code + credentialsAI coding assistant, code review, debuggingCriticalIP theft, credential exposure
Customer PII / CRM dataEmail drafting, customer analysis, segmentationHighGDPR Art. 44 (international transfer), CCPA
Financial recordsAI summarisation, financial modelling assistanceHighSOX, market abuse if pre-announcement
Legal documents / contractsAI-assisted review, summarisationHighAttorney-client privilege, NDA exposure
HR / personnel dataPerformance management, HR communication draftingHighGDPR, employment law
Strategic planning documentsAI-assisted analysis, presentation draftingMediumCompetitive intelligence risk
Internal communicationsEmail drafting, tone improvementMediumConfidentiality, regulatory investigation risk

AI Data Retention: What Happens to Submitted Data

The risk profile of shadow AI varies significantly by provider and configuration. Consumer-tier AI services typically use interaction data for model training by default — meaning that sensitive data submitted by employees may be incorporated into training datasets. Enterprise-tier subscriptions often include data processing agreements (DPAs) that prohibit training on customer data, but employees using free or consumer accounts operate under consumer terms of service.

Organisations that have not explicitly disabled training on their Microsoft 365 Copilot or OpenAI Enterprise deployments may have already contributed sensitive internal data to model training. The Microsoft Copilot for M365 data handling is particularly complex: it operates on the organisation's existing M365 data permissions, meaning that an employee who asks Copilot to summarise 'recent emails about Project Alpha' can receive summaries of emails they would not normally access if their M365 permissions are misconfigured.

Building an AI Usage Policy That Works

The failure mode of most enterprise AI usage policies is that they are written as a list of prohibitions — and are consequently ignored by the employees they purport to govern. Effective AI usage policy design starts from the assumption that employees will use AI tools regardless of policy, and works to create a compliant path that is more convenient than the non-compliant alternative.

Technical Controls for Shadow AI Governance

Policy alone is insufficient — technical controls are required to provide visibility and enforcement. Effective shadow AI governance does not require blocking AI tools, which would be both futile and counterproductive. It requires visibility into AI tool usage and the ability to enforce data classification boundaries.

Detecting Shadow AI Before It Becomes a Breach

Detection of shadow AI usage requires a different approach from traditional threat detection. The goal is not to detect malicious activity — shadow AI use is almost always inadvertent policy violation, not intentional data theft. The detection goal is to identify high-risk AI usage patterns before they result in a regulatory breach notification or IP compromise.

Network-level AI discovery identifies the set of AI services in active use by analysing DNS queries and network flows to known AI provider domains. This provides a shadow AI inventory without requiring endpoint agents or SSL inspection. Once the inventory is known, high-risk usage patterns — large data volumes to consumer AI endpoints, AI API calls from developer environments, AI tool usage from accounts with access to sensitive data classifications — can be flagged for review.

← All articles