Security & Compliance Standards for AI Systems
AI security begins where ordinary app security stops: the attack can be a dataset, a gradient, or a paragraph that looks harmless. This guide maps that wider threat surface and the controls regulated teams need.
Threat landscape
AI introduces a new class of attacks traditional security cannot see
A well-hardened web application with zero CVEs can still be trivially compromised through its ML model. AI systems extend the classical threat model with four novel attack surfaces: the training data pipeline (poisoning), the model weights (inversion and extraction), the inference endpoint (adversarial inputs and prompt injection), and the supply chain (compromised model artifacts). Each attack surface requires controls that sit outside the scope of a standard OWASP or NIST CSF program. The defining characteristic of AI threats is that the attack payload is data — a carefully crafted input or a subtle corruption in a dataset — not code, making them invisible to firewalls, IDS systems, and static analysis tools.
Data Poisoning
An attacker injects malicious examples into training data to degrade model accuracy or embed a backdoor (trigger pattern → attacker-controlled output). Particularly dangerous in federated learning and models trained on scraped web data. Mitigate with data provenance tracking, statistical anomaly detection on training distributions, and differential privacy.
Model Inversion
An adversary queries a model repeatedly to reconstruct sensitive training data — for example, recovering faces from a facial recognition model's confidence scores. Poses major GDPR risk for models trained on PII. Mitigate with output perturbation, confidence score truncation, and differential privacy during training.
Adversarial Examples
Imperceptible perturbations to inputs that cause catastrophic misclassification. Transferable across model architectures. High-stakes in vision systems (autonomous vehicles, medical imaging) and speech recognition. Mitigate with adversarial training, input preprocessing (feature squeezing, JPEG compression), and ensemble defenses.
Prompt Injection
Malicious instructions embedded in user content or retrieved documents override the system prompt, hijacking the LLM's behavior. Direct injection: user input contains "Ignore previous instructions." Indirect injection: a retrieved web page contains hidden instructions. Mitigate with strict prompt delimiters, input/output guardrails, and privilege separation between system and user context.
Model Extraction
An attacker queries a production model to train a functional clone — stealing IP and bypassing access controls on the original. A stolen clone can then be used for white-box adversarial attacks. Mitigate with query rate limiting, output watermarking, and anomaly detection on query patterns.
Supply Chain Attacks
Compromised pre-trained models, poisoned pip packages, or backdoored model checkpoints distributed via public registries (HuggingFace, PyPI). A trojan in a base model persists through fine-tuning. Mitigate with model provenance verification, cryptographic signing of artifacts, and SBOM (Software Bill of Materials) for AI.
Regulatory landscape
Four compliance frameworks every AI team needs to map
Regulatory pressure on AI systems has converged on four dominant frameworks in 2024–2026: the EU AI Act (binding law), NIST AI RMF (US federal guidance), ISO/IEC 42001 (certifiable management system), and SOC 2 with AI-specific trust service criteria. Each framework has a different scope, legal force, and target audience. For most organizations shipping AI into regulated markets, compliance is a multi-framework exercise — not a single checkbox.
EU AI Act — Risk-tiered regulation (2024)
Binding — Aug 2026Binding EU law classifying AI systems by risk: Unacceptable (banned, e.g., social scoring), High (regulated: healthcare, hiring, critical infrastructure — requires conformity assessment, CE marking, human oversight, explainability, audit logs retained 10 years), Limited (transparency obligations), Minimal (unregulated). GPAI (General Purpose AI) models with >10²⁵ FLOPs training compute face additional systemic risk obligations. Penalties: up to €35M or 7% of global revenue.
NIST AI RMF — Risk management framework (2023)
VoluntaryUS voluntary framework for managing AI risk across four functions: GOVERN (policies, roles, culture), MAP (risk identification and context), MEASURE (risk analysis and evaluation), MANAGE (risk treatment and monitoring). Closely aligned with NIST CSF 2.0. Required for US federal AI procurement under EO 14110. Not legally binding but de facto standard for US government contractors.
ISO/IEC 42001:2023 — AI management system standard
CertifiableCertifiable management system standard for AI (parallel to ISO 27001 for infosec). Requires an AI policy, risk assessment process, impact assessments for high-risk uses, and continual improvement cycle. Provides third-party certification pathway. Particularly valued for B2B enterprise procurement and demonstrating due diligence. Compatible with and maps to EU AI Act requirements.
SOC 2 + AI Trust Services (emerging)
Market-drivenTraditional SOC 2 Trust Service Criteria (Security, Availability, Confidentiality, Processing Integrity, Privacy) extended with AI-specific controls: training data integrity, model versioning and change management, bias monitoring, explainability for automated decisions, and AI incident response. Growing demand from enterprise SaaS customers deploying AI features.
7% revenue
EU AI Act penalty
Certifiable
ISO 42001
4 functions
NIST AI RMF
8 domains
High-risk sectors
Secure development
MLSecOps: embedding security into every stage of the AI lifecycle
Secure AI development requires security controls at every phase of the ML lifecycle — not just at the API perimeter. MLSecOps extends DevSecOps principles to the unique artifacts of AI: datasets, feature stores, model weights, training pipelines, and inference endpoints. The goal is to make security a property of the system, not an audit at the end.
Data Security & Provenance
Phase 1: DataMaintain cryptographic hashes and provenance records for all training datasets. Enforce data access controls with column-level RBAC. Scan ingested data for PII (using tools like Presidio or AWS Comprehend) and apply pseudonymization or anonymization before training. Log all data access for audit trails. Apply differential privacy (ε-DP) for models trained on sensitive data.
Threat Modeling & Red-Teaming
Phase 2: DesignConduct AI-specific threat modeling using STRIDE adapted for ML: Spoofing (adversarial inputs), Tampering (data/model poisoning), Repudiation (missing audit logs), Information disclosure (model inversion), Denial of service (model extraction), Elevation of privilege (prompt injection bypassing access controls). Conduct structured red-teaming sessions before production deployment — especially for LLM features.
Secure Training Pipelines
Phase 3: TrainingRun training workloads in isolated compute environments (VPC, no internet access). Use immutable container images with pinned base layers. Scan all pip dependencies with pip-audit or Safety. Enforce model artifact signing (cosign) and store checksums in a tamper-evident log. Restrict write access to feature stores and model registries to CI/CD service accounts only.
Model Evaluation & Bias Auditing
Phase 4: EvaluationEvaluate models on fairness metrics disaggregated by protected attributes before every release. Use IBM Fairness 360 or Google's What-If Tool for audit. Document evaluation results in a Model Card. For EU AI Act high-risk systems, conduct a formal Fundamental Rights Impact Assessment (FRIA). Log all evaluation results and retain for the duration required by applicable regulation.
Secure Inference & Runtime Guards
Phase 5: DeploymentDeploy inference behind an API gateway with authentication, rate limiting, and query anomaly detection. Apply input validation (schema enforcement, length limits, character filtering for LLMs). Implement output scanning for PII, harmful content, and hallucination indicators. For LLMs: enforce system prompt isolation and use a privilege-separation architecture that prevents user-supplied content from accessing system-level context.
Monitoring & Incident Response
Phase 6: OperationsMonitor for data drift (feature distribution shift signals dataset poisoning), prediction drift (accuracy degradation signals adversarial activity), and anomalous query patterns (high-volume similar queries signals extraction attempts). Define AI-specific incident playbooks: model rollback procedure, customer notification triggers, regulator reporting requirements under EU AI Act Article 62 (serious incident reporting within 15 days).
Privacy-preserving ML
Differential privacy: the mathematical guarantee behind data anonymization
Differential privacy (DP) provides a mathematically rigorous guarantee that no individual's data has a significant effect on the model's output — making it impossible for an adversary to determine with confidence whether any specific individual was in the training set. It is the strongest privacy guarantee available for ML and is required for certain regulated data categories (health records, financial data) under GDPR and HIPAA.
DP-SGD (TensorFlow Privacy)
Differentially private SGD: clip per-sample gradients to norm C, add Gaussian noise N(0, σ²C²I), then average. Implemented in TF Privacy and Opacus (PyTorch). Track privacy budget with the moments accountant or Rényi DP composition.
Local vs. Central DP
Local DP: each user randomizes their data before sending (used by Apple, Google for telemetry). Central DP: trusted curator adds noise to aggregate statistics or model updates (DP-SGD). Central DP offers better utility at same privacy budget; local DP removes need to trust the server.
GDPR & HIPAA alignment
DP provides a technical basis for the GDPR "anonymization" exception (Recital 26) — enabling data processing without consent for anonymized datasets. Under HIPAA, DP-trained models may qualify for the Safe Harbor de-identification standard if ε is sufficiently small and no re-identification risk remains.
For any two adjacent datasets D and D' differing in one record, and any output set S, the mechanism M satisfies (ε,δ)-differential privacy. ε (epsilon) is the privacy budget — smaller ε means stronger privacy. δ is the failure probability. DP-SGD, the standard algorithm for DP training, clips per-sample gradients and adds calibrated Gaussian noise at each step.
Minimum noise multiplier σ for DP-SGD to achieve (ε,δ)-DP over T training steps, where C is the gradient clipping norm. Privacy cost accumulates over training steps via the moments accountant. Practical tradeoff: ε ≈ 1–10 for production health/finance models; accuracy loss is typically 1–5% vs. non-private baseline on large datasets.
LLM-specific controls
Securing LLM applications: guardrails, sandboxing, and access control
LLM-powered applications face a uniquely broad attack surface because the model's input and output are natural language — harder to validate than structured API parameters. The OWASP Top 10 for LLM Applications (2025) identifies prompt injection as the #1 risk, followed by insecure output handling, training data poisoning, model denial of service, and excessive agency (agents with too many permissions). Each risk has specific engineering controls.
Prompt Injection Defense
Separate system and user context with structural delimiters (XML tags, special tokens). Never interpolate raw user input directly into system prompts. Use a privilege-separation model: system context has elevated trust; user context is zero-trust. Apply an input classifier that detects injection patterns before passing to the LLM. Log all system prompt access attempts.
Output Validation & Sandboxing
Never execute LLM-generated code directly. Run in an isolated sandbox (Firecracker microVM, E2B, Docker with seccomp) with no filesystem write access, no network access, and CPU/memory limits. Validate structured outputs (JSON, SQL) against a schema before use. Scan free-text outputs for PII using a dedicated classifier before surfacing to users.
Least-Privilege for Agents
Agentic LLMs should operate with minimal permissions. Apply OAuth scopes to tool access: the research agent gets read-only web search; the database agent gets a read-only replica. Require explicit human approval for write operations (email sending, database mutations, file deletion). Audit every tool call with actor identity, tool name, parameters, and response.
Rate Limiting & DoS Protection
LLM inference is expensive — a single user sending max-context prompts can exhaust a GPU cluster and create availability risk. Apply token-level rate limits (max tokens/min per user), context length caps, and queue depth limits. Detect and block extraction attack patterns (hundreds of structurally similar queries from one IP/user) before they reach the model.
Model Access Control
Treat model weights as sensitive IP: encrypt at rest (AES-256), restrict access to production serving infrastructure via IAM roles, log all weight downloads. For multi-tenant deployments, enforce tenant isolation at the inference layer — no shared KV cache across tenants, isolated VRAM regions for confidential model variants.
Content Moderation Pipeline
Deploy a content moderation layer in front of and behind the LLM: input moderation (harmful request detection), output moderation (PII, CSAM, violence, IP leakage). Use lightweight classifier models (DistilBERT-scale) for latency-sensitive paths. Log moderation decisions with policy version for audit and model improvement.
Implementation roadmap
From framework to artifact: a 12-week compliance implementation plan
Compliance frameworks describe what to achieve — this section maps each requirement to a concrete engineering or organizational artifact you can deliver, version-control, and present to an auditor. The sequence below works for ISO 42001 certification and maps substantially to EU AI Act conformity requirements for high-risk systems.
Weeks 1–2: Inventory & Classification
Weeks 1–2Catalog every AI system in production or development. Classify each by EU AI Act risk tier and NIST AI RMF impact category. Document use case, data inputs, decision type (advisory vs. binding), affected population, and applicable regulations. This inventory is the foundation artifact for every subsequent compliance step.
Weeks 3–4: Policy & Governance Framework
Weeks 3–4Draft an AI Policy (ISO 42001 Clause 5.2): objectives, scope, roles (AI Owner, Data Steward, CISO, DPO). Establish an AI Risk Committee with defined meeting cadence and escalation paths. Define the AI lifecycle process: ideation → data assessment → development → red-teaming → staged rollout → monitoring → decommission.
Weeks 5–6: Risk Assessment & FRIA
Weeks 5–6For each high-risk system, conduct an AI Risk Assessment (likelihood × impact for each threat category) and a Fundamental Rights Impact Assessment (FRIA). Document residual risks and accepted risk register. Map risks to technical controls. For GDPR-sensitive systems, conduct or update the Data Protection Impact Assessment (DPIA).
Weeks 7–8: Technical Controls Implementation
Weeks 7–8Implement the prioritized control set from the risk assessment: data provenance logging, model versioning with cryptographic signing, audit log infrastructure (immutable append-only log, 10-year retention for EU high-risk), input/output validation, bias monitoring dashboards, and explainability APIs. Automate controls in CI/CD where possible.
Weeks 9–10: Model Documentation
Weeks 9–10Produce a Model Card for every production model (intended use, out-of-scope uses, training data summary, evaluation results by subgroup, known limitations, ethical considerations). Produce Technical Documentation per EU AI Act Annex IV: system description, design specifications, validation datasets, performance metrics, and human oversight measures.
Weeks 11–12: Audit Readiness & Incident Response
Weeks 11–12Assemble the compliance dossier: AI inventory, policies, risk assessments, FRIA/DPIA, model cards, technical documentation, audit logs, testing results. Draft AI Incident Response Playbook: detection criteria, severity levels, internal escalation, customer notification triggers, and regulatory notification procedure (EU AI Act Article 62: serious incidents within 15 days). Run a tabletop exercise.
10 years
Audit log retention
15 days
Incident reporting
€20M / 4%
GDPR fines
ε ≤ 10
Privacy budget
Checklist
- Maintain an AI system inventory with EU AI Act risk tier classification and review date.
- Assign an AI Owner for every production system with explicit accountability in the risk register.
- Version-control all model artifacts with cryptographic hashes and sign with cosign or equivalent.
- Retain audit logs for automated decision-making for the legally required period (10 years for EU AI Act high-risk).
- Conduct bias and fairness evaluation on all protected attributes before every model release.
- Implement human override capability for all binding AI decisions affecting individuals.
- Test prompt injection defenses quarterly; document results in a red-team report.
- Run DP-SGD (or equivalent) for any model trained on health, financial, or biometric data.
- File DPIA updates whenever material changes are made to data processing or model behavior.
- Conduct annual third-party security assessment of the ML infrastructure and model serving stack.
Related posts
AI Governance and Regulations: From EU AI Act to ISO 42001
AI governance is the moment the story meets law: models leave the lab and enter a world of risk tiers, audits, and named obligations. This guide maps the major frameworks and what they require teams to actually build.
12 min readOperating AI in Regulated Environments: HIPAA, GDPR, PCI DSS & Beyond
The moment an AI system touches health, payment, or EU personal data, architecture turns into compliance choreography. This guide translates the major regulations into the engineering artifacts and process controls they demand.
18 min readOWASP Top 10 for LLM Apps: Real Attacks, Real Fixes
For LLM apps, the attack often arrives as plain language rather than obviously malicious code. This guide walks through the OWASP risks as real failure stories, then shows the concrete controls that stop them.
16 min read