Security & Compliance Standards for AI Systems
Back to blog
AI SecurityAI Governance

Security & Compliance Standards for AI Systems

AI security begins where ordinary app security stops: the attack can be a dataset, a gradient, or a paragraph that looks harmless. This guide maps that wider threat surface and the controls regulated teams need.

14 min readMarch 18, 2026
SecurityComplianceISO 42001NIST AI RMFEU AI ActMLSecOpsAdversarial ML

Threat landscape

AI introduces a new class of attacks traditional security cannot see

A well-hardened web application with zero CVEs can still be trivially compromised through its ML model. AI systems extend the classical threat model with four novel attack surfaces: the training data pipeline (poisoning), the model weights (inversion and extraction), the inference endpoint (adversarial inputs and prompt injection), and the supply chain (compromised model artifacts). Each attack surface requires controls that sit outside the scope of a standard OWASP or NIST CSF program. The defining characteristic of AI threats is that the attack payload is data — a carefully crafted input or a subtle corruption in a dataset — not code, making them invisible to firewalls, IDS systems, and static analysis tools.

Training-time

Data Poisoning

An attacker injects malicious examples into training data to degrade model accuracy or embed a backdoor (trigger pattern → attacker-controlled output). Particularly dangerous in federated learning and models trained on scraped web data. Mitigate with data provenance tracking, statistical anomaly detection on training distributions, and differential privacy.

Privacy attack

Model Inversion

An adversary queries a model repeatedly to reconstruct sensitive training data — for example, recovering faces from a facial recognition model's confidence scores. Poses major GDPR risk for models trained on PII. Mitigate with output perturbation, confidence score truncation, and differential privacy during training.

Inference-time

Adversarial Examples

Imperceptible perturbations to inputs that cause catastrophic misclassification. Transferable across model architectures. High-stakes in vision systems (autonomous vehicles, medical imaging) and speech recognition. Mitigate with adversarial training, input preprocessing (feature squeezing, JPEG compression), and ensemble defenses.

LLM attack

Prompt Injection

Malicious instructions embedded in user content or retrieved documents override the system prompt, hijacking the LLM's behavior. Direct injection: user input contains "Ignore previous instructions." Indirect injection: a retrieved web page contains hidden instructions. Mitigate with strict prompt delimiters, input/output guardrails, and privilege separation between system and user context.

IP theft

Model Extraction

An attacker queries a production model to train a functional clone — stealing IP and bypassing access controls on the original. A stolen clone can then be used for white-box adversarial attacks. Mitigate with query rate limiting, output watermarking, and anomaly detection on query patterns.

Supply chain

Supply Chain Attacks

Compromised pre-trained models, poisoned pip packages, or backdoored model checkpoints distributed via public registries (HuggingFace, PyPI). A trojan in a base model persists through fine-tuning. Mitigate with model provenance verification, cryptographic signing of artifacts, and SBOM (Software Bill of Materials) for AI.

Regulatory landscape

Four compliance frameworks every AI team needs to map

Regulatory pressure on AI systems has converged on four dominant frameworks in 2024–2026: the EU AI Act (binding law), NIST AI RMF (US federal guidance), ISO/IEC 42001 (certifiable management system), and SOC 2 with AI-specific trust service criteria. Each framework has a different scope, legal force, and target audience. For most organizations shipping AI into regulated markets, compliance is a multi-framework exercise — not a single checkbox.

1

EU AI Act — Risk-tiered regulation (2024)

Binding — Aug 2026

Binding EU law classifying AI systems by risk: Unacceptable (banned, e.g., social scoring), High (regulated: healthcare, hiring, critical infrastructure — requires conformity assessment, CE marking, human oversight, explainability, audit logs retained 10 years), Limited (transparency obligations), Minimal (unregulated). GPAI (General Purpose AI) models with >10²⁵ FLOPs training compute face additional systemic risk obligations. Penalties: up to €35M or 7% of global revenue.

2

NIST AI RMF — Risk management framework (2023)

Voluntary

US voluntary framework for managing AI risk across four functions: GOVERN (policies, roles, culture), MAP (risk identification and context), MEASURE (risk analysis and evaluation), MANAGE (risk treatment and monitoring). Closely aligned with NIST CSF 2.0. Required for US federal AI procurement under EO 14110. Not legally binding but de facto standard for US government contractors.

3

ISO/IEC 42001:2023 — AI management system standard

Certifiable

Certifiable management system standard for AI (parallel to ISO 27001 for infosec). Requires an AI policy, risk assessment process, impact assessments for high-risk uses, and continual improvement cycle. Provides third-party certification pathway. Particularly valued for B2B enterprise procurement and demonstrating due diligence. Compatible with and maps to EU AI Act requirements.

4

SOC 2 + AI Trust Services (emerging)

Market-driven

Traditional SOC 2 Trust Service Criteria (Security, Availability, Confidentiality, Processing Integrity, Privacy) extended with AI-specific controls: training data integrity, model versioning and change management, bias monitoring, explainability for automated decisions, and AI incident response. Growing demand from enterprise SaaS customers deploying AI features.

7% revenue

EU AI Act penalty

Certifiable

ISO 42001

4 functions

NIST AI RMF

8 domains

High-risk sectors

Secure development

MLSecOps: embedding security into every stage of the AI lifecycle

Secure AI development requires security controls at every phase of the ML lifecycle — not just at the API perimeter. MLSecOps extends DevSecOps principles to the unique artifacts of AI: datasets, feature stores, model weights, training pipelines, and inference endpoints. The goal is to make security a property of the system, not an audit at the end.

1

Data Security & Provenance

Phase 1: Data

Maintain cryptographic hashes and provenance records for all training datasets. Enforce data access controls with column-level RBAC. Scan ingested data for PII (using tools like Presidio or AWS Comprehend) and apply pseudonymization or anonymization before training. Log all data access for audit trails. Apply differential privacy (ε-DP) for models trained on sensitive data.

2

Threat Modeling & Red-Teaming

Phase 2: Design

Conduct AI-specific threat modeling using STRIDE adapted for ML: Spoofing (adversarial inputs), Tampering (data/model poisoning), Repudiation (missing audit logs), Information disclosure (model inversion), Denial of service (model extraction), Elevation of privilege (prompt injection bypassing access controls). Conduct structured red-teaming sessions before production deployment — especially for LLM features.

3

Secure Training Pipelines

Phase 3: Training

Run training workloads in isolated compute environments (VPC, no internet access). Use immutable container images with pinned base layers. Scan all pip dependencies with pip-audit or Safety. Enforce model artifact signing (cosign) and store checksums in a tamper-evident log. Restrict write access to feature stores and model registries to CI/CD service accounts only.

4

Model Evaluation & Bias Auditing

Phase 4: Evaluation

Evaluate models on fairness metrics disaggregated by protected attributes before every release. Use IBM Fairness 360 or Google's What-If Tool for audit. Document evaluation results in a Model Card. For EU AI Act high-risk systems, conduct a formal Fundamental Rights Impact Assessment (FRIA). Log all evaluation results and retain for the duration required by applicable regulation.

5

Secure Inference & Runtime Guards

Phase 5: Deployment

Deploy inference behind an API gateway with authentication, rate limiting, and query anomaly detection. Apply input validation (schema enforcement, length limits, character filtering for LLMs). Implement output scanning for PII, harmful content, and hallucination indicators. For LLMs: enforce system prompt isolation and use a privilege-separation architecture that prevents user-supplied content from accessing system-level context.

6

Monitoring & Incident Response

Phase 6: Operations

Monitor for data drift (feature distribution shift signals dataset poisoning), prediction drift (accuracy degradation signals adversarial activity), and anomalous query patterns (high-volume similar queries signals extraction attempts). Define AI-specific incident playbooks: model rollback procedure, customer notification triggers, regulator reporting requirements under EU AI Act Article 62 (serious incident reporting within 15 days).

Privacy-preserving ML

Differential privacy: the mathematical guarantee behind data anonymization

Differential privacy (DP) provides a mathematically rigorous guarantee that no individual's data has a significant effect on the model's output — making it impossible for an adversary to determine with confidence whether any specific individual was in the training set. It is the strongest privacy guarantee available for ML and is required for certain regulated data categories (health records, financial data) under GDPR and HIPAA.

Algorithm

DP-SGD (TensorFlow Privacy)

Differentially private SGD: clip per-sample gradients to norm C, add Gaussian noise N(0, σ²C²I), then average. Implemented in TF Privacy and Opacus (PyTorch). Track privacy budget with the moments accountant or Rényi DP composition.

Architecture choice

Local vs. Central DP

Local DP: each user randomizes their data before sending (used by Apple, Google for telemetry). Central DP: trusted curator adds noise to aggregate statistics or model updates (DP-SGD). Central DP offers better utility at same privacy budget; local DP removes need to trust the server.

Regulatory alignment

GDPR & HIPAA alignment

DP provides a technical basis for the GDPR "anonymization" exception (Recital 26) — enabling data processing without consent for anonymized datasets. Under HIPAA, DP-trained models may qualify for the Safe Harbor de-identification standard if ε is sufficiently small and no re-identification risk remains.

M is (ε,δ)-DP if: Pr[M(D)S]eεPr[M(D)S]+δM \text{ is } (\varepsilon, \delta)\text{-DP if: } \Pr[M(D) \in S] \leq e^{\varepsilon} \cdot \Pr[M(D') \in S] + \delta

For any two adjacent datasets D and D' differing in one record, and any output set S, the mechanism M satisfies (ε,δ)-differential privacy. ε (epsilon) is the privacy budget — smaller ε means stronger privacy. δ is the failure probability. DP-SGD, the standard algorithm for DP training, clips per-sample gradients and adds calibrated Gaussian noise at each step.

σC2ln(1.25/δ)εT\sigma \geq \frac{C \cdot \sqrt{2 \ln(1.25/\delta)}}{\varepsilon \cdot \sqrt{T}}

Minimum noise multiplier σ for DP-SGD to achieve (ε,δ)-DP over T training steps, where C is the gradient clipping norm. Privacy cost accumulates over training steps via the moments accountant. Practical tradeoff: ε ≈ 1–10 for production health/finance models; accuracy loss is typically 1–5% vs. non-private baseline on large datasets.

LLM-specific controls

Securing LLM applications: guardrails, sandboxing, and access control

LLM-powered applications face a uniquely broad attack surface because the model's input and output are natural language — harder to validate than structured API parameters. The OWASP Top 10 for LLM Applications (2025) identifies prompt injection as the #1 risk, followed by insecure output handling, training data poisoning, model denial of service, and excessive agency (agents with too many permissions). Each risk has specific engineering controls.

OWASP LLM #1

Prompt Injection Defense

Separate system and user context with structural delimiters (XML tags, special tokens). Never interpolate raw user input directly into system prompts. Use a privilege-separation model: system context has elevated trust; user context is zero-trust. Apply an input classifier that detects injection patterns before passing to the LLM. Log all system prompt access attempts.

OWASP LLM #2

Output Validation & Sandboxing

Never execute LLM-generated code directly. Run in an isolated sandbox (Firecracker microVM, E2B, Docker with seccomp) with no filesystem write access, no network access, and CPU/memory limits. Validate structured outputs (JSON, SQL) against a schema before use. Scan free-text outputs for PII using a dedicated classifier before surfacing to users.

OWASP LLM #8

Least-Privilege for Agents

Agentic LLMs should operate with minimal permissions. Apply OAuth scopes to tool access: the research agent gets read-only web search; the database agent gets a read-only replica. Require explicit human approval for write operations (email sending, database mutations, file deletion). Audit every tool call with actor identity, tool name, parameters, and response.

OWASP LLM #4

Rate Limiting & DoS Protection

LLM inference is expensive — a single user sending max-context prompts can exhaust a GPU cluster and create availability risk. Apply token-level rate limits (max tokens/min per user), context length caps, and queue depth limits. Detect and block extraction attack patterns (hundreds of structurally similar queries from one IP/user) before they reach the model.

Access control

Model Access Control

Treat model weights as sensitive IP: encrypt at rest (AES-256), restrict access to production serving infrastructure via IAM roles, log all weight downloads. For multi-tenant deployments, enforce tenant isolation at the inference layer — no shared KV cache across tenants, isolated VRAM regions for confidential model variants.

Trust & safety

Content Moderation Pipeline

Deploy a content moderation layer in front of and behind the LLM: input moderation (harmful request detection), output moderation (PII, CSAM, violence, IP leakage). Use lightweight classifier models (DistilBERT-scale) for latency-sensitive paths. Log moderation decisions with policy version for audit and model improvement.

Implementation roadmap

From framework to artifact: a 12-week compliance implementation plan

Compliance frameworks describe what to achieve — this section maps each requirement to a concrete engineering or organizational artifact you can deliver, version-control, and present to an auditor. The sequence below works for ISO 42001 certification and maps substantially to EU AI Act conformity requirements for high-risk systems.

1

Weeks 1–2: Inventory & Classification

Weeks 1–2

Catalog every AI system in production or development. Classify each by EU AI Act risk tier and NIST AI RMF impact category. Document use case, data inputs, decision type (advisory vs. binding), affected population, and applicable regulations. This inventory is the foundation artifact for every subsequent compliance step.

2

Weeks 3–4: Policy & Governance Framework

Weeks 3–4

Draft an AI Policy (ISO 42001 Clause 5.2): objectives, scope, roles (AI Owner, Data Steward, CISO, DPO). Establish an AI Risk Committee with defined meeting cadence and escalation paths. Define the AI lifecycle process: ideation → data assessment → development → red-teaming → staged rollout → monitoring → decommission.

3

Weeks 5–6: Risk Assessment & FRIA

Weeks 5–6

For each high-risk system, conduct an AI Risk Assessment (likelihood × impact for each threat category) and a Fundamental Rights Impact Assessment (FRIA). Document residual risks and accepted risk register. Map risks to technical controls. For GDPR-sensitive systems, conduct or update the Data Protection Impact Assessment (DPIA).

4

Weeks 7–8: Technical Controls Implementation

Weeks 7–8

Implement the prioritized control set from the risk assessment: data provenance logging, model versioning with cryptographic signing, audit log infrastructure (immutable append-only log, 10-year retention for EU high-risk), input/output validation, bias monitoring dashboards, and explainability APIs. Automate controls in CI/CD where possible.

5

Weeks 9–10: Model Documentation

Weeks 9–10

Produce a Model Card for every production model (intended use, out-of-scope uses, training data summary, evaluation results by subgroup, known limitations, ethical considerations). Produce Technical Documentation per EU AI Act Annex IV: system description, design specifications, validation datasets, performance metrics, and human oversight measures.

6

Weeks 11–12: Audit Readiness & Incident Response

Weeks 11–12

Assemble the compliance dossier: AI inventory, policies, risk assessments, FRIA/DPIA, model cards, technical documentation, audit logs, testing results. Draft AI Incident Response Playbook: detection criteria, severity levels, internal escalation, customer notification triggers, and regulatory notification procedure (EU AI Act Article 62: serious incidents within 15 days). Run a tabletop exercise.

10 years

Audit log retention

15 days

Incident reporting

€20M / 4%

GDPR fines

ε ≤ 10

Privacy budget

Checklist

  • Maintain an AI system inventory with EU AI Act risk tier classification and review date.
  • Assign an AI Owner for every production system with explicit accountability in the risk register.
  • Version-control all model artifacts with cryptographic hashes and sign with cosign or equivalent.
  • Retain audit logs for automated decision-making for the legally required period (10 years for EU AI Act high-risk).
  • Conduct bias and fairness evaluation on all protected attributes before every model release.
  • Implement human override capability for all binding AI decisions affecting individuals.
  • Test prompt injection defenses quarterly; document results in a red-team report.
  • Run DP-SGD (or equivalent) for any model trained on health, financial, or biometric data.
  • File DPIA updates whenever material changes are made to data processing or model behavior.
  • Conduct annual third-party security assessment of the ML infrastructure and model serving stack.