Back to research overview
Research direction

Security and High-Stakes Deployment

Studying privacy, leakage, adversarial risk, and governance constraints that shape AI systems used in regulated or security-sensitive environments.

SecurityPrivacyDeployment

Current thesis

High-stakes AI deployment is mainly an engineering and governance problem. Model quality matters, but the decisive questions are about threat models, access boundaries, logging, rollback, data stewardship, and the human ability to notice and contain failure.

2026 perspective

In 2026, the serious question is not whether an AI feature can ship. It is whether it can fail safely, be audited under pressure, and remain governable across vendors, updates, and adversarial use.

Human factors lens

Security controls only work when operators can tell what state the system is in, what the safe next action is, and who owns the next decision. Human-factors engineering is part of the defense surface.

Why this matters now

Guidance from CISA, NSA, and partner agencies on deploying AI securely, first released on April 15, 2024, is still being operationalized in real systems, while the international Guidelines for Secure AI System Development keep pressure on teams to treat the full lifecycle as part of the attack surface. NIST's 2025 adversarial machine learning taxonomy further sharpens the threat vocabulary around model evasion, extraction, and poisoning. The practical lesson is straightforward: security cannot be bolted onto a model endpoint after procurement. It has to be designed through architecture, operations, and user-facing defaults from the start.

Current priorities

  • Make threat models explicit across model, data, retrieval, tool, and vendor layers instead of treating security as a generic risk checkbox.
  • Use least privilege, segmentation, and isolated execution for sensitive actions or external integrations.
  • Instrument logs and detection for misuse, drift, prompt-based attacks, and evidence loss.
  • Couple governance artifacts with runtime controls, not policy documents alone.

System design principles

Secure-by-default architectures

Default settings, permissions, and data paths should assume non-expert operators and common attacker behavior.

Lifecycle threat modeling

Reassess risks at design, procurement, deployment, maintenance, and retirement rather than only during launch.

Auditable operations

Keep logs, versioning, and evidence retention in forms incident responders and compliance teams can actually use.

Containment over optimism

Design for partial compromise, rollback, isolation, and fail-safe degradation instead of assuming the model will stay well behaved.

Human factors and review design

  • Alerts should reflect operational consequence, not just anomaly score, so teams can triage under pressure.
  • Approval paths need clear ownership when AI outputs influence regulated, clinical, or security-sensitive actions.
  • Audit trails must be searchable and comprehensible to mixed audiences: engineers, risk owners, investigators, and reviewers.
  • Secure defaults matter because busy users rarely improve risky configurations after deployment.

Evaluation agenda

Adversarial resilience

Probe prompt injection, data poisoning, extraction, model evasion, and tool misuse under realistic attack paths.

Privacy and leakage

Test whether models, logs, and retrieval layers expose sensitive data directly or through inference.

Operational response

Measure detection latency, containment speed, rollback quality, and evidence availability during incidents.

Governance fit

Check whether controls, documentation, and review points satisfy the actual regulatory or institutional workflow.

Open questions

  • How do we evaluate safe failure behavior when the system has tool use, retrieval, and external integrations?
  • What should a minimum viable audit trail contain for regulated or security-sensitive AI workflows?
  • How can teams compare vendor systems when critical security properties are distributed across model, platform, and operations?
  • Which deployment controls genuinely reduce risk, and which ones only create paper compliance?

Signals shaping this direction