Day 1 · Student handout

AI Gateway Architecture Evidence

Day 1 turns a model-centric LLM demo into a system-centric enterprise AI architecture exercise. The learner produces architecture evidence before Day 2 implementation.

June 2026 Published student handout ai-systems-engineering-handbook

Overview

A demo that calls an LLM API is not the same as an enterprise AI system.

Enterprise AI delivery is proven by a system package with architecture, governance, deployment, security, validation, and customer-delivery evidence, not by a model demo alone.

Learning outcomes

Distinguish a model demo, AI application, AI system, and enterprise-deliverable AI system.
Read an HTTP/JSON AI request as a system contract with identity, route, action, resource, environment, and trace fields.
Explain why AI Gateway is the control plane for identity, policy, routing, quota, audit, review, and model-serving boundaries.
Map free text into structured action proposals that still pass schema validation, policy evaluation, tool brokering, and audit.
Design a request lifecycle, component responsibility table, and risk-control map for a public-safe enterprise scenario.

Gateway questions

The first move is to read one AI interaction as a system request, not as a chat transcript. A deliverable architecture must explain who sent the request, what the caller may access, which agent can handle the task, which data and tools are allowed, and which audit record proves the lifecycle later.

Who sent the request?
What can this user access?
Which agent is allowed to handle the task?
Which data sources can the agent retrieve?
Which tools can the agent call?
Which tool calls create side effects?
Which output checks run before the answer returns?
Which actions require human review?
Which audit record proves the request lifecycle later?

HTTP and JSON contract

The gateway treats an HTTP request as one AI task entering the system. The request body is more than a user message; it carries trusted server-side identity, role, requested agent, requested tools, task type, metadata, and trace context.

Client-provided fields are useful hints, but the gateway resolves identity, permission, agent scope, policy, and allowed tools from trusted sources.

HTTP gives a shared boundary for web apps, mobile apps, Slack bots, webhooks, load balancers, security tools, and logs.
JSON gives the gateway inspectable fields for policy, schema validation, tool contracts, and audit.
Status codes distinguish malformed input, missing login, denied access, rate limits, service failures, and successful review states.

Serverless is hosting, not trust

For Day 1, serverless API is treated as a hosting pattern for a trusted gateway handler. It changes the execution model, but it does not remove backend responsibility.

The handler still verifies tokens, resolves permissions, validates schemas, evaluates policy, protects secrets, writes audit events, and returns explicit status or review state.

Serverless API fits short gateway entrypoints, webhooks, audit writes, job creation, and lightweight automation.
Containers, Kubernetes, and managed services fit long-running gateway cores, streaming sessions, memory services, and GPU inference.
Mature enterprise AI systems usually use both.

Free text becomes actions

Human input can remain natural language, but gateway decisions require structured data. An LLM may propose intent, slots, and actions; it must not replace the policy engine.

A useful gateway produces multi-label intent labels, action candidates, risk labels, missing slots, ambiguity signals, and a recommended next step.

Low-risk, high-confidence requests can execute read-only actions.
Low-risk, low-confidence requests should ask one minimal clarification question.
High-risk, high-confidence requests should create a draft or preview before confirmation.
High-risk, low-confidence requests should clarify, deny, or escalate to human review.

Model serving behind the gateway

vLLM and SGLang are model-serving engines in the inference data plane. They load model weights, manage batching, KV cache, streaming, latency, and GPU memory.

The AI Gateway remains the control plane in front of serving: identity, role, permission, quota, policy, retrieval boundaries, guardrails, audit, and review.

vLLM is a strong first tool for general OpenAI-compatible local or private serving.
SGLang is a strong candidate for structured generation, prefix-heavy prompts, and complex LLM workflows.
Neither engine replaces enterprise gateway governance.

Submission packet

The Day 1 artifact is reviewable architecture evidence, not a finished backend. A good packet makes the control boundaries inspectable before implementation starts.

Architecture diagram.
Component responsibility table.
Request lifecycle.
Risk-control map.
Prompt-only governance critique.

Request lifecycle template

Client sends `POST /gateway/requests` with a JSON body.
Gateway route receives the request and calls the handler.
Handler creates `trace_id`.
Gateway authenticates caller.
Gateway resolves trusted identity, role, permissions, and agent scope.
Gateway validates schema and normalizes free text or form hints into actions.
Gateway classifies task risk and evaluates policy.
Gateway selects an agent from registry.
Connector filters data by permission and metadata.
RAG returns allowed source IDs and active document versions.
Model generates response from allowed context.
Tool broker validates schema, permission, timeout, and side effects.
Review-required actions enter human review; denied actions are not executed.
Audit log records trace, policy, sources, tools, guardrail, review, and outcome.
Server returns HTTP status plus JSON response or review status.

Core vocabulary

AI Gateway: Unified AI request entrypoint for routing, policy, data, tools, guardrails, audit, and review.
Policy gate: The decision point that returns allow, deny, or review_required from structured input.
Tool broker: The enforcement point for tool schemas, permissions, side effects, timeouts, approval, and audit.
Model serving engine: The inference layer, such as vLLM or SGLang, that runs model requests efficiently behind the gateway.
Audit log: Lifecycle evidence that records identity, role, policy, source IDs, tool decisions, guardrails, review state, and outcome.

Risk-control map

Prompt injection -> retrieval filter, instruction hierarchy, output guardrail, and red-team test log.
PII leakage -> PII detector, masking, log minimization, and masked audit event.
Tool abuse -> tool broker, schema validation, approval gate, and tool decision log.
Permission bypass -> RBAC, metadata filtering before retrieval, and policy decision log.
Missing audit trail -> trace ID, source IDs, audit schema, and complete audit event.

Worksheet prompts

Fill the HTTP method, route path, authentication signal, input mode, raw message, controlled hints, trusted server-side fields, requested agent, read-only tool, side-effect tool, and audit fields.
Normalize the request into trace, channel, actor, task, requested actions, environment, and policy inputs.
Choose one public-safe scenario such as campus IT helpdesk, bank internal knowledge assistant, medical intake support, or manufacturing audio monitoring.
Write one allow example, one deny example, and one review_required example.

Day 1 submission

AI Gateway architecture diagram.
Component responsibility table.
Request lifecycle with 10-15 steps.
Risk-control map.
One paragraph explaining why prompt-only governance is insufficient.

Next gate

Day 2 uses the Day 1 gateway lifecycle as the control surface for agent registration, tool/data/memory boundaries, policy gates, audit events, and red-team seeds.

Source boundary

The website publishes the student-facing learning path and public-safe summaries. The handbook repo remains the canonical home for worksheets, instructor guides, rubrics, reference answers, handoffs, and detailed source packages.

Canonical source: accelerators/enterprise-ai-architecture-sprint/day-01-ai-gateway/student-handout.md