AWS AgentCore Interview Questions: Junior, Mid, and Senior Answers Grounded in Real Production Incidents and the 2024-2026 Bedrock Stack
AWS Bedrock AgentCore interview questions in 2026 test three things in parallel: your ability to name all 10 platform components and explain what each does in production (Runtime, Memory, Identity, Browser, Code Interpreter, Gateway, Observability, Evaluations, Registry, Policy), your command of the architectural distinction between AgentCore Runtime and Lambda that most junior candidates state backward, and your recall of the named production incidents senior interviewers probe by date and mechanism — the October 2025 us-east-1 DynamoDB cascade, the Asana MCP cross-tenant data leak (May 2025), the Unit42 DNS-based sandbox bypass (April 7, 2026), and the Unit42 IAM God Mode role-chaining exploit (April 6, 2026).
- What is AWS Bedrock AgentCore, and how does it relate to Bedrock the broader service?
- Walk me through the AgentCore components. Why is each one a separate service?
- When would you choose AgentCore over rolling your own agent runtime on Lambda + DynamoDB?
- How does AgentCore relate to MCP? Walk me through Gateway.
- What’s the difference between Bedrock and AgentCore?
- Walk me through how you’d deploy an agent to AgentCore Runtime.
- Why use AgentCore Memory instead of just writing state to DynamoDB?
- When would you use the Browser tool vs Code Interpreter?
- What is AgentCore Evaluations and why does it matter?
- Walk me through how AgentCore Identity handles OAuth2 to Slack.
- How would you architect multi-tenant isolation through AgentCore Gateway?
- What custom metrics + traces would you instrument on an AgentCore agent?
- What is AgentCore Policy, and how is Cedar different from IAM JSON policies?
- How would you provision an AgentCore agent via CloudFormation?
- How would you design your AgentCore deployment to survive a regional cascading outage like Oct 2025 us-east-1?
- Walk me through the Asana MCP cross-tenant data leak (May 2025) — what does it teach about AgentCore Gateway architecture?
- What did Unit42 disclose about AgentCore Code Interpreter (April 7, 2026), and how would you defend against it?
- What was Unit42’s IAM God Mode disclosure (April 6, 2026), and what hardening would you apply?
- AgentCore Runtime sometimes silently restarts sessions. How do you defend a long-running workflow?
- When would you choose AgentCore Runtime over rolling your own agent on Lambda?
AWS AgentCore Hiring in 2026: What Actually Changed
This guide is for software engineers, ML engineers, AI engineers, and cloud architects targeting AWS solutions architect — AI/ML roles, agentic AI engineer positions at AWS-shop startups, Bedrock specialist roles at partner consulting firms (Slalom, Deloitte, Accenture), or AgentCore platform engineer seats at enterprises building production agent infrastructure on Bedrock. It is not for AWS certification cram or “become an AWS architect in 30 days” bootcamp content.
Structural shifts in the 2024-2026 hiring cycle worth internalizing:
- AgentCore launched July 16, 2025 at AWS re:Inforce as a preview, reached GA October 13, 2025 in 9 AWS regions, and expanded at re:Invent December 2025 — adding Policy (Cedar guardrails), Evaluations (13 built-in evaluators), episodic memory, and bidirectional streaming. Conflating the preview and GA dates during system design questions costs credibility immediately.
- MCP integration is now expected at every senior interview. AgentCore Gateway converts REST APIs, Lambda functions, and OpenAPI specs into MCP-compatible tools agents can discover and call. Treating MCP as optional signals you are behind.
- Multi-agent patterns are table stakes. AWS released sample repos with Strands architecture; Agent-to-Agent (A2A) protocol support was confirmed at GA. Interviewers probe agent-to-agent discovery, trust models, and Registry-based tool federation.
- CloudFormation support is available at GA. AgentCore is fully provisionable as IaC. Senior interviews include “walk me through your CloudFormation template” as a qualifier.
- The platform shift — from rolling your own on Lambda + DynamoDB + Cognito to managed Bedrock-based agent runtime — is what interviewers test when they ask architecture tradeoff questions.
- Security scrutiny has arrived. Unit42’s back-to-back April 2026 disclosures — DNS-based sandbox bypass in Code Interpreter (April 7) and IAM role-chaining God Mode (April 6) — signal that AgentCore has reached the enterprise threshold where adversarial researchers start probing systematically.
Salary signal (industry-reported aggregator data, hedged): AWS AI/ML solutions architects with AgentCore production experience are reported in the $170K–$280K total-compensation range in US markets; AWS partner consulting roles carry a 20–40% premium over equivalent in-house positions at similar seniority.
What AWS AgentCore Interviews Actually Test in 2026
Interviewers at AWS-shop enterprises, consulting partners, and AI-native startups cycle through four question types:
- AgentCore component knowledge: Name all 10 components, explain what each does operationally, and distinguish GA components from those added at re:Invent (Policy, Evaluations). “I know the seven main ones” is a junior-tier signal.
- Architecture tradeoffs: AgentCore Runtime vs Lambda (stateful vs stateless; 8 hours vs 15 minutes; Docker vs zip; microVM isolation vs shared execution). Senior interviewers want mechanism-level answers, not “AgentCore handles it for you.”
- Named-incident postmortems: The five incidents — us-east-1 DynamoDB cascade (October 19–20, 2025), Asana MCP cross-tenant leak (May 2025), Unit42 sandbox bypass (April 7, 2026), Unit42 IAM God Mode (April 6, 2026), and AgentCore silent session restart (November 2025) — are architecture probes dressed as war stories. Knowing the date and mechanism is the threshold; mapping the mitigation to your system design is what scores.
- IAM + Identity + multi-tenant isolation: AgentCore Identity adds an outbound token vault (KMS-encrypted OAuth2 for Slack, Google Drive, Salesforce), an inbound auth layer (SigV4, JWT, No-Auth), and enterprise IdP integration (Cognito, Okta, Microsoft Entra). The multi-tenant Gateway isolation question is the most common senior system design probe in AgentCore interviews as of 2026.
What is AWS Bedrock AgentCore, and how does it relate to Bedrock the broader service?
Concept: Platform positioning and scope | Difficulty: Foundational | Stage: Phone screen / intro technical
Amazon Bedrock AgentCore is a suite of 10 fully managed infrastructure services for building, deploying, and scaling AI agents in production: Runtime, Memory, Identity, Browser, Code Interpreter, Gateway, Observability, Evaluations, Registry, and Policy. Bedrock is the broader managed service for foundation models and AI tooling; AgentCore is specifically the agent execution and management layer within it. A useful analogy from the AWS re:Post community: Bedrock Agents (the managed, configuration-based service) is like Fargate — click-through setup, limited customization. AgentCore is like EKS — flexible infrastructure for BYO-framework agents (LangChain, LangGraph, CrewAI, or custom). AgentCore also supports any LLM, not just Amazon Bedrock-hosted models.
Walk me through the AgentCore components. Why is each one a separate service?
Concept: Component architecture and separation of concerns | Difficulty: Foundational | Stage: Technical screen
There are exactly 10 AgentCore components — knowing all 10, including the two added at re:Invent December 2025, is a threshold test. Each solves a problem production teams otherwise build from scratch: Runtime — stateful agent execution in isolated microVMs; Memory — persistent state across sessions; Identity — inbound and outbound agent authentication; Browser — managed headless Chromium; Code Interpreter — sandboxed code execution; Gateway — MCP-native tool federation; Observability — CloudWatch + X-Ray integration; Evaluations (December 2025) — automated quality scoring with 13 built-in evaluators; Registry — agent and tool catalog for A2A discovery; Policy (December 2025, Cedar) — deterministic guardrails on tool calls. The interviewer wants all 10 and the separation-of-concerns reasoning — not a list of six.
When would you choose AgentCore over rolling your own agent runtime on Lambda + DynamoDB?
Concept: Build-vs-buy architectural judgment | Difficulty: Foundational | Stage: Technical screen / system design
Choose AgentCore when your agent needs stateful session continuity — context carried across multiple LLM calls without manually serializing to DynamoDB on every invocation. Rolling your own means owning session-state serialization, idle-timeout handling, per-tenant isolation, JWT auth middleware, MCP tool federation, and observability. AgentCore absorbs all of it. Choose rolling-your-own when compute is genuinely short and stateless, cost constraints make the per-session model uncompetitive at your scale, or you are in a region or compliance environment where AgentCore is not yet available. The meta-answer: “always use AgentCore” and “always use Lambda” are both non-answers — the choice depends on state requirements, execution duration, and operational complexity tolerance.
How does AgentCore relate to MCP? Walk me through Gateway.
Concept: MCP integration and tool federation | Difficulty: Foundational | Stage: Technical screen
AgentCore Gateway is an MCP-compatible tool gateway: it converts REST APIs, Lambda functions, and OpenAPI specs into discoverable MCP tools that agents can invoke at runtime. Gateway handles inbound auth (IAM SigV4, JWT, No-Auth) and manages four operational concerns: Tool Integration & Modeling, Discovery & Invocation, Security & Identity, and Operations & Governance. An agent on AgentCore Runtime calls Gateway to discover tools, retrieve schemas, and invoke them via standardized MCP protocol — no hardcoded API clients. The key architectural point: tools in Gateway are reusable across multiple agents; tools bundled inside a Runtime container are not. Senior interviewers probe whether you understand that decoupling and when centralized Gateway governance is worth the network hop over tight-coupling tools inside the container.
If your interviewer shifts from AgentCore specifics to the underlying protocol — asking about protocolVersion 2024-11-05, the JSON-RPC 2.0 handshake, or the Asana cross-tenant MCP leak — the MCP architecture interview prep for AI engineers and agent platform teams covers the Anthropic Nov 2024 spec, all three transport options (stdio/SSE/Streamable HTTP), server and client primitives, and the Wiz security briefing that defines the threat model AgentCore Gateway is designed to address. For enterprise shops evaluating which terminal-based coding agents to connect through AgentCore Gateway, the Codex CLI enterprise governance interview prep covering MCP centralization and audit trails covers how OpenAI Codex CLI connects to MCP-federated tool registries, the governance model for audit logging tool invocations, and the architectural tradeoffs between agent-local tool bundling and centralized Gateway enforcement that senior interviewers probe when AgentCore is the deployment platform. For a fintech equivalent that probes the same LLM-deployment muscle, see Klarna interview questions.
AgentCore Components Reference (2026)
Interviewers want operational facts — what each component does in production — not marketing-language descriptions. The 10-component count matters: Policy and Evaluations were added at re:Invent December 2025. Listing nine reveals pre-re:Invent sources.
| Component | What It Does | AWS Service Equivalent | Common Interview Probe |
|---|---|---|---|
| Runtime | Serverless container hosting for AI agents; each session runs in a dedicated microVM; stateful-by-design with session continuity up to 8 hours; ARM64 Docker containers pushed to ECR | Lambda (but stateful, longer execution, microVM-isolated) | “Why isn’t this just Lambda?” |
| Memory | Two tiers: short-term (per-session events via create_event/list_events API) and long-term with three strategies — semantic (verbatim facts), summary (condensed episodes), user preference (preferences across sessions) | DynamoDB + S3 (managed) | “How would you design TTL for agent memory?” |
| Identity | Two auth directions: inbound (IAM SigV4, JWT/OAuth 2.0, No-Auth) and outbound (KMS-encrypted token vault with @requires_access_token decorator for OAuth2 to Slack, Google Drive, Salesforce) | IAM + Cognito + Okta + Microsoft Entra | “What’s the difference vs vanilla IAM?” |
| Browser | Managed headless Chromium service; integrates with Playwright and BrowserUse; supports Web Bot Auth for login-gated sites; available at GA (October 2025) | No direct AWS equivalent | “When would you use Browser vs Code Interpreter?” |
| Code Interpreter | Sandboxed Python, JavaScript, and TypeScript execution; pre-installed data science stack (NumPy, pandas, matplotlib, scipy); watermark billing model; Unit42 disclosed DNS-based sandbox bypass (April 7, 2026) | No direct AWS equivalent | “What did Unit42 disclose, and how would you defend against it?” |
| Gateway | MCP-native tool federation; converts REST APIs, Lambda functions, and OpenAPI specs into discoverable MCP tools; handles inbound auth; max 50 gateways / 100 tools per gateway per account | API Gateway (but MCP-native) | “How would you architect multi-tenant isolation through Gateway?” |
| Observability | OpenTelemetry (OTEL) compatible, CloudWatch-powered; three-level hierarchy: Sessions > Traces > Spans; captures reasoning steps, tool invocations, and model interactions | CloudWatch + X-Ray | “What custom metrics matter for agents?” |
| Evaluations | Automated agent quality scoring with 13 built-in evaluators; launched as preview at re:Invent December 2025; prevents “test in production” failure mode | No direct AWS equivalent | “Walk me through your eval methodology” |
| Registry | Centralized catalog for discovering and registering agents, MCP servers, and tools; enables agent-to-agent (A2A) discovery; NOT “Code Registry” — that term does not exist | ECR (for the Docker layer) | “What’s stored in Registry vs ECR?” |
| Policy | Cedar-language guardrails (same Cedar as AWS Verified Permissions) for deterministic control over tool calls; launched at re:Invent December 2025; the 10th component added to complete the platform | IAM policies (but Cedar-typed, deterministic) | “Cedar vs IAM JSON — when would you use each?” |
AgentCore Runtime vs Lambda: A Senior Interviewer’s Comparison Table
Candidates who reduce the distinction to “AgentCore is the managed option” will not pass senior screens. The differences are mechanical — state model, execution ceiling, and packaging format are categorically different.
| Dimension | Lambda | AgentCore Runtime |
|---|---|---|
| State model | Stateless-by-default; state must be externalized to DynamoDB, S3, or ElastiCache between invocations | Stateful-by-design — session continuity is built in; session state persists across LLM calls within a session without manual externalization |
| Max execution duration | 15 minutes hard cap | Up to 8 hours per session — necessary for multi-step agent workflows; supports real-time and asynchronous long-running jobs; payloads up to 100 MB |
| Packaging | Zip + Lambda layers OR container image (amd64 or arm64) | Docker image pushed to ECR, ARM64 only — x86 containers fail at launch; no zip/layers fallback; must expose port 8080 with /invocations and /ping endpoints |
| Deployment target | Lambda managed runtime environment | Serverless Runtime Environment with auto-scale on ARM64 Graviton microVMs; per-session microVM isolation |
| Billing model | Per-invocation + duration; charges during I/O wait | Consumption-based; no charge during I/O wait time (example: ~$0.0007625 per 2-minute session with 20% idle) |
| Ideal use case | Short stateless API handlers, event-driven glue, simple request/response pipelines | Long-running stateful agent workflows with session continuity, multi-step reasoning, MCP tool federation, and per-tenant microVM isolation requirements |
The modal junior answer — “AgentCore Runtime is just Lambda for agents” — misses the stateful + session-continuity + Docker-packaging story entirely. That distinction is the most-probed question at AgentCore-specific interviews in 2026. Practitioner rajmurugan’s dev.to guide frames it precisely: “AgentCore Runtime removes Lambda’s 15-minute limit, handles JWT validation natively, supports VPC networking, and provides SSE streaming — things Lambda cannot do for long-running agents.”
Junior-Tier Questions: AgentCore Concepts and Components (0–2 Years)
What’s the difference between Bedrock and AgentCore?
Concept: Service hierarchy and scope | Difficulty: Junior | Stage: Phone screen
Amazon Bedrock is the broader managed service layer for foundation models and AI tooling — it includes model inference, knowledge bases, guardrails, and the Bedrock Agents managed service. AgentCore is a subset of Bedrock specifically for agent execution infrastructure. The practical distinction: Bedrock Agents is a configuration-based service (think: console setup, action groups, prompt templates, no custom containers). AgentCore is flexible infrastructure — you bring your own framework (LangChain, LangGraph, CrewAI, or custom), package it as a Docker container, and deploy to AgentCore Runtime. The AWS re:Post community comparison captures it well: Bedrock Agents is like Fargate, AgentCore is like EKS. AgentCore also supports any LLM — not just Amazon Bedrock-hosted models — which Bedrock Agents does not offer with the same flexibility.
Walk me through how you’d deploy an agent to AgentCore Runtime.
Concept: Deployment workflow and container requirements | Difficulty: Junior | Stage: Technical screen
The deployment flow has four stages. First, write a Python entry-point using the @app.entrypoint decorator from bedrock_agentcore.runtime.BedrockAgentCoreApp — this is where your agent logic lives. The SDK automatically serves /invocations (POST) and /ping (GET) on port 8080 when app.run() is called. Second, package as a Docker image targeting ARM64 — x86 containers will fail at launch, and this is a common gotcha. Third, push the image to ECR using the bedrock-agentcore-starter-toolkit CLI (agentcore_runtime.configure() builds the config and Dockerfile). Fourth, deploy to the Serverless Runtime Environment via agentcore_runtime.launch() — AgentCore handles auto-scaling, microVM isolation, and session routing. The /ping endpoint must return “Healthy” or “HealthyBusy” — returning neither causes 504 Gateway Timeout errors. CloudFormation support for this full stack was available at GA (October 2025).
Why use AgentCore Memory instead of just writing state to DynamoDB?
Concept: Memory abstraction and multi-tier state management | Difficulty: Junior | Stage: Technical screen
AgentCore Memory provides two tiers that DynamoDB does not model natively: short-term memory (create_event/list_events API for active session state) and long-term memory with three distinct strategies. The semantic strategy stores verbatim facts for exact recall. The summary strategy stores condensed representations of conversation episodes, trading verbatim accuracy for token efficiency. The user preference strategy extracts and persists user-specific preferences across sessions. Rolling this on DynamoDB means you own the schema design, the TTL logic, the extraction pipeline for preference modeling, and the query patterns for each retrieval type. AgentCore Memory handles all of that as a managed service. The operational cost difference is real for teams scaling beyond a handful of agents — per the AWS ML blog, episodic memory (recall of past multi-step task episodes) was added in December 2025 as a preview feature, which DynamoDB alone cannot replicate without significant custom infrastructure.
When would you use the Browser tool vs Code Interpreter?
Concept: Tool selection and appropriate abstraction | Difficulty: Junior | Stage: Technical screen
Use AgentCore Browser when the agent needs to interact with web content that requires a real browser — login-gated sites, JavaScript-rendered pages, form submissions, or any site where a headless Chromium session is necessary. Browser integrates with Playwright and BrowserUse and supports Web Bot Auth for authenticated access. Use Code Interpreter when the task is deterministic computation — data analysis, file parsing, statistical modeling, or programmatic generation. Code Interpreter provides a sandboxed Python, JavaScript, and TypeScript environment with a pre-installed data science stack (NumPy, pandas, matplotlib, scipy). The key distinction: Browser is for the web as a UI; Code Interpreter is for computation. A common mistake is reaching for Browser to scrape structured data from an API — that task belongs in Code Interpreter (or a Gateway-integrated REST API call). The Unit42 DNS-based sandbox bypass disclosure (April 7, 2026) is specifically about Code Interpreter — it underscores that even managed sandbox environments require careful egress controls.
What is AgentCore Evaluations and why does it matter?
Concept: Agent quality assurance and testing methodology | Difficulty: Junior | Stage: Technical screen
AgentCore Evaluations provides automated agent quality scoring with 13 built-in evaluators, launched as a preview at re:Invent December 2025. It matters because production agents are non-deterministic — the same input produces different outputs across runs — making traditional unit testing insufficient. Evaluations runs agents against structured test cases and scores them on dimensions the 13 evaluators capture (accuracy, faithfulness, tool-call correctness, and others). The practical significance: teams without an evaluation layer end up testing in production, which is the failure mode Evaluations exists to prevent. At Clearwater Analytics — 800 agents and 500 tools in production — the reported principle is “context is king”: output quality is determined more by context quality than model choice, and Evaluations is the instrument that surfaces context quality problems before they reach users. Interviewers use this question to probe whether you have disciplined agent development practices, not just the ability to deploy a container.
Mid-Tier Questions: Identity, Gateway, and Production Patterns (2–5 Years)
Walk me through how AgentCore Identity handles OAuth2 to Slack.
Concept: Outbound auth delegation and token vault | Difficulty: Mid | Stage: Technical screen / system design
AgentCore Identity’s outbound flow operates through a KMS-encrypted token vault. You store the Slack OAuth2 credentials in the vault ahead of time — Identity encrypts them with AWS KMS and associates them with the agent’s identity context. At runtime, you annotate the function that needs Slack access with the @requires_access_token decorator from the AgentCore Identity SDK. When the agent calls that function, Identity automatically fetches the stored token from the vault, decrypts it, and passes it to the function — without the token ever appearing in agent code or environment variables. This is the “delegated access” pattern: the agent acts on behalf of the user or service that originally authorized the Slack OAuth flow, rather than using a service account. AgentCore Identity supports OAuth2/OIDC flows for Slack, Google Drive, and Salesforce as named third-party integrations. For enterprise IdPs, it integrates with Amazon Cognito, Okta, and Microsoft Entra ID. The security benefit over hardcoded API keys: token rotation happens without code changes, and the token is never in the container image.
How would you architect multi-tenant isolation through AgentCore Gateway?
Concept: Multi-tenant tool federation and cross-tenant security | Difficulty: Mid | Stage: System design
Multi-tenant Gateway architecture requires three layers working together. First, per-tenant Gateway endpoints — each tenant gets its own Gateway with a distinct endpoint URL, so tool invocations are scoped to a single tenant’s tool set by construction. Second, Identity-scoped tool sets per tenant — the @requires_access_token decorator and the outbound token vault ensure that tool credentials (the OAuth2 tokens for Slack, Google Drive, Salesforce) are bound to a specific tenant’s identity context, preventing one tenant’s credentials from leaking to another’s agent. Third, Policy (Cedar) guardrails per tenant — the Cedar-language Policy component lets you write deterministic rules about which tenants can invoke which tools, enforced at the Gateway layer before the tool executes. The negative reference point here is the Asana MCP cross-tenant data leak (May 2025): approximately 1,000 organizations were affected over a 34-day window when a bug in Asana’s MCP server caused one user’s workspace data to be returned to a different user’s agent session. The root cause was shared process state in the third-party MCP implementation — exactly what per-tenant microVM isolation and per-tenant Identity scoping are designed to prevent. Before launching any multi-tenant MCP integration, run integration tests that explicitly verify cross-tenant boundaries.
What custom metrics + traces would you instrument on an AgentCore agent?
Concept: Agent-specific observability and signal design | Difficulty: Mid | Stage: Technical screen
AgentCore Observability is OpenTelemetry (OTEL) compatible and CloudWatch-powered, with a three-level hierarchy: Sessions > Traces > Spans. Sessions = full agent interaction lifecycle; Traces = individual reasoning chains; Spans = tool invocations and model calls. Beyond the default instrumentation, add custom metrics for: session restart rate (the silent restart problem from November 2025 makes this a critical alarm); tool call latency by tool (identifies Gateway bottlenecks); LLM inference latency and token counts (surfaces context bloat before it trips the 15-minute idle timeout); and tool-call failure rate by tool (early warning for API degradation). For traces, instrument the full agent reasoning loop — not just the final tool call — so you can reconstruct what the agent decided and why, which is the input the Evaluations component scores. Phil Norton of Clearwater Analytics (800 agents in production) frames this directly: context quality determines output quality, and observability is how you verify context integrity before issues reach users.
What is AgentCore Policy, and how is Cedar different from IAM JSON policies?
Concept: Deterministic guardrails and policy language | Difficulty: Mid | Stage: Technical screen
AgentCore Policy uses Cedar — the same policy language as AWS Verified Permissions — for deterministic guardrails on tool calls. It was launched at re:Invent December 2025 as the 10th AgentCore component. Cedar is designed for fine-grained, entity-attribute-based authorization with formal verification properties that IAM JSON policies cannot match. IAM JSON policies are evaluated by the IAM service for AWS API access control; they operate on principals, resources, and actions, with condition keys for context. Cedar policies are evaluated at the application layer — in this case, at the AgentCore tool call boundary — and support richer entity relationship modeling (e.g., “this agent can only invoke this tool if the requesting user is in this tenant and has this role”). The key practical difference for agent systems: Cedar enforces what tools an agent can call and under what conditions, whereas IAM enforces what AWS API calls the agent’s execution role can make. You need both: IAM for the AWS control plane, Cedar/Policy for the agent tool call plane. Senior interviewers often probe whether candidates conflate these two layers — Policy was specifically added to provide deterministic guardrails that Bedrock Guardrails (a separate product) does not offer at the tool-invocation level.
How would you provision an AgentCore agent via CloudFormation?
Concept: Infrastructure-as-code and deployment lifecycle | Difficulty: Mid | Stage: Technical screen / system design
CloudFormation support was available at GA (October 2025) — IaC provisioning is a mid-tier expectation. A complete stack covers: an ECR repository for the ARM64 agent container; an AgentCore Runtime endpoint with version + alias for traffic routing; a least-privilege IAM execution role scoped to only the Bedrock model calls and tool invocations the agent needs; optionally, AgentCore Memory with TTL and strategy configuration; Gateway resources converting API specs to MCP tools; and Policy (Cedar) guardrail resources. Deployment lifecycle: push a new image to ECR, update the stack to reference the new tag, deploy — the runtime manages version transition. Production discipline: version endpoint aliases; use weighted routing between versions for canary deploys; validate that new images expose /invocations and /ping on port 8080 before cutting traffic. The bedrock-agentcore-starter-toolkit CLI automates the ECR push and endpoint launch for development; CloudFormation is the production path.
Senior-Tier Questions: Production Incidents and Architecture (5+ Years)
How would you design your AgentCore deployment to survive a regional cascading outage like Oct 2025 us-east-1?
Concept: Multi-region resilience and cascading failure mitigation | Difficulty: Senior | Stage: System design
The October 19–20, 2025 us-east-1 outage was a DynamoDB DNS race condition: two DNS Enactor processes raced, causing one to clean up the other’s plan as “stale,” removing all IP addresses for the DynamoDB regional endpoint. Lambda, ECS, EKS, Fargate, and AgentCore Runtime — which depends on DynamoDB for session state management — all cascaded into failures. The design response has three layers. First, multi-region active-passive or active-active deployment: provision AgentCore agents in at least one secondary region (us-west-2, eu-west-1) with Route 53 health-check-based failover. Second, degraded-mode Lambda fallback: for critical workflows, maintain a lightweight Lambda path that handles the most essential tool calls without full AgentCore session continuity. Third, circuit breakers on Bedrock model calls: implement exponential backoff with jitter and a circuit breaker that fails fast rather than queuing agent sessions that will time out. The Register’s postmortem highlights a second-order failure: DynamoDB entered congestive collapse during recovery — design your recovery procedures to avoid overwhelming the service on restoration as well.
Walk me through the Asana MCP cross-tenant data leak (May 2025) — what does it teach about AgentCore Gateway architecture?
Concept: MCP cross-tenant isolation and shared-state failure mode | Difficulty: Senior | Stage: System design / security
In May 2025, Asana’s MCP server launch contained a shared process state bug: one user’s Asana workspace responses were returned to a different user’s agent session. Approximately 1,000 organizations were affected over a 34-day window. The root cause was not a network-layer access control bypass — it was a shared-mutable-state bug in the MCP implementation, where per-session context was not properly isolated before tool responses were constructed. AgentCore Runtime’s per-session microVM model is directly motivated by this incident. The architectural lessons for Gateway design: use per-tenant Gateway endpoints (not a shared gateway with tenant context in headers); scope Identity token vault credentials to a specific tenant identity; enforce Policy (Cedar) guardrails at the tool invocation boundary, not just the network layer; and run explicit cross-tenant integration tests verifying that tenant A’s session cannot receive tenant B’s data under any timing or error condition before any MCP server goes to production.
What did Unit42 disclose about AgentCore Code Interpreter (April 7, 2026), and how would you defend against it?
Concept: Sandbox security, DNS egress controls, and shared-responsibility model | Difficulty: Senior | Stage: Security / system design
On April 7, 2026, Palo Alto Networks Unit42 published “Cracks in the Bedrock: Escaping the AWS AgentCore Sandbox” (CVE-2026-4269), demonstrating a DNS-based sandbox bypass technique in AgentCore’s Code Interpreter environment. The mechanism: code executing inside the Code Interpreter sandbox can exfiltrate data or communicate with external infrastructure via DNS queries, because the sandbox’s network isolation controls blocked TCP/IP egress but did not apply equivalent controls to DNS resolution. DNS tunneling — encoding data in DNS query subdomains and reading it at a controlled nameserver — is a well-understood exfiltration channel that many sandbox implementations overlook. Defense layers: at the VPC level, restrict egress from the Code Interpreter execution environment to only the specific endpoints it legitimately needs — not the open internet. Add a DNS firewall (AWS Route 53 Resolver DNS Firewall) that blocks queries to domains not on an explicit allowlist, preventing DNS tunneling to attacker-controlled nameservers. Apply least-privilege IAM to the Code Interpreter execution role — if the sandbox cannot make outbound AWS API calls beyond what the agent explicitly needs, the blast radius of a bypass is constrained. Finally, audit all sandbox-egress patterns in your CloudWatch logs; anomalous DNS query volumes from an agent session are a high-confidence signal of attempted exfiltration. The broader lesson: “AWS manages the sandbox” does not mean AWS absorbs all sandbox security responsibility. The shared-responsibility model applies at the agent layer too.
What was Unit42’s IAM God Mode disclosure (April 6, 2026), and what hardening would you apply?
Concept: IAM privilege escalation via role chaining and least-privilege hardening | Difficulty: Senior | Stage: Security / system design
On April 6, 2026, Unit42 published a disclosure on an IAM role-chaining privilege escalation pattern affecting AgentCore’s role assumption model. A compromised agent with limited IAM permissions could manipulate the service’s role chaining behavior to assume a much broader role — “God Mode” — gaining access well beyond its intended scope. The attack surface: the service-linked role AWSServiceRoleForBedrockAgentCoreRuntimeIdentity (introduced at GA, October 13, 2025) combined with misconfigured trust policies that allow AgentCore to assume roles not explicitly scoped to the agent’s task. Hardening measures: add aws:SourceAccount and aws:SourceArn condition keys on every AssumeRole policy AgentCore execution roles can use, preventing cross-account or cross-service escalation. Apply session policies when AgentCore assumes a role to constrain the assumed-role scope within that session. Enable IAM Access Analyzer to detect anomalous assumption chains and create AWS Config rules that alert on any AgentCore execution role with sts:AssumeRole permissions lacking explicit condition keys. The broader lesson from the Unit42 disclosure: agent-specific IAM surface — roles that are assumed dynamically at runtime — creates privilege escalation paths that static IAM reviews often miss.
AgentCore Runtime sometimes silently restarts sessions. How do you defend a long-running workflow?
Concept: Idle timeout handling, idempotency, and production resilience | Difficulty: Senior | Stage: System design
The silent session restart problem, first reported on AWS re:Post in November 2025, stems from the 15-minute idle timeout: if the agent does not respond within the idle window — e.g., waiting on a large-context LLM inference call — the runtime silently restarts the container with no error surfaced to the client. The immediate fix is the HEALTHY_BUSY ping handler: return “HealthyBusy” from /ping during long operations, signaling the container is alive and busy rather than idle-hung. Beyond that: design all workflow steps as idempotent; use the Memory short-term event log (create_event/list_events) to checkpoint progress after each completed step so a restart picks up rather than re-runs from the beginning. Externalize all critical state immediately — never accumulate in-memory state that restart would lose. Instrument CloudWatch alarms on session-restart events to detect the pattern before users surface it. Note the two-number distinction: maximum session lifetime is 8 hours; idle timeout is 15 minutes. Production teams that conflate these build agents that fail silently on long workflows.
When would you choose AgentCore Runtime over rolling your own agent on Lambda?
Concept: Runtime architecture tradeoffs — the differentiator question | Difficulty: Senior | Stage: System design
The separation between senior and junior answers here is mechanical, not philosophical. Three concrete differentiators Lambda cannot replicate without significant custom infrastructure:
State model: Lambda is stateless-by-default. Between invocations, you serialize session state to DynamoDB, deserialize at the next call start, and manage race conditions across concurrent Lambda invocations touching the same session. AgentCore Runtime is stateful-by-design — session continuity is built in. For multi-step workflows triggering 5–20 LLM calls with tool invocations between them, the complexity difference is substantial.
Execution duration: Lambda’s hard cap is 15 minutes. AgentCore Runtime supports sessions up to 8 hours with payloads up to 100 MB and bidirectional streaming (December 2025). Long-running agentic workflows — document analysis, multi-step research, iterative code generation — routinely exceed Lambda’s ceiling.
Packaging and isolation: AgentCore Runtime uses ARM64 Docker images with per-session microVM isolation. Lambda runs container images but does not provide that per-session boundary. The microVM boundary is what makes AgentCore Runtime safe for multi-tenant deployments — each tenant’s session runs in a completely isolated execution environment.
The billing model is comparable — both are serverless per-invocation — and that similarity is what leads junior candidates to conclude “AgentCore Runtime is just Lambda for agents.” The justification for AgentCore Runtime is the stateful + duration + isolation delta, not billing. Joud Wawad’s practitioner deep dive documents the microVM architecture and ARM64 Graviton infrastructure at the level of specificity senior interviewers expect.
Named-Incident Quick Reference for Interview Recall
Senior interviewers probe these five incidents by name and date. Knowing the mechanism — not just the name — is what distinguishes a candidate who read a summary from one who understands the failure modes. The mitigation column maps to real architectural decisions you would make in system design.
| Incident | Date | Failure Mode / Mechanism | Interviewer Probe | 2-Sentence Mitigation |
|---|---|---|---|---|
| us-east-1 cascading outage | Oct 19–20, 2025 | DynamoDB DNS Enactor race condition removed all regional endpoint IPs; cascaded to Lambda, ECS, EKS, Fargate, AgentCore Runtime | Multi-region failover design | Deploy in secondary region with Route 53 health-check failover. Add circuit breakers on Bedrock calls and a degraded-mode Lambda fallback for critical tool calls. |
| Asana MCP cross-tenant leak | May 2025 (~1,000 orgs, 34 days) | Shared process state in Asana’s MCP server returned one user’s workspace data to a different user’s agent session | Multi-tenant Gateway isolation patterns | Per-tenant Gateway endpoints and Identity-scoped credentials prevent shared-state cross-contamination. Enforce Cedar Policy guardrails and run explicit cross-tenant integration tests before launch. |
| Unit42 sandbox bypass (CVE-2026-4269) | April 7, 2026 | DNS-based Code Interpreter sandbox bypass: code exfiltrates data via DNS queries even when TCP/IP egress is blocked | Sandbox defense depth | VPC egress restrictions plus Route 53 Resolver DNS Firewall block DNS tunneling. Least-privilege IAM on the Code Interpreter execution role limits blast radius. |
| Unit42 IAM God Mode | April 6, 2026 | IAM role-chaining escalation: limited-permission agent manipulates AgentCore’s role assumption model to acquire broad IAM access | IAM hardening for agent execution roles | Add aws:SourceAccount / aws:SourceArn condition keys on all AssumeRole policies; use session policies. Enable IAM Access Analyzer and AWS Config rules for anomalous role-assumption chains. |
| AgentCore silent session restart | November 2025 | 15-minute idle timeout silently restarts the container when blocked on long LLM inference; no error surfaces to the client | Long-running workflow resilience | HEALTHY_BUSY ping handler prevents idle timeout during long inference. Design all steps as idempotent; checkpoint to AgentCore Memory after each completed step. |
Red-Flag Answers (And What They Signal)
Interviewers score these answers negatively not because they are entirely wrong, but because each reveals a specific production gap. Knowing which gap each answer signals is as useful as knowing the correct answer.
1. “AgentCore Runtime is just Lambda for agents.”
Signals architecture naivety. Misses the stateful session-continuity model, the 8-hour execution ceiling vs Lambda’s 15-minute hard cap, and the per-session microVM isolation that makes AgentCore Runtime safe for multi-tenant deployments. The billing model is similar; the runtime semantics are categorically different. This is the most common junior answer and the filter senior interviewers use first.
2. “We just use IAM for everything.”
Signals no understanding of agent-specific identity. AgentCore Identity adds an outbound token vault for OAuth2 to third-party apps (Slack, Google Drive, Salesforce) and a Cedar Policy layer for deterministic tool-call guardrails — two layers IAM does not provide. Saying “just IAM” means you have not thought about how agents authenticate to systems outside the AWS control plane.
3. “I’d test in production.”
Signals no awareness of AgentCore Evaluations (December 2025, 13 built-in evaluators). Production agents are non-deterministic; traditional unit testing is insufficient. Evaluations is how you verify context quality before agents reach users — and Clearwater Analytics (800 agents in production) treats context quality as the primary output-quality lever.
4. “MCP just works out of the box.”
Signals no awareness of the Asana MCP cross-tenant leak (May 2025), where ~1,000 organizations were affected for 34 days by a shared-state bug in a third-party MCP server. MCP requires deliberate per-tenant isolation, Identity-scoped tool credentials, and cross-tenant integration tests before launch.
5. “I’d use the Browser tool for everything.”
Signals no understanding of when headless web is the wrong abstraction. Deterministic computation — data analysis, file parsing, statistical modeling — belongs in Code Interpreter. Browser adds Chromium overhead and DOM-structure dependency for tasks that Code Interpreter handles cleanly.
6. “AWS will handle the security for me.”
Signals no shared-responsibility model thinking. The Unit42 April 2026 disclosures — DNS-based sandbox bypass (April 7) and IAM God Mode role chaining (April 6) — prove that AgentCore has an agent-specific attack surface customers must harden. AWS manages the microVM runtime; you own egress controls, IAM scoping, Cedar Policy guardrails, and cross-tenant isolation.
Questions to Ask Your Interviewer (2026-Aware)
Reverse questions are architectural probes, not conversation fillers — they signal you are thinking at the right level of production specificity.
For AWS internal teams (Bedrock product engineering, AgentCore runtime):
- Is the HEALTHY_BUSY pattern the long-term solution to silent session restarts, or is there a roadmap item for a more robust idle-detection signal?
- What is the integration roadmap between Policy (Cedar) and Evaluations — does Evaluations scoring feed back into Policy rule refinement?
- How does the team scope the shared-responsibility boundary for agent-specific attack surfaces — is the Unit42 DNS bypass mitigation primarily a customer responsibility?
For partner consulting (Slalom, Deloitte, Accenture):
- What is the most common AgentCore architecture mistake you see in the first 90 days — Lambda-vs-Runtime confusion, IAM scoping, or something else?
- How do you typically approach multi-tenant Gateway isolation for enterprise clients — per-tenant endpoints, shared gateway with Cedar policies, or hybrid?
- Do clients adopt AgentCore Memory or default to existing DynamoDB patterns? What drives the decision?
For AI-native AWS-shop startups:
- Are you using AgentCore Evaluations, a custom harness, or a third-party eval tool for production agent quality?
- How are you handling the silent session restart edge case — HEALTHY_BUSY ping handler or a different pattern?
- Where in the 10-component stack are you running custom infrastructure rather than the managed component, and why?
For large enterprise AgentCore platform teams:
- After the October 2025 us-east-1 outage, are you now active-active, active-passive, or running a degraded-mode Lambda fallback?
- Are Cedar Policy resources managed as IaC in CloudFormation, or are there team-specific governance patterns that override the default?
- What custom metric or alarm has caught production issues that AgentCore’s built-in observability stack would have missed?
4-Week AWS AgentCore Interview Prep Roadmap
Sequenced for candidates with foundational AWS knowledge (Lambda, DynamoDB, IAM, CloudFormation) who need the AgentCore-specific layer for targeted roles.
Week 1: Platform foundation — official docs + 10 components
- Read the AgentCore developer guide through the Runtime, Memory, Identity, Browser, and Code Interpreter sections. Note specific behaviors, not capability descriptions.
- Read the GA announcement post for the full timeline: preview July 16 → GA October 13, 2025 → re:Invent December 2025 expansions. Know which features are GA and which are preview.
- Drill the 10 components. Focus on: what Registry does (not “Code Registry”); what Policy does and when it was added (Cedar, December 2025); the three long-term Memory strategies and when to use each.
- Read the Runtime troubleshooting guide: ARM64 requirement, port 8080 constraint, and HEALTHY_BUSY ping handler all appear in technical screens.
Week 2: Practitioner depth — Joud Wawad + aws-samples
- Work through Joud Wawad’s AgentCore deep dive — the most detailed practitioner reference available, covering Runtime container requirements, MCP proxy pattern, and ARM64 Graviton microVM architecture.
- Clone and run the bedrock-agentcore-samples repo (2.8k stars as of May 2026). The Strands multi-agent architecture samples are directly relevant for senior system design questions.
- Read Xin Cheng’s LangGraph-on-AgentCore walkthrough for the concrete deployment flow: @app.entrypoint decorator, configure/launch/invoke CLI pattern, @requires_access_token for Identity integration.
Week 3: Architecture patterns — re:Invent + CloudFormation
- Watch the re:Invent AIM3310 session (Phil Norton, Clearwater Analytics — 800 agents, 500 tools in production). “Context is king” is directly citable. Internalize the production agent rules.
- Build a minimal CloudFormation stack: AgentCore Runtime endpoint + ECR repository + IAM execution role + Memory store. Practice the full loop: write entry-point → configure → launch → invoke → update endpoint version with alias routing for canary deploys.
- Read the Gateway auth docs and the MCP proxy blog post.
Week 4: Security and postmortems — all five incidents + mock
- Read the Register’s October 2025 us-east-1 postmortem: DynamoDB DNS Enactor race condition, cascading failure sequence, congestive collapse during recovery. Narrate this without notes.
- Read both Unit42 disclosures: DNS-based sandbox bypass (April 7, 2026) and IAM God Mode role chaining (April 6, 2026). Know the mechanism, not just the title.
- Review the Asana MCP cross-tenant incident (May 2025) and the silent session restart report (November 2025, AWS re:Post). Map each to its mitigation pattern.
- Run a mock screen against all 20 TOC questions. Time the Lambda-vs-Runtime answer — it takes 3–4 minutes to deliver completely. Practice until you can deliver it unprompted.
What Comes After AgentCore Fluency
AgentCore is a fast-moving platform. Policy (Cedar) and Evaluations were both in preview as of December 2025 — expect them to reach general availability and generate a new wave of interview questions specifically about Cedar policy authoring and evaluation methodology. The Unit42 security disclosures from April 2026 will likely accelerate AWS’s investment in AgentCore-specific security controls, which means the shared-responsibility model for agent systems will be an increasingly probed topic through 2026 and 2027. AWS-shop candidates who also prep the Anthropic toolchain find that interviewers probe worktrees, channels, and checkpoint features introduced in 2026 — the Claude Code interview guide covering 2026 features (worktrees, channels, plugins) covers those alongside the sub-agent Task tool and MCP integration patterns that now appear in multi-cloud agentic AI roles.
The candidates who perform best in AgentCore interviews are not those with the longest list of services they have touched — they are the ones who can reason precisely about what the managed layer actually does, where it stops, and what happens when it fails. The five postmortems in this guide are not edge cases: they are the production reality of a platform that reached 1 million SDK downloads in its first three months and is now running in enterprises with hundreds of agents and thousands of tool invocations per day. That is the context in which your answers will be evaluated.