Internal Guide · Agent Workflow

Giving our agent a test inbox it can read by itself.

The AI writes SMTP code but has no way to verify an email actually left the building. Mailpit turns that blind spot into a closed loop — a local fake SMTP server the agent talks to, plus an HTTP API it can curl to read its own outbox. Write code, send, verify, iterate. No humans checking Gmail.

01 · FRAMING

The gap, and what fills it.

SMTP is fire-and-forget. When the agent writes smtp.send(...), a successful return means the mail server accepted the handoff — not that the email is readable, correctly formatted, or addressed to the right recipient. We need an observable endpoint on the other side.

Problem

The agent sends into the void.

The LLM guesses a plausible-looking address (test@example.com, user@gmail.com) and calls send(). The call returns. Did the email arrive? Was the subject mangled? Did the HTML render? The agent has no idea.

Either we babysit an inbox and tell it, or we hallucinate our way to "done."

Solution

A fake SMTP server with an API.

Mailpit accepts any mail on port 1025, stores it in memory, and exposes a REST API on port 8025. The agent sends via its normal SMTP client, then curls the API to read back what it just sent.

One contract, deterministic recipient, fully scriptable — the agent validates itself.

02 · ARCHITECTURE

How the pieces wire together.

Everything runs on localhost. The agent's CLI drives two surfaces of the same Mailpit container: the SMTP listener for writes, the HTTP API for reads. No network, no auth, no external accounts.

graph LR
  A["AI Agent
(CLI)"] -->|writes / edits| B["App Code
smtp.send(...)"]
  B -->|SMTP :1025| C[("Mailpit Container")]
  C -->|stores in memory| D[["Inbox
(SQLite / RAM)"]]
  A -->|curl GET :8025/api| E{{"HTTP API
list · search · delete"}}
  E -->|reads from| D
  E -.->|asserts arrived| A
  F(["Developer
Web UI :8025"]) -.->|optional observe| D

  classDef agent fill:#1e3a5f,stroke:#1e3a5f,color:#fff,stroke-width:2px
  classDef mailpit fill:#fef3c7,stroke:#b45309,color:#451a03,stroke-width:2px
  classDef verify fill:#ecfccb,stroke:#4d7c0f,color:#1a2e05,stroke-width:2px
  classDef human fill:#f5f1e8,stroke:#6b6456,color:#1c1917,stroke-width:1.5px,stroke-dasharray: 4 3

  class A agent
  class C,D mailpit
  class E verify
  class F human

Why localhost matters. The SMTP port (1025) and HTTP port (8025) are both on the same host as the agent. No DNS, no TLS handshakes, no rate limits. The full send → verify cycle takes milliseconds, which is the whole point — the agent can afford to do it on every change.

03 · WORKFLOW

The closed loop, end to end.

Each step is a CLI call the agent can script. Step 4 is where the magic happens: instead of declaring victory on a 200 response, the agent proves the email actually materialized.

01 · WRITE

Edit code

Agent modifies SMTP logic, templates, or recipient handling.

02 · RESET

Clear inbox

DELETE all messages so this run starts clean.

03 · SEND

Run the send

Trigger the code path that calls smtp.send().

04 · VERIFY

Curl the API

Assert recipient, subject, and body match expectations.

05 · ITERATE

Fix or ship

Failed? Agent reads the actual message, diagnoses, loops back.

04 · PORTS & SURFACES

Three ports. Three jobs.

One container exposes everything the workflow needs. The agent uses two of them; the human developer gets the third for eyeballing edge cases.

SMTP Listener

:1025

Where the app code sends. Plain SMTP, no auth required. Point any mailer library at this port instead of Sendgrid/SES.

HTTP API

:8025/api

Where the agent reads. REST + JSON. List, search, fetch, delete — everything needed for assertions.

Web UI

:8025

Same port, browser-friendly. For humans to spot-check how emails actually render when debugging a tricky failure.

05 · COMMANDS

What the agent will actually run.

Six commands cover 95% of the workflow. All return JSON except startup; all safe to put in a script or tool definition.

Purpose	Command	What it does
Start Mailpit	`docker run -d --name mailpit -p 1025:1025 -p 8025:8025 axllent/mailpit`	One-time container boot. Persists until explicitly stopped.
Reset inbox	`curl -X DELETE localhost:8025/api/v1/messages`	Wipes all stored messages. Call before each test run.
List all messages	`curl -s localhost:8025/api/v1/messages \| jq`	Returns metadata for every email in the inbox.
Search by recipient	`curl -s "localhost:8025/api/v1/search?query=to:test@example.com"`	Filter by to/from/subject. The core assertion primitive.
Read one message	`curl -s localhost:8025/api/v1/message/<ID>`	Full headers, HTML + text bodies, attachments — everything.
Health check	`curl -s localhost:8025/api/v1/info`	Confirm Mailpit is running before attempting a send.

06 · THE CONTRACT

Tell the agent once, in CLAUDE.md.

Drop this block into your project's CLAUDE.md (or equivalent). The agent reads it on every session — no more guessing recipients, no more skipping verification.

# Email Testing — Mailpit

# SMTP target (no auth, plain connection)
SMTP_HOST=localhost
SMTP_PORT=1025
SMTP_FROM="noreply@dev.local"

# Canonical test recipient — always use this address
TEST_RECIPIENT="inbox@test.local"

# Verification API
MAILPIT_API=http://localhost:8025/api/v1

## Agent protocol for any email-sending change:

  1. Before the send: curl -X DELETE $MAILPIT_API/messages
  2. Execute the code path that sends mail
  3. Verify: curl -s "$MAILPIT_API/search?query=to:$TEST_RECIPIENT"
  4. Fetch the message ID from results, then:
     curl -s $MAILPIT_API/message/<ID> to read full body
  5. Assert: recipient matches, subject matches, body contains
     the expected dynamic fields. Only then report success.

  DO NOT invent recipient addresses. Always use $TEST_RECIPIENT.
  DO NOT declare success on SMTP 250 alone — verify via API.

Why this contract works. The agent now has a deterministic recipient (no more guessing), a required verification step (no more false positives), and a clean-slate protocol (no cross-contamination between test runs). Three ambiguities eliminated in ~15 lines of config.