TextPilot

AI Agents in Production: Lessons from Building TextPilot

What we learned building a developer SMS API with AI-assisted development — the tools, workflows, and hard-won lessons from shipping real infrastructure.

AI Agents in Production: Lessons from Building TextPilot

We built TextPilot — a developer-first SMS API — with significant help from AI coding assistants. Not as a gimmick, but as a genuine force multiplier. Here's what we learned about using AI agents to ship real production infrastructure.

The Stack We Shipped

TextPilot isn't a toy project. It's a production API handling real SMS delivery:

  • API: Cloudflare Workers + Hono + tRPC + D1 (SQLite at the edge)
  • Queue: Cloudflare Queues for reliable message delivery
  • Auth: Better Auth with session cookies + API key authentication
  • SDK: Zero-dependency TypeScript package on npm
  • Dashboard: React + TanStack Router + custom component library
  • Docs: Fumadocs (Next.js, static export)
  • Infra: Pulumi managing AWS IAM/SNS + Cloudflare resources
  • Billing: Stripe integration with checkout and portal sessions

All of this was built with AI assistance. Here's what worked, what didn't, and what we'd do differently.

What Worked

1. AI Excels at Boilerplate-Heavy Tasks

Setting up database schemas, writing migration files, creating tRPC routers, building CRUD endpoints — these are tasks where the pattern is clear but the typing is tedious. AI assistants demolished this work.

Our 9-table database schema with migrations, type definitions, and query builders? Generated in a fraction of the time it would have taken manually. And because the AI could see the full schema, the types were consistent across tables.

2. Test Generation Is a Superpower

We have 88 tests covering our API. Writing tests is the task developers procrastinate on most, and it's where AI assistance delivers the highest ROI. You describe the behavior, the AI writes the test, and you review.

// AI-generated test that actually caught a bug
it('should reject expired API keys', async () => {
  const key = await createApiKey({ expiresAt: new Date('2020-01-01') })
  const res = await app.request('/v1/messages', {
    method: 'POST',
    headers: { Authorization: `Bearer ${key.plaintext}` },
    body: JSON.stringify({ to: '+1234567890', body: 'test' }),
  })
  expect(res.status).toBe(401)
})

3. Cross-Stack Consistency

When you're building a monorepo with an API, SDK, dashboard, and docs, keeping everything consistent is a challenge. AI assistants that can see your entire codebase maintain consistency across boundaries — the SDK types match the API response shapes, the docs match the actual API behavior, the dashboard calls match the tRPC router definitions.

4. Infrastructure as Code

Pulumi code for AWS IAM policies, SNS topics, and Cloudflare Workers configuration is notoriously fiddly. AI assistants that understand both the cloud provider's API and Pulumi's TypeScript SDK can generate correct IAM policies on the first try (most of the time).

What Didn't Work

1. AI Gets Lost in Complex State Management

When we built the dashboard's billing flow — Stripe checkout sessions, plan upgrades, usage tracking, webhook handling — the AI needed significant human guidance. Multi-step flows with external service integration and error handling at each step require architectural thinking that current AI assistants struggle with.

Lesson: Use AI for individual functions and components, but design the flow yourself.

2. Security Requires Human Review

Every piece of authentication, authorization, and encryption code got manually reviewed. AI assistants can write auth middleware, but they sometimes miss edge cases:

  • Token timing attacks
  • Session fixation vulnerabilities
  • CORS misconfigurations
  • API key exposure in error messages

Lesson: Never ship AI-generated security code without thorough human review.

3. Carrier-Specific SMS Logic

The SMS space has quirks that aren't well-represented in training data. 10DLC registration flows, carrier filtering rules, opt-out handling requirements, throughput limits by trust score — this domain knowledge needed human research and implementation.

Lesson: AI is weaker in niche domains with limited public documentation.

The Workflow That Emerged

After several weeks of building, we settled on a workflow:

  1. Human designs the architecture: Database schema, API surface, component hierarchy
  2. AI generates the implementation: Routes, components, tests, migrations
  3. Human reviews and refines: Security, edge cases, UX details
  4. AI handles the chores: Formatting, type fixes, test coverage, documentation

This isn't "AI replacing developers." It's developers with dramatically higher throughput.

Tools That Made It Work

Development

  • Claude Code: Primary AI coding assistant for architecture discussions, code generation, debugging
  • Biome: Fast linting and formatting that AI output could be automatically cleaned up with
  • TypeScript: Strong types meant AI-generated code was caught by the compiler before it hit production

Infrastructure

  • Cloudflare Workers: Zero cold start, global deployment, D1 for database — perfect for an API that needs to be fast everywhere
  • Pulumi: TypeScript IaC meant the AI could reason about infrastructure the same way it reasons about application code
  • GitHub Actions: CI/CD that verified every change before deployment

Communication

  • TextPilot (our own product): SMS delivery infrastructure
  • Resend: Email for transactional messages — similar developer-first philosophy

Advice for Building with AI Assistants

  1. Start with strong types: TypeScript, Zod schemas, database types. The more structure you give the AI, the better its output.

  2. Keep files focused: One concern per file. AI assistants work better with small, focused files than monolithic modules.

  3. Write the tests first: Or at least describe the expected behavior. AI generates much better code when it has a clear target.

  4. Review everything: AI code that passes TypeScript checks and tests can still have subtle bugs. Read every line before it ships.

  5. Use monorepo structure: When the AI can see your API types, SDK interface, and dashboard components in one context, it maintains consistency automatically.

  6. Don't fight the AI's style: If the AI generates code in a slightly different pattern than you'd write, ask yourself if it matters. Often it doesn't. Save your review energy for logic and security.

The Numbers

Building TextPilot with AI assistance:

  • 9 database tables with migrations and type definitions
  • 88 tests passing across the API
  • Full dashboard with 8+ pages (overview, projects, API keys, messages, billing, registration)
  • TypeScript SDK published to npm
  • 8 documentation pages with code examples
  • Complete IaC managing AWS and Cloudflare resources
  • Stripe billing with 3 plan tiers

Is this faster than building without AI? Significantly. Is it as fast as some AI hype suggests? No. The human-in-the-loop is still critical for architecture, security, and domain expertise.

Conclusion

AI-assisted development isn't magic. It's a tool — like TypeScript, like CI/CD, like component libraries. The developers who get the most out of it are the ones who understand what it's good at (boilerplate, tests, consistency) and what it's bad at (architecture, security, niche domains).

We built TextPilot this way, and we'd do it again. The result is a production API that's well-tested, well-typed, and well-documented — shipped in a fraction of the time it would have taken otherwise.

Try TextPilot →