AI Agents in Production: Lessons from Building TextPilot
What we learned building a developer SMS API with AI-assisted development — the tools, workflows, and hard-won lessons from shipping real infrastructure.
AI Agents in Production: Lessons from Building TextPilot
We built TextPilot — a developer-first SMS API — with significant help from AI coding assistants. Not as a gimmick, but as a genuine force multiplier. Here's what we learned about using AI agents to ship real production infrastructure.
The Stack We Shipped
TextPilot isn't a toy project. It's a production API handling real SMS delivery:
- API: Cloudflare Workers + Hono + tRPC + D1 (SQLite at the edge)
- Queue: Cloudflare Queues for reliable message delivery
- Auth: Better Auth with session cookies + API key authentication
- SDK: Zero-dependency TypeScript package on npm
- Dashboard: React + TanStack Router + custom component library
- Docs: Fumadocs (Next.js, static export)
- Infra: Pulumi managing AWS IAM/SNS + Cloudflare resources
- Billing: Stripe integration with checkout and portal sessions
All of this was built with AI assistance. Here's what worked, what didn't, and what we'd do differently.
What Worked
1. AI Excels at Boilerplate-Heavy Tasks
Setting up database schemas, writing migration files, creating tRPC routers, building CRUD endpoints — these are tasks where the pattern is clear but the typing is tedious. AI assistants demolished this work.
Our 9-table database schema with migrations, type definitions, and query builders? Generated in a fraction of the time it would have taken manually. And because the AI could see the full schema, the types were consistent across tables.
2. Test Generation Is a Superpower
We have 88 tests covering our API. Writing tests is the task developers procrastinate on most, and it's where AI assistance delivers the highest ROI. You describe the behavior, the AI writes the test, and you review.
// AI-generated test that actually caught a bug
it('should reject expired API keys', async () => {
const key = await createApiKey({ expiresAt: new Date('2020-01-01') })
const res = await app.request('/v1/messages', {
method: 'POST',
headers: { Authorization: `Bearer ${key.plaintext}` },
body: JSON.stringify({ to: '+1234567890', body: 'test' }),
})
expect(res.status).toBe(401)
})3. Cross-Stack Consistency
When you're building a monorepo with an API, SDK, dashboard, and docs, keeping everything consistent is a challenge. AI assistants that can see your entire codebase maintain consistency across boundaries — the SDK types match the API response shapes, the docs match the actual API behavior, the dashboard calls match the tRPC router definitions.
4. Infrastructure as Code
Pulumi code for AWS IAM policies, SNS topics, and Cloudflare Workers configuration is notoriously fiddly. AI assistants that understand both the cloud provider's API and Pulumi's TypeScript SDK can generate correct IAM policies on the first try (most of the time).
What Didn't Work
1. AI Gets Lost in Complex State Management
When we built the dashboard's billing flow — Stripe checkout sessions, plan upgrades, usage tracking, webhook handling — the AI needed significant human guidance. Multi-step flows with external service integration and error handling at each step require architectural thinking that current AI assistants struggle with.
Lesson: Use AI for individual functions and components, but design the flow yourself.
2. Security Requires Human Review
Every piece of authentication, authorization, and encryption code got manually reviewed. AI assistants can write auth middleware, but they sometimes miss edge cases:
- Token timing attacks
- Session fixation vulnerabilities
- CORS misconfigurations
- API key exposure in error messages
Lesson: Never ship AI-generated security code without thorough human review.
3. Carrier-Specific SMS Logic
The SMS space has quirks that aren't well-represented in training data. 10DLC registration flows, carrier filtering rules, opt-out handling requirements, throughput limits by trust score — this domain knowledge needed human research and implementation.
Lesson: AI is weaker in niche domains with limited public documentation.
The Workflow That Emerged
After several weeks of building, we settled on a workflow:
- Human designs the architecture: Database schema, API surface, component hierarchy
- AI generates the implementation: Routes, components, tests, migrations
- Human reviews and refines: Security, edge cases, UX details
- AI handles the chores: Formatting, type fixes, test coverage, documentation
This isn't "AI replacing developers." It's developers with dramatically higher throughput.
Tools That Made It Work
Development
- Claude Code: Primary AI coding assistant for architecture discussions, code generation, debugging
- Biome: Fast linting and formatting that AI output could be automatically cleaned up with
- TypeScript: Strong types meant AI-generated code was caught by the compiler before it hit production
Infrastructure
- Cloudflare Workers: Zero cold start, global deployment, D1 for database — perfect for an API that needs to be fast everywhere
- Pulumi: TypeScript IaC meant the AI could reason about infrastructure the same way it reasons about application code
- GitHub Actions: CI/CD that verified every change before deployment
Communication
- TextPilot (our own product): SMS delivery infrastructure
- Resend: Email for transactional messages — similar developer-first philosophy
Advice for Building with AI Assistants
-
Start with strong types: TypeScript, Zod schemas, database types. The more structure you give the AI, the better its output.
-
Keep files focused: One concern per file. AI assistants work better with small, focused files than monolithic modules.
-
Write the tests first: Or at least describe the expected behavior. AI generates much better code when it has a clear target.
-
Review everything: AI code that passes TypeScript checks and tests can still have subtle bugs. Read every line before it ships.
-
Use monorepo structure: When the AI can see your API types, SDK interface, and dashboard components in one context, it maintains consistency automatically.
-
Don't fight the AI's style: If the AI generates code in a slightly different pattern than you'd write, ask yourself if it matters. Often it doesn't. Save your review energy for logic and security.
The Numbers
Building TextPilot with AI assistance:
- 9 database tables with migrations and type definitions
- 88 tests passing across the API
- Full dashboard with 8+ pages (overview, projects, API keys, messages, billing, registration)
- TypeScript SDK published to npm
- 8 documentation pages with code examples
- Complete IaC managing AWS and Cloudflare resources
- Stripe billing with 3 plan tiers
Is this faster than building without AI? Significantly. Is it as fast as some AI hype suggests? No. The human-in-the-loop is still critical for architecture, security, and domain expertise.
Conclusion
AI-assisted development isn't magic. It's a tool — like TypeScript, like CI/CD, like component libraries. The developers who get the most out of it are the ones who understand what it's good at (boilerplate, tests, consistency) and what it's bad at (architecture, security, niche domains).
We built TextPilot this way, and we'd do it again. The result is a production API that's well-tested, well-typed, and well-documented — shipped in a fraction of the time it would have taken otherwise.
Ship a Two-Factor Auth Flow in 20 Minutes
A step-by-step guide to adding SMS-based two-factor authentication to your app with TextPilot — from zero to verified in 20 minutes.
Why Every AI Agent Needs an SMS Strategy
Your AI agent can write code, query databases, and draft emails. But can it text your users? Here's why SMS is the missing piece in your AI stack.