Building

How I Built an AI Chat Feature With Supabase Edge Functions and OpenRouter

2026-04-018 min read

The first time it worked — actually worked — I watched a streaming response appear word by word on my phone screen and felt something shift. Not just technical satisfaction. Something closer to witnessing a thought form in real time. A thing I'd built was thinking, and I could watch it happen.

Most AI chat tutorials stop at "call the OpenAI API and display the result." That's the easy part. Real production chat needs auth, streaming, rate limiting, tool execution, error recovery, and a personality that doesn't feel like talking to a corporate FAQ bot. This is how I built all of it for Ocean Drop — a wellness app for couples built on Expo SDK 55 with two AI companions that give personalized guidance based on cycle phases.

Why Supabase Edge Functions Instead of a Standalone Backend?

Supabase Edge Functions are serverless TypeScript functions that run on the Deno runtime — deployed to the edge, close to your users, with no server to manage. If you're already using Supabase for auth and database, they're the natural choice for AI endpoints because your data is right there.

Here's what made the decision easy for Ocean Drop:

Auth is built in. The JWT from the Supabase client is already in every request. Verifying it is three lines of code, not a middleware stack.
Data proximity. The edge function can read cycle data, calendar events, and user profiles from the same Supabase instance — no cross-service calls.
Deno runtime. TypeScript with no build step. Fast cold starts. Standard Web APIs like fetch and ReadableStream work out of the box.
Zero infrastructure. No Express server to deploy, no Docker containers to orchestrate, no scaling config to tune. Deploy from the CLI or dashboard.

The alternative was spinning up a standalone Node server — FastAPI, Express, or a Vercel serverless function. All viable. But when your auth, database, and compute live in the same ecosystem, the simplicity compounds. One deployment target. One set of environment variables. One place to debug.

The tradeoff: edge functions are stateless between invocations. In-memory state doesn't persist. I'll come back to why that matters when I talk about rate limiting.

Why OpenRouter Instead of Calling OpenAI Directly?

OpenRouter is a unified API gateway that routes to 400+ models from every major provider — OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, and dozens more. One endpoint, one API format, and you switch models by changing a single environment variable.

For a solo developer, this flexibility is everything:

Development at zero cost. I built and tested the entire chat feature using free-tier models. During development, the OpenRouter bill was $0/month.
Model switching without code changes. I've switched the underlying model three times since launch — from a free model to deepseek/deepseek-chat to the current production model — each time by changing one env var.
Cost control at scale. DeepSeek V3 runs at roughly 1/50th the cost of GPT-4 for comparable quality in my use case. OpenRouter takes a 5.5% platform fee on paid models, and it's worth every fraction of a cent for the optionality.

The integration is a standard OpenAI-compatible fetch call with two extra headers — X-Title for your app name and HTTP-Referer for your site URL. OpenRouter uses these for analytics and to prioritize traffic from identified apps.

Compared to calling OpenAI directly: you lose nothing functionally and gain the ability to try any model in the market without rewriting your backend. For a project still finding product-market fit, that's a strategic advantage.

How Does the Edge Function Actually Work?

The chat edge function handles everything between "user taps send" and "response appears on screen." Four layers, each with a specific job.

Layer 1: Auth middleware. A shared authenticateUser() function extracts the JWT from the Authorization header, verifies it against Supabase Auth, and returns a user-scoped client plus an admin client for privileged database operations. The same function is reused across all six edge functions in Ocean Drop — DRY from day one.

// Simplified auth pattern — shared across all edge functions
const { user, supabaseAdmin } = await authenticateUser(req);
// user.id is verified — safe to use for database queries

Layer 2: Input validation. Before anything touches the LLM, the input gets checked. Message length capped at 5,000 characters. Conversation history limited to the last 50 messages. Total body size capped at 512KB. And rate limiting — 20 requests per minute per user, tracked in an in-memory map.

Layer 3: System prompt construction. This is where the personality lives. Based on the user's role, the function builds either the "Drop" prompt (for male partners — supportive bro energy, actionable advice) or the "Marina" prompt (for female users — warm, wise, scientifically grounded). Each prompt is injected with live context: current cycle day, phase name, partner's name, upcoming calendar events. The two-sided architecture that drives this role system — consent flows, RLS policies, gender-aware notifications — is a story in itself.

The system prompts grew to 200+ lines each. That length isn't accidental — short prompts produce generic responses. The specificity of the phase-by-phase guidance, the voice examples, the wellness boundaries — that's what makes the AI feel like a companion rather than a search engine.

Layer 4: The OpenRouter call. A standard fetch to OpenRouter's /chat/completions endpoint with stream: true, a tools array of 8 callable functions, and the assembled message history.

const response = await fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${apiKey}`,
    "Content-Type": "application/json",
    "X-Title": "Ocean Drop",
  },
  body: JSON.stringify({
    model: Deno.env.get("OPENROUTER_MODEL"),
    stream: true,
    temperature: 0.7,
    max_tokens: 1024,
    tools: MARINA_TOOLS,
    messages: [systemMessage, ...conversationHistory, userMessage],
  }),
});

Nothing exotic. The power is in the orchestration — auth, validation, context injection, and streaming all wired together in a single serverless function.

What About Streaming? How Do You Get Tokens to the Phone in Real Time?

Streaming is the difference between waiting 8 seconds for a wall of text and watching a response form naturally, word by word. Users notice. It's worth the complexity.

The edge function streams Server-Sent Events (SSE) back to the client. Each chunk follows a simple protocol:

Text tokens: data: {"type":"text","content":"..."}\n\n
Tool calls: data: {"type":"tool_call","tool_call":{...}}\n\n
Stream end: data: [DONE]\n\n

On the client side, it gets platform-specific. The web uses the standard ReadableStream API with getReader() — proper incremental streaming. But React Native on Android and iOS doesn't support ReadableStream on fetch responses. The response just... arrives as a single blob.

The workaround: call response.text() to get the full SSE payload, then parse it line by line, splitting on data: prefixes and rendering each token as if it were arriving in real time. It's not true streaming at the transport level, but the user experience is identical — text appears progressively rather than all at once.

// Platform-aware SSE parsing (simplified)
if (Platform.OS === "web") {
  const reader = response.body.getReader();
  // True streaming — process chunks as they arrive
} else {
  const text = await response.text();
  const lines = text.split("\n").filter(l => l.startsWith("data: "));
  // Parse and render each SSE event sequentially
}

A 55-second abort timeout on the client side catches cases where the LLM hangs or the edge function hits a cold-start delay. Fail gracefully, always.

This is the exact gap I couldn't find filled anywhere online when building this. Developers in Supabase's GitHub Discussions were asking how to do SSE from Edge Functions. The answer is simpler than you'd think — the Deno runtime supports standard Web APIs, so you construct a ReadableStream, pipe SSE-formatted data through it, and return it as the response.

How Do AI Tool Calls Work in This System?

The AI in Ocean Drop isn't just a chatbot. It's an agent that can take actions. Say "set a reminder for our anniversary" and it creates a calendar event. Say "my period started today" and it logs the data. Eight tools in total — calendar CRUD, period tracking, cycle queries, and settings management.

Here's how it flows:

The LLM decides to call a tool based on the user's message
The tool call is streamed back as a structured SSE event with the function name and arguments
The edge function validates every argument strictly — dates must match YYYY-MM-DD, UUIDs must be valid, event types must be from a known enum
The tool executes against the Supabase database using the admin client
Results are sent back to the LLM as role: "tool" messages
The LLM generates a follow-up response confirming what it did

Safety guardrails: max 5 tool calls per request (prevents runaway loops), all string inputs truncated to safe lengths, and every database operation is scoped to the authenticated user's ID.

The user sees a ToolCallCard in the UI — a small card showing what happened. "Created Event: Date Night — April 5th." It makes the AI feel competent. It did the thing instead of telling you to do the thing.

This is where the alchemy of building software gets tangible — you're not just writing code that responds. You're writing code that acts. The transmutation from text to real calendar entries, real data, real changes in someone's daily life.

What Went Wrong Along the Way?

Every system looks clean in a blog post. Here's what didn't work:

No rate limiting at first. A bug in the client caused rapid-fire duplicate requests. Ten messages in two seconds. OpenRouter credits evaporated. The in-memory rate limiter went in the same day. It's imperfect — it resets on cold starts because edge functions are stateless — but it catches the 99% case.

React Native streaming was broken for weeks. I assumed ReadableStream would work on fetch responses in React Native. It doesn't. The SSE parser that splits response.text() into individual events was the fix, and it works well enough that users can't tell the difference.

Tool call validation was too loose. Early versions passed LLM-generated arguments straight to the database. The LLM hallucinated dates like 2026-13-45 and UUIDs like not-a-real-uuid. Every argument now goes through regex validation before hitting the database. Trust, but verify — especially with language models.

The system prompt was too short. Version one was 20 lines. The AI gave responses that could have come from any chatbot. Version two grew to 200+ lines with phase-by-phase guidance, voice examples, wellness boundaries, and explicit instructions on when to use tools. The personality clicked into place only after the prompt got specific enough to constrain the model's defaults.

Cold start timeouts. Edge functions occasionally take 2-3 seconds to cold start, and the LLM call itself can take 10-15 seconds for complex responses. A 45-second timeout on the LLM call plus a 55-second timeout on the client side, with graceful error messages, keeps the experience from feeling broken.

Every one of these bugs taught me something about the gap between how we imagine AI systems work and how they actually behave in production. The code handles the happy path. Production handles everything else.

Would I Build It This Way Again?

Yes. With refinements, not regrets.

Supabase Edge Functions were the right call for this architecture. Auth integration, data proximity, and zero infrastructure overhead make them ideal when you're already in the Supabase ecosystem. If you're building on a different backend, evaluate whether the migration cost is worth the integration benefits.

OpenRouter was the right call for model flexibility. I've switched models three times during development — each time was a single env var change. In a market where new models drop monthly and pricing shifts weekly, that optionality matters.

SSE streaming was worth the platform-specific complexity. Users feel the difference. Text appearing progressively signals that the AI is "thinking," which builds trust in a way that a loading spinner followed by a text block never will.

What I'd change next time: persistent rate limiting backed by a Supabase table instead of in-memory state, more structured error recovery for partial streams, and a tool result format that gives the LLM more context to generate natural follow-up responses.

The full story of Ocean Drop — the philosophy, the design decisions, the two-sided experience — is in Building Ocean Drop: A Wellness App With Soul. And the complete toolchain I use to build as a solo developer lives at The Stack I Use to Build Apps Solo.

If you're building something similar — AI features on Supabase, streaming in React Native, or a mobile app that needs to feel alive — I consult on exactly this kind of architecture.

Want the Full Workflow?

This is one piece of the system I used to ship Ocean Drop in 7 days.

The full playbook has the exact prompts, the pre-build checklist, the security audit workflow — all personalized for your stack via a single Day 0 prompt.

Get the Playbook — $17 →

Transmissions from the workshop

Code, consciousness, and the craft of building with soul. No spam, no filler — just signal.

Back to all posts