Sotto
voice AI that answers UK restaurant phones.
.NET 10. Groq Llama 4 Scout. Whisper Large v3 Turbo. Deepgram Aura 2. Under 500 milliseconds end to end on the PSTN, with the UK Big 14 allergens enforced as a mandatory conversation state.
A UK restaurant voice AI, documented stage by stage.
The Sotto case study shows how a phone call placed to a UK restaurant becomes a confirmed POS order without a human taking the call. The architecture is a 29-project .NET 10 monorepo with Clean Architecture, 609 tests, and a sub-500-millisecond end-to-end voice budget.
The vertical is UK restaurants specifically. That choice determines the regulatory shape (Big 14 allergens enforced as a conversation state, VAT as integer pence, GDPR per-tenant retention, HMRC 7-year financial retention) and the integrations (Square UK, Toast, Clover, Stripe UK, Uber Direct, Stuart).
The Challenge
Restaurants miss calls and lose orders.
Phone orders still drive a meaningful share of restaurant revenue. During peak hours those calls go unanswered. Staff are stretched thin, language barriers frustrate customers, and every missed call is a lost order.
Missed calls during peak hours
Staff cannot answer every call when the kitchen is slammed. Callers hang up and order elsewhere.
Language friction in mixed-script areas
UK high streets see Bengali, Urdu, Polish, and English in the same shift. Human staff cannot cover every language.
Front-of-house labour scarcity
Hiring and retaining phone-capable staff is harder than ever. Wages rise while margins shrink.
Latency budget
Sub-500 milliseconds, stage by stage.
Voice AI has zero tolerance for delay. A one-second pause feels like an eternity on a phone call. Sotto's budget is broken down per stage rather than quoted as a round-number marketing figure. Sum across all ten stages is 470 milliseconds.
| Stage | Kind | ms |
|---|---|---|
| Twilio PSTN ingress | Carrier | 50 |
| WebSocket ingress (VoiceGateway) | Internal | 5 |
| VAD plus buffer drain | Internal | 5 |
| Mu-law to PCM (8 to 16 kHz) | Internal | 5 |
| Groq Whisper Large v3 Turbo | Model | 150 |
| gRPC to orchestrator | Internal | 5 |
| pgvector RAG search | Internal | 15 |
| Groq Llama 4 Scout 17B TTFT | Model | 50 |
| Deepgram Aura 2 TTFA | Model | 130 |
| WebSocket egress and PSTN | Carrier | 55 |
| Sum | 470 |
Platform integration
4 platform services, zero reinvention.
Instead of building auth, authorization, comms, and commerce from scratch, Sotto integrates the shared KaritKarma platform layer. Domain code focuses on voice and ordering.
Wenme
AuthenticationOAuth 2.1 plus PKCE with passkeys (WebAuthn / FIDO2). Restaurant owners and staff sign in without passwords.
Darwan
AuthorizationOwner, manager, and staff roles scoped per restaurant. Allow / deny decisions with signed audit trails per access attempt.
BitsPath
Voice and commsCarrier-grade PBX for inbound call routing, plus SMS, email, and WhatsApp for order confirmations and real-time alerts.
Loom
Menu and commerceMenu items, variants, modifiers, and pricing. Real-time sync with Square UK, Toast, and Clover POS systems.
UK regulatory shape
Big 14, integer pence, GDPR on a schedule.
UK rules shape the conversation machine, the money math, and the retention policy. None of this is bolted on; each one is a first-class concern in the codebase.
UK Big 14 allergens, enforced
AllergenCheck is a mandatory state in the conversation machine, not a checkbox. The AI enumerates allergens per item and asks about caller allergies before any order can confirm.
VAT in integer pence
Standard 20 percent, Zero 0 percent, Reduced 5 percent. Stored as basis points and money values as integer pence so VAT never drifts a rounding penny.
GDPR on a daily schedule
Per-tenant retention (default 365 days). Daily 02:00 UTC purge. Right-to-erasure anonymises calls, customers, and transcripts in one transaction. HMRC 7-year financial retention is preserved.
Sotto vs the alternatives
Phone call. POS write. Done.
Versus a human server with a notepad, a touch-tone IVR, or a chatbot on the website, here is what the architecture does differently.
| Capability | Sotto | Human staff | Touch-tone IVR | Web chatbot |
|---|---|---|---|---|
| Picks up under 500ms | Variable | Web only | ||
| Available 24/7 | ||||
| Natural phone conversation | Text | |||
| Big 14 allergen enforcement | Mandatory state | Hopefully | ||
| Writes to your POS | Square / Toast / Clover | Manual | Limited | |
| Reads full menu accurately | pgvector RAG | Memory | Tone tree | |
| GDPR retention per tenant | Varies |
What ships today
Production-ready, in active UK pilots.
Zero missed calls during pilots
AI answers every call, 24 / 7. No more lost orders during peak hours in pilot restaurants.
EN and BN handled mid-call
Llama 4 Scout handles mixed-language conversation without a restart. Whisper Large v3 Turbo transcribes both.
Front-of-house freed for service
Staff focus on hospitality and table service instead of phones during peak.
Sub-500 millisecond voice latency
Stage-by-stage budget held under load thanks to the streaming token bridge to Aura 2.
GDPR by daily schedule
Per-tenant retention with automated 02:00 UTC purge and a right-to-erasure endpoint.
4 KaritKarma platform services
Wenme, Darwan, BitsPath, Loom. Plus Stripe UK, Square / Toast / Clover, Uber Direct, Stuart.
Frequently asked
Sotto, asked plainly.
- What is the Sotto case study?
- The Sotto case study documents how a UK-focused voice AI takes restaurant phone orders end-to-end. Sotto is built as a .NET 10 monorepo of 29 projects with Clean Architecture and 609 tests. The voice path runs on Groq Llama 4 Scout 17B for reasoning, Whisper Large v3 Turbo for transcription, and Deepgram Aura 2 for speech. End-to-end PSTN-to-PSTN latency stays under 500 milliseconds. The case study covers the latency budget, the eight-state conversation machine (with mandatory AllergenCheck for the UK Big 14), POS integrations (Square UK, Toast, Clover), payments (Stripe UK in integer pence), and per-tenant GDPR retention.
- Is Sotto live in production?
- Sotto is built and tested as a production-ready platform with 609 tests covering the voice pipeline, order flow, POS connectors, and payment links. UK restaurant pilots are active. We do not promote pilots to general-availability claims, so we label Sotto as production-ready with active pilots rather than as a multi-customer SaaS roster. Named-customer references are added only with written permission.
- How does Sotto hit sub-500 millisecond voice latency?
- The latency budget is broken down stage by stage in the product page. Each call runs on its own gRPC stream against a shared ConversationOrchestrator pool, so concurrent calls do not queue. A streaming token bridge batches LLM tokens at sentence boundaries and ships them to Deepgram Aura 2 TTS before the full Llama 4 Scout response is generated, which hides time-to-first-token behind time-to-first-audio. Whisper Large v3 Turbo on Groq sits near 150 milliseconds for transcription, leaving headroom for the network legs.
- What KaritKarma platform services does Sotto integrate?
- Sotto integrates the shared KaritKarma platform layer rather than rebuilding it. Wenme provides OAuth 2.1 plus PKCE authentication with passkeys for restaurant owners and staff. Darwan provides role-based access with owner, manager, and staff scopes per restaurant. BitsPath provides voice, SMS, and email notifications. Loom provides menu and commerce data with item variants, modifiers, and pricing sync. POS connectors for Square UK, Toast, and Clover sit behind a common IEPosConnector interface. Payments run on Stripe UK with checkout sessions, payment links, SMS payment links, Apple Pay, and Google Pay.
- Is Sotto compliant with UK food and data regulations?
- Yes. AllergenCheck is a mandatory state in the conversation machine, so the AI enumerates the Big 14 allergens for every ordered item and asks about caller allergies before any order can be confirmed. VAT is calculated per item in basis points (Standard 20 percent, Zero 0 percent, Reduced 5 percent) and stored as integer pence, so totals never drift by a rounding penny. HMRC 7-year financial retention is preserved. GDPR retention is per-tenant configurable (default 365 days) with a daily 02:00 UTC purge and a right-to-erasure endpoint that anonymises calls, customers, and transcripts in one transaction.
- Where does Sotto run and what is the deployment model?
- Sotto is deployed in a UK region for data-residency reasons. Telephony runs on Twilio PSTN with SIP trunking and webhook routing. The voice plane is internal microservices on .NET 10. The application database is PostgreSQL 18 with pgvector for menu embeddings (bge-large-en-v1.5). Redis 8 handles ephemeral state and Cloudflare fronts the merchant dashboard. The deployment model is dedicated tenant rather than shared multi-tenant for the voice pipeline, so latency and cost stay predictable per restaurant chain.
Explore Sotto
Voice AI that never misses a call.
See how Sotto answers, takes the order, enforces allergens, and writes to Square, Toast, or Clover before the caller hangs up.