All Case Studies
Case Study / Sotto

Sotto
voice AI that answers UK restaurant phones.

.NET 10. Groq Llama 4 Scout. Whisper Large v3 Turbo. Deepgram Aura 2. Under 500 milliseconds end to end on the PSTN, with the UK Big 14 allergens enforced as a mandatory conversation state.

29
.NET 10 projects
609
Tests
<500ms
Voice latency
UK
Regulatory focus
Production-ready with active UK restaurant pilots. Named-customer references published only with written permission.
What is the Sotto case study

A UK restaurant voice AI, documented stage by stage.

The Sotto case study shows how a phone call placed to a UK restaurant becomes a confirmed POS order without a human taking the call. The architecture is a 29-project .NET 10 monorepo with Clean Architecture, 609 tests, and a sub-500-millisecond end-to-end voice budget.

The vertical is UK restaurants specifically. That choice determines the regulatory shape (Big 14 allergens enforced as a conversation state, VAT as integer pence, GDPR per-tenant retention, HMRC 7-year financial retention) and the integrations (Square UK, Toast, Clover, Stripe UK, Uber Direct, Stuart).

The Challenge

Restaurants miss calls and lose orders.

Phone orders still drive a meaningful share of restaurant revenue. During peak hours those calls go unanswered. Staff are stretched thin, language barriers frustrate customers, and every missed call is a lost order.

Missed calls during peak hours

Staff cannot answer every call when the kitchen is slammed. Callers hang up and order elsewhere.

Language friction in mixed-script areas

UK high streets see Bengali, Urdu, Polish, and English in the same shift. Human staff cannot cover every language.

Front-of-house labour scarcity

Hiring and retaining phone-capable staff is harder than ever. Wages rise while margins shrink.

Latency budget

Sub-500 milliseconds, stage by stage.

Voice AI has zero tolerance for delay. A one-second pause feels like an eternity on a phone call. Sotto's budget is broken down per stage rather than quoted as a round-number marketing figure. Sum across all ten stages is 470 milliseconds.

StageKindms
Twilio PSTN ingressCarrier50
WebSocket ingress (VoiceGateway)Internal5
VAD plus buffer drainInternal5
Mu-law to PCM (8 to 16 kHz)Internal5
Groq Whisper Large v3 TurboModel150
gRPC to orchestratorInternal5
pgvector RAG searchInternal15
Groq Llama 4 Scout 17B TTFTModel50
Deepgram Aura 2 TTFAModel130
WebSocket egress and PSTNCarrier55
Sum470

Platform integration

4 platform services, zero reinvention.

Instead of building auth, authorization, comms, and commerce from scratch, Sotto integrates the shared KaritKarma platform layer. Domain code focuses on voice and ordering.

Wenme

Authentication

OAuth 2.1 plus PKCE with passkeys (WebAuthn / FIDO2). Restaurant owners and staff sign in without passwords.

Darwan

Authorization

Owner, manager, and staff roles scoped per restaurant. Allow / deny decisions with signed audit trails per access attempt.

BitsPath

Voice and comms

Carrier-grade PBX for inbound call routing, plus SMS, email, and WhatsApp for order confirmations and real-time alerts.

Loom

Menu and commerce

Menu items, variants, modifiers, and pricing. Real-time sync with Square UK, Toast, and Clover POS systems.

UK regulatory shape

Big 14, integer pence, GDPR on a schedule.

UK rules shape the conversation machine, the money math, and the retention policy. None of this is bolted on; each one is a first-class concern in the codebase.

UK Big 14 allergens, enforced

AllergenCheck is a mandatory state in the conversation machine, not a checkbox. The AI enumerates allergens per item and asks about caller allergies before any order can confirm.

VAT in integer pence

Standard 20 percent, Zero 0 percent, Reduced 5 percent. Stored as basis points and money values as integer pence so VAT never drifts a rounding penny.

GDPR on a daily schedule

Per-tenant retention (default 365 days). Daily 02:00 UTC purge. Right-to-erasure anonymises calls, customers, and transcripts in one transaction. HMRC 7-year financial retention is preserved.

Sotto vs the alternatives

Phone call. POS write. Done.

Versus a human server with a notepad, a touch-tone IVR, or a chatbot on the website, here is what the architecture does differently.

CapabilitySottoHuman staffTouch-tone IVRWeb chatbot
Picks up under 500msVariableWeb only
Available 24/7
Natural phone conversationText
Big 14 allergen enforcementMandatory stateHopefully
Writes to your POSSquare / Toast / CloverManualLimited
Reads full menu accuratelypgvector RAGMemoryTone tree
GDPR retention per tenantVaries

What ships today

Production-ready, in active UK pilots.

Zero missed calls during pilots

AI answers every call, 24 / 7. No more lost orders during peak hours in pilot restaurants.

EN and BN handled mid-call

Llama 4 Scout handles mixed-language conversation without a restart. Whisper Large v3 Turbo transcribes both.

Front-of-house freed for service

Staff focus on hospitality and table service instead of phones during peak.

Sub-500 millisecond voice latency

Stage-by-stage budget held under load thanks to the streaming token bridge to Aura 2.

GDPR by daily schedule

Per-tenant retention with automated 02:00 UTC purge and a right-to-erasure endpoint.

4 KaritKarma platform services

Wenme, Darwan, BitsPath, Loom. Plus Stripe UK, Square / Toast / Clover, Uber Direct, Stuart.

Frequently asked

Sotto, asked plainly.

What is the Sotto case study?
The Sotto case study documents how a UK-focused voice AI takes restaurant phone orders end-to-end. Sotto is built as a .NET 10 monorepo of 29 projects with Clean Architecture and 609 tests. The voice path runs on Groq Llama 4 Scout 17B for reasoning, Whisper Large v3 Turbo for transcription, and Deepgram Aura 2 for speech. End-to-end PSTN-to-PSTN latency stays under 500 milliseconds. The case study covers the latency budget, the eight-state conversation machine (with mandatory AllergenCheck for the UK Big 14), POS integrations (Square UK, Toast, Clover), payments (Stripe UK in integer pence), and per-tenant GDPR retention.
Is Sotto live in production?
Sotto is built and tested as a production-ready platform with 609 tests covering the voice pipeline, order flow, POS connectors, and payment links. UK restaurant pilots are active. We do not promote pilots to general-availability claims, so we label Sotto as production-ready with active pilots rather than as a multi-customer SaaS roster. Named-customer references are added only with written permission.
How does Sotto hit sub-500 millisecond voice latency?
The latency budget is broken down stage by stage in the product page. Each call runs on its own gRPC stream against a shared ConversationOrchestrator pool, so concurrent calls do not queue. A streaming token bridge batches LLM tokens at sentence boundaries and ships them to Deepgram Aura 2 TTS before the full Llama 4 Scout response is generated, which hides time-to-first-token behind time-to-first-audio. Whisper Large v3 Turbo on Groq sits near 150 milliseconds for transcription, leaving headroom for the network legs.
What KaritKarma platform services does Sotto integrate?
Sotto integrates the shared KaritKarma platform layer rather than rebuilding it. Wenme provides OAuth 2.1 plus PKCE authentication with passkeys for restaurant owners and staff. Darwan provides role-based access with owner, manager, and staff scopes per restaurant. BitsPath provides voice, SMS, and email notifications. Loom provides menu and commerce data with item variants, modifiers, and pricing sync. POS connectors for Square UK, Toast, and Clover sit behind a common IEPosConnector interface. Payments run on Stripe UK with checkout sessions, payment links, SMS payment links, Apple Pay, and Google Pay.
Is Sotto compliant with UK food and data regulations?
Yes. AllergenCheck is a mandatory state in the conversation machine, so the AI enumerates the Big 14 allergens for every ordered item and asks about caller allergies before any order can be confirmed. VAT is calculated per item in basis points (Standard 20 percent, Zero 0 percent, Reduced 5 percent) and stored as integer pence, so totals never drift by a rounding penny. HMRC 7-year financial retention is preserved. GDPR retention is per-tenant configurable (default 365 days) with a daily 02:00 UTC purge and a right-to-erasure endpoint that anonymises calls, customers, and transcripts in one transaction.
Where does Sotto run and what is the deployment model?
Sotto is deployed in a UK region for data-residency reasons. Telephony runs on Twilio PSTN with SIP trunking and webhook routing. The voice plane is internal microservices on .NET 10. The application database is PostgreSQL 18 with pgvector for menu embeddings (bge-large-en-v1.5). Redis 8 handles ephemeral state and Cloudflare fronts the merchant dashboard. The deployment model is dedicated tenant rather than shared multi-tenant for the voice pipeline, so latency and cost stay predictable per restaurant chain.

Explore Sotto

Voice AI that never misses a call.

See how Sotto answers, takes the order, enforces allergens, and writes to Square, Toast, or Clover before the caller hangs up.