VAT, all integers
Standard 20%, zero 0%, reduced 5%, stored as basis points. Every money value is an integer in pence so VAT never drifts a rounding penny.
Sotto is a voice and SMS AI that answers your restaurant's phone in under 500 milliseconds. It takes the order, enforces the UK Big 14 allergens, and writes the ticket to your POS. No missed calls, no hand-typed orders, no walked-away customers at peak.
Hi, can I get two Margheritas and a Diet Coke, for collection?
Two Margheritas, one Diet Coke, collection. Quick check on allergens, anyone in the order needs to avoid gluten, milk, or nuts?
What is Sotto
Sotto answers your existing UK restaurant number, holds a natural conversation with the caller in English or Bengali, captures the order with allergens and modifiers, takes payment by SMS link or pay-on-arrival, and writes the ticket into Square, Toast, or Clover without anyone touching a keyboard.
The voice pipeline is built end-to-end in .NET 10 microservices around a Twilio Media Stream WebSocket. Groq runs Whisper Large v3 Turbo for transcription and Llama 4 Scout for reasoning. Deepgram Aura 2 produces the spoken reply. A streaming token bridge ships sentence-boundary chunks to the TTS before the LLM has finished its full response, which is how the round-trip stays under 500 milliseconds.
Sotto ships as a multi-tenant SaaS atsotto.karitkarma.comwith per-restaurant subdomains, Traefik routing, and Let's Encrypt certs per merchant.
Voice pipeline
A caller's patience for dead air is measured in fractions of a second. Sotto's latency budget is published, broken down by stage, and held to under 500ms end-to-end. The streaming token bridge between the LLM and the TTS is the trick that keeps it there.
Network legs (Twilio PSTN in and out) account for around a quarter of the budget. The model legs (STT, LLM time-to-first-token, TTS time-to-first-audio) account for the bulk. Internal hops are negligible.
| Stage | ms | |
|---|---|---|
| Twilio PSTN Carrier-side | 50 | |
| WebSocket ingress VoiceGateway | 5 | |
| VAD + buffer drain 50 RMS / 200ms guard | 5 | |
| ฮผ-law to PCM WAV 8 to 16 kHz | 5 | |
| Groq Whisper STT whisper-large-v3-turbo | 150 | |
| gRPC to orchestrator Duplex stream | 5 | |
| pgvector RAG search bge-large-en-v1.5 | 15 | |
| Groq Llama 4 Scout TTFT 17B Scout, streaming | 50 | |
| Deepgram Aura 2 TTFA aura-2-luna-en | 130 | |
| WebSocket egress + PSTN Back to caller | 55 | |
| End-to-end time-to-first-audio | 470 | |
Conversation state machine
Sotto uses Stateless v5 for the conversation state machine. Every call walks through a defined set of states: greeting, menu inquiry, order building, allergen check, confirmation, payment, completion. A separate escalation branch handles human handoff.
AllergenCheck is mandatory. The state machine refuses to advance to confirmation until the AI has enumerated the Big 14 for every ordered item and asked the caller about their allergies. This is the law in the UK and it is the difference between a useful AI and a liability.
Static greeting plays under 50ms, no LLM wait.
RAG against per-tenant menu schema; AI explains items.
Slot filling with validation, modifiers, upsell triggers.
Big 14 enumerated for every item. Cannot be skipped.
Read-back of items, totals, and customer name.
Stripe checkout link via SMS, Apple/Google Pay, or pay-on-arrival.
Order pushed to POS, kitchen ticket fires, daily metrics update.
Human-agent handoff with full conversation context.
UK compliance, built in
Most ordering software treats VAT, allergens, and GDPR as features you configure later. Sotto ships them as default behaviours of the platform itself.
Standard 20%, zero 0%, reduced 5%, stored as basis points. Every money value is an integer in pence so VAT never drifts a rounding penny.
AllergenCheck is a state in the conversation machine, not a checkbox. The AI enumerates allergens for every ordered item and asks about the caller's allergies before the order can confirm.
Configurable per-tenant retention (default 365 days). Daily purge at 02:00 UTC. Right-to-erasure anonymises calls, customers, and transcripts in one transaction. HMRC 7-year financial retention is preserved.
Integrations
Sotto isn't a closed loop. It writes orders to your existing POS, takes payment through your existing Stripe account, and dispatches couriers through the 3PL you already use. The AI providers are pluggable; today Sotto runs on Groq and Deepgram for latency.
Onboarding a new restaurant is self-service through the merchant dashboard. The menu is pulled from the POS, embedded with bge-large-en-v1.5 vectors, and the AI is on the line the same day.
Compared with the alternatives
The honest comparison is not Sotto against another voice AI startup. It is Sotto against the staff member taking the call right now, the IVR you already abandoned, and the chat widget you bolted onto your website.
| Capability | Human staff | IVR tree | Generic chatbot | |
|---|---|---|---|---|
| Picks up under 500ms | Variable | Web only | ||
| Available 24/7 | ||||
| Natural conversation | Text | |||
| Multi-language (EN + BN) | Depends | Varies | ||
| Big 14 allergen enforcement | Mandatory | Hopefully | ||
| Writes to your POS | Square/Toast/Clover | Manual entry | Limited | |
| Reads full menu accurately | pgvector RAG | Memory | Tone tree | |
| Human escalation built in | Is human | Maybe | ||
| GDPR audit trail | Varies | |||
| Cost per order at scale | Pence | Pounds | Pence | Pence |
Human staff are still essential in the dining room. Sotto exists for the phone line at the moment a kitchen ticket needs to be punched in correctly, without the host having to choose between the caller and the customer at the bar.
The shipped reality
Sotto is live at sotto.karitkarma.com today. The codebase is a 29-project .NET 10 solution with a Next.js 16 dashboard, PostgreSQL 18 with pgvector for menu RAG, Redis 8 for conversation state, and RabbitMQ via MassTransit for inter-service events.
PostgreSQL 18 with pgvector HNSW indexing for menu RAG. Redis 8 for session and conversation state. All money in integer pence, timestamps in UTC DateTimeOffset.
609 unit tests across 6 service test projects, 29 integration tests via Testcontainers, 4 k6 load scenarios, 3 BenchmarkDotNet suites.
Groq Llama 4 Scout 17B for reasoning, Groq Whisper Large v3 Turbo for STT, Deepgram Aura 2 for TTS. AI persona "Emma", British female voice.
16 mobile-first routes. Live orders over SignalR with pulsing live indicator. Live transcripts. Multi-location bulk editing. Self-service onboarding.
.NET 10.0.2 GA. Next.js 16.1.6, React 19.2.4, Tailwind 4. PostgreSQL 18 + pgvector. RabbitMQ 4 + MassTransit 8. OpenTelemetry traces, Jaeger UI, Prometheus alerts, Grafana dashboards.
Frequently asked
These answers are mirrored in JSON-LD so they are quotable by AI answer engines and search results.
Sotto is a voice-first AI that answers your restaurant's phone, takes orders through natural conversation, and pushes the captured order straight into your POS. It also handles 2-way SMS through the same conversation engine. Built on .NET 10 microservices with Groq Llama 4 Scout for reasoning, Whisper Large v3 Turbo for transcription, and Deepgram Aura 2 for speech. End-to-end latency sits under 500ms, which is fast enough that callers do not realise they are speaking to software.
Yes. The voice and SMS pipelines run on Groq Llama 4 Scout, which handles Bengali and English with native fluency. Whisper Large v3 Turbo transcribes both languages and Deepgram Aura 2 produces British-English speech today, with multilingual voices on the same provider as the next switch. Mixed-language conversations are handled mid-call without a restart.
Every Sotto call runs on its own gRPC stream against a shared ConversationOrchestrator pool, so concurrent calls do not queue. The voice pipeline is bounded under 500ms even under load thanks to a streaming token bridge that batches LLM tokens at sentence boundaries and ships them to TTS before the full response is generated. Load tests run against k6 scenarios covering voice latency, order flow, menu API, and concurrent calls.
Yes. AllergenCheck is a mandatory state in the conversation state machine, which means the AI enumerates the Big 14 for every ordered item and asks about caller allergies before any order can be confirmed. VAT is calculated per item in basis points (Standard 20%, Zero 0%, Reduced 5%) and stored as integer pence so totals never drift. GDPR retention is per-tenant configurable with automated daily purge and a right-to-erasure endpoint.
Sotto ships POS connectors for Square UK, Toast, and Clover behind a common IEPosConnector interface, so menus sync and orders write through without manual entry. Payments run on Stripe UK with checkout sessions, payment links, SMS payment links, and Apple Pay or Google Pay toggles. Delivery is dispatched through Uber Direct or Stuart with UK postcode zone validation.
Self-service onboarding runs through the merchant dashboard. Once a Twilio number is connected and a POS is paired, the menu is pulled, embedded with bge-large-en-v1.5 vectors for RAG, and the AI is live on the line. A typical single-location setup is finished the same day. Multi-location estates use the bulk-edit screens to clone menu and policy across sites.
Self-service onboarding, day-one POS sync, mandatory allergen check, GDPR-by-default. Bring your existing phone number, keep your existing Stripe.