Sotto
Voice AI for restaurants

The line is busy
because Sotto picked up.

Sotto is a voice and SMS AI that answers your restaurant's phone in under 500 milliseconds. It takes the order, enforces the UK Big 14 allergens, and writes the ticket to your POS. No missed calls, no hand-typed orders, no walked-away customers at peak.

400ms
p50 voice
24/7
on the line
.NET 10
runtime
Live call ยท 00:42
Transcript
Caller

Hi, can I get two Margheritas and a Diet Coke, for collection?

Sotto

Two Margheritas, one Diet Coke, collection. Quick check on allergens, anyone in the order needs to avoid gluten, milk, or nuts?

State
AllergenCheck
RAG hits
3 / 3
VAT (bps)
2000

What is Sotto

A voice and SMS AI built specifically for the moment a restaurant phone rings.

Sotto answers your existing UK restaurant number, holds a natural conversation with the caller in English or Bengali, captures the order with allergens and modifiers, takes payment by SMS link or pay-on-arrival, and writes the ticket into Square, Toast, or Clover without anyone touching a keyboard.

The voice pipeline is built end-to-end in .NET 10 microservices around a Twilio Media Stream WebSocket. Groq runs Whisper Large v3 Turbo for transcription and Llama 4 Scout for reasoning. Deepgram Aura 2 produces the spoken reply. A streaming token bridge ships sentence-boundary chunks to the TTS before the LLM has finished its full response, which is how the round-trip stays under 500 milliseconds.

Sotto ships as a multi-tenant SaaS atsotto.karitkarma.comwith per-restaurant subdomains, Traefik routing, and Let's Encrypt certs per merchant.

SottoSub-500ms end-to-end8-state conversation machineMandatory AllergenCheckPer-tenant GDPR retentionSquare, Toast, Clover

Voice pipeline

Ten steps. 470 milliseconds. Most of it is the carrier.

A caller's patience for dead air is measured in fractions of a second. Sotto's latency budget is published, broken down by stage, and held to under 500ms end-to-end. The streaming token bridge between the LLM and the TTS is the trick that keeps it there.

Network legs (Twilio PSTN in and out) account for around a quarter of the budget. The model legs (STT, LLM time-to-first-token, TTS time-to-first-audio) account for the bulk. Internal hops are negligible.

Latency budget
470ms
NetworkModel inferenceInternal hop
Stagems
Twilio PSTN
Carrier-side
50
WebSocket ingress
VoiceGateway
5
VAD + buffer drain
50 RMS / 200ms guard
5
ฮผ-law to PCM WAV
8 to 16 kHz
5
Groq Whisper STT
whisper-large-v3-turbo
150
gRPC to orchestrator
Duplex stream
5
pgvector RAG search
bge-large-en-v1.5
15
Groq Llama 4 Scout TTFT
17B Scout, streaming
50
Deepgram Aura 2 TTFA
aura-2-luna-en
130
WebSocket egress + PSTN
Back to caller
55
End-to-end time-to-first-audio470

Conversation state machine

Eight states. One that the AI cannot skip.

Sotto uses Stateless v5 for the conversation state machine. Every call walks through a defined set of states: greeting, menu inquiry, order building, allergen check, confirmation, payment, completion. A separate escalation branch handles human handoff.

AllergenCheck is mandatory. The state machine refuses to advance to confirmation until the AI has enumerated the Big 14 for every ordered item and asked the caller about their allergies. This is the law in the UK and it is the difference between a useful AI and a liability.

Stateless v5UK Big 14 enforcedSliding 10-turn window
  1. 1

    Greeting

    Static greeting plays under 50ms, no LLM wait.

  2. 2

    MenuInquiry

    RAG against per-tenant menu schema; AI explains items.

  3. 3

    OrderBuilding

    Slot filling with validation, modifiers, upsell triggers.

  4. 4

    AllergenCheck

    Mandatory

    Big 14 enumerated for every item. Cannot be skipped.

  5. 5

    OrderConfirmation

    Read-back of items, totals, and customer name.

  6. 6

    PaymentPending

    Stripe checkout link via SMS, Apple/Google Pay, or pay-on-arrival.

  7. 7

    Completed

    Order pushed to POS, kitchen ticket fires, daily metrics update.

  8. 8

    EscalationRequested

    Human-agent handoff with full conversation context.

UK compliance, built in

The three things UK restaurant tech keeps getting wrong.

Most ordering software treats VAT, allergens, and GDPR as features you configure later. Sotto ships them as default behaviours of the platform itself.

VAT, all integers

Standard 20%, zero 0%, reduced 5%, stored as basis points. Every money value is an integer in pence so VAT never drifts a rounding penny.

UK Big 14 allergens

AllergenCheck is a state in the conversation machine, not a checkbox. The AI enumerates allergens for every ordered item and asks about the caller's allergies before the order can confirm.

GDPR on a schedule

Configurable per-tenant retention (default 365 days). Daily purge at 02:00 UTC. Right-to-erasure anonymises calls, customers, and transcripts in one transaction. HMRC 7-year financial retention is preserved.

Integrations

Plugs into the systems your kitchen already runs.

Sotto isn't a closed loop. It writes orders to your existing POS, takes payment through your existing Stripe account, and dispatches couriers through the 3PL you already use. The AI providers are pluggable; today Sotto runs on Groq and Deepgram for latency.

Onboarding a new restaurant is self-service through the merchant dashboard. The menu is pulled from the POS, embedded with bge-large-en-v1.5 vectors, and the AI is on the line the same day.

Twilio
PSTN + SMS
Telephony
Square UK
Menu sync + order push
POS
Toast
REST API v2.5
POS
Clover
REST API v3
POS
Stripe UK
GBP / integer pence
Payments
Uber Direct
Zone + ETA dispatch
Delivery
Stuart
UK 3PL dispatcher
Delivery
Groq
Llama 4 + Whisper
AI
Deepgram
Aura 2 TTS
AI

Compared with the alternatives

Sotto vs the people, the tone trees, and the generic chatbots.

The honest comparison is not Sotto against another voice AI startup. It is Sotto against the staff member taking the call right now, the IVR you already abandoned, and the chat widget you bolted onto your website.

Capability
SottoSotto
Human staff
IVR tree
Generic chatbot
Picks up under 500msVariableWeb only
Available 24/7
Natural conversationText
Multi-language (EN + BN)DependsVaries
Big 14 allergen enforcementMandatoryHopefully
Writes to your POSSquare/Toast/CloverManual entryLimited
Reads full menu accuratelypgvector RAGMemoryTone tree
Human escalation built inIs humanMaybe
GDPR audit trailVaries
Cost per order at scalePencePoundsPencePence

Human staff are still essential in the dining room. Sotto exists for the phone line at the moment a kitchen ticket needs to be punched in correctly, without the host having to choose between the caller and the customer at the bar.

The shipped reality

Built, tested, deployed. Not a roadmap deck.

Sotto is live at sotto.karitkarma.com today. The codebase is a 29-project .NET 10 solution with a Next.js 16 dashboard, PostgreSQL 18 with pgvector for menu RAG, Redis 8 for conversation state, and RabbitMQ via MassTransit for inter-service events.

Data layer

PostgreSQL 18 with pgvector HNSW indexing for menu RAG. Redis 8 for session and conversation state. All money in integer pence, timestamps in UTC DateTimeOffset.

Test coverage

609 unit tests across 6 service test projects, 29 integration tests via Testcontainers, 4 k6 load scenarios, 3 BenchmarkDotNet suites.

Voice stack

Groq Llama 4 Scout 17B for reasoning, Groq Whisper Large v3 Turbo for STT, Deepgram Aura 2 for TTS. AI persona "Emma", British female voice.

Dashboard

16 mobile-first routes. Live orders over SignalR with pulsing live indicator. Live transcripts. Multi-location bulk editing. Self-service onboarding.

<500ms
End-to-end voice latency
8
Conversation states
609
Unit tests, all green
29
.NET 10 projects
Built from

.NET 10.0.2 GA. Next.js 16.1.6, React 19.2.4, Tailwind 4. PostgreSQL 18 + pgvector. RabbitMQ 4 + MassTransit 8. OpenTelemetry traces, Jaeger UI, Prometheus alerts, Grafana dashboards.

Frequently asked

Six questions buyers actually ask.

These answers are mirrored in JSON-LD so they are quotable by AI answer engines and search results.

What is Sotto?

Sotto is a voice-first AI that answers your restaurant's phone, takes orders through natural conversation, and pushes the captured order straight into your POS. It also handles 2-way SMS through the same conversation engine. Built on .NET 10 microservices with Groq Llama 4 Scout for reasoning, Whisper Large v3 Turbo for transcription, and Deepgram Aura 2 for speech. End-to-end latency sits under 500ms, which is fast enough that callers do not realise they are speaking to software.

Does Sotto speak Bangla?

Yes. The voice and SMS pipelines run on Groq Llama 4 Scout, which handles Bengali and English with native fluency. Whisper Large v3 Turbo transcribes both languages and Deepgram Aura 2 produces British-English speech today, with multilingual voices on the same provider as the next switch. Mixed-language conversations are handled mid-call without a restart.

How does Sotto handle peak-hour call volume?

Every Sotto call runs on its own gRPC stream against a shared ConversationOrchestrator pool, so concurrent calls do not queue. The voice pipeline is bounded under 500ms even under load thanks to a streaming token bridge that batches LLM tokens at sentence boundaries and ships them to TTS before the full response is generated. Load tests run against k6 scenarios covering voice latency, order flow, menu API, and concurrent calls.

Is Sotto compliant with UK food regulations?

Yes. AllergenCheck is a mandatory state in the conversation state machine, which means the AI enumerates the Big 14 for every ordered item and asks about caller allergies before any order can be confirmed. VAT is calculated per item in basis points (Standard 20%, Zero 0%, Reduced 5%) and stored as integer pence so totals never drift. GDPR retention is per-tenant configurable with automated daily purge and a right-to-erasure endpoint.

Which POS systems and payment providers does Sotto support?

Sotto ships POS connectors for Square UK, Toast, and Clover behind a common IEPosConnector interface, so menus sync and orders write through without manual entry. Payments run on Stripe UK with checkout sessions, payment links, SMS payment links, and Apple Pay or Google Pay toggles. Delivery is dispatched through Uber Direct or Stuart with UK postcode zone validation.

How long does it take to set up Sotto for a restaurant?

Self-service onboarding runs through the merchant dashboard. Once a Twilio number is connected and a POS is paired, the menu is pulled, embedded with bge-large-en-v1.5 vectors for RAG, and the AI is live on the line. A typical single-location setup is finished the same day. Multi-location estates use the bulk-edit screens to clone menu and policy across sites.

Sotto is open for restaurants

Hand the busy line
to the AI that doesn't mishear.

Self-service onboarding, day-one POS sync, mandatory allergen check, GDPR-by-default. Bring your existing phone number, keep your existing Stripe.

On the line, right now
  • Twilio number connected in minutes, not weeks
  • Big 14 allergen check the AI cannot bypass
  • Direct write to Square, Toast, or Clover
  • Stripe UK checkout, payment links, SMS pay
  • Per-tenant GDPR retention with daily automated purge