BACCO
An autonomous AI concierge for the ultra-high-net-worth luxury travel market — built on proprietary vector databases and personalized inference pipelines.
VISIT SITE →<50ms
Vector Search Latency
10k+
Curated Experiences
24/7
AI Concierge
The luxury travel industry serves a clientele that expects bespoke service measured in seconds, not days. Traditional concierges scale linearly with headcount — every new member means another human in the loop. BACCO needed an AI system capable of understanding hyper-personal preferences (a member who only flies private after 10am, prefers boutique hotels under 30 rooms, and avoids restaurants with white tablecloths) while delivering recommendations indistinguishable from a top human concierge with 20 years of experience. The challenge: build an inference pipeline that combines proprietary curation, real-time availability data, and member-specific context — all responding in under a second.
How we built it
01 · Discovery
Mapped the concierge mental model
Spent 3 weeks shadowing top human concierges to understand the implicit reasoning steps. Codified 40+ preference dimensions and 12 reasoning patterns into a structured ontology.
02 · Architecture
Vector + relational hybrid
Designed a dual-database architecture: Pinecone for semantic similarity across experiences, PostgreSQL for hard constraints (dates, locations, budget caps). Custom retrieval orchestrator merges both in real-time.
03 · LLM Integration
Multi-model routing
Different models for different jobs: GPT-4 for conversational nuance, Claude for long-context preference synthesis, fine-tuned Llama for fast triage. Routing layer picks the right model per query type.
04 · Launch
Quiet rollout to top members
Soft launch to 50 high-net-worth members with weekly feedback loops. Iterated on prompt engineering and retrieval scoring based on real conversations.
What powers it
Next.js 14
App Router for streaming UI + edge runtime for low-latency conversations
Pinecone
Production-grade vector search with sub-50ms p95 latency
PostgreSQL + pgvector
Relational data + secondary vector store for hot queries
OpenAI + Anthropic + Llama
Multi-model routing for cost/quality optimization
LangGraph
Structured reasoning workflows with explicit decision trees
Vercel + AWS
Edge for chat interface, AWS for heavy ML inference jobs
BACCO is now serving members in real time with response quality that members consistently rate equal-or-better than human concierges in blind tests. The system has handled over 10,000 unique requests, learning from each interaction to refine member-specific recommendations. Average response time is under 800ms end-to-end, including LLM generation. Most importantly, BACCO scales without adding headcount: each new member costs near-zero marginal effort.
800ms
End-to-end response
94%
Member satisfaction
10k+
Requests handled
0
Human FTE added
5★
Avg rating
92%
Retention
“We replaced a process that took our team hours into something that responds in under a second — without losing the personal touch our members expect.”
— BACCO Founding Team
Luxury Travel Tech
Next case study