Home

AI/ML · Santa Clara University — GenAI Course

PetTriage AI

Dual-agent veterinary triage system that helps pet owners make confident care decisions

Product & Technical Lead — 4-person team·10-day sprint — Jan 2026·GitHub
PetTriage AI
PetTriage AI · product surfaceLangGraph / Pinecone / GPT-4.1-mini / GPT-4o-mini
500ms
Response Time
70%
Cost Reduction
18,909
Documents Indexed
$0.003
Per Query Cost

The Challenge

First-time pet owners face a critical gap: Google provides contradictory advice, emergency vets cost $800+ for what's often a 'monitor at home' diagnosis, and 52% of US pet owners skip care entirely due to cost anxiety. Across 94 million pet-owning households spending $39.8B on vet care annually, there's no trusted middle layer for urgent decision-making. Existing pet health apps offer static checklists, not AI-powered triage grounded in veterinary literature.

My Role & Approach

Led a 4-person team through a 10-day sprint to ship a production-ready system. Owned the product vision ('Clear answers in anxious moments — helping pet parents make confident care decisions'), conducted user research with first-time pet owners, designed the technical architecture, built the RAG pipeline and evaluation framework, and defined the freemium business model ($270K ARR potential). Made every key architecture decision: dual-agent routing, safety system design, model selection for cost optimization.

Product Thinking

Problem Framing

We reframed the problem from 'build a pet health chatbot' to 'reduce the anxiety gap between Google and the ER.' This shifted our success metric from response accuracy to user confidence in their care decision — a fundamentally different product.

Key Tradeoffs

Safety vs. user experience: we chose hard-coded emergency routing over LLM-based detection for life-threatening symptoms. This meant slower iteration (code changes vs. prompt changes) but zero false negatives on critical cases. We also chose dual-agent over single-agent despite higher complexity — the consistency gain was worth the engineering cost.

What We Didn't Build

We scoped out breed-specific dietary recommendations, vet clinic locator/booking, and a symptom history timeline. Each was a good idea, but none served the core job-to-be-done: 'Should I go to the ER right now?' Keeping scope tight let us ship in 10 days.

Cross-Functional Leadership

Led a 4-person team with mixed technical backgrounds. Ran daily standups, owned the product roadmap, conducted user interviews with pet owners, presented to course judges, and coordinated architecture decisions across frontend and backend engineers.

Solution & Execution

Dual-agent architecture orchestrated by LangGraph (ReAct pattern). Specialized routing between a conversational agent (GPT-4.1-mini) and a medical triage agent (GPT-4o-mini) — the system decides which agent handles each query based on clinical complexity. RAG pipeline embeds 18,909 veterinary records (openFDA adverse events, clinical notes, pain assessments, genetic profiles) in Pinecone with sub-100ms search latency. Multi-modal inputs: GPT-4o Vision analyzes symptom photos, Gemini 2.0 Flash provides real-time web search for latest treatments. 11-layer safety validation (5 input + 6 output guardrails) ensures triage-only guidance with zero diagnosis claims. Hard-coded emergency routing for life-threatening conditions bypasses the LLM entirely.

Tech: LangGraph, Pinecone, GPT-4.1-mini, GPT-4o-mini, GPT-4o Vision, Gemini 2.0 Flash, Next.js, FastAPI

Impact

  • 500ms average response time (down from 2.1s with single-agent approach)
  • 70% cost reduction through strategic model selection ($0.003/query vs $0.01)
  • Sub-100ms vector search across 18,909 veterinary documents
  • 11-layer safety system: zero diagnosis claims, 100% triage-only guidance
  • Freemium business model: $270K ARR potential at 5% conversion of 50K downloads

Challenges & Pivots

Challenge: Single-agent system produced inconsistent medical triage — mixing casual conversation tone with clinical assessments

Resolution: Implemented dual-agent architecture with temperature tuning: 0.3 for medical triage (factual, conservative), 0.7 for conversation (empathetic, reassuring). This separated concerns and improved both accuracy and user experience.

Challenge: RAG hallucination on edge cases — the system would synthesize plausible-sounding but unsupported clinical guidance

Resolution: Built 11-layer safety validation with source citation requirements. If the RAG can't find supporting evidence, the system explicitly says 'I don't have enough information' rather than guessing.

Challenge: High API costs with GPT-4o for every query made the product economically unviable

Resolution: Analyzed query distribution — 73% were simple questions. Routed these to GPT-4o-mini with cached RAG results, reserving GPT-4o for complex multi-step triage. Cut costs by 70%.

Your dual-agent architecture and safety-first approach demonstrated production-level thinking. The cost optimization through intelligent routing shows mature product judgment.

Course Judge, GenAI for Enterprises, Santa Clara University

Screenshots

Symptom checker with 4 urgency levels (ER, Today, Soon, Monitor)
Symptom checker with 4 urgency levels (ER, Today, Soon, Monitor)
System architecture — dual-agent routing with LangGraph orchestration
System architecture — dual-agent routing with LangGraph orchestration
Conversational chat with veterinary source citations
Conversational chat with veterinary source citations
Freemium business model — $270K ARR target at 5% conversion
Freemium business model — $270K ARR target at 5% conversion

Technical Highlight

Dual-Agent Router — LangGraph
def classify_complexity(state: TriageState) -> TriageState:
    score = complexity_classifier.predict(state["query"])
    state["complexity"] = "simple" if score < 0.6 else "complex"
    return state

graph = StateGraph(TriageState)
graph.add_node("classify", classify_complexity)
graph.add_node("fast_agent", fast_agent)   # GPT-4o-mini — 73% of queries
graph.add_node("deep_agent", deep_agent)   # GPT-4o — complex triage

graph.add_conditional_edges(
    "classify",
    lambda s: s["complexity"],
    {"simple": "fast_agent", "complex": "deep_agent"},
)

What I Learned

  • Dual-agent systems beat generalist agents for complex domains — separating concerns improves both accuracy and cost efficiency.
  • Safety in AI products requires architectural validation, not just clever prompts. Hard-coded emergency routing was more reliable than any prompt engineering.
  • Cost optimization through strategic model selection is a product decision, not just an engineering one. Understanding query distribution patterns drives architecture choices.
  • Building an evaluation framework early (before optimizing) saved significant time — every prompt tweak could be objectively measured against the 200-question test suite.

What I'd Do Differently

I'd invest more time in user testing before building. We validated the technical approach thoroughly but only had informal conversations with pet owners. A structured user research phase with prototype testing would have surfaced the photo upload feature need earlier — we discovered it post-launch from user feedback.