Ledgerly AI reaches a new milestone: the system now learns from every LLM response and writes that knowledge back into the KB, so the same question is never sent to an LLM twice. Every LLM call is a one-time cost that permanently expands the KB — trending toward zero LLM calls over time. This release also ships the KB-First Agent Runner (multi-domain agents execute against live data with zero LLM calls), the Knowledge Expansion Engine (auto-researches web sources to fill KB gaps), a complete provider abstraction layer (no SDK is hardwired anywhere), a rebuilt KB scoring formula that correctly routes symptom queries, and a fully upgraded chat rendering engine with 539 registered AI tools and 156 agents now in the platform.
addedKB Gap Learner (kb-gap-learner.ts) — every time the LLM answers a query the KB could not, the Q→A pair is stored back into ai_response_cache. The next identical or similar query hits the KB at 0 LLM cost (~2ms vs 500ms+). Category-aware TTL (definitions: 7 days, troubleshooting: 24h, inventory: 1h, analytics: 30min). Quality gates reject low-value responses. Fire-and-forget so learning never blocks the user response.
addedKB-First Agent Runner (kb-first-agent-runner.ts) — multi-domain parallel agents now execute entirely without LLM calls: each agent hits KB cache + FAQ for its domain (0 cost, ~2ms), then runs domain tools against live DB (0 LLM). Only when ALL agents return empty does the system escalate to LLM. Covers 15+ domains: inventory, work_orders, crm, hr, leasing, financials, vendors, compliance, and more.
addedKnowledge Expansion Engine (knowledge-expansion-engine.ts) — automatically researches and ingests knowledge from external repair databases, OEM parts sources, HVAC forums, CPSC recall feeds, and completed work order history. KB grows while the system runs — new failure modes, parts, and fixes are indexed without manual input.
addedParallel Learning Module (parallel-learning.ts) — tracks execution metrics and domain combination performance across parallel agent dispatches. After 5+ executions on a query pattern, it recommends parallel dispatch if latency savings exceed 200ms and success rate is ≥ 70%. The routing strategy self-optimizes.
addedUnified Learning Engine (unified-learning-engine.ts) — permanent memory, community best practices federation, cross-portal intelligence transfer, gap detection, and internet knowledge ingestion all unified into one engine. Organizations that use the platform collectively improve the KB for every other org (privacy-safe, scrubbed).
addedVisual KB — vision-based knowledge layer: visual-weather-damage-kb.ts, visual-cost-estimation-kb.ts, visual-age-estimation-kb.ts, visual-diagnosis-engine.ts, image-kb-seeder.ts, image-kb-service.ts. Field photos now contribute to and query the KB directly. Recurring issue detector (recurring-issue-detector.ts) identifies patterns across properties.
addedKB visual contribution pipeline — /api/ai/vision/kb-contribute/route.ts accepts field tech photo uploads, runs PII/sensitive-data scrubber via regex, and queues for admin approval before indexing into the visual KB. Field knowledge flows back into the system.
addedNon-stream chat route quick-responder — /api/ai-chat/route.ts was missing tryQuickResponse() entirely. Now runs before the KB gate on this route, matching stream route behavior and preventing LLM calls for common queries on both chat paths.
addedAI Command Center (/dashboard/ai-command-center) — unified hub replacing all scattered AI admin pages. 8 tabs: Overview, Governance, Compliance, Ethics, Metrics, Gaps, Costs, Explainability. Intelligence gap detection surfaces queries the KB couldn't answer — these gaps are the input queue for the Knowledge Expansion Engine.
added36 AI API routes now in the platform — covering: streaming + non-streaming chat, KB contribution, vision analysis, repair guidance, step verification, meeting summarization, workflow generation, document processing, property analysis, anomaly streaming, internet knowledge ingestion, value impact, transcription, TTS, image analysis, smart form suggestions, feedback loop, relationship context, and action tracking.
improvedKB scoring formula: Jaccard (matches / totalUnique) replaced with query-coverage weighted formula (queryCoverage × 0.8 + specificity × 0.2). "Leak in sink" now scores 0.83 instead of 0.13 — symptom queries that were invisible to the KB now resolve from it directly. The v5.0 symptom expansion was actively hurting routing under Jaccard; this formula makes it work correctly.
improvedMain Trade KB symptom indexing: TRADE_KNOWLEDGE_BASE now extracts and indexes failureModes[].mode, failureModes[].symptoms[], and failureModes[].fix/repair keywords — not just component names and common parts. "No hot water", "drain backing up", "breaker trips" now hit the KB directly.
improvedKnowledge base depth: 30+ specialized KB files covering trades, work order procedures, compliance/safety, property management, appliance troubleshooting, building codes, fair housing, lease documents, HR/workforce, accounting/finance, inventory, vendor management, utilities, lease renewal, collections, unit turnover, emergency response, seasonal operations, pet policy, insurance/risk, procurement, multi-state landlord-tenant law, sustainable/green ops, affordable housing, equipment nameplates, inspection checklists, and hyper-local city intelligence (NYC). Each KB file grows via the Knowledge Expansion Engine.
improvedAIClientFactory (ai-client-factory.ts) — fully provider-agnostic: routes by task type across Ollama, Gemini, Claude, and GPT. vision → Ollama/Gemini, reasoning → Ollama/Claude/Gemini Pro, embedding → Ollama/Google, structured → Ollama/Gemini Flash, code → Ollama/Claude/Gemini, chat → Ollama/Gemini. Dynamic imports — zero hard dependencies at load time. 24 scattered raw Anthropic SDK usages replaced so no provider is hardwired anywhere in the codebase.
improvedAI Tools Registry: 539 registered tools across all domains — including new tools for image generation/editing, video generation, orchestration, visual reference lookup, LoRA fine-tuning, product harvesting, skill matching, sensor-triggered work orders, permit tracking, offline AI, observation memory, lease amendments, workforce forecasting, voice commands, codebase health, and agent lifecycle management.
improvedAI Agents: 156 agent files across functional (field-service, fashion, telecom, agriculture, construction, e-commerce), scaled (cross-domain parallel), autonomous, domain-specific (automotive, aerospace, food & beverage), support, governance, competitor analysis, and legal sentinel categories.
improvedWork order action-intent coverage: quick-responder pattern expanded from "how do I create" to catch "can you create / make / open / start / submit / file / log a work order". FAQ keywords expanded with make, start, submit, file, log, leaking, sink, broken, issue, repair, fix. "Can you create a work order for a leaking sink?" now resolves from KB with zero LLM call.
improvedStream route early quick-responder: length guard (> 14 chars) prevents short greetings from triggering early QR with empty context; passes pageContext and orgName fallback so page-help queries resolve before auth completes.
improvedNumbered list circles in chat: upgraded from pale background to copper gradient fill (from-primary-400 to-primary-600) with white text, shadow, and proper <ol> container wrapping. RichMessageContent component added for full markdown rendering: bold, italic, code blocks, headers, YouTube video previews with thumbnails, and syntax-highlighted code fences.
addedVision Provider Abstraction — 7 open-source-first providers behind unified model-router.ts: Ollama (local/offline), vLLM (self-hosted GPU), GOT-OCR2.0 (document OCR specialist), DeepInfra (Qwen2.5-VL-32B, cost-optimized), Fireworks (Qwen2.5-VL-72B, fast inference), OpenRouter (multi-model fallback). Routes 16 VisionIntent task types with local-first priority. unified-vision-pipeline.ts enforces 7-stage security pipeline: EXIF metadata stripping, decompression bomb detection, rate limiting, audit logging on every image processed.
addedSecurity Infrastructure — 8-layer defense stack now documented: Agent Detection (9-signal composite agentScore classifying human/suspicious/likely_agent/confirmed_agent), Anomaly Detector (3-sigma on 8 behavior metrics, 5× baseline triggers alert), Canary System (honeytoken records — any access is immediate CRITICAL event indicating exfiltration), Admin Monitor (26 sensitive actions in CRITICAL/HIGH/MEDIUM tiers, all logged to admin_audit_logs), Sentinel (13-file suite: AI threat orchestration, behavior learning, dependency scanning, IP reputation, real-time event stream, NVD/GitHub Advisory threat intel), API Guard (withOrgGuard() on every route), AI Route Guard (per-type rate limits: vision 10/min, chat 60/min, analytics 30/min, transcribe 15/min), Agent Middleware (blocks confirmed agents on billing/settings/v1 routes).
addedKB Persistence mechanism fully documented: ai_response_cache table (migration 20260416000017) stores learned Q→A pairs with organization_id isolation (unique constraint on org+hash enforces isolation at DB level), portal_type context (same query can have different cached answers per owner/vendor/tenant portal), category-aware TTL (definition/compliance: 7d, troubleshooting/procedure: 24h, product: 3d, maintenance/general: 6h, inventory: 1h, analytics: 30min, data-driven results: capped 1h), hit_count popularity signal for KB promotion, and parallel write to ai_intelligence_gaps for gap detection cron input.
fixedAnthropic credit error surfacing to UI: proactive-check/route.ts was calling new Anthropic() directly on every chat mount. When Gemini 403'd (referrer restriction on localhost), the multi-provider chain cascaded to Anthropic, burned credits, and showed a 400 error in the chat window. Replaced with AIClientFactory — no provider hardwired, no credit errors.
fixedWork order query duplication: "can you create a work order for X" was falling through to the LLM which outputted the full system prompt knowledge verbatim — causing the same response to appear 3–4 times with stray ### aerospace section headers. Now intercepted at quick-responder before touching the KB or LLM.