Skip to content

Architecture

LocoPuente is the convergence point for the entire LocoLabo research programme. Five sibling projects each contribute a layer to what students experience.


LayerProjectsWhat They Contribute
InfrastructureLocoBench + LocoConvoyHardware selection, capacity planning, multi-GPU scaling
IntelligenceLocoLLM + LocoAgenteRouted specialist models, agentic tools, scaffolded reasoning
ExperienceLocoEnsayoAI-populated rehearsal environments, simulation scenarios

LocoBench feeds the infrastructure decisions

Section titled “LocoBench feeds the infrastructure decisions”

LocoBench maps the floor of local LLM inference across every consumer GPU VRAM tier. That research answers the question every deployment starts with: what is the minimum viable hardware for a student-facing service? LocoPuente PoC deployments are essentially LocoBench findings applied to a real environment, validating benchmarks under genuine academic load.

When fifty students hit the service simultaneously, single-GPU inference becomes a bottleneck. LocoConvoy’s research into multi-GPU parallelism on consumer PCIe hardware, load balancing, and Mixture of Agents architectures feeds directly into LocoPuente’s capacity planning and scaling decisions.

Rather than routing every student request through one large model, LocoLLM’s swarm of small specialist models behind a smart router delivers more efficient, more targeted responses at lower VRAM cost. As LocoPuente scales, LocoLLM is the architecture that makes it affordable.

The Study Buddy, the Curriculum Explainer, the research assistant that can read a PDF and generate a structured summary — these require more than single-turn chat. LocoAgente’s research into agentic scaffolding on small models is what makes these tools viable without frontier hardware.

LocoEnsayo builds AI-populated rehearsal environments for professional education. In LocoPuente, those environments become the richest student-facing experiences on offer — simulated organisations, role-play scenarios, virtual client meetings. The research and the deployment are the same system.


All core AI services expose OpenAI-compatible APIs, enabling interoperability across the stack.

EndpointProvided ByConsumed By
/v1/chat/completionsOllamaOpen WebUI, Perplexica, AnythingLLM, Open Notebook, Custom chat
/v1/audio/transcriptionsSpeachesOpen WebUI, Open Notebook
/v1/audio/speechSpeachesOpen WebUI, Open Notebook
Image generation APIComfyUIOpen WebUI (in-chat images), direct student UI
Web search APISearXNGOpen WebUI, Perplexica

This means any component can be swapped or upgraded independently without breaking integrations.