Humans Are the Integration Layer

The Most Expensive Middleware

Every organisation runs an integration layer that never appears on an architecture diagram. It is people.

Watch any team for a day. Someone copies a figure from a dashboard into a slide. Someone re-explains a decision that lives in a message thread from March. Someone asks a colleague where the protocol document went, because the search box returned forty results and none of them was it. We remember where things live, translate between formats, carry context from tool to tool. Humans are the integration layer, and they are the most expensive middleware ever deployed.

The cost is measurable. Employees spend an estimated 1.8 hours every working day, 9.3 hours per week, searching for and gathering information.¹ Other estimates run higher: roughly 2.5 hours per day, or 30% of the workday.² One survey puts the waste at 19.8% of business time, the equivalent of one full day per working week.³

Not analysing information. Not making decisions. Not creating value. Simply trying to locate it.

Across an organisation, this becomes enormous. A company with two hundred employees can easily lose the equivalent of fifty people's productivity every day to information retrieval. The strange part is that the knowledge usually already exists. It has been written down, stored, archived, indexed. But accessing it is far more difficult than it should be.

The Fragmentation of Organisational Knowledge

The core problem is not lack of information. It is fragmentation.

Modern organisations accumulate knowledge across dozens of systems. Documentation platforms contain procedures and internal policies. Messaging tools capture informal decisions and operational updates. Databases hold structured operational data. Code repositories contain technical logic and system behaviour. Emails contain negotiations and historical context.

Each system captures part of the organisation's knowledge. None captures the whole picture. So the gaps get bridged the only way they can be: by people, searching across tools, asking colleagues for missing context, reconciling the versions by hand.

This works reasonably well in small teams. But as organisations grow and information multiplies, the cognitive burden becomes overwhelming. A typical healthcare organisation documents and shares information across five or more platforms; many workers cannot even say how many knowledge tools their organisation uses.

Organisations are not suffering from lack of knowledge. They are suffering from the inability to access the knowledge they already possess.

Computing Has Been Here Before

Each era of computing solved a different kind of complexity. In the early days, software had to interact directly with hardware: programs managed memory, storage, and devices themselves. That quickly became impractical, and operating systems emerged to manage hardware resources and provide a structured environment where applications could run safely.

Later, as networks expanded, another challenge appeared: how could computers communicate reliably across the world? Protocols such as TCP/IP solved that problem by creating standard ways for machines to exchange data. As the internet grew, the challenge became information itself: the web created enormous volumes of data that humans needed to navigate, and search engines became the solution. Each of these innovations introduced a new layer of abstraction that simplified complexity.

Today we are encountering a new form of complexity.

The complexity of knowledge.

The Arrival of Large Language Models

The emergence of large language models in the early 2020s changed how machines interact with language.

Large Language Models (LLMs) are AI systems trained on vast collections of text. They learn statistical patterns in language that allow them to understand questions, summarise documents, generate explanations, and reason across large bodies of text. Systems such as GPT, Claude, Gemini, Llama, and Qwen belong to this category.

For the first time, machines could interpret human language with surprising fluency. This immediately suggested a possibility: if machines could read and understand text, perhaps they could help organisations navigate their internal knowledge. But several obstacles quickly became apparent.

What Language Models Lacked

The first obstacle was knowledge access. Language models are trained on public data. They understand general topics but know nothing about a specific organisation's internal information. Without access to organisational knowledge, language models can only provide generic answers.

The second obstacle was reliability. Language models generate responses by predicting likely sequences of words. This allows them to produce remarkably fluent text. But it also means they occasionally produce answers that sound convincing but are incorrect, a phenomenon commonly known as hallucination. In casual applications this is tolerable. In professional environments it can be dangerous.

The third obstacle was infrastructure. Many early AI tools operated entirely in cloud environments. For organisations handling sensitive information, this raised immediate concerns. Where does the data go? Who can access it? Does the system comply with industry regulations? In healthcare, these are not concerns. They are hard stops.

Language models were powerful. But they were one component of a much larger system that still needed to be built.

Five Technologies That Changed Everything

Between 2024 and 2026, five technologies matured almost simultaneously. Together they took organisational knowledge infrastructure out of the exclusive domain of tech giants with billion-dollar budgets.

1. Retrieval-Augmented Generation (RAG)

RAG is a technique that gives an AI the ability to look things up before answering. Instead of relying solely on what the model memorised during training, a RAG system retrieves relevant documents from your organisation's knowledge base and feeds them to the AI alongside your question. The AI's answer is now grounded in your actual data, and because every answer can cite its sources, you can verify it.

RAG moved from research novelty to production reality in five years.⁴ A study in Nature Digital Medicine found that a RAG-powered system achieved 96.4% accuracy in assessing surgical fitness, compared to 86.6% for human responses using the same guidelines. By 2025, RAG had become the default architecture for enterprise AI.

2. Semantic Search and Vector Embeddings

Traditional search matches keywords. Semantic search understands meaning. It converts text into mathematical representations called embeddings: numerical fingerprints that capture what a passage is about, not just what words it contains. "Post-op infection protocol" finds documents titled "surgical site infection prevention guidelines", because the meaning is the same, even though the words are not.

Modern embedding models like BGE-M3 support over 100 languages and handle documents up to 8,192 tokens. With extensions like pgvector, these capabilities run directly inside PostgreSQL, with no additional infrastructure required.

Combining semantic search with traditional keyword search creates hybrid search: the union of both catches what either alone would miss.

3. Open-Weight Models

Until 2023, the most capable language models were locked behind proprietary APIs. You could use GPT-4, but you could not download it, inspect it, or run it on your own hardware. Your data had to travel to someone else's servers.

That changed when Meta released Llama, Alibaba released Qwen, and Mistral released a family of efficient models, all with open weights. For the first time, organisations could download production-quality language models and run them entirely within their own infrastructure.

Open-weight models broke the monopoly. They made self-hosting not just theoretically possible but practically viable. And they shifted the economics of AI from per-token API costs to fixed infrastructure costs, which, for organisations above a certain size, is dramatically cheaper.

4. The Model Context Protocol (MCP)

RAG, embeddings, and open models solve retrieval, understanding, and inference. But there is a fourth problem none of them addresses: how does an AI agent connect to the tools and data sources it needs to act?

Before MCP, every integration was bespoke. Connecting an AI to your documentation platform required one custom connector. Connecting it to your messaging tool required another. Connecting it to your database required a third. If you had ten AI applications and a hundred tools, you potentially needed a thousand different integrations.

The Model Context Protocol, introduced by Anthropic in November 2024, provides a universal, open standard for connecting AI systems to external tools and data sources. Think of it as USB-C for AI applications: a single interface that any model can use to communicate with any tool.

Adoption was immediate: OpenAI, Google DeepMind, and Microsoft all adopted MCP within months of its release. By late 2025, thousands of MCP servers had been built by the community, SDK downloads were in the tens of millions per month, and the protocol was being used in production at major enterprises. In December 2025, Anthropic donated MCP to the Agentic AI Foundation under the Linux Foundation (co-founded by Anthropic, Block, and OpenAI), ensuring vendor-neutral governance.

MCP is what transforms a knowledge retrieval system into an operating system. Without it, every connector is hand-built plumbing. With it, agents get a standard interface to tools and data, the same way TCP/IP gave computers a standard way to talk to networks.

5. Self-Hosted Inference Engines

The final piece was runtime efficiency. Open models needed an efficient way to run on affordable hardware.

A new generation of inference engines solved this. vLLM, the most widely adopted, achieves 2–4x higher throughput than standard approaches by eliminating GPU memory waste through a technique called PagedAttention, and serves multiple concurrent users with an OpenAI-compatible API. SGLang, developed at UC Berkeley, optimises for structured generation and complex agent workflows. llama.cpp made it possible to run models on CPUs and consumer hardware, the project that democratised "AI on a laptop". TensorRT-LLM from NVIDIA delivers the highest raw performance on NVIDIA GPUs for production deployments. And Ollama wrapped local inference in a developer experience as simple as pulling a Docker image.

Model quantisation (compressing weights from 16-bit to 4-bit precision) shrinks a 14-billion parameter model from roughly 28 GB to 9 GB, making it practical on a single GPU.

A single GPU server can now serve a production-grade AI assistant to an entire organisation. Sub-second response times. Zero data leaving the network. The performance gap between self-hosted and cloud-hosted AI has narrowed dramatically. The privacy and compliance advantages remain absolute.

Together, these five technologies (retrieval, semantic understanding, downloadable models, standardised connectivity, and efficient local inference) form the complete stack. Each one was necessary. None alone was sufficient. Their simultaneous maturation is what made the agentic OS possible.

The Emergence of Agentic Systems

Over the past two years, a new architectural idea has begun to emerge. Instead of treating AI as a simple chatbot that responds to prompts, engineers have begun designing systems where AI behaves more like an agent interacting with information systems.

An AI agent is capable of performing sequences of actions. It interprets a query, searches for relevant information, evaluates evidence, uses tools, and generates responses based on the results. This is the agent loop: reason, decide, execute, observe, iterate, up to a maximum number of steps.

This approach allows AI systems to interact with real organisational knowledge rather than relying solely on their training data.

The numbers tell the story: 30% of organisations are exploring agentic AI and 38% are piloting solutions, but only 14% have solutions ready to deploy and just 11% are actively using them in production. 42% are still developing their strategy; 35% have no formal strategy at all.⁵

The concept is real. The implementation gap is enormous.

The Agentic Operating System

This gap leads to the concept of an Agentic Operating System. Traditional operating systems manage computational resources for software applications: they coordinate memory, storage, permissions, and processes.

An agentic operating system performs a similar role for AI agents. It manages how agents access knowledge. It controls which tools they can use. It verifies sources and evidence. It enforces safety and compliance boundaries.

In effect, it becomes a governance layer for machine intelligence.

When a user asks a question, the system does not simply generate text. Instead, the agent performs a sequence of controlled actions: interpret the query, search relevant knowledge sources, evaluate the retrieved information, construct a grounded response, and verify that the response is safe and supported by evidence.

The result is not merely generated text. It is an answer connected to real organisational knowledge, with an audit trail for every step.

Gartner predicts that by 2026, over 40% of enterprise applications will embed role-specific AI agents. The market is converging on a common architecture: multi-agent orchestration, tool sandboxing, audit trails, and governance layers.

But there is a critical gap. Most enterprise agentic platforms assume cloud deployment. They assume your data can travel to a third-party provider. For healthcare, for defence, for legal, for any regulated industry where data sovereignty is non-negotiable, this assumption is a dealbreaker.

Why Healthcare Gets Hit Hardest

Knowledge fragmentation exists everywhere. But nowhere is the combination of fragmentation, regulation, and consequence more acute than in healthcare.

The stakes are real. When a warehouse manager cannot find a delivery manifest, the cost is a delayed shipment. When a surgical team cannot confirm a protocol, the cost is measured differently.

The regulation is unforgiving. Healthcare data is the most regulated data on earth. HIPAA in the United States. GDPR in Europe. US state legislatures introduced hundreds of AI-related bills in 2024–2025 alone; dozens became law. The cost of getting data handling wrong is not a fine. It is an average $7.42 million per breach, the highest of any industry for fourteen consecutive years.⁶

The tools keep multiplying. Each new platform promises to consolidate information. In practice, each one becomes another silo.

In healthcare, the knowledge problem is not an inconvenience. It is a safety issue. And it is precisely the domain where the gap between cloud-first agentic platforms and the requirements of the real world is widest.

What Building One Teaches

I know this because I built one.

I am the CTO of a surgical technology company. Our teams work across operating theatres, sterile processing departments, hospital warehouses, and distribution centres. Our knowledge lived in the same fragmented landscape as everyone else.

So I built an agentic OS: a self-hosted AI knowledge platform that connects every internal knowledge source into a single intelligent agent, running entirely within our own infrastructure. Our team uses it every day: to answer operational questions, find engineering decisions buried in old pull requests, pull delivery data across hospital sites, and onboard new hires into institutional knowledge they would otherwise spend weeks discovering.

I did not build this as a demo. I built it because the same principles that define our surgical systems (traceability, accountability, and precision) should define how we work internally.

Once you begin building a real-world agentic system, something becomes immediately clear.

The AI model itself is rarely the hardest part.

The real challenge lies in the surrounding infrastructure. Getting data out of documentation platforms, messaging tools, code repositories, and databases (keeping it fresh, handling rate limits, detecting changes, managing failures gracefully) accounts for the majority of the engineering effort. The AI model is often the easiest component.

Every connector has different rate limits, different authentication schemes, different change detection mechanisms, and different failure modes. You build a connector, it works beautifully for three weeks, and then a platform API update breaks your change detection logic at 2 AM.

In practice, building an agentic knowledge system resembles distributed systems engineering more than traditional machine learning.

Three Principles That Emerged

Building a production agentic OS taught me three principles that I believe apply to anyone building these systems, in any domain.

Trust requires evidence, not fluency

Fluent language alone does not create trust. Evidence does.

Early versions of the system I built would produce fluent, confident, entirely fabricated answers when retrieval returned weak results. The response sounded authoritative. A user unfamiliar with the topic would have no reason to doubt it. In a system used by people making decisions about surgical instruments, patient protocols, or delivery logistics, this is dangerous.

The fix was structural: evidence gating. Below a confidence threshold, the system blocks the answer entirely and returns an explicit "insufficient evidence" response. Not a soft warning. A hard gate. I ended up building a four-step citation recovery pipeline, because the language model alone cannot be trusted to cite accurately. Trust must be verified, not assumed.

Systems that occasionally admit uncertainty often become more trustworthy than those that answer everything.

Language models should speak. Classical systems should decide.

Language models are excellent at interpreting language and generating explanations. But critical decisions should not depend solely on probabilistic systems.

Reliable architectures separate the probabilistic layer (where language models interpret questions and generate candidate responses) from the deterministic layer (where traditional software logic verifies evidence, enforces rules, and ensures compliance).

The language model handles perception: reading documents, understanding queries, generating natural language. But the consequential decisions (whether to answer, what to cite, whether to escalate, whether to block) are made by deterministic logic. Evidence gates, citation enforcers, compliance scanners, tool sandboxes. These are not AI. They are rules, thresholds, and validation layers designed by engineers who understand the consequences of getting it wrong.

This principle maps directly to how we build surgical AI systems: probabilistic perception, deterministic checks, human escalation. The domain changes. The architecture does not.

Security is the architecture, not a layer

In healthcare AI, security cannot be bolted on. It must be the architecture.

Protected health information scanning at a single boundary is not enough. Real systems need scanning at multiple boundaries: when a query enters, when a response leaves, when tools return output, when documents are ingested, and when the system writes to memory. Authentication must be enforced, not optional. Tool execution must be sandboxed with schema validation, scope boundaries, and prompt injection detection.

This is not paranoia. It is the minimum standard for any system that handles healthcare data.

Where the Industry Is Today

Despite the intense attention surrounding AI, agentic systems remain in an early stage. Many organisations are experimenting. Fewer have deployed successfully at scale. Major technology companies (Microsoft, Salesforce, ServiceNow, Oracle) have all announced agentic AI platforms. The market is real.

RAG itself is evolving from a retrieval pipeline into what some now call a "knowledge runtime" or "context platform": an orchestration layer that manages retrieval, verification, reasoning, access control, and audit trails as integrated operations. By 2026, 60% of new RAG deployments are expected to include systematic evaluation from day one, up from under 30% in 2025.⁷

Most healthcare organisations have already adopted or are exploring generative AI. The question is no longer whether to adopt AI. It is how, and critically, where your data goes when you do.

Here is my thesis for what comes next: by 2028, the agentic OS will be as fundamental to organisational infrastructure as the database is today. Not as a product category: as a layer. Every organisation above a certain size will have a knowledge runtime that continuously indexes internal information, connects agents to tools via standard protocols, enforces compliance at every boundary, and provides an audit trail for every action. The organisations that build this layer now (with governance, citation enforcement, and data sovereignty from the beginning) will have a structural advantage that is nearly impossible to replicate later. The ones that wait will spend years retrofitting trust into systems that were never designed for it.

Key Learnings

1. The hardest problem is not AI; it is plumbing. Data synchronisation, connector reliability, rate limit handling, and change detection account for the majority of the engineering effort. The AI model is often the easiest component.

2. Citations are non-negotiable. An AI that gives confident-sounding answers without evidence is worse than no AI at all. Trust must be verified, not assumed.

3. Evidence gating prevents harm. When retrieval confidence is low, the system must refuse to answer, not guess. A fluent fabrication is more dangerous than silence.

4. Language models should speak. Classical systems should decide. The probabilistic layer handles perception and generation. The deterministic layer handles consequences.

5. Security is the architecture, not a layer. Compliance scanning at multiple boundaries, enforced authentication, tool sandboxing, and injection detection are not features; they are the foundation.

6. Self-hosting is no longer a sacrifice. Modern quantised models on optimised inference engines deliver production-grade quality at fixed cost, with absolute data sovereignty.

7. Cost predictability matters more than cost minimisation. A fixed infrastructure cost that scales to five hundred users without increasing is more valuable than the cheapest option for five.

What Comes After Storage

For most of computing history, we focused on storing information. Databases and document systems captured knowledge. But capturing information is only the first step; understanding it is far more valuable.

Agentic systems represent an early step toward a future where organisations can reason over their own knowledge, where the answers that already exist, scattered across dozens of systems, can be found in seconds rather than hours, grounded in evidence rather than memory, and governed with the same rigour we apply to the instruments themselves. When that future arrives, the operating system of the modern organisation will not simply manage files and processes.

It will manage knowledge itself.

These systems will be built. The only question is whether they are built with the traceability, accountability, and precision that the domains they serve demand. I build AI for surgical safety: in my world, the cost of a wrong answer is not a bug report. It is a patient outcome.

The knowledge is already there. Written down, stored, indexed, waiting. Machines can finally do the integrating.

People were never meant to be middleware.

Footnotes

McKinsey Global Institute, "The Social Economy", 2012. An old number that keeps being replicated because the underlying behaviour has not changed. ↩
IDC estimate of time knowledge workers spend searching for and gathering information. ↩
Interact survey of office workers on time wasted searching for the information needed to do the job effectively. ↩
Lewis, P. et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks", NeurIPS 2020. ↩
Deloitte, Emerging Technology Trends study, 2025. ↩
IBM, "Cost of a Data Breach Report", 2025. ↩
NStarX, "The Next Frontier of RAG", 2025. ↩