The Quiet Problem Hiding in Every Organisation
Walk into almost any modern organisation and you will see the same landscape.
Screens filled with dashboards. Documentation systems storing procedures. Messaging platforms capturing conversations. Databases recording transactions. Project trackers monitoring progress. Shared drives filled with files.
It should be the most information-rich working environment in human history.
And yet, ask people how much time they spend simply looking for information, and the answers are surprisingly consistent.
A document that you know exists but cannot find. A conversation that happened months ago but is buried in a thread. A decision that was made but never properly recorded.
Five minutes here. Ten minutes there. Small interruptions that accumulate quietly.
Research from McKinsey estimates that employees spend 1.8 hours every day — 9.3 hours per week — searching and gathering information. IDC puts it higher: roughly 2.5 hours per day, or 30% of the workday. Interact found that 19.8% of business time — the equivalent of one day per working week — is wasted searching for information to do the job effectively.
Not analysing it. Not making decisions. Not creating value.
Simply trying to locate it.
Across an organisation, this becomes enormous. A company with two hundred employees can easily lose the equivalent of fifty people's productivity every day to information retrieval.
The strange part is that the knowledge usually already exists. It has been written down. Stored. Archived. Indexed.
But accessing it is far more difficult than it should be.
The Fragmentation of Organisational Knowledge
The core problem is not lack of information.
The problem is fragmentation.
Modern organisations accumulate knowledge across dozens of systems. Documentation platforms contain procedures and internal policies. Messaging tools capture informal decisions and operational updates. Databases hold structured operational data. Code repositories contain technical logic and system behaviour. Emails contain negotiations and historical context.
Each system captures part of the organisation's knowledge. None of them captures the whole picture.
Over time, something subtle happens. Humans become the integration layer between systems. We remember where things live. We translate between formats. We search across multiple tools. We ask colleagues for missing context.
This behaviour works reasonably well in small teams.
But as organisations grow and information multiplies, the cognitive burden becomes overwhelming. The average healthcare organisation now uses more than five platforms to document and share information. Over 30% of workers are not even sure how many knowledge tools their organisation uses.
Organisations are not suffering from lack of knowledge.
They are suffering from the inability to access the knowledge they already possess.
How Computing Has Historically Solved Complexity
To understand why this problem is emerging now, it helps to look briefly at how computing has evolved.
Each era of computing solved a different kind of complexity.
In the early days, software had to interact directly with hardware. Programs had to manage memory, storage, and devices themselves. This quickly became impractical.
Operating systems emerged to manage hardware resources and provide a structured environment where applications could run safely.
Later, as networks expanded, another challenge appeared: how could computers communicate reliably across the world? Protocols such as TCP/IP solved that problem by creating standard ways for machines to exchange data.
As the internet grew, the next challenge was information itself. The web created enormous volumes of data that humans needed to navigate. Search engines and indexing systems became the solution.
Each of these innovations introduced a new layer of abstraction that simplified complexity.
Today we are encountering a new form of complexity.
The complexity of knowledge.
The Arrival of Large Language Models
The emergence of large language models in the early 2020s changed how machines interact with language.
Large Language Models (LLMs) are AI systems trained on vast collections of text. They learn statistical patterns in language that allow them to understand questions, summarise documents, generate explanations, and reason across large bodies of text. Systems such as GPT, Claude, Gemini, Llama, and Qwen belong to this category.
For the first time, machines could interpret human language with surprising fluency.
This immediately suggested a possibility: if machines could read and understand text, perhaps they could help organisations navigate their internal knowledge.
But several obstacles quickly became apparent.
Why Language Models Alone Were Not Enough
The first obstacle was knowledge access. Language models are trained on public data. They understand general topics but know nothing about a specific organisation's internal information. Without access to organisational knowledge, language models can only provide generic answers.
The second obstacle was reliability. Language models generate responses by predicting likely sequences of words. This allows them to produce remarkably fluent text. But it also means they occasionally produce answers that sound convincing but are incorrect — a phenomenon commonly known as hallucination. In casual applications this is tolerable. In professional environments it can be dangerous.
The third obstacle was infrastructure. Many early AI tools operated entirely in cloud environments. For organisations handling sensitive information, this raised immediate concerns. Where does the data go? Who can access it? Does the system comply with industry regulations? 44% of organisations cite data privacy and security as the top barrier to adopting LLMs. In healthcare, this is not a concern. It is a hard stop.
Language models were powerful. But they were one component of a much larger system that still needed to be built.
Five Technologies That Changed Everything
Something remarkable happened between 2024 and 2026. Five technologies matured almost simultaneously, making it possible — for the first time — to build something that was previously the exclusive domain of tech giants with billion-dollar budgets.
1. Retrieval-Augmented Generation (RAG)
RAG is a technique that gives an AI the ability to look things up before answering. Instead of relying solely on what the model memorised during training, a RAG system retrieves relevant documents from your organisation's knowledge base and feeds them to the AI alongside your question. The AI's answer is now grounded in your actual data — and because every answer can cite its sources, you can verify it.
RAG moved from research novelty (Lewis et al., NeurIPS 2020) to production reality. A study in Nature Digital Medicine found that a RAG-powered system achieved 96.4% accuracy in assessing surgical fitness — compared to 86.6% for human responses using the same guidelines. By 2025, 73% of large-organisation AI implementations involved RAG architectures.
2. Semantic Search and Vector Embeddings
Traditional search matches keywords. Semantic search understands meaning. It converts text into mathematical representations called embeddings — numerical fingerprints that capture what a passage is about, not just what words it contains. "Post-op infection protocol" finds documents titled "surgical site infection prevention guidelines" — because the meaning is the same, even though the words are not.
Modern embedding models like BGE-M3 support over 100 languages and handle documents up to 8,192 tokens. With extensions like pgvector, these capabilities run directly inside PostgreSQL — with no additional infrastructure required.
Combining semantic search with traditional keyword search creates hybrid search — the union of both catches what either alone would miss.
3. Open-Weight Models
Until 2023, the most capable language models were locked behind proprietary APIs. You could use GPT-4, but you could not download it, inspect it, or run it on your own hardware. Your data had to travel to someone else's servers.
That changed when Meta released Llama, Alibaba released Qwen, and Mistral released a family of efficient models — all with open weights. For the first time, organisations could download production-quality language models and run them entirely within their own infrastructure.
Open-weight models broke the monopoly. They made self-hosting not just theoretically possible but practically viable. And they shifted the economics of AI from per-token API costs to fixed infrastructure costs — which, for organisations above a certain size, is dramatically cheaper.
4. The Model Context Protocol (MCP)
RAG, embeddings, and open models solve retrieval, understanding, and inference. But there is a fourth problem none of them addresses: how does an AI agent connect to the tools and data sources it needs to act?
Before MCP, every integration was bespoke. Connecting an AI to your documentation platform required one custom connector. Connecting it to your messaging tool required another. Connecting it to your database required a third. If you had ten AI applications and a hundred tools, you potentially needed a thousand different integrations.
The Model Context Protocol, introduced by Anthropic in November 2024 and donated to the Linux Foundation in December 2025, provides a universal, open standard for connecting AI systems to external tools and data sources. Think of it as USB-C for AI applications — a single interface that any model can use to communicate with any tool.
The adoption has been remarkably fast. OpenAI, Google DeepMind, and Microsoft all adopted MCP within months of its release. By late 2025, thousands of MCP servers had been built by the community, SDK downloads were in the tens of millions per month, and the protocol was being used in production at major enterprises. In December 2025, Anthropic donated MCP to the Agentic AI Foundation under the Linux Foundation — co-founded by Anthropic, Block, and OpenAI — ensuring vendor-neutral governance.
MCP is what transforms a knowledge retrieval system into an operating system. Without it, every connector is hand-built plumbing. With it, agents get a standard interface to tools and data — the same way TCP/IP gave computers a standard way to talk to networks.
5. Self-Hosted Inference Engines
The final piece was runtime efficiency. Open models needed an efficient way to run on affordable hardware.
A new generation of inference engines solved this. vLLM — the most widely adopted — achieves 2–4x higher throughput than standard approaches by eliminating GPU memory waste through a technique called PagedAttention, and serves multiple concurrent users with an OpenAI-compatible API. SGLang, developed at UC Berkeley, optimises for structured generation and complex agent workflows. llama.cpp made it possible to run models on CPUs and consumer hardware — the project that democratised "AI on a laptop." TensorRT-LLM from NVIDIA delivers the highest raw performance on NVIDIA GPUs for production deployments. And Ollama wrapped local inference in a developer experience as simple as pulling a Docker image.
Model quantisation — compressing weights from 16-bit to 4-bit precision — shrinks a 14-billion parameter model from roughly 28 GB to 9 GB, making it practical on a single GPU.
A single GPU server can now serve a production-grade AI assistant to an entire organisation. Sub-second response times. Zero data leaving the network. The performance gap between self-hosted and cloud-hosted AI has narrowed dramatically. The privacy and compliance advantages remain absolute.
Together, these five technologies — retrieval, semantic understanding, downloadable models, standardised connectivity, and efficient local inference — form the complete stack. Each one was necessary. None alone was sufficient. Their simultaneous maturation is what made the agentic OS possible.
The Emergence of Agentic Systems
Over the past two years, a new architectural idea has begun to emerge.
Instead of treating AI as a simple chatbot that responds to prompts, engineers have begun designing systems where AI behaves more like an agent interacting with information systems.
An AI agent is capable of performing sequences of actions. It interprets a query, searches for relevant information, evaluates evidence, uses tools, and generates responses based on the results. This is the agent loop: reason, decide, execute, observe, iterate — up to a maximum number of steps.
This approach allows AI systems to interact with real organisational knowledge rather than relying solely on their training data.
Deloitte's 2025 Emerging Technology Trends study found that while 30% of organisations are exploring agentic AI and 38% are piloting solutions, only 14% have solutions ready to deploy and just 11% are actively using them in production. 42% are still developing their strategy, with 35% having no formal strategy at all (Deloitte, 2025).
The concept is real. The implementation gap is enormous.
The Idea of an Agentic Operating System
This gap leads to the concept of an Agentic Operating System.
Traditional operating systems manage computational resources for software applications. They coordinate memory, storage, permissions, and processes.
An agentic operating system performs a similar role for AI agents. It manages how agents access knowledge. It controls which tools they can use. It verifies sources and evidence. It enforces safety and compliance boundaries.
In effect, it becomes a governance layer for machine intelligence.
When a user asks a question, the system does not simply generate text. Instead, the agent performs a sequence of controlled actions: interpret the query, search relevant knowledge sources, evaluate the retrieved information, construct a grounded response, and verify that the response is safe and supported by evidence.
The result is not merely generated text. It is an answer connected to real organisational knowledge, with an audit trail for every step.
Gartner predicts that by 2026, over 40% of enterprise applications will embed role-specific AI agents. The market is converging on a common architecture: multi-agent orchestration, tool sandboxing, audit trails, and governance layers.
But there is a critical gap. Most enterprise agentic platforms assume cloud deployment. They assume your data can travel to a third-party provider. For healthcare, for defence, for legal, for any regulated industry where data sovereignty is non-negotiable, this assumption is a dealbreaker.
Why Healthcare Gets Hit Hardest
The knowledge fragmentation problem described in this article exists everywhere. But nowhere is the combination of fragmentation, regulation, and consequence more acute than in healthcare.
The stakes are real. When a warehouse manager cannot find a delivery manifest, the cost is a delayed shipment. When a surgical team cannot confirm a protocol, the cost is measured differently.
The regulation is unforgiving. Healthcare data is the most regulated data on earth. HIPAA in the United States. GDPR in Europe. 47 US states introduced over 250 AI-related bills in 2024–2025 alone, with 33 signed into law. The cost of getting data handling wrong is not a fine. It is an average $7.42 million per breach — the highest of any industry, for fourteen consecutive years (IBM, 2025).
The tools keep multiplying. Each new platform promises to consolidate information. In practice, each one becomes another silo.
In healthcare, the knowledge problem is not an inconvenience. It is a safety issue. And it is precisely the domain where the gap between cloud-first agentic platforms and the requirements of the real world is widest.
What Building These Systems Actually Teaches You
I know this because I built one.
I am the CTO of a surgical technology company. Our teams work across operating theatres, sterile processing departments, hospital warehouses, and distribution centers. Our knowledge lived in the same fragmented landscape as everyone else.
So I built an agentic OS: a self-hosted AI knowledge platform that connects every internal knowledge source into a single intelligent agent, running entirely within our own infrastructure. Our team uses it every day — to answer operational questions, find engineering decisions buried in old pull requests, pull delivery data across hospital sites, and onboard new hires into institutional knowledge they would otherwise spend weeks discovering.
I did not build this as a demo. I built it because the same principles that define our surgical systems — traceability, accountability, and precision — should define how we work internally.
Once you begin building a real-world agentic system, something becomes immediately clear.
The AI model itself is rarely the hardest part.
The real challenge lies in the surrounding infrastructure. Getting data out of documentation platforms, messaging tools, code repositories, and databases — keeping it fresh, handling rate limits, detecting changes, managing failures gracefully — accounts for the majority of the engineering effort. The AI model is often the easiest component.
Every connector has different rate limits, different authentication schemes, different change detection mechanisms, and different failure modes. You build a connector, it works beautifully for three weeks, and then a platform API update breaks your change detection logic at 2 AM.
In practice, building an agentic knowledge system resembles distributed systems engineering more than traditional machine learning.
Three Principles That Emerged
Building a production agentic OS taught me three principles that I believe apply to anyone building these systems, in any domain.
Trust requires evidence, not fluency
Fluent language alone does not create trust. Evidence does.
Early versions of the system I built would produce fluent, confident, entirely fabricated answers when retrieval returned weak results. The response sounded authoritative. A user unfamiliar with the topic would have no reason to doubt it. In a system used by people making decisions about surgical instruments, patient protocols, or delivery logistics, this is dangerous.
The fix was structural: evidence gating. Below a confidence threshold, the system blocks the answer entirely and returns an explicit "insufficient evidence" response. Not a soft warning. A hard gate. I ended up building a four-step citation recovery pipeline — because the language model alone cannot be trusted to cite accurately. Trust must be verified, not assumed.
Systems that occasionally admit uncertainty often become more trustworthy than those that answer everything.
Language models should speak. Classical systems should decide.
Language models are excellent at interpreting language and generating explanations. But critical decisions should not depend solely on probabilistic systems.
Reliable architectures separate the probabilistic layer — where language models interpret questions and generate candidate responses — from the deterministic layer — where traditional software logic verifies evidence, enforces rules, and ensures compliance.
The language model handles perception: reading documents, understanding queries, generating natural language. But the consequential decisions — whether to answer, what to cite, whether to escalate, whether to block — are made by deterministic logic. Evidence gates, citation enforcers, compliance scanners, tool sandboxes. These are not AI. They are rules, thresholds, and validation layers designed by engineers who understand the consequences of getting it wrong.
This principle maps directly to how we build surgical AI systems: probabilistic perception, deterministic checks, human escalation. The domain changes. The architecture does not.
Security is the architecture, not a layer
In healthcare AI, security cannot be bolted on. It must be the architecture.
Protected health information scanning at a single boundary is not enough. Real systems need scanning at multiple boundaries: when a query enters, when a response leaves, when tools return output, when documents are ingested, and when the system writes to memory. Authentication must be enforced, not optional. Tool execution must be sandboxed with schema validation, scope boundaries, and prompt injection detection.
This is not paranoia. It is the minimum standard for any system that handles healthcare data.
Where the Industry Is Today
Despite the intense attention surrounding AI, agentic systems remain in an early stage.
Many organisations are experimenting. Fewer have deployed successfully at scale. Major technology companies — Microsoft, Salesforce, ServiceNow, Oracle — have all announced agentic AI platforms. The market is real.
RAG itself is evolving from a retrieval pipeline into what some now call a "knowledge runtime" or "context platform" — an orchestration layer that manages retrieval, verification, reasoning, access control, and audit trails as integrated operations. By 2026, 60% of new RAG deployments are expected to include systematic evaluation from day one, up from under 30% in 2025 (NStarX, 2025).
85% of healthcare organisations have already adopted or are exploring generative AI. The question is no longer whether to adopt AI. It is how — and critically, where your data goes when you do.
Here is my thesis for what comes next: by 2028, the agentic OS will be as fundamental to organisational infrastructure as the database is today. Not as a product category — as a layer. Every organisation above a certain size will have a knowledge runtime that continuously indexes internal information, connects agents to tools via standard protocols, enforces compliance at every boundary, and provides an audit trail for every action. The organisations that build this layer now — with governance, citation enforcement, and data sovereignty from the beginning — will have a structural advantage that is nearly impossible to replicate later. The ones that wait will spend years retrofitting trust into systems that were never designed for it.
Key Learnings
1. The hardest problem is not AI — it is plumbing. Data synchronisation, connector reliability, rate limit handling, and change detection account for the majority of the engineering effort. The AI model is often the easiest component.
2. Citations are non-negotiable. An AI that gives confident-sounding answers without evidence is worse than no AI at all. Trust must be verified, not assumed.
3. Evidence gating prevents harm. When retrieval confidence is low, the system must refuse to answer — not guess. A fluent fabrication is more dangerous than silence.
4. Language models should speak. Classical systems should decide. The probabilistic layer handles perception and generation. The deterministic layer handles consequences.
5. Security is the architecture, not a layer. Compliance scanning at multiple boundaries, enforced authentication, tool sandboxing, and injection detection are not features — they are the foundation.
6. Self-hosting is no longer a sacrifice. Modern quantised models on optimised inference engines deliver production-grade quality at fixed cost, with absolute data sovereignty.
7. Cost predictability matters more than cost minimisation. A fixed infrastructure cost that scales to five hundred users without increasing is more valuable than the cheapest option for five.
A Final Reflection
For most of computing history, we focused on storing information.
Databases and document systems captured knowledge. But capturing information is only the first step.
Understanding it is far more valuable.
Agentic systems represent an early step toward a future where organisations can reason over their own knowledge — where the answers that already exist, scattered across dozens of systems, can be found in seconds rather than hours, grounded in evidence rather than memory, and governed with the same rigour we apply to the instruments themselves.
When that future arrives, the operating system of the modern organisation will not simply manage files and processes.
It will manage knowledge itself.
And that may prove to be one of the most important shifts in computing since the rise of the internet.
These systems will be built. The only thing that matters is whether they are built with the traceability, accountability, and precision that the domains they serve demand.
I build AI for surgical safety. The knowledge is already there — written down, stored, scattered across dozens of systems. Whether we build systems worthy of the trust required to find it will define this era. Because in my world, the cost of a wrong answer is not a bug report. It is a patient outcome.
Further Reading and References
Primary Research
- Lewis, P. et al. (2020): "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020
- Gao, Y. et al. (2024): "Retrieval-Augmented Generation for Large Language Models: A Survey"
- Nature Digital Medicine (2025): "RAG for LLMs in Assessing Medical Fitness for Surgery"
- Chen, J. et al. (2024): "M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity"
Industry Analysis
- Deloitte (2025): "Agentic AI Strategy"
- IBM (2025): "Cost of a Data Breach Report"
- McKinsey Global Institute (2012): "The Social Economy"
- IDC: Knowledge workers spend approximately 2.5 hours/day searching for information
- Gartner (2025): 40%+ of enterprise applications will embed role-specific AI agents by 2026
On Self-Hosted AI
- vLLM Project: High-throughput LLM serving engine
- Meta AI (2025): "Self-Hosted Deployments for Regulated Industries"
- pgvector Project: Open-source vector similarity search for PostgreSQL
On RAG Evolution and Agent Connectivity
- NStarX (2025): "The Next Frontier of RAG"
- RAGFlow (2025): "From RAG to Context"
- Anthropic (2024): "Introducing the Model Context Protocol"
- Anthropic (2025): "Donating MCP and Establishing the Agentic AI Foundation"
- The New Stack (2025): "Why the Model Context Protocol Won"
On Healthcare AI Regulation
- Foley & Lardner (2025): "HIPAA Compliance for AI in Digital Health"
- GenHealth.ai (2026): "Navigating the AI Regulatory Landscape in Healthcare"
On Knowledge Fragmentation
- Stange, K.C. (2009): "The Problem of Fragmentation and the Need for Integrative Solutions." Annals of Family Medicine
- Cottrill Research: Survey statistics on time spent searching for information