Shah Vision — Engineering Reliable Intelligence

The product is the entire system that makes inference possible—weights, configurations, thresholds, preprocessing parameters, version metadata, and the infrastructure that ties them together. Lose any piece, and you've lost the product.

The Model That Disappeared

We had a model in production that worked. Then one day it didn't. Not because the model failed—because we couldn't find it.

Someone had retrained with new data and deployed to production. The weights file was overwritten. The old configuration was gone. When validation failed on a customer site, we had no way to reproduce the previous state. No way to compare what changed. No way to roll back.

The investigation took three days. The fix took two weeks. The loss of customer trust was immeasurable.

That incident taught me something fundamental: the model is not the product. The product is the entire system that makes inference possible—weights, configurations, thresholds, preprocessing parameters, version metadata, and the infrastructure that ties them together. Lose any piece, and you've lost the product.

The Artifact Management Problem

Machine learning systems are fundamentally different from traditional software. In conventional applications, the source code is the product—version it, and you can reproduce any build. In ML systems, the 'product' is a complex constellation of artifacts: training data, preprocessing logic, model architecture, learned weights, inference parameters, threshold configurations, and runtime dependencies.

The 2024-2025 MLOps literature confirms this systematically. Research shows that 85% of ML models never make it to production—not because the algorithms fail, but because the engineering infrastructure around them fails. The global MLOps market reached $1.58 billion in 2024 and is projected to grow at 35% annually, driven almost entirely by organizations realizing that model management is harder than model development.

The core challenges are interconnected:

Reproducibility: Can you recreate the exact inference behavior from six months ago? Without explicit artifact management, the answer is almost always no.

Traceability: When something goes wrong, can you trace from a specific prediction back to the model version, training data, and configuration that produced it?

Compatibility: Does model version X work with inference engine version Y? Without explicit constraints, the answer is 'maybe, until it doesn't.'

Auditability: For regulated industries—medical devices, financial services, autonomous systems—can you prove to an auditor exactly what was running when?

Design Principles for Production Artifact Management

After that incident, I rebuilt our artifact management system from first principles. These are the design decisions that matter:

1. Immutability

Once an artifact is created, it is never modified. New versions create new artifacts; old versions remain forever accessible. This seems wasteful until you need to reproduce a failure from three months ago.

2. Explicit References

No implicit dependencies. Every artifact required for inference must be explicitly listed with a resolvable URI. If a configuration file references 'model.onnx', that path must be a full URI that resolves to exactly one artifact.

3. Integrity Verification

Every artifact reference includes a cryptographic hash (SHA-256). At load time, the hash is verified. If verification fails, inference fails. No silent corruption, no 'it worked on my machine.'

4. Modular Separation

Different components (classification, registration, preprocessing) are independent modules with independent versioning. Upgrading the classification model shouldn't require touching registration artifacts.

5. Standard Formats

ONNX for neural networks (framework-agnostic inference). JSON for configuration (human-readable, diff-able). Pickle only for learned non-NN artifacts where no standard exists. No proprietary formats that create vendor lock-in.

6. Version Compatibility as Metadata

Explicit min/max version constraints for the inference engine. A model trained for engine v2.3 may not work on v3.0. This must be checked before deployment, not discovered in production.

The Manifest Pattern

The central concept is the model manifest—a single JSON file that serves as the authoritative source of truth for a validated model configuration. Everything required to run inference is referenced from this file. Nothing is implied.

The manifest contains: unique identifiers for the deployment context (client, product, configuration level); URIs and SHA hashes for all model weights; URIs and SHA hashes for all parameter files; inference engine version constraints; and a timestamp that makes every configuration uniquely identifiable.

A model is considered valid only if: all referenced URIs are resolvable, all SHA hashes match, version constraints are satisfied, and all required parameters are present. Failure of any condition invalidates the entire configuration. There is no 'mostly works.'

This pattern enables several critical capabilities:

Atomic deployment: Either the complete configuration deploys successfully, or nothing changes. No partial states.

Instant rollback: Rolling back means pointing to a previous manifest. All artifacts still exist because they're immutable.

Reproducible debugging: Given a manifest and an input, you can reproduce exactly what happened. No 'it works differently now.'

Audit compliance: Every prediction can be traced to a manifest, and every manifest defines exactly what ran.

Hierarchical Storage Structure

For multi-client, multi-product deployments, artifacts are organized hierarchically: client → product → configuration level → timestamp. Each timestamp directory contains a complete, self-contained configuration—all modules, all parameters, the manifest.

The timestamp is monotonically increasing and never reused. This creates a natural audit trail: you can list all configurations for a given product in chronological order, see exactly when each was created, and understand the evolution over time.

Within each configuration, modules are separated: classification artifacts in one directory, registration artifacts in another, preprocessing in a third. Each module has its own parameter file that fully specifies its execution—no cross-module dependencies, no implicit assumptions.

This structure scales. Adding a new client means creating a new top-level directory. Adding a new product configuration means adding under the client. Updating a model means creating a new timestamp. The old configuration remains accessible forever.

Model Is Not the Product