The Infrastructure Gap Nobody's Talking About
Walk into any enterprise AI initiative and you'll find a familiar scene: brilliant data scientists building impressive models that never make it to production. The culprit isn't the models—it's the plumbing.
The gap between a working prototype and a production system is filled with decidedly unsexy problems: How do you version model artifacts alongside the data that trained them? How do you handle graceful degradation when your inference endpoint times out? What happens when your model needs data from three different systems that have never talked to each other?
These aren't machine learning problems. They're the same distributed systems challenges I spent years solving in adtech—where milliseconds matter and "it works on my laptop" is a career-limiting phrase.
What AdTech Taught Me About AI Systems
Video advertising infrastructure operates under constraints that would make most AI systems buckle: sub-100ms decision windows, zero tolerance for downtime during premium inventory, and data pipelines that must reconcile information from dozens of sources in real-time.
Building these systems taught me that reliability isn't a feature—it's a design philosophy. Every component assumes its dependencies will fail. Every pipeline has a fallback. Every deployment can be rolled back in seconds, not hours.
When I started building AI infrastructure, I brought these instincts with me. The result? Systems that actually survive contact with production traffic.
Consider the Model Context Protocol (MCP)—an emerging standard for connecting AI models to external tools and data sources. The concept is elegant: give language models a standardized way to access databases, APIs, and file systems. The implementation challenge is pure systems engineering: connection pooling, authentication flows, error handling, rate limiting, and graceful degradation when tools become unavailable.
I've built production MCP servers that handle real estate appraisal data, user profile databases, and filesystem operations. The AI component—the language model—is almost an afterthought. The hard work is making sure the model can reliably access what it needs, when it needs it, without bringing down everything else when something goes wrong.
Local LLMs and the Return of Edge Computing
There's a quiet revolution happening in AI deployment: the move toward local and edge inference. Organizations are discovering that not every query needs to hit a cloud API, and not every use case can tolerate the latency, cost, or data governance implications of centralized inference.
This is edge computing redux—a domain where systems engineers have been operating for decades. The questions are familiar: How do you distribute workloads efficiently? How do you keep edge nodes updated without downtime? How do you monitor systems you can't directly access?
I've been running local LLMs on custom hardware—coordinating inference across machines on my local network, building tools that let models on one machine access GPUs on another. It's not fundamentally different from the content delivery architectures I've worked with before. The payload is different; the engineering principles are identical.
The Skills That Transfer (and the Ones That Don't)
For technical leaders considering how to staff AI infrastructure teams, here's what I've learned about skill transferability:
What translates directly:
- Distributed systems design and failure mode analysis
- API design and versioning strategies
- Observability and debugging production systems
- Performance optimization under constraints
- Security thinking and threat modeling
What requires genuine new learning: - Model behavior characteristics (latency profiles, failure modes, resource requirements) - Prompt engineering as a systems interface - Evaluation methodologies for non-deterministic outputs - The rapidly evolving tooling landscape
The ratio matters: roughly 70% of building production AI systems is traditional engineering, 30% is AI-specific knowledge. Yet most hiring focuses on inverting that ratio.
Building for What AI Actually Needs
The next wave of AI adoption won't be constrained by model capability—we're already past that threshold for most enterprise use cases. The constraint is infrastructure: the ability to connect models to proprietary data, integrate AI into existing workflows, and operate these systems reliably at scale.
This is why I'm focused on building AI infrastructure tooling—specifically, standardized ways to connect language models to real data sources and enterprise systems. The models are commoditizing; the integration layer is where lasting value gets created.
Organizations that recognize this will build AI teams differently. They'll pair ML specialists with systems engineers. They'll invest in infrastructure before they invest in model fine-tuning. They'll measure success by production reliability, not benchmark scores.
A Practical Path Forward
For systems engineers curious about this space: you're more prepared than you think. Start by deploying a local LLM—tools like Ollama make this trivial. Build something that connects that model to data you actually care about. You'll quickly discover that the hard problems feel familiar.
For technical leaders building AI capabilities: look at your existing engineering talent differently. The senior engineer who's been keeping your payment systems running at four-nines availability understands more about production AI than most candidates with "AI" in their title.
The best AI infrastructure I've seen isn't built by people who studied AI—it's built by people who've been burned by production failures and never want to experience that again.
Conclusion
None