Open Source AI Stack: LiteLLM, OpenWebUI & sovereign model usage
How we built a flexible open-source stack for language models with LiteLLM, OpenWebUI, local models, and automated data anonymization — without vendor lock-in and with full control over our data.

Table of Contents
Share on Social Media
The world of language models is moving fast — new models, new prices, new providers. If you commit early to a single vendor, you quickly risk vendor lock-in and a lack of transparency around costs and data.
In this article we show how we built a flexible, open-source AI stack with LiteLLM, OpenWebUI, local models, and automated data anonymization that makes LLMs usable for our team — without giving up control over infrastructure and data.
Abstract
The LLM landscape is changing at a pace that can make even experienced tech teams sweat. New models show up weekly, pricing shifts regularly, providers disappear or pivot. What was "state of the art" yesterday can be a compromise today.
If you align your entire setup around a single model or a single provider in such an environment, you're building structural risk — technically, economically, and organizationally.
That's why we deliberately chose a different path: instead of one tool or one vendor, we rely on a modular, open-source AI stack. At the center is LiteLLM as a unified API proxy, combined with OpenWebUI as the team interface, a local GPU workstation for sensitive workloads, and our own tool for structured data anonymization.
This article describes how this stack came to be, why we made certain decisions — and what we still haven't solved.
The problem: why build your own stack at all?
When teams start integrating LLMs seriously into day-to-day work, the same problems show up surprisingly quickly.
At first it looks harmless: one developer tries a few prompts with the OpenAI API, someone else tests Claude, a third person uses an Azure deployment. A few weeks later reality looks more like this:
- Everyone has their own API keys
- Nobody really knows what costs are being generated
- It's unclear which data ends up in which prompts
- Switching models requires changes across multiple apps
- Vendor lock-in happens almost by accident
In short: it can get out of hand. Of course there's an obvious solution: buy a SaaS product that abstracts exactly these problems. That's a perfectly valid decision — many teams do well with it.
We still chose not to — not for ideological reasons, but after a pragmatic trade-off. For us, flexibility, transparency, and data control mattered more than minimizing operational effort. That means more responsibility. But also a lot more freedom.
Our stack — four building blocks
At its core our setup consists of four components that work together, each with a clearly defined role.
1. LiteLLM — one entry point for every model
The heart of our system is LiteLLM. The idea is simple, but extremely effective: LiteLLM acts as a universal proxy for LLMs. Whether it's OpenAI, Anthropic, Azure, or a locally running model — to clients it always looks like the same standardized API. For applications this means they only ever talk to a single endpoint; which model answers in the background is configured centrally.
This brings several practical benefits:
- You can switch providers without changing code
- Experimenting with new models becomes trivial
- Apps stay stable even when infrastructure changes
One especially helpful feature is built-in token tracking. Each internal API key can be associated with teams or projects. Usage and costs become transparent, and budgets can even be enforced. That sounds trivial, but in practice it's hugely helpful.
Of course LiteLLM isn't the only tool in this space. Many solutions are emerging for the same problem (e.g. OpenRouter, Together AI, …). We chose LiteLLM because it combines three properties that were important to us:
- fully open source
- easy to self-host
- an active community and fast development pace
What LiteLLM does not solve is model behavior. A prompt optimized for Claude won't automatically work equally well with GPT‑4 or DeepSeek. LiteLLM abstracts infrastructure — not model "personality". And like any self-hosted software, LiteLLM needs a minimum amount of upkeep. The effort is manageable. But it exists.
2. OpenWebUI — LLMs for the entire team
APIs are great — for developers. Most of the rest of an organization simply wants a good chat interface. That's where OpenWebUI comes in.
OpenWebUI is a self-hosted chat UI that connects directly to our LiteLLM proxy. For the team, the day-to-day experience feels very similar to ChatGPT. The key difference: everything runs on our own infrastructure — our models (local or cloud), our rules, our data control.
- Our models (local or cloud)
- Our rules
- Our data control
Rolling it out in the team was surprisingly straightforward. The UI is familiar and the barrier to entry is low. Instead of technical model names, we defined several "human" aliases — for example for different quality or cost levels.
The OpenWebUI team also ships new features regularly, so you're "almost" always up to date. We also equipped marketing and sales with their own CustomGPT-like assistants. For broader functionality and more autonomy we integrate small agent systems — n8n is a proven tool for implementation and integration into OpenWebUI.
3. Guardrails
LiteLLM allows integrating content filters and third-party solutions directly at the proxy level — centrally enforced, without every application having to implement them independently.
We use doccape for data anonymization. It replaces sensitive entities — names, companies, project references — with neutral placeholders before anything reaches a model. The text structure stays intact; the real entities don't leave the building.
This shifts privacy from a regime of prohibitions to a technical default. Colleagues can work with real documents without asking each time whether it's allowed.
4. Local vs. cloud
Not every task belongs in the cloud. For data-sensitive or experimental workloads we also run a local GPU workstation from AIME. We run various open-source models on it, including:
- DeepSeek
- Qwen
- Mistral
- Tulu
- GLM
- …
These models are surprisingly capable by now — especially for many internal tasks or specialized agents (e.g. with n8n). We decide local vs. cloud based on three factors:
- data sensitivity
- cost
- required model quality
Thanks to LiteLLM, switching is completely transparent to applications. You reconfigure a model alias — and the app automatically uses a different backend. No code changes, no new integration.
What we still haven't solved
As stable as the setup is by now, there are still several open topics.
Model evaluation
Which model is best for which use case? Right now this evaluation still happens rather informally — via team exchange and small experiments. That works in a small setup, but long-term it will need a more structured evaluation process.
Governance
Another topic is model governance. Who decides which models are allowed for which tasks? For now, this is still fairly loosely organized. For larger organizations that would likely be too informal.
Local scaling
A single GPU workstation works great for a small team. But what happens if ten teams want to use local models at the same time? We haven't fully answered that question yet.
Conclusion
Our AI stack gives us three things that matter to us:
- control over costs
- control over models
- control over data
At the same time, we stay flexible in an industry that's currently changing faster than almost any other infrastructure technology in recent years.
The downside is clear: open source means ownership. You operate infrastructure. You make decisions yourself. And sometimes you have to solve problems that don't have an off-the-shelf solution yet.
For us, the benefits outweigh the costs. But we're curious: how did you solve this?
Outlook: language models as a local automation layer
A next step that we've already implemented internally goes even further. We connect local systems and internal data sources directly to OpenWebUI and LiteLLM — meaning LLMs can not only chat, but also actively work with our infrastructure:
- create Jira tickets
- summarize appointments from internal systems
- search internal data sources
- trigger workflows
All local. All under our control. Without data leaving the company. That would probably be worth a dedicated article.
TL;DR
- LiteLLM unifies providers behind one API — with centralized token tracking
- OpenWebUI brings LLMs to the whole team via a chat interface
- doccape structurally anonymizes data before API calls
- Local models on AIME workstations complement cloud models
- Open topics: model evaluation, governance, and scaling
Are you evaluating how to deploy language models in your organization?
Whether cloud, hybrid or on-prem: we help you identify the right use cases, assess privacy and architecture, and turn it into a realistic implementation path.
