How We Embedded GenAI into a Legacy Claims System in 4 Weeks.
We were working with a mid-sized healthcare services company that processes thousands of insurance claims weekly through a legacy platform built over a decade ago. The system handled a critical volume of work, but the UX was brittle, workflows were manual, and it lacked modern tools like contextual search, summarization, or intelligent routing.
The client didn’t want a rewrite. Their mandate was clear:
“Help us experiment with GenAI, but don’t break the system that pays the bills.”
This was a practical engagement, not an innovation showcase. The team wanted to know if large language models (LLMs) could add value to their operations without disrupting stability.
The Problem
Claims processors were overwhelmed with context-switching and repetition.
They had to toggle between multiple database views, case notes, PDFs, and regulatory content to make a decision. Most of their questions fell into a few categories:
“Has this provider submitted this before?”
“What’s the patient’s history on this diagnosis?”
“What’s the appropriate policy clause that applies here?”
Each answer required scanning dense text, searching internal systems, and relying on tribal knowledge. No one trusted automation to handle final decisions, but there was an appetite for assistance.
The big constraint? The legacy system was closed, with no API access and no frontend rewrite possible. Anything we built had to sit on top of what existed.
The Task
Embed a GenAI-powered layer that could summarize claim history and suggest next steps for reviewers, without replacing or rewriting the core system.
This meant building a lightweight, secure, non-invasive assistant that could:
Retrieve relevant context from internal data sources
Generate summaries or suggestions using a language model
Present output inside the reviewer’s existing workflow
Respect PHI and enterprise security requirements
The goal wasn’t to replace humans. It was to reduce decision fatigue.
The Actions We Took
Week 1: Define use cases and constraints
We started by shadowing real claims reviewers for 2 days. That gave us three high-leverage use cases:
Claim history summarization
Policy matching and clause extraction
Suggested next step routing (approve, escalate, hold)
We scoped access boundaries with IT and Legal. No PHI would leave their cloud environment, and no write access to the legacy system was allowed.
Week 2: Build a sidecar API and data layer
We stood up a secure sidecar API in Azure that could:
Query structured claim data from their internal DB
Ingest and tokenize relevant text from past case notes and PDFs
Package context into prompts using a templated approach
We chose Azure OpenAI for deployment to remain within the client’s cloud and security policies.
Week 3: Frontend overlay + feedback loop
We built a browser-based overlay (like a sidebar extension) that reviewers could open next to the legacy app. It showed:
A claim summary (generated by the model)
Matching policy clauses (with source links)
Suggested disposition (with low/moderate/high confidence indicators)
Each suggestion included quick feedback buttons:
👍 "Helpful", 👎 "Wrong", or 📝 "Comment"
Reviewers used these not just to rate, but to provide inline corrections. That helped us tune prompts and filtering over time.
Week 4: Pilot with one team
We rolled out to 8 reviewers in one region. IT monitored network access, Legal reviewed logs, and we had a Slack channel for real-time feedback.
No changes were made to their core system. Everything ran in parallel, voluntarily.
The Results
Claim review time dropped 15–20% in common scenarios
Model suggestions were accepted 60–70% of the time during the pilot
User trust increased when we added transparency (“why this clause was suggested”)
No production incidents or security violations occurred
The team now wants to explore expansion to prior authorization workflows
More importantly, adoption came from the bottom up. Reviewers requested it. Not because it replaced them, but because it helped them.
Takeaway
You don’t need to rebuild legacy systems to embed AI; you need to respect their boundaries.
A well-scoped GenAI pilot doesn’t have to be flashy. In our case, starting small, on the side, not inside, created space for experimentation without risk.
If you’re working with older infrastructure and want to explore AI:
Don’t focus on models first; focus on use cases.
Build a wrapper, not a replacement.
Involve the people using it early and often.
It might not be transformative. But it can be useful. And that’s a much better place to start.