Why billing teams want AI — and why most prototypes fail
Billing operations are full of expensive, repetitive judgment calls: denial reason codes need interpretation, EOBs need to be read in context, coding mismatches need to be surfaced fast, and appeals need to be drafted with enough specificity to stand up to payer review. The buyer does not want “AI” as a feature. They want fewer misses, faster follow-up, and a system that can explain why it took an action.
The problem is that most LLM demos are built like consumer assistants. They can summarize text, but they cannot reliably preserve billing logic, audit trails, or role-based access. In revenue cycle, that is where the project dies. A real assistant has to work from structured X12 837, X12 835, remittance data, claim status resources, fee schedules, and policy rules — then use the model only where language generation adds value.
AST’s view: start with control points, not prompts
We have seen this play out in revenue cycle and adjacent clinical workflows: the teams that win do not start by asking what the model can do. They start by defining where the assistant is allowed to read, where it can recommend, where it can act, and where a human must approve. That distinction matters because a billing assistant touches protected health information, financial data, and payer-specific logic at the same time.
When our team works on RCM automation, we design the workflow around three layers: structured ingestion, policy enforcement, and language generation. That keeps the model from inventing codes, guessing at coverage, or bypassing human review on edge cases.
How AST thinks about the core architecture
The architecture should look like this: ingest 835 remittance, 837 claim, eligibility, payer rules, and internal policy data into a normalized billing store; run deterministic checks for missing modifiers, incompatible codes, coverage mismatches, and denial patterns; then pass a condensed context packet to the LLM for summarization, explanation, or appeal drafting. The LLM should never be the first system to interpret raw claims data.
That keeps the prompt small, limits token waste, and reduces the chance of hallucinated billing logic. It also makes the product easier to defend to compliance, operations, and payer-contract teams.
| Approach | Best For | Tradeoff |
|---|---|---|
| LLM only over claim text | Fast prototypes, low-risk summaries | ✗ Weak auditability and higher hallucination risk |
| Rules engine + LLM for exceptions | Denial triage, coding review, appeal drafting | ✓ Strong control, needs structured data model |
| Human-in-the-loop copilot | High-stakes appeals and coding QA | ✓ Best for accuracy, slower throughput |
| Autonomous workflow agent | Low-risk work queues with strict guardrails | ✗ Harder to certify and govern |
4 technical approaches for a HIPAA-compliant billing assistant
1) Deterministic rules first, LLM second. Use policy engines and claim validators to catch obvious issues before the model sees anything. Examples include modifier mismatches, diagnosis-to-procedure conflicts, invalid place-of-service combinations, and payer-specific edits. The LLM then explains the issue in plain English or drafts the next step for the work queue.
2) Retrieval over structured claim context. Pull the relevant claim lines, remittance events, denial codes, and contract language into a bounded context window. Do not feed the model the whole patient record. Build retrieval around claim IDs, encounter IDs, and denial categories so the assistant can answer questions like “Why was this denied?” without wandering into unrelated PHI.
3) Routed workflows for appeals and escalations. Use the model to classify the denial, estimate confidence, and draft the appeal packet. But route low-confidence cases to human review and attach the evidence bundle automatically: claim history, EOB excerpt, payer policy references, and prior correspondence. This is where the assistant saves time without pretending to replace billing expertise.
4) Domain-tuned language generation with hard constraints. You do not need to fine-tune a giant model on every billing document. In many cases, prompt engineering plus template-constrained generation is enough. If you do fine-tune, focus on denial taxonomy classification, coding anomaly detection, and concise appeal language — not free-form medical reasoning.
Decision framework: what to build first
- Map the highest-cost denial paths. Start with the 5-10 denial categories that create the most rework or write-off exposure. That is where automation pays back fastest.
- Normalize your claim and remittance data. Build a canonical billing layer that ties together 837s, 835s, notes, policy references, and work queue events.
- Define allowed model actions. Separate summarize, classify, recommend, draft, and submit into different permission levels.
- Instrument every output. Store prompt version, model version, input references, confidence score, and human override status.
- Design the fallback route. Every AI decision needs a path to a human queue when the model confidence is low or the rule engine flags inconsistency.
That framework keeps scope under control. It also gives product teams a roadmap they can defend internally when operations asks for speed and compliance asks for proof.
AST’s engineering model for revenue cycle AI
AST builds these systems with integrated pods, not detached contractors. That matters because billing assistants require product thinking, backend rigor, QA discipline, and HIPAA-aware deployment in the same sprint. We have seen too many teams bolt on an LLM after the data model is already frozen. That usually leads to weak workflows, unclear accountability, and an inbox full of “AI said so” exceptions.
Our team has spent years building healthcare software where the workflow matters as much as the model. In one environment supporting 160+ respiratory care facilities, the lesson was simple: if the input data is inconsistent, no amount of model sophistication will save the workflow. Normalize the data, control the actions, and make every output reviewable.
For a HIPAA-compliant billing assistant, that means we typically pair a secure cloud environment with encrypted storage, least-privilege access, immutable logs, and environment separation for development, testing, and production. We also put QA around denial classification and appeal generation because billing bugs are not cosmetic bugs — they become cash flow problems.
Build the assistant around trust, not novelty
If your team is serious about a billing assistant that can read EOBs, flag coding errors, and route appeals, the architecture has to reflect the realities of healthcare operations. That means structured claim data first, LLMs second, and auditability everywhere. It also means designing the workflow so finance, compliance, and operations can all live with the output.
We build to that standard because healthcare products do not get judged on demo day. They get judged when a denial is disputed, an appeal is filed, and someone asks who changed what, when, and why.
Need a billing assistant that can explain every decision?
We have built healthcare software where audit trails, denial workflows, and secure cloud controls are part of the product architecture from the start. If you are trying to ship an AI billing assistant without creating compliance debt, book a free 15-minute discovery call — no pitch, just straight answers from engineers who have done this.


