Transparency is core to our mission. Here's exactly how we collect, process, and analyze Medicare data.
All data comes from the CMS Medicare Provider Utilization and Payment Data, published by the Centers for Medicare & Medicaid Services. This is the same data the federal government uses to track Medicare spending.
Each record includes: provider name and NPI, specialty, location, HCPCS procedure code, number of services, number of unique beneficiaries, submitted charges, Medicare allowed amount, and Medicare payment amount. We aggregate across all years to build provider profiles.
Our risk score (0–100) identifies statistical outliers who may warrant further investigation. It is not a fraud determination. The score combines several signals:
How far is a provider from the median in their own specialty? A cardiologist billing 50x the median cardiologist stands out more than one billing 2x. We compare total payments, services, and beneficiary counts against specialty peers.
The ratio between what a provider submits (charges) vs. what Medicare actually pays. While some markup is normal, extreme ratios (10x, 25x, or higher) can indicate billing anomalies. We compare each provider's markup to their specialty average.
Unusually high service volumes or beneficiary counts relative to peers. A single provider seeing more patients than seems physically possible is a meaningful signal.
Specific billing patterns associated with known fraud schemes: excessive COVID testing concentration, wound care billing anomalies, high-cost drug administration patterns, and others.
Individual signals are weighted and combined using logarithmic scaling to produce a final score from 0–100. Logarithmic scaling ensures that extreme outliers are clearly distinguished while avoiding false alarms from moderate variations.
⚠️ A high risk score is not an accusation of fraud.
It identifies statistical outliers — providers whose billing patterns differ significantly from their peers. There may be legitimate reasons: a specialist handling unusually complex cases, a provider in an underserved area seeing more patients, or data reporting differences.
Only proper investigation by qualified authorities (CMS, OIG, law enforcement) can determine whether actual fraud has occurred.
In addition to our statistical approach, we developed a supervised machine learning model trained on confirmed fraudsters — providers who have been indicted by the DOJ, excluded by the HHS OIG (LEIE database), or who settled False Claims Act cases.
The model learns what billing patterns caught fraudsters share — volume anomalies, markup ratios, specialty concentration, geographic signals, beneficiary patterns — then scores every active provider on how closely they resemble confirmed criminals. The result is a "fraud match probability" — not proof of fraud, but a statistical measure of resemblance to convicted providers.
This approach was validated when our earlier statistical analysis flagged providers before the DOJ charged them. The supervised model builds on this by learning directly from the ground truth of confirmed fraud.
Results are available on our "Still Out There" page.
This data is valuable for:
We do not label any provider as "fraudulent." We identify statistical outliers and present the data transparently so that qualified parties can draw informed conclusions. Provider names are included because this is public data published by the federal government, but we encourage users to consider context before drawing conclusions.
Note: All data is from publicly available Medicare records. OpenMedicare is an independent journalism project not affiliated with CMS.