Flagship Investigation

The Algorithm Knows

How AI Trained on 8,300 Convicted Fraudsters Found 500 Doctors Who Look Just Like Them

February 21, 2026

15 min read

By OpenMedicare Investigative Team

⚠️ Important Disclaimer: The providers identified in this analysis are flagged based on statistical patterns, not evidence of wrongdoing. A high fraud probability score means a provider's billing patterns are mathematically similar to those of convicted fraudsters. There may be entirely legitimate explanations — large group practices, complex patient populations, specialty billing patterns, or data aggregation artifacts. No provider named here has been accused or charged with any crime unless otherwise noted. We present this analysis because taxpayers deserve to know how public money is being spent.

We fed 10 years of Medicare billing data — every claim, every dollar, every provider — into a machine learning model trained on 8,300+ confirmed fraud convictions. Then we asked it a simple question: Who else bills like a criminal?

The algorithm returned 500 names.

Five hundred active Medicare providers whose billing patterns are statistically indistinguishable from doctors, nurse practitioners, and clinics that were later convicted of healthcare fraud, excluded from federal programs, or sentenced to prison. Collectively, these 500 providers have billed Medicare over $400.0M in taxpayer money — an average of $800.0K each.

This doesn't mean they're guilty. It means the math says they look exactly like people who were.

How We Built It

Our model is a Random Forest classifier — an ensemble of hundreds of decision trees, each one learning slightly different patterns in the data. Think of it as 500 fraud investigators, each with a slightly different perspective, voting on whether a provider's billing looks suspicious. When the majority vote yes, the probability goes up.

🔬 Training Data

Confirmed Fraudsters (Labels)

• 8,301 NPIs from the HHS OIG LEIE exclusion list
• DOJ healthcare fraud prosecution records
• Cross-referenced with CMS billing data
• 2,198 had sufficient billing history for training

Billing Features (Inputs)

• Total billing volume & payment amounts
• Markup ratios (charged vs. allowed)
• Services per beneficiary
• Specialty-specific patterns
• Geographic concentration
• Years of active billing

Data source: CMS Medicare Provider Utilization and Payment Data (2014–2023), 1.7M+ providers scored. HHS OIG LEIE database. DOJ press releases and case records.

The key insight: we don't try to detect fraud directly. We trained the model on the billing patterns of people who were convicted of fraud, then asked it to find active providers who match those patterns. It's the difference between trying to define fraud in the abstract vs. learning what fraud actually looks like from 2,198 real cases.

What the Algorithm Learned

When you train a model on thousands of convicted fraudsters, certain patterns emerge. Here are the features the model found most predictive, ranked by importance:

16.3%Years Active

Fraudsters tend to have shorter careers — they get caught, lose their license, or move on to a new scheme.

11.9%Services per Beneficiary

How many services a provider bills per patient. Padding patient visits with unnecessary tests and procedures is the oldest fraud playbook.

8%Markup Ratio

The ratio of what a provider charges vs. what Medicare allows. Legitimate providers typically charge 1.5–2x. Fraudsters routinely charge 3–5x, knowing Medicare will pay the allowed amount regardless.

7.8%Total Services

Raw billing volume. More services = more revenue = more opportunity for abuse.

7.2%Total Beneficiaries

Patient count. Extremely high patient counts can indicate "patient mills" — clinics designed to churn through Medicare beneficiaries.

6.5%Payment per Service

Higher average payments suggest upcoding — billing for more expensive procedures than were actually performed.

5.1%Geographic Concentration

Fraud clusters geographically. South Florida, Houston, Los Angeles, and Detroit are historical hotspots.

The #1 predictor surprised us: years active. Fraudsters don't tend to build long careers. They enter the system, bill aggressively for a few years, and either get caught, move to a different NPI, or flee. A provider who's been billing Medicare for 15+ years is statistically less likely to match fraud patterns — not because fraud doesn't exist among veterans, but because the convicted ones rarely lasted that long.

The #2 predictor — services per beneficiary — is the classic fraud signal. Legitimate internists might bill 2–4 services per patient visit. Fraud mills bill 8, 10, 15 services per encounter — stacking labs, tests, and procedures onto every patient who walks through the door.

The Top 10 Flagged Providers

These are the 10 active Medicare providers whose billing patterns most closely match those of convicted fraudsters. Each link goes to their full provider profile with detailed billing data.

Rank	Provider	State	Specialty	Fraud Prob.	Medicare Payments
1	Ramesh Thimmiah Billing pattern nearly identical to convicted fraudsters in same specialty	WV	Internal Medicine	95.9%	$788.7K
2	Willie Lucas 7-figure billing with high services per patient	MS	Internal Medicine	95.5%	$1.0M
3	Michael Cozzi Highest markup ratio in top 10 — nearly 3x submitted vs. allowed	IN	Anesthesiology	94.3%	$1.7M
4	Frank Leung Rare specialty flag — endocrinologists are uncommon on fraud lists	IL	Endocrinology	95.7%	$601.8K
5	John Daconti New Jersey — one of the top 5 states for healthcare fraud prosecutions	NJ	Internal Medicine	94.9%	$547.0K
6	Tuan Duong California — tied #1 for most flagged providers by state	CA	Internal Medicine	95.6%	$516.5K
7	Lilia Gorovits Consistent volume-driven billing pattern across all years	PA	Internal Medicine	94.6%	$716.7K
8	Sudhirkumar Shah High services per beneficiary in a lower-volume state	MO	Internal Medicine	95.3%	$724.7K
9	Michael Hernandez Florida — tied #1 for most flagged providers, $1.2M in payments	FL	Internal Medicine	95.1%	$1.2M
10	Edd Jones Only Family Practice provider in top 10	GA	Family Practice	94.7%	$795.8K

Click any provider name to see their full billing history, year-by-year breakdown, and top procedures.

The Specialty Pattern: Why Internal Medicine Dominates

Of the 500 flagged providers, the specialty breakdown is striking:

Internal Med

263

52.6%

Family Practice

135

27.0%

Other

102

20.4%

Internal Medicine + Family Practice = 79.6% of all flagged providers. Why? These are the highest-volume primary care specialties. They see the most patients, order the most tests, and bill the most line items. That volume creates opportunity — more claims mean more chances to pad, upcode, or bill for services never rendered. It's the same reason bank robbers target banks with the most cash.

This doesn't mean internists are inherently more fraudulent. It means the model learned that fraud patterns concentrate where billing volume is highest — which is exactly what DOJ prosecution records show.

The Geography of Suspicion

Top 5 states by number of flagged providers (220 of 500, or 44%)

California and Florida are tied at 56 flagged providers each. Then New York (39), Texas (36), and New Jersey (33). If that list looks familiar, it should — these are the exact same states that dominate DOJ healthcare fraud takedowns year after year.

South Florida alone has been the target of more Medicare fraud strike forces than any other region in the country. Our algorithm didn't know that. It simply learned from the data that providers in these states are more likely to match convicted fraud patterns. The geography of fraud is not random.

The Money

$400.0M

Total Medicare payments to flagged providers

$800.0K

Average per provider

500

Providers flagged

Four hundred million dollars. That's the total Medicare payments flowing to these 500 providers over the past decade. To put that in perspective: the entire HHS OIG budget for investigating Medicare fraud is roughly $400M per year. We're talking about the same amount of money going to just 500 providers whose billing patterns match convicted criminals.

Not all of that money is fraudulent — probably not even most of it. But if even 10% represents waste, fraud, or abuse, that's $40 million in taxpayer money. Enough to fund 800 nurses for a year.

Does It Actually Work?

Yes. Our model achieves an AUC of 0.83 — meaning it correctly distinguishes between fraudsters and legitimate providers 83% of the time. That's not perfect, but it's significantly better than random chance (0.50) and comparable to models used in financial fraud detection.

More importantly: we've already seen it work in the real world. Several providers flagged by earlier versions of our model were subsequently charged by the DOJ — months or years after our analysis identified them. We documented these cases in “Our Data Predicted It”.

The model doesn't know why a provider bills the way they do. It only knows that the pattern matches. But when you train on 2,198 confirmed cases and the math keeps being right, the patterns deserve scrutiny.

⚠️ What This Is — And What It Isn't

This is a statistical analysis, not an investigation by law enforcement. A high fraud probability means a provider's billing patterns are mathematically similar to convicted fraudsters. It does not mean they have committed fraud.

There are many legitimate reasons a provider might trigger our model: they may treat unusually complex patients, operate a high-volume clinic, practice in an underserved area, or bill under a group NPI that aggregates multiple providers' claims.

No provider named in this article has been accused or charged with any crime based on this analysis. If you believe you have information about Medicare fraud, contact the HHS OIG Hotline.

Explore the Data Yourself

We believe in transparency. Every number in this article comes from publicly available CMS data. You can verify it, challenge it, or build on it.

🔍 Full Searchable List

All 500 flagged providers — searchable, sortable, with fraud probability scores and billing details.

📞 Report Fraud

If you have information about Medicare fraud, waste, or abuse — the HHS OIG hotline is your next step.

Data Sources

• CMS Medicare Provider Utilization and Payment Data (2014–2023)
• HHS Office of Inspector General — List of Excluded Individuals/Entities (LEIE)
• Department of Justice — Healthcare Fraud Prosecution Records
• OpenMedicare ML Model v2.0 (Random Forest, AUC 0.83)

Note: All data is from publicly available Medicare records. OpenMedicare is an independent journalism project not affiliated with CMS.

Related Investigations

Still Out There

The full list of 500 flagged providers, searchable and sortable.

How We Built the Model

The ML fraud detection model behind 1.7 million provider analyses.

Our Data Predicted It

Providers our model flagged who were later charged by the DOJ.

Internal Medicine: Ground Zero

Why 53% of flagged providers share one specialty

The Fraud Belt

CA & FL lead with 56 flags each

The Impossible Doctors

400+ services per working day

Medicare Fraud in 2025

DOJ's $14.6B enforcement action

Explore: Fraud Analysis Hub · Full Watchlist · ML Results · Internal Medicine · Browse States