The Algorithm Knows
How AI Trained on 8,300 Convicted Fraudsters Found 500 Doctors Who Look Just Like Them
⚠️ Important Disclaimer: The providers identified in this analysis are flagged based on statistical patterns, not evidence of wrongdoing. A high fraud probability score means a provider's billing patterns are mathematically similar to those of convicted fraudsters. There may be entirely legitimate explanations — large group practices, complex patient populations, specialty billing patterns, or data aggregation artifacts. No provider named here has been accused or charged with any crime unless otherwise noted. We present this analysis because taxpayers deserve to know how public money is being spent.
We fed 10 years of Medicare billing data — every claim, every dollar, every provider — into a machine learning model trained on 8,300+ confirmed fraud convictions. Then we asked it a simple question: Who else bills like a criminal?
The algorithm returned 500 names.
Five hundred active Medicare providers whose billing patterns are statistically indistinguishable from doctors, nurse practitioners, and clinics that were later convicted of healthcare fraud, excluded from federal programs, or sentenced to prison. Collectively, these 500 providers have billed Medicare over $400.0M in taxpayer money — an average of $800.0K each.
This doesn't mean they're guilty. It means the math says they look exactly like people who were.
How We Built It
Our model is a Random Forest classifier — an ensemble of hundreds of decision trees, each one learning slightly different patterns in the data. Think of it as 500 fraud investigators, each with a slightly different perspective, voting on whether a provider's billing looks suspicious. When the majority vote yes, the probability goes up.
🔬 Training Data
Confirmed Fraudsters (Labels)
- • 8,301 NPIs from the HHS OIG LEIE exclusion list
- • DOJ healthcare fraud prosecution records
- • Cross-referenced with CMS billing data
- • 2,198 had sufficient billing history for training
Billing Features (Inputs)
- • Total billing volume & payment amounts
- • Markup ratios (charged vs. allowed)
- • Services per beneficiary
- • Specialty-specific patterns
- • Geographic concentration
- • Years of active billing
Data source: CMS Medicare Provider Utilization and Payment Data (2014–2023), 1.7M+ providers scored. HHS OIG LEIE database. DOJ press releases and case records.
The key insight: we don't try to detect fraud directly. We trained the model on the billing patterns of people who were convicted of fraud, then asked it to find active providers who match those patterns. It's the difference between trying to define fraud in the abstract vs. learning what fraud actually looks like from 2,198 real cases.
What the Algorithm Learned
When you train a model on thousands of convicted fraudsters, certain patterns emerge. Here are the features the model found most predictive, ranked by importance:
Fraudsters tend to have shorter careers — they get caught, lose their license, or move on to a new scheme.
How many services a provider bills per patient. Padding patient visits with unnecessary tests and procedures is the oldest fraud playbook.
The ratio of what a provider charges vs. what Medicare allows. Legitimate providers typically charge 1.5–2x. Fraudsters routinely charge 3–5x, knowing Medicare will pay the allowed amount regardless.
Raw billing volume. More services = more revenue = more opportunity for abuse.
Patient count. Extremely high patient counts can indicate "patient mills" — clinics designed to churn through Medicare beneficiaries.
Higher average payments suggest upcoding — billing for more expensive procedures than were actually performed.
Fraud clusters geographically. South Florida, Houston, Los Angeles, and Detroit are historical hotspots.
The #1 predictor surprised us: years active. Fraudsters don't tend to build long careers. They enter the system, bill aggressively for a few years, and either get caught, move to a different NPI, or flee. A provider who's been billing Medicare for 15+ years is statistically less likely to match fraud patterns — not because fraud doesn't exist among veterans, but because the convicted ones rarely lasted that long.
The #2 predictor — services per beneficiary — is the classic fraud signal. Legitimate internists might bill 2–4 services per patient visit. Fraud mills bill 8, 10, 15 services per encounter — stacking labs, tests, and procedures onto every patient who walks through the door.
The Top 10 Flagged Providers
These are the 10 active Medicare providers whose billing patterns most closely match those of convicted fraudsters. Each link goes to their full provider profile with detailed billing data.
| Rank | Provider | State | Specialty | Fraud Prob. | Medicare Payments |
|---|---|---|---|---|---|
| 1 | Ramesh Thimmiah Billing pattern nearly identical to convicted fraudsters in same specialty | WV | Internal Medicine | 95.9% | $788.7K |
| 2 | Willie Lucas 7-figure billing with high services per patient | MS | Internal Medicine | 95.5% | $1.0M |
| 3 | Michael Cozzi Highest markup ratio in top 10 — nearly 3x submitted vs. allowed | IN | Anesthesiology | 94.3% | $1.7M |
| 4 | Frank Leung Rare specialty flag — endocrinologists are uncommon on fraud lists | IL | Endocrinology | 95.7% | $601.8K |
| 5 | John Daconti New Jersey — one of the top 5 states for healthcare fraud prosecutions | NJ | Internal Medicine | 94.9% | $547.0K |
| 6 | Tuan Duong California — tied #1 for most flagged providers by state | CA | Internal Medicine | 95.6% | $516.5K |
| 7 | Lilia Gorovits Consistent volume-driven billing pattern across all years | PA | Internal Medicine | 94.6% | $716.7K |
| 8 | Sudhirkumar Shah High services per beneficiary in a lower-volume state | MO | Internal Medicine | 95.3% | $724.7K |
| 9 | Michael Hernandez Florida — tied #1 for most flagged providers, $1.2M in payments | FL | Internal Medicine | 95.1% | $1.2M |
| 10 | Edd Jones Only Family Practice provider in top 10 | GA | Family Practice | 94.7% | $795.8K |
Click any provider name to see their full billing history, year-by-year breakdown, and top procedures.
The Specialty Pattern: Why Internal Medicine Dominates
Of the 500 flagged providers, the specialty breakdown is striking:
Internal Medicine + Family Practice = 79.6% of all flagged providers. Why? These are the highest-volume primary care specialties. They see the most patients, order the most tests, and bill the most line items. That volume creates opportunity — more claims mean more chances to pad, upcode, or bill for services never rendered. It's the same reason bank robbers target banks with the most cash.
This doesn't mean internists are inherently more fraudulent. It means the model learned that fraud patterns concentrate where billing volume is highest — which is exactly what DOJ prosecution records show.
The Geography of Suspicion
Top 5 states by number of flagged providers (220 of 500, or 44%)
California and Florida are tied at 56 flagged providers each. Then New York (39), Texas (36), and New Jersey (33). If that list looks familiar, it should — these are the exact same states that dominate DOJ healthcare fraud takedowns year after year.
South Florida alone has been the target of more Medicare fraud strike forces than any other region in the country. Our algorithm didn't know that. It simply learned from the data that providers in these states are more likely to match convicted fraud patterns. The geography of fraud is not random.
The Money
Four hundred million dollars. That's the total Medicare payments flowing to these 500 providers over the past decade. To put that in perspective: the entire HHS OIG budget for investigating Medicare fraud is roughly $400M per year. We're talking about the same amount of money going to just 500 providers whose billing patterns match convicted criminals.
Not all of that money is fraudulent — probably not even most of it. But if even 10% represents waste, fraud, or abuse, that's $40 million in taxpayer money. Enough to fund 800 nurses for a year.
Does It Actually Work?
Yes. Our model achieves an AUC of 0.83 — meaning it correctly distinguishes between fraudsters and legitimate providers 83% of the time. That's not perfect, but it's significantly better than random chance (0.50) and comparable to models used in financial fraud detection.
More importantly: we've already seen it work in the real world. Several providers flagged by earlier versions of our model were subsequently charged by the DOJ — months or years after our analysis identified them. We documented these cases in “Our Data Predicted It”.
The model doesn't know why a provider bills the way they do. It only knows that the pattern matches. But when you train on 2,198 confirmed cases and the math keeps being right, the patterns deserve scrutiny.
⚠️ What This Is — And What It Isn't
This is a statistical analysis, not an investigation by law enforcement. A high fraud probability means a provider's billing patterns are mathematically similar to convicted fraudsters. It does not mean they have committed fraud.
There are many legitimate reasons a provider might trigger our model: they may treat unusually complex patients, operate a high-volume clinic, practice in an underserved area, or bill under a group NPI that aggregates multiple providers' claims.
No provider named in this article has been accused or charged with any crime based on this analysis. If you believe you have information about Medicare fraud, contact the HHS OIG Hotline.
Explore the Data Yourself
We believe in transparency. Every number in this article comes from publicly available CMS data. You can verify it, challenge it, or build on it.
All 500 flagged providers — searchable, sortable, with fraud probability scores and billing details.
If you have information about Medicare fraud, waste, or abuse — the HHS OIG hotline is your next step.
Data Sources
- • CMS Medicare Provider Utilization and Payment Data (2014–2023)
- • HHS Office of Inspector General — List of Excluded Individuals/Entities (LEIE)
- • Department of Justice — Healthcare Fraud Prosecution Records
- • OpenMedicare ML Model v2.0 (Random Forest, AUC 0.83)
Note: All data is from publicly available Medicare records. OpenMedicare is an independent journalism project not affiliated with CMS.
Related Investigations
The full list of 500 flagged providers, searchable and sortable.
The ML fraud detection model behind 1.7 million provider analyses.
Providers our model flagged who were later charged by the DOJ.
Why 53% of flagged providers share one specialty
CA & FL lead with 56 flags each
400+ services per working day
DOJ's $14.6B enforcement action
Explore: Fraud Analysis Hub · Full Watchlist · ML Results · Internal Medicine · Browse States