OpenMedicare
Start Here
Explore
Fraud
Investigations
Data
Tools
About

Footer

OpenMedicare

Independent Medicare data journalism

Sister Sites

  • OpenMedicaid
  • OpenFeds
  • OpenSpending

Explore

  • Providers
  • Procedures
  • States
  • Specialties
  • Search

Fraud Analysis

  • Still Out There (AI)
  • Fraud Overview
  • Fraud Watchlist
  • Deep Dive Profiles
  • Impossible Numbers
  • Report Fraud

Investigations

  • The Algorithm Knows
  • How We Built the Model
  • Internal Medicine Crisis
  • Florida & California Fraud
  • Million Dollar Flagged
  • All Investigations

Tools

  • Provider Lookup
  • Compare
  • Cost Calculator
  • Your Medicare Dollar
  • Downloads

About

  • About OpenMedicare
  • Methodology
  • Glossary
  • Data Sources
  • API Docs
  • Updates
Data Sources: Centers for Medicare & Medicaid Services (CMS), Medicare Provider Utilization and Payment Data
Disclaimer: This site is an independent journalism project. Data analysis and editorial content are not affiliated with or endorsed by CMS or any government agency. All spending figures are based on publicly available Medicare payment records.
Sister Sites: OpenMedicaid · OpenFeds · OpenSpending

© 2026 OpenMedicare. Independent data journalism. Built by TheDataProject.ai

Methodology•Download Data
  1. Home
  2. Methodology

Our Methodology

Transparency is core to our mission. Here's exactly how we collect, process, and analyze Medicare data.

Data Source

All data comes from the CMS Medicare Provider Utilization and Payment Data, published by the Centers for Medicare & Medicaid Services. This is the same data the federal government uses to track Medicare spending.

  • Dataset: Medicare Physician & Other Practitioners — by Provider and Service
  • Years covered: 2014–2023 (10 years)
  • Scale: ~96 million rows of physician/supplier claims for Medicare Part B
  • Scope: Every physician, nurse practitioner, and clinical supplier who billed Medicare

What's Included

Each record includes: provider name and NPI, specialty, location, HCPCS procedure code, number of services, number of unique beneficiaries, submitted charges, Medicare allowed amount, and Medicare payment amount. We aggregate across all years to build provider profiles.

Risk Score Calculation

Our risk score (0–100) identifies statistical outliers who may warrant further investigation. It is not a fraud determination. The score combines several signals:

1. Specialty Peer Comparison

How far is a provider from the median in their own specialty? A cardiologist billing 50x the median cardiologist stands out more than one billing 2x. We compare total payments, services, and beneficiary counts against specialty peers.

2. Markup Ratio Analysis

The ratio between what a provider submits (charges) vs. what Medicare actually pays. While some markup is normal, extreme ratios (10x, 25x, or higher) can indicate billing anomalies. We compare each provider's markup to their specialty average.

3. Volume Anomalies

Unusually high service volumes or beneficiary counts relative to peers. A single provider seeing more patients than seems physically possible is a meaningful signal.

4. Pattern Flags

Specific billing patterns associated with known fraud schemes: excessive COVID testing concentration, wound care billing anomalies, high-cost drug administration patterns, and others.

5. Scoring

Individual signals are weighted and combined using logarithmic scaling to produce a final score from 0–100. Logarithmic scaling ensures that extreme outliers are clearly distinguished while avoiding false alarms from moderate variations.

What This Is NOT

⚠️ A high risk score is not an accusation of fraud.

It identifies statistical outliers — providers whose billing patterns differ significantly from their peers. There may be legitimate reasons: a specialist handling unusually complex cases, a provider in an underserved area seeing more patients, or data reporting differences.

Only proper investigation by qualified authorities (CMS, OIG, law enforcement) can determine whether actual fraud has occurred.

Supervised Fraud Detection (ML v2)

In addition to our statistical approach, we developed a supervised machine learning model trained on confirmed fraudsters — providers who have been indicted by the DOJ, excluded by the HHS OIG (LEIE database), or who settled False Claims Act cases.

Model Details

  • Algorithm: Random Forest classifier
  • Training labels: 2,198 confirmed fraudsters (LEIE exclusions + DOJ charges)
  • Providers scored: 1,719,625 active Medicare providers
  • AUC: 0.83 — the model correctly ranks a random fraudster above a random legitimate provider 83% of the time
  • Flagged: 500 providers scored above the 86% fraud-match threshold

The model learns what billing patterns caught fraudsters share — volume anomalies, markup ratios, specialty concentration, geographic signals, beneficiary patterns — then scores every active provider on how closely they resemble confirmed criminals. The result is a "fraud match probability" — not proof of fraud, but a statistical measure of resemblance to convicted providers.

This approach was validated when our earlier statistical analysis flagged providers before the DOJ charged them. The supervised model builds on this by learning directly from the ground truth of confirmed fraud.

Results are available on our "Still Out There" page.

Limitations

  • Part B only: This data covers physician and supplier claims. It does not include hospital inpatient stays (Part A), prescription drugs (Part D), or Medicare Advantage (Part C) plans.
  • No clinical context: We see billing codes, not patient charts. We cannot assess medical necessity.
  • Aggregated data: CMS suppresses data for providers with fewer than 11 beneficiaries for a given service, which may affect small-practice providers.
  • Payment ≠ Income: Medicare payments go to practices and organizations, not necessarily individual providers.
  • Historical data: Patterns may reflect past practices that have since changed.

How to Use This Data

This data is valuable for:

  • Journalists investigating Medicare spending patterns and potential waste
  • Researchers studying healthcare economics, utilization, and geographic variation
  • Policymakers evaluating program integrity and spending trends
  • Concerned citizens understanding how their tax dollars fund healthcare

Responsible Disclosure

We do not label any provider as "fraudulent." We identify statistical outliers and present the data transparently so that qualified parties can draw informed conclusions. Provider names are included because this is public data published by the federal government, but we encourage users to consider context before drawing conclusions.

Learn More

📖 Glossary
Every Medicare and fraud detection term defined, A-Z
📊 Data Sources
Where our data comes from, how fresh it is, and what it contains
Share:

Data Sources

  • • Centers for Medicare & Medicaid Services (CMS)
  • • Medicare Provider Utilization and Payment Data (2014-2023)
  • • CMS National Health Expenditure Data

Note: All data is from publicly available Medicare records. OpenMedicare is an independent journalism project not affiliated with CMS.