The following is a detailed discussion of the measures we report on deaths and other bad outcomes (scored with one, two, three, three-and-a-half, four, or five (best) stars) and how to interpret them. Our analyses were conducted by MPA Healthcare Solutions (formerly Michael Pine & Associates), a Chicago-based firm that is expert in evaluating the clinical quality of hospitals. The technical appendix provides details of the analysis and we have a very extensive Technical Report that has even more background for biomedical research experts.
In choosing a surgeon, you will want to know the success rate of each one you are considering. The most important result is keeping patients alive.
For our ratings of surgeons, for some categories of procedures, we identify doctors who have had better-or-worse-than-average risk-adjusted death rates for patients in-hospital and within 90 days of discharge from the hospital. These rates were adjusted in an effort to take into account the fact that some surgeons treat a relatively high percentage of sicker and frailer patients, who would have a relatively high risk of dying with any surgeon.
The adjusted death rates are calculated based on analyses of records of in-hospital surgeries for patients age 65 or older in the traditional Medicare program who had these surgeries during a five-year period (2010-14). The Medicare data are the only available uniform, nationwide data file of surgical cases. For states and types of procedures where we have data to compare surgeons' results for Medicare patients to results for all patients, the surgeons who do well for Medicare patients tend to do well for other patients.
Hospitals submit records to Medicare to get reimbursed for services rendered. Medicare adds one fact to the records submitted by the hospitals: whether the patient died within a specified number of days after hospital admission. Medicare gets this information on deaths, including deaths that occur after a patient's discharge from the hospital, from Social Security records.
The analysis began with the selection of a subset of cases. We selected types of cases that are relatively common and have substantial death rates or other bad outcome rates that might be affected by the quality of a surgeon's or hospital's care. The cases included 12 types of surgical cases as detailed on this website, such as heart valve and coronary artery bypass graft surgery (CABG), large and small bowel surgery, and total hip and knee replacement. The cases were selected from the Medicare records based on detailed definitions using standard procedure codes listed in the technical appendix
In our ratings tables, for each of the surgical procedures we evaluated, there are columns for ratings of surgeons' outcomes. For seven of the procedure categories, there is a column for having fewer deaths. For all 12 of the procedure categories we report on, there is a column for having fewer deaths, prolonged lengths of stay, or readmissions. That column includes events that are likely, though not always, associated with complications. For five of the procedure categories, we don't have a "Having Fewer Deaths" column because we concluded that deaths are infrequent enough and sample sizes small enough that reporting on the death outcomes alone would not be valuable to users of this website.
Adjusted death rates were calculated in several steps.
First, for each type of case, we calculated the actual death rate for each surgeon. We counted deaths that occurred in-hospital or within 90 days after discharge from the hospital where the patient was initially admitted for surgery. This 90-day period is holding the surgeon accountable for results long after surgery. There is reason to believe that a surgeon can affect 90-day outcomes by doing better work during the surgery itself, by making sure the hospital provides high-quality and safe care, by following up with the patient and the patient's other providers after the patient's hospital release, and by helping to ensure that the patient is released to a safe and supportive environment. Checking for this 90-day period also eliminates the possibility that a hospital and surgeon might have relatively low death rates only because they discharge patients to their homes, hospice care, or nursing homes when they are on the verge of death.
We then calculated a "predicted" death rate for each type of case for each surgeon. The predicted death rate indicates what percentage of the surgeon's patients would have died if the surgeon were as successful as the average of all U.S. surgeons in the entire national Medicare dataset we were using in keeping similar patients alive. The patient characteristics taken into account in defining "similar patients" were age, gender, presence or absence of selected principal and secondary diagnoses, and whether certain surgical procedures were performed. For example, a surgeon whose large bowel surgery patients were mostly over age 80 and had secondary diagnoses such as diabetes or malnutrition might have a considerably higher predicted death rate for these cases than a surgeon whose large bowel surgery patients were mostly age 65 to 70 and had few other medical problems. (More detail on the methods used by MPA Healthcare Solutions to calculate predicted death rates appears in the technical appendix.)
Next, we used each surgeon's predicted and actual death rates for the type of surgery, along with the national average death rate for such surgeries, to calculate the surgeon's "adjusted" death rate. The simplest way to calculate an adjusted rate is to, first, calculate the ratio of the surgeon's actual rate to his or her predicted rate for this type of case. If, for example, Surgeon A has an actual death rate of 3 percent but would be predicted to have a death rate of 6 percent based on the levels of sickness and frailty of his or her patients, then the ratio is 0.5 (3 percent divided by 6 percent). Second, we multiply this ratio by the national average death rate in the entire national Medicare dataset we were using to calculate the adjusted death rate for this surgeon for this procedure type. If the national average death rate were 8 percent, then the adjusted death rate for Surgeon A would be 4 percent (8 percent multiplied by the 0.5 ratio). In reality, we used a more complicated formula (employing odds ratios) for calculating adjusted rates, but the results are nearly identical.
For each surgeon in each category of cases, we also checked whether the difference between the actual death rate and the predicted death rate was "statistically significant." For example, if the actual rate was 6 percent and the predicted rate 7 percent, what are the chances that the one-percentage-point difference was the result of the surgeon's having had unusually good luck during the five-year period of our analysis?
We know that some patients survive when the average patient in a similar condition who underwent the same treatment would be expected to die; and some die when the average similar patient given the same treatment would be expected to survive. Since the different results can't be explained, we call it good luck or bad luck, and any surgeon might have a string of either. But big differences between actual and predicted death rates for large numbers of cases are unlikely to result from luck alone. For every surgeon in each category of cases on our ratings tables, we calculate the chances that the difference between the surgeon's actual death rate and the surgeon's predicted death rate could have resulted by luck alone, and the lower the chances of such a result the more "statistically significant" the surgeon's result is.
Our star scores also take one other matter into account: how well the surgeon's risk-adjusted rate ranks compared to other surgeons we analyzed. It is possible that a surgeon's death (or other bad outcome) rate would be statistically significantly different from average even if the actual difference between the surgeon's rate and the average was relatively small just because we have data on a very large number of surgeries the surgeon performed. So we want to give you some insight into not only whether a surgeon's rate was statistically significantly better or worse than average but also how big the difference was. So we looked at the adjusted death (or other bad outcome) rate for each surgeon and ranked them.
Our star scores give you insight into how confident you can be that a surgeon's score was better or worse than average and also how well the surgeon ranked compared to other surgeons we evaluated in the U.S.
Surgeons receive the following ratings based on our definitions, methods, and analysis of the cases we used/studied from a set of five years of hospital records:
Five stars indicates that, using our analysis methods, a surgeon has met two criteria (1) based on the surgeon's outcomes and number of cases, we can be at least 95 percent confident that his/her better-than-average outcomes were not just the result of good luck; and (2) the surgeon's outcome rates were among the best 1/5th of all surgeons studied.
Four stars indicates that, using our analysis methods, while the surgeon did not meet our criteria to qualify for a 5 star rating, the surgeon performed better than average and based on the surgeon's outcomes and number of cases, we can be at least 90 percent confident that his/her better-than-average outcomes were not just the result of good luck.
Three and a half stars indicates that, using our analysis methods, the surgeon's outcome rates were among the best 1/5th of all surgeons studied but the surgeon did not have enough cases that we can be at least 90 percent confident that his/her better-than-average outcomes are not just the result of good luck.
Three stars indicates that, using our analysis methods, (1) the surgeons' rates were not among the best 1/5th of all surgeons studied and (2) neither can we be 90 percent confident that the surgeon had better-than-average outcomes that were not just the result of good luck and nor can we be 95 percent confident that the surgeon had worse-than-average outcomes that were not just the result of bad luck.
Two stars indicates that, using our analysis methods, while the surgeon did not meet our criteria to get a 1 star rating, the surgeon's outcomes were worse than average and, based on the surgeon's outcomes and number of cases, we can be at least 95 percent confident that his/her worse-than-average outcomes were not just the result of bad luck.
One star indicates that, using our analysis methods, a surgeon has met two criteria (1) based on the surgeon's outcomes and number of cases, we can be at least 95 percent confident that his/her worse-than-average outcomes were not just the result of bad luck; and (2) the surgeon's outcome rates we calculated were among the worst 1/5th of the rates we calculated for all surgeons studied.
— A dash mark indicates that the surgeon did not have enough of this type of surgeries in the records we were able to analyze to provide a basis for us to report on their outcomes.
Interpreting the Death Rate Data
Let's look more closely at the strengths and weaknesses of the death rate data. How valuable are they in choosing a surgeon for your care?
Comparison to Other Measurement Systems
The data are helpful in predicting a surgeon's outcomes. We have found, for example, that surgeons who look good on our measures tend to look good on other measures—at least for the specific types of surgeries for which comparable data are available. For example, before publishing our initial report of outcomes for individual surgeons, we looked at how our risk-adjusted death rates for heart valve and CABG surgery for 2009-2012 compared to the risk-adjusted death rates reported by the New York State Department of Health for such surgeries for 2009-2011. Looking at all eight surgeons whom the New York state government's analysis identified as having statistically significantly better-than-average death rates, six of those surgeons were identified in our analysis as having better-than-average death rates with at least a 95 percent confidence level, and the other two were identified as having better-than-average death rates at roughly a 90 percent confidence level (t=1.62 for one surgeon and 1.77 for the other surgeon). In contrast, looking at the six surgeons whom the New York analysis identified as having statistically significantly worse-than-average death rates, all had substantially worse death rates in our analysis than any of the eight surgeons New York identified as having significantly better-than-average death rates.
For the eight surgeons New York identified as having statistically significantly better-than-average death rates, the weighted average death rate in our analysis was 2.82 percent; and for the six with statistically significant worse-than-average death rates, the weighted average death rate in our analysis was 7.32 percent—meaning it was 2.59 times as high (bad) as the death rate for the surgeons the New York report identified as significantly better than average.
This comparability between the New York State analysis and our analysis is true although the New York analysis: 1) included non-Medicare cases, while our analysis included only Medicare cases with patients age 65 or older; 2) included cases in different though overlapping date ranges, compared to the cases analyzed by us; 3) looked at deaths within 30 days of the procedure rather than deaths within 90 days of discharge, as counted in our analysis; 4) had different data than we had on each case for use in adjustment for risk (with New York using data reported specifically for this analysis by hospitals to the state government, and our analysis relying on data from Medicare claims submitted by hospitals); and 5) differed in other ways from our analysis.
How Surgeons Are Evaluated by Other Doctors
It is reassuring to know that surgeons rated high in our surveys where doctors are asked to name other doctors they would consider most desirable to care for a loved one tend to have lower death rates and other bad outcome rates than other surgeons. For example, before we published our first ratings at the individual surgeon level, we did an analysis that showed that of all the surgeons we evaluated for heart valve and CABG surgery, the ones recommended often enough to make our "Top Doctors" list of doctor-recommended doctors had an average risk-adjusted death rate of 5.8 percent for such surgery, while all other doctors in the regions where we surveyed had a risk-adjusted death rate of 6.8 percent. The same pattern has shown up in comparisons of rates of complications and readmissions for surgeons recommended by other doctors versus complications and readmissions for other surgeons.
Accuracy of Attribution of Cases to Surgeons
We have also determined that the outcome data results are reasonably accurately attributed to the named surgeons. We know that in medical cases, as opposed to surgical cases, it is hard to ascertain which of several doctors is responsible for a case. But in surgical cases, the physician responsible for the surgery is supposed to be identified by the "claim operating physician NPI number" in the Medicare claim record. Medicare rules require the hospital filing the claim to use this data-entry field "to uniquely identify the physician with the primary responsibility for performing the surgical procedure(s)." Assuming hospitals code accurately, there should be no issue of accuracy in our attribution of cases to surgeons.
As described in the technical appendix, MPA Healthcare Solutions checked coding accuracy in several other ways.
In addition, before publishing our first ratings of individual surgeons, we compared physicians identified as having performed CABG surgery in our analysis with physicians identified as performing such surgery in the April 2013 report by the State of California, "California Report on Coronary Artery Bypass Graft Surgery," based on 2009-2010 state data. This report covered a different time period than the 2009-2012 data used in our analysis at that time, and the state report included non-Medicare data and Medicare Advantage Plan data rather than only Medicare fee-for-service data we used in our analysis. While one would not expect the counts for each physician to match closely, for every physician identified in our analysis as having at least 70 isolated CABG cases, the California report showed a physician with the same name having at least 70 isolated CABG cases.
As a further quality control, we have reported outcome rates for each type of surgery only if the doctor identified by the National Provider Identifier (NPI) in the claims records we used is identified by other sources as practicing in a specialty that would be expected to do that type of surgery. For example, we report outcomes for a surgeon performing hip or knee replacement surgery only for doctors who are independently identified as specializing in orthopedic surgery. We checked specialties in the government's National Plan and Provider Enumeration System (NPPES), the official record of NPI numbers and the place where doctors are expected to indicate their specialties. We also checked specialties with other sources, such as the American Board of Medical Specialties.
There was one exception to the rules of counting/reporting only on cases that met our rules of fitting to the specialty of the doctor: We included cases of a specific procedure type if the doctor was shown as having in-hospital cases of that procedure type in a file released by Medicare in April 2014 showing counts of procedure codes by doctor. That file was supposedly based on claims filed by doctors (as opposed to hospitals).
Other Issues with the Data
Be aware of other considerations when using our ratings of surgeons:
From the billing records submitted to Medicare by the hospitals used for our analysis, one cannot always be sure whether secondary diagnoses existed when the patient entered the hospital or whether they occurred during the hospital stay. Consider heart surgery cases , for example. If the patient's record says the patient had diabetes as a secondary diagnosis, we can be confident that the diabetes was present upon admission, not acquired in the hospital. But if the record says the patient had pneumonia, we can't be sure whether the patient entered the hospital with pneumonia or acquired it in the hospital. Without this information, we can't know whether to give the surgeon credit for having more difficult cases if s/he has an unusually large number of heart surgery patients with pneumonia. We wouldn't want to give the surgeon such credit if the surgeon or hospital is causing pneumonia.
Hospitals are supposed to code secondary diagnoses as "Present on Admission," if that is the case. But we know that hospital coding is not always accurate. In our analysis, we recoded some secondary diagnoses if we knew the correct coding–for example, diabetes or multiple sclerosis as always present on admission, and surgical-site infection or acute respiratory failure as always hospital–acquired because it is very unlikely that a patient would be admitted to a hospital for elective surgery if something like a respiratory failure was present. But in some cases it is not obvious whether a secondary diagnosis was Present on Admission or acquired while the patient was in the hospital under a surgeon's care. That uncertainty could produce improper risk-adjustment in our analysis.
- Because of data limitations, various underlying characteristics of patients could not be considered. For example, when you are assessing a surgeon who serves mostly low-income, uninsured patients, there's a good chance that the patients have social problems—such as the absence of emotionally supportive family members—that are unreported in the data available for analysis but that could influence death rates or readmissions within 90 days of hospital discharge.
- As we have explained, some patients may die just as a result of bad luck unrelated to the surgery; for example, a heart surgery patient might be killed in an auto accident. Our analysis cannot control for such possibilities and assumes that such events are randomly distributed across surgeons' patients. Our analysis of the statistical significance of differences in death rates is designed to take into account the possibility that such bad events might occur simply as a result of unexplained bad luck.
- Within any of the types of cases we examined, patients may have diseases at different stages of progression, with very different risks of death or other bad outcomes. Some surgeons' heart surgery cases, for example, might include a disproportionately large number of cases in which the underlying condition had reached an advanced stage by the time the patient was admitted for surgery. The data we worked with did not include information on laboratory or imaging results that would make it possible to distinguish among patients on the basis of these findings. Undetected differences in patient mix are especially likely when comparing surgeons and hospitals that serve as referral centers to which other hospitals and surgeons send difficult cases.
- Cases were followed for only up to 90 days after discharge, but problems caused by surgeons or hospitals may not result in death until sometime later. (Longer follow-up periods, of course, have their own set of problems because more time increases the chances of death from causes unrelated to the surgery or hospital stay.)
- Some differences in death rates and readmission rates may result from differences in community practices or the availability of non-hospital facilities to care for patients. In some communities, for example, patients in the final stages of emphysema (obstructive pulmonary disease) may be allowed to die in their homes or in nursing facilities, while in other communities these patients may be admitted to hospitals for their final few days.
- Some of the data for some of the hospital claims records may not be accurate. There are, no doubt, many innocent errors when so many records are processed by hospital coding staffs. In addition, some hospitals may follow different coding guidelines in describing diagnoses in the records they report to Medicare in order to obtain the highest allowable reimbursements for the cost of care.
- Some of the data are incomplete. For example, the billing record that is the source of the data has space for hospitals to list only eight secondary diagnoses in addition to the principal diagnosis. If a patient had nine or more secondary diagnoses, the adjustment process could not allow for all of them.
- Time has elapsed since the period to which the data apply (2010-2014). Medicare records of hospital cases don't become immediately available to the public or to researchers, and it took time for us to perform our analyses.
- The data are for patients 65 or older. Surgeons who perform relatively well with that age group could perform relatively better or worse with younger patients.
- High or low surgeon death rates may result from the quality of treatment provided by specific hospitals, not from the quality of the surgeon's performance. If a surgeon performs well because he or she uses a specific hospital, that may not benefit patients who use a different hospital.
Rates of Any Bad Outcome
The death rate information we have presented focuses only on one bad outcome: death. But there are other bad outcomes. You don't want to contract an infection in the hospital, have an adverse reaction to a drug, fall out of bed, or suffer any of many other types of complications—even if you ultimately survive. Some complications cause permanent disability or disfigurement; others just make your hospital stay longer and more unpleasant. You want neither.
You also don't want to have problems after you leave the hospital. Such problems might be caused by the surgeon's treatment or the care provided by the hospital where the surgery was performed. Or problems might occur because the surgeon or hospital did not do everything they could to help you transition back into the community and to care by other health care providers. Poor performance on any of those fronts can necessitate readmission to a hospital after your initial discharge.
Our ratings tables report surgeons' ratings for "having fewer prolonged lengths of stay, readmissions, or deaths," which we sometimes refer to as "having lower rates of deaths or other bad outcomes" or "having lower total adverse outcome rates." These rates indicate whether any of the following occurred: 1) deaths in hospital or within 90 days of discharge; 2) what we define as "prolonged lengths of stay," which often results from the patient's having complications during the initial hospital stay; or 3) readmissions within 90 days of initial hospital discharge. As with death rates, the other bad outcome rates were adjusted in an effort to take into account the fact that some surgeons and hospitals treat a relatively high percentage of sicker and frailer patients who would have a higher risk of bad outcomes regardless of surgeon or hospital.
The adjusted death or other bad outcome rates were calculated based on cases where the initial hospital discharge occurred in the 57-month period between January 1, 2010, and September 30, 2014. That allowed time to record readmissions that occurred up to December 31, 2014, the end of the four-year period for which we obtained Medicare data.
To give you information intended to indicate whether a surgeon's patients were likely to have had in-hospital complications, we took a roundabout approach. First, MPA Healthcare Solutions (formerly Michael Pine & Associates), which performed the analysis for us, developed a proxy indicator for in-hospital complications in cases where death did not occur during the hospital stay. The proxy indicator is intended to highlight likely complications regardless of whether complications are reported in the hospital records. The proxy indicator looks for "prolonged lengths of stay" in hospital. Analyses of medical records indicate that a large proportion of prolonged lengths of stay are associated with important complications.
Here is a simplified explanation of how that analysis was performed: The analysis recognized that, for a given category of cases, a given surgeon or hospital will have varying lengths of stay, even after allowing for differences in patients' characteristics. But after allowing for differences in patient characteristics, most of this variation will be clustered around the hospital's and surgeon's average length of stay for that category of cases. Cases in which the length of stay is not within a hospital's and surgeon's predicted cluster are likely to involve complications. For each category of cases for each hospital and surgeon, the analysis identified cases that had lengths of stay outside the predicted cluster of lengths of stay; these were deemed prolonged lengths of stay. Because prolonged lengths of stay, like deaths, might occur more often for surgeons or hospitals with especially sick or frail patients, the analysis calculated a predicted percentage of prolonged lengths of stay for each surgeon and hospital based on the mix of characteristics of their patients.
We also calculated predicted inpatient death rates for each surgeon or hospital for its mix of patients using an approach similar to the one used for calculating predicted death rates within 90 days of discharge (described above).
In addition, we calculated a predicted rate of readmissions within 90 days of discharge for each hospital, adjusting for each hospital's mix of patients.
Then we calculated a combined predicted rate of "deaths or other bad outcomes," which we sometimes refer to as a rate of "prolonged lengths of stay, readmissions, or deaths." This includes deaths in hospital, deaths after discharge within 90 days of discharge, prolonged lengths of stay, and readmissions within 90 days of discharge.
We calculated the actual rate of "deaths or other bad outcomes" for each surgeon, calculated the ratio of the actual to the predicted rate for each surgeon, and did roughly the equivalent (except we used odds ratios) of multiplying that ratio by the weighted-average rate of "deaths or other bad outcomes" for all surgeons in the entire national Medicare dataset we were using to get a risk-adjusted "deaths or other bad outcomes" rate. For each surgeon, we report whether that rate is better or worst than the all-surgeon average–and at what statistical confidence level. This is basically the same risk-adjustment process as was used (described above) in calculating risk-adjusted death rates from the actual, predicted, and all-surgeon death rates.
Keep in mind that most of the caveats set out above with regard to adjusted death rates also apply to adjusted rates of "deaths or other bad outcomes." In addition, while the death rates measure directly something we care about—the death—the rates of "deaths or other bad outcomes" use proxies—length of stay and readmissions—as an indicator of complications and also of deficiencies in care and follow-up that can result in readmissions.
Can You Rely on the Results We Report?
We have given you here and in our Technical Appendix quite a bit of explanation of our methods of calculating surgeons' results, and we have a lengthy Technical Report with even more background. All this information should alert you to limitations of our measurement systems and of all available systems for measuring surgical outcomes. There is considerable literature discussing the pros and cons of measurement systems. For example, you can find strong criticism of our use of prolonged lengths of stay as an indicator of likely complications in the following article: Lyman S, Fields KG, Nocon AA, Ricciardi BF, Boettner F: "Prolonged Length of Stay Is Not an Acceptable Alternative to Coded Complications in Assessing Hospital Quality in Elective Joint Arthroplasty," J Arthroplasty. Nov. 2015, pp.1863-7. And you can find a fuller explanation of our methods and the rationale for them in Fry DE, Pine M, Jones BL, Meimban RJ: "Adverse Outcomes in Surgery: Redefinition of Post operative Complications." Am J Surg 2009; 197:479 84. You might find it helpful to read additional literature and to discuss our measures and alternative information sources with physicians or other experts.