|We have been asked what is the minimum sample size for reporting.
We will give all physicians their own scores to review confidentially on all questions, regardless of sample size or other statistical properties.
But for many physicians, results will not be publicly reported for a substantial number of questions. In a small number of cases, no results on any question will be reported for a physician.
In Memphis, for example, we will publicly report on at least one question for 430 of the 437 doctors included in the survey, but for some of these 430 physicians, only one or a few questions will be publicly reported.
The decision of whether or not to publicly report survey results on a physician is made on a question-by-question basis and depends on the statistical properties of the question for the specific physician and on the statistical properties of the question for Memphis as a whole.
We will publicly report on any physician on any question if the t-test for that physician on that question indicates at a 95 percent confidence level that that physician's performance is better or worse than the all-physician average in the community. That is a function of (1) the size of the difference between that physician's score and the community average on that question, (2) the variation among the scores given on that question by that physician's patients, and (3) the number of patient ratings for that physician on that question. We did not have a t-test result sufficient for public reporting on any question for any physician where the number of completed surveys for that physician was fewer than 10.
If a physician’s score for a measure failed to qualify for public reporting based on t-test results, there was another standard used to determine whether to allow his/her score to be publicly reported. Such a standard is, of course, necessary since many physicians with very large sample sizes and meaningful data should not be excluded from reporting simply because they have scores that are not far from average. This second standard is that the data must achieve a certain level on a "reliability" statistic.
This concept of "reliability" is calculated on each question by taking into account the between-physician variation in physician-average scores, the average within-physician variation in scores, and sample size. For each question, for every possible number of responses, we calculated what the "reliability" statistic of the dataset would be if all physicians had that sample size. This is a conservative approach in terms of what is publicly reported. We will be publicly reporting for each question all results for all physicians with a sample size that produces a "reliability" statistic of 0.7 or higher in this calculation.
Using this standard—which is the standard that actually comes into play for determining reportability for most physicians on most questions (since most scores are not significantly different from the average)—the following are the minimum sample sizes for public reporting for illustrative questions in Memphis: for Q20 (overall rating), minimum number of patient responses required for public reporting is 22; for Q21 (recommend to family and friends), minimum is 30; for Q14, (did doctor give easy-to-understand instructions), minimum is 39; for Q11 (did doctor explain things well), minimum is 31; for Q23 (courteous and respectful office staff), minimum is 39. In Denver, the minimums turn out to be substantially higher just because of the statistical properties of the Denver pool of responses.
--rkrughoff at CHECKBOOK/CSS