Publishing negative studies is difficult.

Many physicians and readers believe that journals are systematically disposed to publishing positive studies (journals, after all, are responsive to their audience, and most readers are less interested in that which does not work) and that investigators are systematically disposed to only publishing their positive findings (newly minted physicians in the past 15 years have likely learned of funnel plots as a means to detect this bias). Although these two beliefs have merit, my hesitancy is more methodological: it is more difficult to carefully evaluate negative studies.

Many medical schools now teach some structured program about reviewing the medical literature. Drawing heavily on a series titled “User's Guide to the Medical Literature” that was published in the Journal of the American Medical Association, many physicians have received some systematic training in evaluating results. The degree to which this cohort of physicians who have been exposed to this training has filtered sufficiently into the “reviewer pool” to affect journal review quality is uncertain. Unfortunately, this training does less to help in reviewing negative studies because the curriculum employed in medical schools is heavily focused on evaluating studies that reach a positive conclusion.

Negative findings, without addressing the nuances between equivalency versus no difference, have very specific review concerns. The most well-known issue is power. Power refers to the chance that a study can detect a difference if a difference actually exists; this is also occasionally called the ability to avoid a Type 2 error. While this is often provided in randomized controlled trials, it is rarely found in observational studies.1 

Although power is positively correlated with sample size, it can also be modified by the effect size selection. Ideally, investigators choose the minimally important clinical difference as the effect size. Unless someone is a content expert, it is difficult to know whether the effect size chosen was appropriate. In addition, many studies are frequently powered to only one primary outcome. If the results report negative findings, it may be that they were not powered for that outcome.

Beyond power, which only addresses the likelihood that one has avoided an error due to chance, there are several additional considerations. Many examples exist2  of studies that enroll patients who are not the ones most likely to benefit from the intervention and then declare the intervention not effective. Conversely, the Hawthorne effect may be so powerful in the control arm of a study that a signal is not detected between the groups.

Measurement is also an extremely common reason for negative studies. I am convinced that most studies that look at quality of life fail to find differences because we cannot adequately measure quality of life. Similar measurement concerns plague studies in which dietary variables are of interest because of the difficulty of measuring dietary variables in large, free-living population studies.

The proper delivery of an intervention is an additional concern often referred to as intervention fidelity. This is particularly of concern in diabetes, where so many interventions target self-care behaviors and involve some form of education or more advanced methods to alter behaviors. Unless the intervention was delivered as intended—information that is often unavailable to the reviewer—one may falsely conclude that the negative study is a reflection of an inadequate intervention as opposed to inadequate delivery.

Data management is yet another concern in negative studies. Failure to provide quality control for data input and management can result in negative studies if those errors are non-differentially distributed. An anonymous quote from several centuries ago reminds us that, “The government are very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root, and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the village watchman, who just puts down what he damn well pleases.”

And these are but a few of the issues.

In this issue of Clinical Diabetes, an important “negative” study is reported in our Landmark Studies department (p. 17). Michael Pignone, MD, MPH, reports on a randomized clinical trial investigating the efficacy of self-monitoring of blood glucose (SMBG) in 184 subjects with newly diagnosed type 2 diabetes. The study assigned subjects to a fairly comprehensive management program that either included SMBG or not. The study concluded that SMBG provided no benefit with regard to glycemic control—a “negative” study.

This is an important observation that is somewhat at variance with accepted wisdom. Several considerations should provide pause before these results are translated into practice. First, the population studied was apparently remarkably easy to control; < 10% required more than one medication to achieve control at 12 months, and nearly one-third required no medication at all. The less complex the population, the more likely that the elimination of one component (here SMBG) of an intense multidimensional intervention will have minimal effect on the outcome of interest. Second, the intervention that the control group received was indeed multifactorial and robust. Third, subjects were asked to monitor their blood glucose eight times per week and advised on a response to abnormal values. It is hard to know how well this instruction was delivered without some form of fidelity analysis, let alone whether the nature of the “advice” was sufficiently robust to influence any behavioral change. In addition, it is possible that monitoring only about once per day is insufficient to allow someone to detect patterns of response to their medications, diet, activity, illness, or other factors. It is generally my approach to have individuals monitor more frequently early on during the learning phase and then cut back once they have stabilized.

Despite some of these limitations, we still published a review of this study in this issue. But it was difficult, because negative studies are seemingly just a little harder to evaluate. No work is perfect or clear, and this work clearly demonstrates common issues that affect negative studies. Still, it appears that individuals with newly diagnosed type 2 diabetes in this study, who were enrolled in an intensive, multidimensional diabetes control program and who readily achieved control with no medications or just one medication, did not gain additional glycemic benefit from performing SMBG about once a day during the first year of diagnosis based on the advice given in this study.

1.
Hebert
RS
,
Wright
SM
,
Dittus
RS
,
Elasy
TA
:
Prominent medical journals often provide insufficient information to assess the validity of studies with negative results
.
J Negat Results Biomed
1
:
1
-
7
,
2002
2.
O'Malley
PG
,
Feuerstein
IM
,
Taylor
AJ
:
Impact of electron beam tomography, with or without case management, on motivation, behavioral change, and cardiovascular risk profile: a randomized controlled trial
.
JAMA
289
:
2215
-
2223
,
2003