We also consider data from a study conducted in the United States and the Netherlands for a new sepsis diagnostic test . Three independent diagnoses per patient were made by expert panelists based on information contained in case report forms, and the diagnostic combination was used to determine the overall confidence of the classification for each patient, as described in S2 Supporting Information (“A Method for Estimating Patient Classifications By an Expert Panel Comparator”). Erroneous classifications were introduced at random, weighted by the distribution of uncertainty observed in the classification of patients, as described in S3 Support information (“Weighting for Misclassified Classification Events”). In order to present a statistically valid representation of the randomness of the selection, each injection of classification noise was randomly drawn from the distribution of uncertainty observed in the study and introduced in 100 iterations, and aggregate results are displayed. Four different patient selections from the study as a whole (N -447) were made separately and analyzed separately: (1) the subset of patients (N-290; 64.9% of all patients) who received unanimous diagnoses from external expert panelists and to whom the same diagnosis was attributed by the researchers at the clinical sites. We thought it was the “super-unanimous” group and assumed that if external panel experts and clinical site researchers agreed, the diagnoses were pretty correct. These patients represent the study cohort layer with the lowest probability of error in comparison; (2) the subset of patients (N-410; 91.7% of the total) who received a consensual diagnosis (majority diagnosis) by the external organ. This subgroup of patients excluded 37 patients considered “indeterminate” because the experts had not reached a consensual diagnosis; (3) all patients (N -447) with a forced diagnosis of positive or negative, regardless of the degree of uncertainty associated with each patient; (4) the sub-quantity of patients with clinical records indicating respiratory disorders (N-93; 20.8% of the total) for whom a relatively high classification uncertainty was expected and observed. Each of these 4 patient selections had an expected misclassification rate, determined on the basis of the average residual uncertainty, as measured in the evaluations of the three external panelists, as described in S4 Supporting Information. Figure 2 graphically shows the effects of misclassification by the comparator on the interpretation of diagnostic performance. This figure was generated from a simulation of 100 negative Truth ground samples and 100 positive Truth samples. Panel A shows actual test performance (0% comparative ranking), while Panel B shows the effect of random injection of 5% misclassification in comparison calls. The quantitative results of this specific simulation are compiled in Table 1.
In order to assess the extent of the obvious differences in the declassification rates proposed in this table, we conducted a more in-depth review that varied the number of simulated samples (test size) (S5 Supporting Information, “Decrease in apparent in performance of index test, with 5% noise injected into comparator”).