br For testing Normality and Lognormality the Shapiro test f
For testing Normality and Lognormality, the Shapiro test from the R package stats (version 3.2.2) was used with a threshold of 0.01, and log transformation of the data respec-tively . For Pareto, Gamma and Cauchy, the Kolmogorov-Smirnov test was applied . For this test, the parame-ters were estimated with the Maximum Likelihood Estimates (MLE). For the MLE of Gamma, we used the rGammaGamma R package (version 1.0.12.). For Cauchy, the two parameters were set as the median and the interquartile range. As the parameters were estimated directly from the data, we applied a parametric bootstrap to estimate the final P-value. This idea of resampling to find the null distribution of the test statistics when estimating the parameters is based on the Lilliefors test . The significance threshold for the final P-value was set to 0.01. For testing Bimodality, the Bimodality Index was com-puted using the R package ClassDiscovery (version 3.0.0.).
Concordance index (C-index) method
The C-index is an accuracy measure that is based on the prob-ability of concordance between the predicted and observed responses [30,31]. Specifically, for a pair of patients chosen at random, the C-index determines the probability that the pa-tient sample with the higher risk prediction will experience an event before the patient with the lower risk in the pair. This measure is computed for all pairs of observed responses and the number of times the predictions are concordant is sum-marized as a statistic from which the P-value is derived under assumptions of asymptotic Normality .
The D-index is a statistical metric that assesses prognostic ability of a potential biomarker by measuring the degree of separation between two Kaplan-Meier curves constructed by splitting the patient Pimozide into two groups based on the biomarker . Under assumptions made by the Cox pro-portional hazard model, the D-index is calculated as an es-timate of the log hazard ratio between the two populations being compared. The C-index and D-index functions were computed using the survcomp package (https://github.com/ bhklab/survcomp/tree/master/man) .
The eight survival analysis methods were evaluated based on performance according to three criteria, reliability, accu-racy, and robustness. First, reliability was assessed by divid-ing each cancer dataset into two sets, running the survival analysis method on both datasets to identify biomarkers, and the consistency of results between the two half datasets was compared. Second, accuracy of a method was assessed by comparing the results of each method to a gold standard list of known prognostic expression markers that were specific to each tumor type. We then computed ROC curves to compare the relative false positive rate (FPR) to true positive rate (TPR) of the eight methods. Third, robustness for each method was tested by generating in silico data with controlled levels of noise, and a set of known “positive controls”. In this way, the performance of the eight methods was assessed in the pres-ence of increasing amounts of noise in the data.
Assessment of reliability identified the k-means and Cox regression methods as those with the strongest performance. Reliability was assessed by investigating whether a method would give similar results if only half of the data was presented to the method. Two sets of gene-specific log10-transformed P-values were obtained from applying the same method to two halves of the cancer datasets, and a correla-tion coefficient was used to evaluate the similarity in results
Fig. 2 Assessing the reliability of the eight methods. A. Bar chart displaying the Spearman correlation coefficients between the two data halves for each method demonstrate which methods are more reliable. B. Scatter plot of -log10(P-values) for all genes in one data half versus the other data half for each method indicate which methods identified that same set of significant genes in each data half. Blue lines indicate equality between the two halves.