# br Ranking stability analysis br The stability

3.3. Ranking N,N-Dimethylsphingosine analysis

The stability of six feature ranking algorithms is evaluated in this section. The feature ranking algorithm was launched with 70%

Table 2

Best Feature selection techniques for different top-k lists and classifiers.

SVM

BT

kNN

#Features
Rank
AUC
#Features
Rank
AUC
#Features
Rank
AUC

LR

NN

#Features
Rank
AUC
#Features
Rank
AUC

Fig. 5. Classifier performance (AUC) with the full feature set and different cardinal-ity of the feature subset for different classifiers: NN.

Fig. 6. Classifier performance (AUC) with the full feature set and different cardinal-ity of the feature subset for different classifiers: SVM.

of the data randomly extracted from the whole dataset. Seven runs of this process resulted in a total of K = 7 rankings.

3.3.1. Traditional stability analysis

The stability of the feature ranking algorithms can be evaluated with metrics like the Spearman’s rank correlation coe cient (SR).

In this case, we have computed the 7(7−1) pairwise similarities for 2
each algorithm to end up averaging these computations accord-ing to Eq. (5). The SR is recorded in Table 3 where it can be seen that RF is the most stable (0.712) ranking algorithm, whereas NN-Wrapper is quite unstable (0.036).

The Jaccard index allows to study the stability of a feature sub-set that contains the top-k feature lists. Table 4 shows the Jac-card index for the selection of feature subsets with cardinality that varies from 10–100 and the average in the last row. The results confirm that the NN-Wrapper method is very unstable and RF is very stable. Looking at stability and classifier performance jointly,

results demonstrate that RF was the most stable technique, but it performed worse than other rankers in terms of model perfor-mance. SVM-wrapper and the Pearson correlation coe cient per-formed moderately in terms of robustness and are the best ranking technique in terms of model performance.

The analysis based on a single metric does not allow, how-ever, to say anything about how similar the rankings provided by the different algorithms are. Typical questions we would like to

Table 3

Stability of a set with 7 full rankings assessed through average pairwise similarities with the Spearman’s rank correlation coe -cient (SR).

Pearson
Relief
SVM
NN
SVM
RF

Wrapper
Wrapper
RFE

Table 4

Stability of a set with 7 top-k lists assessed through average pairwise similarities with the Jaccard index

for different values of k.

k
Pearson
Relief
SVM
NN
SVM RFE
RF

Wrapper
Wrapper

Average

from

answer are: (i) Which feature ranking algorithms provide similar rankings?, (ii) Which algorithm is more stable for a certain range of k values?. Analyzing directly the results gathered in Table 4 does not seem straightforward.

3.3.2. Visual stability analysis

A simple plot helps to see the relative and absolute stability of the feature selectors. Fig. 7 highlights that their relative stability changes with the value of k. In general terms, RF and Pearson ap-pears to be the most stable algorithm. Note also that the stability of the SVM-RFE approach for low values of k is very low. No reli-able information of the most relevant factors can be extracted from just a single run of the algorithm. It would be desirable to aggre-gate the rankings in order to get a more representative ranking. Likewise, NN-Wrapper is very unstable.

MDS [12] is used in this section to visualize the feature selec-tors in a graph so that comparisons between all of them can be established.

All the results gathered in the experiment can be organized as a set of 42 points (6 algorithms x 7 runs each one) defined a 100-dimensional space. These points are projected to a 2D space using MDS. The distance between points is calculated with the Spear-man’s rank coe cient and the stress criterion is normalized with the sum of squares of the dissimilarities.