br After the projection each outcome of
After the projection, each outcome of the algorithm is repre-sented by two coordinates (x,y) and the similarities among feature selector can be analyzed in Fig. 7b. Regarding stability, it KIN59 can be observed that the points that correspond to the NN-Wrapper are very scattered. In other words, this is the most unstable feature selector. The outcomes of RF, however, are clustered together. The same applies to the Pearson feature selector. This figure also allows to see that Pearson generates similar ranking to RF. Note also that SVM-RFE and Releif are very distant to these two methods while SVM-Wrapper falls somewhere in between.
Stability should be studied jointly with the capability of the se-lected features to predict the target class. This is crucial in order to provide reliable information to the experts about the most impor-tant risk/protective factors, and not only with the most stable rank-ing lists. In terms of predictive power (see Fig. 2 in previous sec-tions) RF and Pearson shows similar behavior but the models built with Pearson tend to outperform those built with the features se-lected with RF. This is also confirmed with the analysis conducted in Section 3.2 (Table 2). This visual analysis allows us to see that SVM-wrapper performs moderately in terms of robustness and it is similar to Pearson. Additionally, they are the best ranking tech-niques in terms of model performance.
There is enough evidence that different screening tests such as fecal occult blood testing or colonoscopy are effective in re-ducing the incidence and mortality from CRC [32,35]. Screening and preventive interventions can benefit from the incorporation of CRC risk prediction models able to identify individuals at high risk of developing CRC. Risk-adapted screening tests might also be more cost-effective than following traditional screening interven-tions. The use of the individualized risk information provided by these models would also potentially encourage lifestyle changes. There are several challenges, however, to implement risk predic-tion tools for CRC. The main one is the collection of family history, genetic, lifestyle and dietary information in a primary care envi-ronment. Other side effects include anxiety, false reassurance, and false alarms among the general population. Further assessment in terms of research, clinical impact, and cost-effectiveness is neces-sary to deploy these models in clinical practice.
The aim of ribonucleic acid (RNA) study is the assessment of several feature se-lection techniques together with classification models to develop risk prediction models for colorectal cancer. This work is focused in the analysis of both classification performance and robustness of the feature selection algorithm.
This research work shows that the two best performance re-sults are achieved with a SVM classifier using the top-41 features selected by the SVM-approach (AUC=0.693) and LR with the top-40 features selected by the Pearson (AUC=0.689). This implies an improvement with respect to the results using the full feature set of 3.9% and 1.9% for the SVM and LR classifier, respectively. This performance is comparable to other studies on CRC (AUC=0.63 in  and similar performance in references therein).
Table 5 shows the features with more discriminant power for colorectal cancer prediction selected with the best performance strategies: the SVM-Wrapper approach (top-41) and the ranking performed with the Pearson correlation coe cient (top-40). Al-though features are different from one list to the other, some fea-tures are common in both lists (highlighted in grey): Red meat, legume, physical exercise, family histoy of CRC, carotenes, choles-terol, age, level of education, ethanol in the past, food intake in grams, niacin, rs4939827, rs7014346, rs961253, rs9929218, sex and zinc. Some of the features found associated with CRC risk in MCC