



Apart from benchmark functions, all these algorithms have been executed on two gene expression data sets: yeast and leukemia. Having better exploration capability, it has performed better than other metaheuristic algorithms in convergence analysis. In this work, the proposed variant RW-ESWSA, which has better exploration in the search strategy incorporating randomized walk or movements, proves its efficacy in its performance on benchmark functions and statistical analysis.

This is the first attempt to identify OPSMs using metaheuristic approaches. The OPSM is a submatrix where a subset of genes changes its expression rate in approximately similar manner in different conditions of a disease. This research proposes a new variant of Elephant Swarm Water Search Algorithm (ESWSA), namely, Random Walk Elephant Swarm Water Search Algorithm (RW-ESWSA) to find order-preserving submatrices (OPSM) from gene expression data sets expressed in a matrix form. Joy Adhikary, Sriyankar Acharyya, in Recent Trends in Computational Intelligence Enabled Research, 2021 Abstract This is a set of 100 genes obtained by taking 50 genes with the largest positive t-values and another 50 genes with the smallest negative t-values. Following a general consensus ( Golub et al., 1999 Kim et al., 2002), we chose to select a sufficient number of genes that can be further considered for fine filtration. Technical noise refers to errors incurred at various stages during data preparation.įor coarse filtration, we follow an established approach based upon the t-metric discussed in the previous section. Biological noise refers to the genes in a gene expression data set that are irrelevant for classification. This noise can be categorized into (i) biological noise and (ii) technical noise ( Lu and Han, 2003). The purpose of coarse filtration is to remove most of the attributes that contribute to noise in the gene expression data set. Asish Mukhopadhyay, in Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology, 2015 6.1 Coarse filtration A higher value of classification accuracy represents the better efficiency of the algorithm in selecting the significant genes that contain the most diagnostic information. The classification accuracy is calculated as the percentage of the test samples correctly predicted as either diseased or normal. The process is repeated by considering each individual sample as the test sample. Then the second sample is used as the test sample and the remaining set is taken as the training set. The top-ranking genes from the training set are used in the algorithm to predict the test sample as diseased or normal. For LOOCV, the first sample is selected as the test set and the remaining samples as the training set. For the classification accuracy calculation, the leave-one-out cross-validation (LOOCV) method is used. For the k-means, they are the genes having the least Euclidian distance from the mean value of the cluster. In the case of the ECA and the CMI, the top-ranking genes are defined as the genes having the highest multiple redundancy measure in clusters. These candidate genes are composed of the top-ranking genes having higher fitness. A subset of genes (called as the candidate genes), is selected from each cluster for the classification studies. For each simulation, a set of clusters is obtained. For the purpose of comparison, 10 simulations for each of the algorithms are considered. The ECA is compared with the k-means and the CMI.
