Özet:
Genome sequencing technologies are advancing at a very fast rate. Output of those technologies is large amounts of data that is inconvenient for manual analysis. There are statistical and machine learning methods to make inferences automatically. The problem is choosing which to apply and how to apply them. In this study, dimensionality reduction algorithms and Bayesian inference were employed in order to find the overrepresented genes and pathways in epilepsy patients, which is a neural disorder characterized by spontaneous seizures and known to have a genetic basis in some cases. These methods were employed on the Whole Exome Sequencing (WES) data of over 100 patients, each of which belong to one of four different phenotypes of Epilepsy. From this data, de novo single nucleotide polymorphisms were filtered from WES data to analyze the performance of dimensionality reduction algorithms and identify the genes, pathways, and cellular locations associated with each phenotype. Dimensionality reduction was applied on three pathogenicity scores obtained by different methods. It was observed that even linear applications (i.e. Principal Component Analysis) perform acceptably, and nonlinear versions increase the performance. Three manifold learning algorithms were also applied as different approaches. Genes and pathways found to be most significant were shown and discussed.