Developing new approaches for multi-platform and multi-individual genomic sequence assembly

Kavak, Pınar.

Arşiv ve Dokümantasyon Merkezi Dijital Arşivi Ana Sayfası
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
Ph.D. Theses
→
Öğe Göster

dc.contributor	Ph.D. Program in Computer Engineering.
dc.contributor.advisor	Güngör, Tunga.
dc.contributor.advisor	Alkan, Can.
dc.contributor.author	Kavak, Pınar.
dc.date.accessioned	2023-03-16T10:13:49Z
dc.date.available	2023-03-16T10:13:49Z
dc.date.issued	2017.
dc.identifier.other	CMPE 2017 K38 PhD
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12616
dc.description.abstract	High throughput sequencing (HTS) technologies generate huge amount of data with very low cost, which prompted research on algorithm development to analyze large DNA sequence datasets. In this thesis, we propose new solutions to three related problems in genomics eld. Firstly, although the accuracy and reproducibility of HTS based analyses is highly improved, the usability of these platforms in terms of robustness is still an open question. We produced whole genome shotgun (WGS) sequence data from the genomes of two individuals in two di erent centers to assess the usability of a widely used HTS platform in terms of robustness in clinical applications. We observe that HTS platforms are powerful enough for providing data for rst-pass clinical tests, but before using in clinical applications, the variant predictions need to be con rmed by orthogonal methods. Secondly, we still need innovative methods for the de novo genome assembly problem. The task of assembling very short DNA sequence reads into -ideally- complete chromosome sequences is further complicated due to (i) the repetitive and duplicated structure of genomes, and (ii) the fact that the data produced by the HTS technologies tend to be short and error prone. We present a new method to increase the assembly accuracy by integrating data from Illumina, Ion-Torrent and Roche-454 platforms. Lastly, characterization of novel sequence insertions longer than the average read length remains a challenging task. There are only a few algorithms that are speci cally developed for novel sequence insertion discovery that can bypass the need for the whole genome de novo assembly. We present a new algorithm, Pamir, to e ciently and accurately discover and genotype novel sequence insertions using either single or multiple genome sequencing datasets.
dc.format.extent	30 cm.
dc.publisher	Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2017.
dc.subject.lcsh	Sequence alignment (Bioinformatics)
dc.subject.lcsh	Genomics -- Data processing.
dc.title	Developing new approaches for multi-platform and multi-individual genomic sequence assembly
dc.format.pages	xviii, 103 leaves ;