dc.contributor |
Graduate Program in Computational Science and Engineering. |
|
dc.contributor.advisor |
Ecevit, Fatih. |
|
dc.contributor.author |
Güvenç, Betül. |
|
dc.date.accessioned |
2023-03-16T10:02:28Z |
|
dc.date.available |
2023-03-16T10:02:28Z |
|
dc.date.issued |
2016. |
|
dc.identifier.other |
CSE 2016 G88 |
|
dc.identifier.uri |
http://digitalarchive.boun.edu.tr/handle/123456789/12326 |
|
dc.description.abstract |
There is a large number of algorithms for keyword extraction and text summarization in natural language processing, as we discuss some of these in this thesis. We started with a survey on automatic text summarization in order to understand the state of the art methods. Also we proposed a new and efficient method for keyword extraction task using Word2Vec and PageRank algorithms. In this thesis, we investigated two di↵erent graph based text summarization algorithms for both single and multi-document settings on di↵erent types of texts where we used LexRank for multi-document summarization and TextRank for single document summarization. We also investigated a number of keyword extraction methods. Almost every keyword extraction method use high dimensional vectors to define words in a vector space. We approached the problem of automatic extraction of keywords from text as a unsupervised learning task and we treat each word in the document as a low dimensional vector. We developed a new keyword extraction method using Word2Vec and PageRank algorithms. Our results show that summarization algorithms give best result on news texts, usable results on legal texts while they give less than optimal results for short stories. On the other hand, we also compared di↵erences in using one-hot-representation and Word2Vec representation but we observed no significant di↵erences between these methods. |
|
dc.format.extent |
30 cm. |
|
dc.publisher |
Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2016. |
|
dc.subject.lcsh |
Machine learning. |
|
dc.subject.lcsh |
Natural language processing (Computer science) |
|
dc.title |
Machine learning methods in natural language processing |
|
dc.format.pages |
xii, 119 leaves ; |
|