Text classification via word embeddings:   An application for Turkish music mood detection

Çimen, Barış.

Arşiv ve Dokümantasyon Merkezi Dijital Arşivi Ana Sayfası
→
Boğaziçi Üniversitesi Tezleri
→
Sosyal Bilimler Enstitüsü
→
Yönetim Bilişim Sistemleri
→
M.A. Theses
→
Öğe Göster

dc.contributor	Graduate Program in Management Information Systems.
dc.contributor.advisor	Durahim, Ahmet Onur.
dc.contributor.author	Çimen, Barış.
dc.date.accessioned	2023-03-16T12:52:02Z
dc.date.available	2023-03-16T12:52:02Z
dc.date.issued	2017.
dc.identifier.other	MIS 2017 C36
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/18202
dc.description.abstract	The objective of this study is to bring an approach that incorporates word embeddings into Turkish text classification process, and to evaluate the applicability and performance of this approach by applying it for Turkish music mood detection. The methodology followed in this study consists of two main parts. In the first part, word embeddings are trained through a large collection of textual data, which includes more than 2.5 million Turkish documents gathered from the Internet, by using Word2Vec and GloVe algorithms. Subsequently, lyrics vectors are generated for the pre-processed lyrics selected for mood detection through the use of word embeddings that were trained initially. In the second part of the study, lyrics vectors are employed as features in music mood detection performed via various machine learning techniques. Besides, Turkish music mood detection is also done by using traditional bag-of-words approach, in which TF-IDF term weighting scheme is used, and Doc2Vec algorithm for comparison purposes. The effects of stemming of the words into their roots and filtering out the precompiled list of stop-words on the results are investigated as well. The results obtained from the study show the effectiveness of incorporating word embeddings generated using big textual data collection into the Turkish text classification process, which is clearly illustrated by the improved classification performance.
dc.format.extent	30 cm.
dc.publisher	Thesis (M.A.) - Bogazici University. Institute for Graduate Studies in the Social Sciences, 2017.
dc.subject.lcsh	Text editors (Computer programs)
dc.subject.lcsh	Music, Turkish.
dc.title	Text classification via word embeddings: An application for Turkish music mood detection
dc.format.pages	viii, 46 leaves ;