Archives and Documentation Center
Digital Archives

Theme supervised nonnegative matrix factorization for topic modeling

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Güngör, Tunga.
dc.contributor.author Suyunu, Burak.
dc.date.accessioned 2023-03-16T10:04:55Z
dc.date.available 2023-03-16T10:04:55Z
dc.date.issued 2020.
dc.identifier.other CMPE 2020 S88
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12442
dc.description.abstract Topic models are often used to organize and interpret large and unstructured corpora of text documents. They try to explain the topics that constitute the semantic infrastructure of the document sets and try to find the distributions of these topics for the documents. Because of its unsupervised nature, the outputs of a topic model has to be interpretable to represent its success. However, the results of a topic model are usually weakly correlated with human interpretation. In this thesis, we propose a semi supervised topic model called Theme Supervised Nonnegative Matrix Factorization that can benefit from labeled documents to improve and facilitate the interpretation of the topics. Our model constrains the representation of the topics to align with the labeled documents and this enables the topics discovered by the model to be readily under stood. To utilize the labels provided by the documents more efficiently and to explore the document sets in more depth, we used a hierarchical topic structure consisting of themes, subtopics, and background topics in our model. We created layers under the themes that permit unsupervised learning for subtopics. This hierarchical structure, with the unsupervised learning capability it provides, enables our model, which was restricted with supervision, to discover new dimensions and make more detailed clas sifications. We tested our model on Schwartz dataset we created, as well as Brown and Reuters datasets with different supervision ratios.Our model estimates the topics of the documents much better than the traditional nonnegative matrix factorization and latent Dirichlet allocation for any situation; and besides, the effect of supervision is noteworthy, especially at low ratios. Moreover, our new term scoring metric success fully alters the weights of significant and insignificant terms for each topic and makes the topics easier to understand and interpret.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2020.
dc.subject.lcsh Matrix groups.
dc.title Theme supervised nonnegative matrix factorization for topic modeling
dc.format.pages xii, 87 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account