Archives and Documentation Center
Digital Archives

Identification of verbal multiword expressions using deep learning architectures and representation learning methods

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Güngör, Tunga.
dc.contributor.author Erden, Berna.
dc.date.accessioned 2023-03-16T10:04:27Z
dc.date.available 2023-03-16T10:04:27Z
dc.date.issued 2019.
dc.identifier.other CMPE 2019 E75
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12414
dc.description.abstract Understanding multiword expressions (MWEs) plays an instrumental role in Natural Language Processing applications such as parsing and machine translation. MWE identi cation is a task that automatically detects and classi es MWEs in running text. As with the basic characteristics of MWEs, signi cant challenges exist in MWE identi cation. Considering the recent attempts of the PARSEME network on verbal multiword expressions (VMWEs), we focus on the identi cation of VMWEs. We update the PARSEME Turkish train and test corpora 1.0 (2017) as the PARSEME Turkish train and development corpora 1.1 (2018). We construct the PARSEME Turkish test corpus 1.1. In addition, we develop a multilingual VMWE identi cation system based on bidirectional long short term memory with conditional random elds networks accompanied with the gappy 1-level tagging scheme. To extend our study, we examine the impact of data representation format on the VMWE identi cation task. We introduce the bigappy-unicrossy tagging scheme to recognize overlaps in sequence labelling tasks. Our results show that data representation format is important to identify discontinuous VMWEs. Moreover, we enhance our neural VMWE identi cation model with automatically learned embeddings by neural networks to respond to the variability challenge. We compare character-level convolutional neural networks and character-level bidirectional long short-term (BiLSTM) networks. We analyze two di erent schemes to represent morphological information using BiLSTM networks. Our results demonstrate that character embeddings and morphological embeddings improve performance in general. The choice of representation learning method depends on language.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019.
dc.subject.lcsh Natural language processing (Computer science)
dc.title Identification of verbal multiword expressions using deep learning architectures and representation learning methods
dc.format.pages xv, 77 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account