Abstract:
Biometrics is the identi cation of a person by personal properties and traits, and can be divided into physiological based and behavioural based methods. In this thesis we investigate the identi cation of users of a social platform from their verbal behaviour, which is an example of behaviour based biometrics. Online social platforms implement moderation mechanisms to lter out unwanted content and to take action against possible cases of verbal aggression and abuse, sexual harassment, and such. Since they can have large numbers of users, it is desirable to automatize parts of this process. What we call chat biometrics aims to re-identify a user from chat messages. The typical application scenario is the re-identi cation of banned users, returning under di erent identities, and aggressors operating through multiple fake accounts. We propose a processing pipeline, and contrast the problem with the authorship identi- cation problem, which is well-studied in the literature. We evaluate our proposed approach on a large corpus of multiparty chat records in Turkish (namely, the COPA database), which was collected from a multiplayer game environment. We also introduce a new corpus in this study, collected from a well-known Turkish social platform called Ek sisozluk, in order to test the robustness of the system across domain changes, as well as on Portuguese and English news datasets, to show performance across languages. We evaluate both pro le-based and instance-based approaches, and provide detailed analyses with regards to the required amount of text to identify a person reliably.