Abstract:
Sequential pattern mining is an important data mining task with wide applications. In traditional sequential pattern mining techniques, the main purpose is to discover sequences that are frequent. The information acquired from sequential pattern mining can be used in marketing, medical records, sales analysis. The main disadvantages of the traditional algorithms such as Generalized Sequential Patterns (GSP) are the lack of focus on user expectations and the high number of discovered patterns. Constraint based data mining technique is an efficient solution as it limits the patterns within a set of conditions based on the requirements of the user. Despite the importance of constraint based sequential pattern mining, a constraint based data mining tool that handles many of the user-specified constraints simultaneously does not exist in the literature. The aim of this study is developing a constraint based sequential pattern mining algorithm and a constraint based sequential pattern mining tool CBSPM (Constraint Based Sequential Pattern Mining) which handles a set of user-specified constraints simultaneously. A data generator facility is also developed for generating artificial sequential datasets. The verification test is conducted and the sensitivity of execution times on various parameters are examined. Performance tests shoved that as the number of constraints increases performace of the tool increases.