Abstract:
Requirements classification is an important problem in organizing the systems and requirements, and it is widely used in handling large requirements data sets. A basic example of a requirements classification problem is the distinction between the functional and non-functional (quality) requirements. The state-of-the-art classifiers are most effective when they use a large set of word features such as text n-grams or part of speech n-grams. However, as the number of features increases, it becomes more difficult to interpret the approach, because many redundant features have to be explored that do not capture the meaning of the requirements. In this study, we propose the use of more general linguistic features, such as dependency types, for the construction of interpretable machine learning classifiers for requirements engineering. Through a feature engineering effort, assisted by tools that interpret graphically how classifiers work, we derive a set of linguistic features. While classifiers that use the proposed features fit the training set slightly worse than those that use high-dimensional feature sets, this approach performs generally better on validation data sets and is more interpretable. We use industry data sets, and we perform experimental runs using several automated feature selection algorithms to explore whether our feature set can be optimized further using one of the automated selection algorithms. Although in some data sets, impressive results were obtained. the automated selection algorithms did not prove a significant improvement, and even, on average, the results were worse than the results we obtained using the set based on linguistic features.