Abstract:
As the importance of processing data to extract added value continuously grows, nancial institutions, which are able to collect customer data from a variety of sources, invest more and more resources to move towards a data-based decision-making approach. One of the areas where data have been used extensively in recent years and tremendous bene ts are provided is credit scoring models for lending institutions, remarkably. Thus, they not only automate but also manage application evaluation processes in a way that is more objective, reliable and in accordance with the regulations. In this thesis, a study was conducted using logistic regression which is one of the most frequently used prediction techniques in credit scoring systems. The study mainly consists of three pillars. Firstly, a data set of consumer loan records was prepared for the speci ed model development period using the credit bureau data where member nancial institutions share credit histories and payment behaviors of their customers. Secondly, after determining that the prepared data set is capable of representing the whole sector, a random sample of records was selected, and it has been shown that such a study can be carried out with open source programs like R, without using popular data analytics and statistical programs, where large budgets are allocated for their licenses, installations and maintenance. In addition to visual and numerical analyses, algorithms and functions needed in the model development process were examined extensively. Thirdly, automatic calculation of interaction terms using Weight of Evidence (WOE) method is discussed, and the so-called Neighbours' approach is proposed to calculate WOE values of covariate patterns that are not observed in the development data set. In short, we elaborated in this study on how to develop an end-to-end credit scoring model using data which can portray the whole credit sector in a given period.