Combining machine learning and scorecards to assess credit risk

Enterprise Applications Editor

This is a guest blogpost by Shafi Rahman, principal scientist, FICO

Last year, artificial intelligence (AI) generated countless news headlines and ideas that fascinate us. Its predictive ability has been called on in a range of industries, especially in the financial sector.

However, the use of AI and machine learning in retail banking pose a special challenge. There are numerous regulations requiring that lenders be able to explain their analytic models, not only to regulators but often to consumers. AI is frequently a “black box” approach, so how do you tell a customer why an AI-developed algorithm gave them a certain risk score, for example?

Our work at FICO has focused on bridging that gap. This example shows what is possible if you want to blend the best of traditional analytics approaches with AI and machine learning.

Limitations of traditional credit risk models

A traditional credit risk scorecard model relies on inputs of various customer characteristics to generate a score reflecting the probability of default. These factors are put into different value ranges, with each “bin” being assigned a score weight. The score weights corresponding to the individual’s information are then added up to produce the final score.

Each bin computes its score weight by measuring the weight of evidence (WoE), which is the separation between known good cases and known bad cases. A WoE of 0 means that the bin has the same distribution of good and bad cases as the overall population, whereas the further away the score is from 0, the more concentration in the bin of one case over the other compared to the overall population. A scorecard generally has a few bins with a smooth distribution of WoE.

However, these analytics-driven scorecards cannot function effectively without a sufficient number of either known good or bad cases. When data is limited, a noisy, choppy WoE distribution is yielded across bins, leading to weak-performing scorecard models.

A Machine Learning alternative

One machine learning alternative to the scorecard model is an algorithm called Tree Ensemble Modelling (TEM). TEM involves building multiple “tree” models, where each node of the tree is a variable which is split into two further sub-trees.

Each tree model uses just a handful of characteristics as input, which produces a shallow tree and ensures a limited splitting of variables. With TEM, the minimum number of good and bad cases can be met more frequently, thus solving a key problem of the scorecard approach.

However, unlike a scorecard approach, TEM cannot point to the reasons for giving someone a particular score. This lack of explainability is a big limitation of a purely machine learning approach, given that TEM models can have thousands of trees and tens of thousands of parameters with no simple interpretation.

Although not practical for use, a comparison of both showed that the machine learning score outperformed the scorecard. The next challenge was to narrow the performance gap between the machine learning and scorecard models.

A hybrid approach

At FICO, we wanted to merge the practical benefits of a scorecard – explainability, the ability to input domain knowledge, and ease of execution in a production environment – with the deep insights of machine learning and AI, which can uncover patterns scorecard approaches cannot.

To do this, we developed a tool that recodes the patterns and insights discovered using machine learning or AI and turns them into a set of scorecards. Instead of directly computing the WoE from good and bad data points, the tool tries to match the score distribution generated by a machine learning algorithm like TEM, which ends up providing an estimate of the WoE for each bin.

Significantly, this hybrid model is almost as predictive as the machine learning one and we think overcomes the limitations imposed by an insufficient number of cases. Whereas it was previously considered impossible to build powerful scorecards with sparse cases, our approach now allows us to do so, and remain transparent as well.

Machine learning can expose powerful and predictive latent features that can be directly incorporated into a scorecard model to preserve transparency while improving prediction – a function that is not limited to credit risk modelling.