The solution provided for this client included the categorisation of transactions using machine learning and using the data in consumer credit affordability models.
Our team used a hybrid model of agile delivery, with a single collaborative scrum team coupled with project management oversight for internal and external reporting and governance. Our categorisation of transactions used artificial intelligence to deliver the best and most effective resolution for the credit rating agency.
In terms of training data, this included over 1 million 3-month period banking statements from anonymised personal bank accounts. Here we used structured data including Bank SQL database queries and unstructured data including Excel spreadsheets, pdf bank statements and paper copies of statements, where Optical Character Recognition (OCR) were used to convert them into a digital form.
One of our biggest challenges was to simplify 50,000+ merchant codes used by banks to classify transactions and automate affordability calculations. In doing this, we found that many of the codes were in fact duplicated but described differently. To solve this problem we used various machine-learning classifiers that worked through to find the optimum fit. This included K-NN, Logistic Regression, Decision trees, Random Forrest, Gradient Boosting Classifiers and Support Vector Machine. We then used a further deep-learning and Bayesian Classifier (Naive Bayes Models).
We also used the human-in-the-loop approach (HITL). This type of data analytics uses AI draws on both human and machine intelligence to create machine learning models. Here, annotation was initially carried out through a competition within the organisation whereby over one thousand staff worked to annotate five 3-month period bank statements each. We also implemented reinforcement learning using feedback loops during testing and in live using data scientists.