This repository contains the official implementation of "Towards Financially Inclusive Credit Products Through Financial Time Series Clustering" by Tristan Bester and Benjamin Rosman, published in AAAI W5: AI in Finance for Social Impact.
The project presents a novel time series clustering algorithm designed to help financial institutions understand consumer financial behavior through transaction data without relying on restrictive credit scoring techniques. This approach promotes financial inclusion by enabling institutions to create more tailored financial products based on actual spending behavior.
Financial inclusion ensures that individuals have access to financial products and services that meet their needs. As a key contributing factor to economic growth and investment opportunity, financial inclusion increases consumer spending and consequently business development. It has been shown that institutions are more profitable when they provide marginalised social groups access to financial services.
Customer segmentation based on consumer transaction data is a well-known strategy used to promote financial inclusion. While the required data is available to modern institutions, the challenge remains that segment annotations are usually difficult and/or expensive to obtain. This prevents the usage of time series classification models for customer segmentation based on domain expert knowledge.
As a result, clustering is an attractive alternative to partition customers into homogeneous groups based on the spending behaviour encoded within their transaction data. In this paper, we present a solution to one of the key challenges preventing modern financial institutions from providing financially inclusive credit, savings and insurance products: the inability to understand consumer financial behaviour, and hence risk, without the introduction of restrictive conventional credit scoring techniques. We present a novel time series clustering algorithm that allows institutions to understand the financial behaviour of their customers. This enables unique product offerings to be provided based on the needs of the customer, without reliance on restrictive credit practices.
You can set up the environment using Conda:
conda env create -f environment.yml
conda activate berkaRequired packages include:
- PyTorch
- NumPy
- Pandas
- scikit-learn
- MongoDB Python driver
- tqdm
- python-dotenv
The project uses MongoDB to store configurations and results. You can run MongoDB using Docker:
docker compose up -dConfigure your database credentials in a .env file:
MONGO_USERNAME=root
MONGO_PASSWORD=rootpassword
This project uses the Berka dataset, which contains banking transactions. To use the system, place the dataset files in the following structure:
data/
└── Berka/
├── account.csv - Account information (4502 accounts)
├── card.csv - Card details (894 cards)
├── client.csv - Client information (5371 clients)
├── disp.csv - Dispositions (account-client relationships)
├── district.csv - District/demographic data
├── loan.csv - Loan information
├── order.csv - Payment orders
└── trans.csv - Transaction data
python init_db.pyThis creates a database with various model configurations to evaluate.
python main.pyThis will:
- Load the Berka dataset
- Process financial transactions
- Train different autoencoder architectures
- Apply clustering methods
- Evaluate clusters using metrics like Silhouette Score and Davies-Bouldin Index
- Store results in the MongoDB database
main.py: Main script to run experimentsinit_db.py: Script to initialize the database with configurationssrc/: Source code directorydatasets/: Dataset handling classesmodels/: Neural network modelsdrivers/: Training proceduresfactories/: Factory methods for model componentsdb/: Database interactionmodules/: Neural network modules
data/: Directory for dataset filesplots/: Directory for saved visualizationsenvironment.yml: Conda environment configurationdocker-compose.yml: Docker configuration for MongoDB
The system implements multiple neural network architectures for financial time series clustering:
- Fully Connected Neural Networks (FCNN)
- Residual Networks (ResNet)
- Long Short-Term Memory networks (LSTM)
- Deep Temporal Clustering (DTC)
Various pretext losses are implemented:
- Mean Squared Error (MSE)
- Multi-task Reconstruction (multi_rec)
- Variational Autoencoders (VAE)
If you use this code in your research, please cite:
@article{bester2024towards,
title={Towards Financially Inclusive Credit Products Through Financial Time Series Clustering},
author={Bester, Tristan and Rosman, Benjamin},
journal={AAAI W5: AI in Finance for Social Impact},
year={2024},
eprint={2402.11066},
archivePrefix={arXiv}
}