Financial Time Series Clustering

Overview

This repository contains the official implementation of "Towards Financially Inclusive Credit Products Through Financial Time Series Clustering" by Tristan Bester and Benjamin Rosman, published in AAAI W5: AI in Finance for Social Impact.

The project presents a novel time series clustering algorithm designed to help financial institutions understand consumer financial behavior through transaction data without relying on restrictive credit scoring techniques. This approach promotes financial inclusion by enabling institutions to create more tailored financial products based on actual spending behavior.

Abstract

Financial inclusion ensures that individuals have access to financial products and services that meet their needs. As a key contributing factor to economic growth and investment opportunity, financial inclusion increases consumer spending and consequently business development. It has been shown that institutions are more profitable when they provide marginalised social groups access to financial services.

Customer segmentation based on consumer transaction data is a well-known strategy used to promote financial inclusion. While the required data is available to modern institutions, the challenge remains that segment annotations are usually difficult and/or expensive to obtain. This prevents the usage of time series classification models for customer segmentation based on domain expert knowledge.

As a result, clustering is an attractive alternative to partition customers into homogeneous groups based on the spending behaviour encoded within their transaction data. In this paper, we present a solution to one of the key challenges preventing modern financial institutions from providing financially inclusive credit, savings and insurance products: the inability to understand consumer financial behaviour, and hence risk, without the introduction of restrictive conventional credit scoring techniques. We present a novel time series clustering algorithm that allows institutions to understand the financial behaviour of their customers. This enables unique product offerings to be provided based on the needs of the customer, without reliance on restrictive credit practices.

Requirements

Environment Setup

You can set up the environment using Conda:

conda env create -f environment.yml
conda activate berka

Required packages include:

PyTorch
NumPy
Pandas
scikit-learn
MongoDB Python driver
tqdm
python-dotenv

Database Setup

The project uses MongoDB to store configurations and results. You can run MongoDB using Docker:

docker compose up -d

Configure your database credentials in a .env file:

MONGO_USERNAME=root
MONGO_PASSWORD=rootpassword

Data

This project uses the Berka dataset, which contains banking transactions. To use the system, place the dataset files in the following structure:

data/
└── Berka/
    ├── account.csv  - Account information (4502 accounts)
    ├── card.csv     - Card details (894 cards)
    ├── client.csv   - Client information (5371 clients)
    ├── disp.csv     - Dispositions (account-client relationships)
    ├── district.csv - District/demographic data
    ├── loan.csv     - Loan information
    ├── order.csv    - Payment orders
    └── trans.csv    - Transaction data

Usage

1. Initialize the database with configurations

python init_db.py

This creates a database with various model configurations to evaluate.

2. Run the clustering experiments

python main.py

This will:

Load the Berka dataset
Process financial transactions
Train different autoencoder architectures
Apply clustering methods
Evaluate clusters using metrics like Silhouette Score and Davies-Bouldin Index
Store results in the MongoDB database

Project Structure

main.py: Main script to run experiments
init_db.py: Script to initialize the database with configurations
src/: Source code directory
- datasets/: Dataset handling classes
- models/: Neural network models
- drivers/: Training procedures
- factories/: Factory methods for model components
- db/: Database interaction
- modules/: Neural network modules
data/: Directory for dataset files
plots/: Directory for saved visualizations
environment.yml: Conda environment configuration
docker-compose.yml: Docker configuration for MongoDB

Architecture

The system implements multiple neural network architectures for financial time series clustering:

Fully Connected Neural Networks (FCNN)
Residual Networks (ResNet)
Long Short-Term Memory networks (LSTM)
Deep Temporal Clustering (DTC)

Various pretext losses are implemented:

Mean Squared Error (MSE)
Multi-task Reconstruction (multi_rec)
Variational Autoencoders (VAE)

Citation

If you use this code in your research, please cite:

@article{bester2024towards,
  title={Towards Financially Inclusive Credit Products Through Financial Time Series Clustering},
  author={Bester, Tristan and Rosman, Benjamin},
  journal={AAAI W5: AI in Finance for Social Impact},
  year={2024},
  eprint={2402.11066},
  archivePrefix={arXiv}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
plots		plots
src		src
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
export_env.sh		export_env.sh
init_db.py		init_db.py
main.py		main.py
results.csv		results.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Financial Time Series Clustering

Overview

Abstract

Requirements

Environment Setup

Database Setup

Data

Usage

1. Initialize the database with configurations

2. Run the clustering experiments

Project Structure

Architecture

Citation

About

Uh oh!

Releases

Packages

Languages

TristanBester/berka_clustering

Folders and files

Latest commit

History

Repository files navigation

Financial Time Series Clustering

Overview

Abstract

Requirements

Environment Setup

Database Setup

Data

Usage

1. Initialize the database with configurations

2. Run the clustering experiments

Project Structure

Architecture

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages