GitHub - doscsy12/synthetic_data_generation: generate synthetic data

Synthetic data generation (SYN)

Introduction

There are many techniques utilised for synthetic data generation, and we need to have a thorough understanding of their effectiveness, the assessment of statistical similarities, and the maintenance of data integrity. Central to performing a systematic review is the examination of the robustness and reliability of the algorithms utilized. For example, Generative Adversarial Networks (GANs), while known for their potential, are notoriously hard to train, and susceptible to instability issues such as divergence, which pose risks to the authenticity of their outputs. Furthermore, evaluating the fidelity of generated samples remains an ongoing challenge, as conventional metrics may inadequately capture certain characteristics present in real datasets.

Aim

So, the aim of performing a systematic review is to navigate through these complexities, and to ensure a comprehensive understanding of the efficacy and limitations of synthetic data generation techniques in the financial domain.

Systematic review

A systematic literature review was conducted to examine existing research on synthetic data generation techniques within the financial domain. This method includes defining the research, defining article search method, and analysis of the results after full text screening of selected articles. The results of this process will inform the development of our algorithms aimed at accurately and reliably replicating financial transactions.

Evaluating Generative Models for Synthetic Financial Transaction Data

The banking sector faces challenges in using deep learning due to data sensitivity and regulatory constraints, but generative AI may offer a solution. Thus, this study identifies effective algorithms for generating synthetic financial transaction data and evaluates five leading models - Conditional Tabular Generative Adversarial Networks (CTGAN), DoppelGANger (DGAN), Wasserstein GAN, Financial Diffusion (FinDiff), and Tabular Variational AutoEncoders (TVAE) - across five criteria: fidelity, synthesis quality, efficiency, privacy, and graph structure. While none of the algorithms is able to replicate the real data's graph structure, each excels in specific areas: DGAN is ideal for privacy-sensitive tasks, FinDiff and TVAE excel in data replication and augmentation, and CTGAN achieves a balance across all five criteria, making it suitable for general applications with moderate privacy concerns. As a result, our findings help guide decision-makers in choosing the right generative AI model to support safe, effective, and scalable use of synthetic data in financial services.

Algorithmic Development and Evaluation for Replicating Temporal and Graph Patterns in Financial Data

This topic is important because financial data often exhibits complex temporal patterns and network structures, such as transactions over time and relationships between accounts. Accurately replicating these aspects allows for the creation of realistic datasets that reflect the dynamic and interconnected nature of financial systems. This supports the development, testing, and benchmarking of AI models, such as those used in fraud detection, credit scoring, and risk assessment—under realistic conditions. By capturing both the time-based behavior and relational structures, these algorithms help improve model performance, robustness, and relevance in real-world financial applications.

	file	description
1.	Algorithm development	Algorithm development
2.	Article	Accepted article in WITS2024
3.	Podcast on paper	Podcast created by NotebookLM
4.	Friday Hacks	Talk to NUS students
5.	Temporal Graph algorithm	Algorithm development

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
V0		V0
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthetic data generation (SYN)

Introduction

Aim

Systematic review

Evaluating Generative Models for Synthetic Financial Transaction Data

Algorithmic Development and Evaluation for Replicating Temporal and Graph Patterns in Financial Data

About

Uh oh!

Releases

Packages

Languages

doscsy12/synthetic_data_generation

Folders and files

Latest commit

History

Repository files navigation

Synthetic data generation (SYN)

Introduction

Aim

Systematic review

Evaluating Generative Models for Synthetic Financial Transaction Data

Algorithmic Development and Evaluation for Replicating Temporal and Graph Patterns in Financial Data

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages