Mondrian Anonymization Project

This project implements the Mondrian anonymization algorithm to anonymize a dataset, ensuring that sensitive information is protected while retaining the utility of the data. The project supports both numerical and categorical columns, with the use of JSON files for handling categorical data. A random perturbation technique is implemented for sensititve data in order to avoid re-identification. Additionally, it includes a script to generate a synthetic dataset for testing the anonymization process.

Overview

The project consists of:

Data Generation : A script generates a synthetic dataset containing personal information such as name, gender, city, age, profession, education, and annual income. The distributions of attributes are sourced from ISTAT, the official source of Italian statistics.
Anonymization : The Mondrian algorithm is implemented to anonymize the dataset by generalizing quasi-identifiers while preserving data utility and sensitive data.
Random Perturbation : In order to avoid re-identification, a script applies random perturbation on sensitive data, preserving mean and variance of each sensitive attribute so that utility is maintained.

Instructions :

The requirments for using our project are:

Faker
Pandas
Numpy
Matplotlib

Clone the repository:

git clone https://github.com/Alebasso01/K-anonymity.git

Install the requirments:

pip install faker pandas numpy matplotlib

Run the mondrian:

python mondrian.py

Parameters

database: The input dataset.
k: The k-anonymity parameter.
qis: List of quasi-identifiers (both numerical and categorical).
sd: List of sensitive attributes.
ei: List of explicit identifiers to be dropped.
json_files: Dictionary mapping categorical quasi-identifiers to JSON file paths for LCA.
statistic: Method for numerical quasi-identifier generalization ('range' or 'mean').

Test

In the project we analyzed both mean and variance. Numerical attributes undergo mean and variance calculation, while categorical attributes are mapped based on hierarchical ordering where feasible. Additionally, a discernibility penalty function was implemented to quantify the uniqueness of quasi-identifiers, enhancing the anonymization process and ensuring robust data protection. We also included a function to calculate the equivalence size metric, which provides an additional measure of the quality of the k-anonymization.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
docs		docs
generation		generation
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mondrian Anonymization Project

Overview

Instructions :

Parameters

Test

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

biaperass/K-anonimity

Folders and files

Latest commit

History

Repository files navigation

Mondrian Anonymization Project

Overview

Instructions :

Parameters

Test

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages