This repository contains scripts to generate synthetic datasets that support Brighthive demos and storytelling for various use cases. The generated data is designed to be realistic and representative of real-world scenarios while maintaining privacy and security.
The mock data generator serves several key purposes:
- Create realistic datasets for demonstration purposes
- Support storytelling and use case presentations
- Enable testing and development without using real production data
- Provide consistent, reproducible data for demos and training
- Python 3.7 or higher
- Required Python packages (install using pip):
pip install -r requirements.txt
brighthive-mock-data/
├── documentation/ # Documentation files
├── output/ # Generated data files
├── scripts/ # Data generation scripts
└── requirements.txt # Python dependencies
-
Create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
source venv/bin/activate -
Install the required dependencies:
pip install -r requirements.txt
-
Run the data generation script. Here is an example for CRM data:
python scripts/generate_crm_data.py
-
The script will:
- Create an
outputdirectory if it doesn't exist - Generate synthetic CRM data with realistic fields
- Save the data as a CSV file in the
outputdirectory with the current date - The output file will be named
crm_data_MM-DD.csv
- Create an
-
Verify the generated data:
- Check the
outputdirectory for the new CSV file
- Check the
The repository includes generators for various types of data:
- CRM data
- Healthcare data
- Financial data
- Student data
- Web analytics data
- Health devices data
Each script generates data specific to its domain while maintaining realistic relationships and patterns.
When adding new data generators:
- Follow the existing code structure and patterns
- Include appropriate documentation
- Use realistic data ranges and distributions
- Ensure data privacy and security
- Add your script to this README's "Available Datasets" section