Raw2DataBase is a streamlined solution for loading raw CSV data into a PostgreSQL database, leveraging Docker for easy deployment and Metabase for powerful data visualization. This project provides a robust framework for managing database connections, processing CSV data, and seamlessly integrating with Metabase for data analysis and reporting.
-
Database Connection Handler π°οΈ
- Generic and extensible to support multiple database types (PostgreSQL, MySQL, MongoDB etc.).
- Handles CSV processing using pandas, converting files into dataframes for database insertion.
-
Main Application Logic βοΈ
- Script to receive configuration and raw data paths.
- Manages database connection setup, data processing, and data insertion.
-
Tests π©Ή
- Tests for each feature to ensure correct functionality and reliability.
- β Database Connection Handler: A flexible, extensible handler for connecting to various databases.
- β CSV Processing: Efficient CSV data processing using pandas, converting raw data into SQL-like objects for database insertion.
- β Dockerized Environment: Easy setup and deployment using Docker and Docker Compose.
- β Data Visualization: Integration with Metabase for creating and sharing interactive dashboards and reports.
- β Test Coverage: Comprehensive tests using Pytest to ensure the reliability and correctness of each component.
raw2database/
βββ docker/
β βββ .env
β βββ docker-compose.yml
βββ src/
β βββ __init__.py
β βββ data/
β β βββ __init__.py
β β βββ data_processor.py
β βββ database/
β βββ __init__.py
β βββ database.py
β βββ database_loader.py
βββ tests/
β βββ __init__.py
β βββ test_database.py
β βββ test_data_loader.py
β βββ test_data_processor.py
βββ config/
β βββ your_db_config.json
βββ data/
β βββ raw/
β βββ processed
β βββ interim
β βββ external
βββ requirements.txt
βββ README.md
βββ .gitignore
- Clone the Repository
git clone https://github.com/JMasr/raw2database.git- Navigate to the Project Directory
cd raw2database- Create and Activate the Conda Environment
conda create -n raw2database python=3.9
conda activate raw2database- Install Requirements
pip install -r requirements.txt- Navigate to the Docker Folder
cd docker- Configure the .env File with your Credentials
cat <<EOL > .env
POSTGRES_USER=<user_postgres>
POSTGRES_PASSWORD=<pass_postgres>
POSTGRES_DB=<ps_db_name>
PGADMIN_DEFAULT_EMAIL=<[email protected]>
PGADMIN_DEFAULT_PASSWORD=<pass_ui-admin_tool>
EOL- Run Docker Compose
docker-compose up -d- Create a configuration folder
mkdir config- Configure the Database Edit the config/postgres_config.json file to set the database connection details:
cd config
cat <<EOL > postgres_config.json
{
"db_type": "postgres",
"DB_NAME": "<ps_db_name>",
"DB_HOST": "<host>",
"DB_PORT": <port>,
"DB_USER": "<user_postgres>",
"DB_PASSWORD": "<pass_postgres>"
}
EOL- Running the Application To load data from a CSV file into the database, run:
python src/main.py --raw_files_path <path/to/your_data.csv> --config_file config/postgres_config.json --db_type postgresContributions are welcome! Please fork the repository and create a pull request with your improvements.
This project is licensed under the MIT License - see the LICENSE file for details.
For any questions or issues, please open an issue in the repository or contact the maintainer at [email protected]