Skip to content

Spotify Data Retrieval is a Python ETL pipeline that extracts song data from Spotify's API using OAuth 2.0, focusing on the user's last 24 hours. It transforms and loads the data into a PostgreSQL database, with daily automation via CRON. The project features a Streamlit interface for visualization and a FastAPI server for data access.

Notifications You must be signed in to change notification settings

DishenMakwana/spotify-data-retrieval

Repository files navigation

Spotify ETL Pipeline: Extract, Transform, and Load Data into PostgreSQL

Overview

This repository implements an ETL (Extract, Transform, Load) pipeline that retrieves song data from Spotify's API based on the user's last 24 hours of activity. The data is transformed and loaded into a PostgreSQL database for storage and further analysis. The process is automated to run daily using CRON.

Technologies Used:

  • Spotify API (OAuth 2.0 Authentication)
  • Python (for data extraction, transformation, and loading)
  • PostgreSQL (for database storage)
  • CRON (for scheduling automation)

Requirements

1. Developer Setup on Spotify API

To use the Spotify API, you must first set up a Spotify Developer account and create an application to get your client_id and client_secret.

  1. Go to Spotify Developer Dashboard.
  2. Log in with your Spotify account and click on "Create an App".
  3. Provide necessary information (App Name, Description, etc.).
  4. Once the app is created, you’ll have access to:
    • Client ID
    • Client Secret
    • Redirect URI (You’ll need to set up one if it’s not automatically provided)
  5. Save these credentials for later use in the .env file.

2. Installing Requirements

Before running the ETL pipeline, you need to install the necessary dependencies. Follow these steps:

Clone the repository:

Create a virtual environment:

python3 -m venv env

source env/bin/activate

Install the dependencies:

pip install -r requirements.txt

3. Setup Environment Variables

To securely store your credentials and other configuration variables, create a .env file based on the .env.example template.

  1. Copy the example .env file:
cp .env.example .env
  1. Update the .env file with your Spotify credentials and PostgreSQL connection details:

4. Running the ETL Pipeline

To run the ETL pipeline, execute the following command:

python main.py

5. Automating the ETL Pipeline

To automate the ETL pipeline to run daily, you can use CRON jobs. To set up a CRON job, follow these steps:

  1. Open the CRON tab for editing:
crontab -e
  1. Add the following line to the CRON tab to run the ETL pipeline every day at 12:00 AM:
0 0 * * * /bin/bash/ /path/to/your/virtualenv/bin/python /path/to/your/repository/main.py
  1. Save and exit the CRON tab.

6. Viewing the Data in UI using Streamlit

To view the data in a user-friendly interface, you can use Streamlit. To run the Streamlit app, execute the following command:

streamlit run frontend.py

for fetch data from database use fastapi running using below command

uvicorn server:app --reload
Note: If you want to run the Streamlit app and FastAPI server simultaneously, you can use the following command:
chmod +x run.sh
./run.sh

About

Spotify Data Retrieval is a Python ETL pipeline that extracts song data from Spotify's API using OAuth 2.0, focusing on the user's last 24 hours. It transforms and loads the data into a PostgreSQL database, with daily automation via CRON. The project features a Streamlit interface for visualization and a FastAPI server for data access.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published