This scripts must be located in the dags volume for Airflow to be able to import them.
top_songs_to_csv: makes a request to the Spotify API to retrieve the most popular songs for a specific genre and saves the result to a CSV file.top_songs_to_postgres: uploads the aforementioned CSV to a table in a PostgreSQL DB.
- Spotify credentials: spotify_client_id_secret and spotify_client_secret
- PostgreSQL connection
Our PostgreSQL table must be created before triggering the DAG for the first time1.
We can pass a dag_run.conf in the following format: {"genre": "pop", "limit": "25"}. Default values are {"genre":"reggaeton","limit":"50"}
get_pg_stats: makes a request to Nifi API to monitor a Process Group. and stores data to temporary JSON file.upload_to_mongo: inserts JSON to MongoDB collection2.remove_json_file: bash command to remove temp JSON file.
- Apache Nifi credentials: {"user":"nifi_user","pass":"nifi_password"}, nifi_url
- Nifi Process Group ID
- MongoDB connection
get_partitions_usage: we retrieve information on the disk partitions with psutil module.no_warning: if no partition exceeds the disk usage threshold, no actions will be required3.warning_mail: a mail including the partitions exceeding the defined threshold is sent.
- Disk usage threshold
- Email/list of emails to send the warning to