Skip to content

angelagonzalezp/airflow-example-dags

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

airflow-example-dags

This scripts must be located in the dags volume for Airflow to be able to import them.

Tasks

  • top_songs_to_csv: makes a request to the Spotify API to retrieve the most popular songs for a specific genre and saves the result to a CSV file.
  • top_songs_to_postgres: uploads the aforementioned CSV to a table in a PostgreSQL DB.

Required Variables and Connections

  • Spotify credentials: spotify_client_id_secret and spotify_client_secret
  • PostgreSQL connection

How to trigger the dag?

Our PostgreSQL table must be created before triggering the DAG for the first time1. We can pass a dag_run.conf in the following format: {"genre": "pop", "limit": "25"}. Default values are {"genre":"reggaeton","limit":"50"}

Tasks

  • get_pg_stats: makes a request to Nifi API to monitor a Process Group. and stores data to temporary JSON file.
  • upload_to_mongo: inserts JSON to MongoDB collection2.
  • remove_json_file: bash command to remove temp JSON file.

Required Variables and Connections

  • Apache Nifi credentials: {"user":"nifi_user","pass":"nifi_password"}, nifi_url
  • Nifi Process Group ID
  • MongoDB connection

Tasks

  • get_partitions_usage: we retrieve information on the disk partitions with psutil module.
  • no_warning: if no partition exceeds the disk usage threshold, no actions will be required3.
  • warning_mail: a mail including the partitions exceeding the defined threshold is sent.

Required Variables and Connections

  • Disk usage threshold
  • Email/list of emails to send the warning to

Footnotes

  1. Create script can be found at utils directory.

  2. Apache Airflow 2.8.0 and higher versions include MongoHook.

  3. Airflow>=2.3.0 allows EmptyOperator.