Learning the fundamentals of Apache Airflow through a hands-on mini-project.
curl -O https://airflow.apache.org/docs/apache-airflow/3.0.0/docker-compose.yamldags, plugins, logs.
These folders are mounted into the Airflow containers and are necessary for DAG storage, logging, and optional plugins.
Create a .env file in the root directory of your project with the AIRFLOW_UID AND AIRFLOW_GID These additional variables are useful in case you are trying out/testing Airflow installation via docker compose. They are not intended to be used in production, but they make the environment faster to bootstrap for first time users with the most common customizations. (source: https://airflow.apache.org/docs/apache-airflow/2.1.1/start/docker.html#:~:text=Environment%20variables%20supported%20by%20Docker%20Compose,-Do%20not%20confuse&text=Airflow%20Image%20to%20use.&text=Group%20ID%20in%20Airflow%20containers,must%20be%20set%20to%200%20.&text=Those%20additional%20variables%20are%20useful,with%20the%20most%20common%20customizations.&text=Username%20for%20the%20administrator%20UI,container%20with%20embedded%20development%20database.&text=Password%20for%20the%20administrator%20UI,Only%20used%20when%20_AIRFLOW_WWW_USER_USERNAME%20set.&text=If%20not%20empty%2C%20airflow%20containers,1%20and%20above.)
docker-compose up airflow-initThis sets up the Airflow metadata database and initializes other services like Redis and PostgreSQL.
docker-compose upTo check if the services are running:
docker pshttp://localhost:8080Login with: Username: airflow Password: airflow
This DAG:
- Simulates training 3 models in parallel.
- Randomly generates an accuracy score for each.
- Uses BranchPythonOperator to pick the best model.
- Branches to either "accurate" or "inaccurate" based on the score.
- PythonOperator: Executes Python functions for training models.
- BranchPythonOperator: Decides the next task based on logic.
- BashOperator: Prints the result.
- XCom: Used to pass values between tasks.
[training_model_A, training_model_B, training_model_C] >> choose_best_model >> [accurate, inaccurate]Reference- https://www.youtube.com/watch?v=IH1-0hwFZRQ
A cleaner, more modern DAG using Airflow's @dag and @task decorators.
- Training tasks use @task.
- Branching logic uses @task.branch.
- BashOperators are still used for final outputs.
- XCom is handled automatically via return values.
- Simpler syntax.
- Better readability.
- Automatic XCom handling.
- Encouraged by the Airflow community for new DAGs.
Same output as above, but cleaner code

- TaskFlow tutorial: https://airflow.apache.org/docs/apache-airflow/stable/tutorial/taskflow.html
- XComs in Airflow: https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/xcoms.html
This mini-project is a personal learning initiative to practice using Apache Airflow and working with external APIs. It fetches real-time location data from the International Space Station (ISS) and uses reverse geocoding to identify the nearest location on Earth.
- Data Source: Open Notify ISS API- http://api.open-notify.org/iss-now.json
- Reverse Geocoding: Nominatim OpenStreetMap API- https://nominatim.org/release-docs/latest/api/Reverse/
DAG Tasks-
- load ISS(International Space Station) location
- Use OpenStreetMaps API to reverese geocode the address of the location
- Insert the coordinates and the address into a database (OracleSQL)
Another DAG added that- copies the data from that table to another log table
sample output:
Log message source details: sources=["/opt/airflow/logs/dag_id=iss_dag/run_id=manual__2025-06-01T11:02:47.640122+00:00/task_id=reverse_geocode/attempt=1.log"] [2025-06-01, 16:32:52] INFO - DAG bundles loaded: dags-folder, example_dags: source="airflow.dag_processing.bundles.manager.DagBundlesManager" [2025-06-01, 16:32:52] INFO - Filling up the DagBag from /opt/airflow/dags/iss_location.py: source="airflow.models.dagbag.DagBag" [2025-06-01, 16:32:52] INFO - Task instance is in running state: chan="stdout": source="task" [2025-06-01, 16:32:52] INFO - Previous state of the Task instance: queued: chan="stdout": source="task" [2025-06-01, 16:32:52] INFO - Current task name:reverse_geocode: chan="stdout": source="task" [2025-06-01, 16:32:52] INFO - Dag name:iss_dag: chan="stdout": source="task" [2025-06-01, 16:32:52] INFO - The ISS currently is above: Municipio de Cañada de Gómez, Departamento Iriondo, 2505, Argentina.: chan="stdout": source="task" [2025-06-01, 16:32:52] INFO - Done. Returned value was: Municipio de Cañada de Gómez, Departamento Iriondo, 2505, Argentina: source="airflow.task.operators.airflow.providers.standard.decorators.python._PythonDecoratedOperator" [2025-06-01, 16:32:52] INFO - latitude:-32.7150: chan="stdout": source="task" [2025-06-01, 16:32:52] INFO - Pushing xcom: ti="RuntimeTaskInstance(id=UUID('01972b28-8e6d-7cf7-9e7a-b8096cc64955'), task_id='reverse_geocode', dag_id='iss_dag', run_id='manual__2025-06-01T11:02:47.640122+00:00', try_number=1, map_index=-1, hostname='02497a58636d', context_carrier={}, task=<Task(_PythonDecoratedOperator): reverse_geocode>, bundle_instance=LocalDagBundle(name=dags-folder), max_tries=2, start_date=datetime.datetime(2025, 6, 1, 11, 2, 51, 568020, tzinfo=TzInfo(UTC)), end_date=None, is_mapped=False)": source="task" [2025-06-01, 16:32:52] INFO - longitude:-61.4837: chan="stdout": source="task" [2025-06-01, 16:32:52] INFO - :): chan="stdout": source="task" [2025-06-01, 16:32:52] INFO - Task instance in success state: chan="stdout": source="task" [2025-06-01, 16:32:52] INFO - Previous state of the Task instance: running: chan="stdout": source="task" [2025-06-01, 16:32:52] INFO - Task operator:<Task(_PythonDecoratedOperator): reverse_geocode>: chan="stdout": source="task"
Oracle DB integration for logging real-time ISS location data into a relational database. Two DAGs are used:
- Fetches ISS location using Open Notify API
- Uses Nominatim API for reverse geocoding
- Inserts data directly into Oracle using the
oracledbPython client - Fully written using TaskFlow API (
@task)
- Calls a stored PL/SQL procedure (
copy_new_data) that copies rows from a source table to a target table - Uses Airflow's
OracleOperator
Features-
2 tables-
- Source table
- Target table
use airflow to copy data from one table to another every ten minutes
CREATE TABLE target_table ( 2 id NUMBER PRIMARY KEY, 3 iss_latitude NUMBER(10,6), 4 iss_longitude NUMBER(10,6), 5 address VARCHAR2(255), 6 captured_at DATE 7 );
SQL> CREATE TABLE source_table ( 2 id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY, 3 iss_latitude NUMBER(10,6), 4 iss_longitude NUMBER(10,6), 5 address VARCHAR2(255), 6 captured_at DATE DEFAULT SYSDATE 7 );
CREATE OR REPLACE PROCEDURE copy_new_iss_data AS BEGIN INSERT INTO target_table (id, iss_latitude, iss_longitude, address, captured_at) SELECT s.id, s.iss_latitude, s.iss_longitude, s.address, s.captured_at FROM source_table s WHERE NOT EXISTS ( SELECT 1 FROM target_table t WHERE t.id = s.id ); COMMIT; END;
not a good practice- oracledb to directly connect to the database, a better way would be to add a provider- oracle, then add a connection to it


