Advanced Programming Techniques β Sapienza University of Rome
A hands-on lab for building and deploying a cloud-native batch processing pipeline on AWS. Students provision infrastructure with Terraform, containerize a Python application with Docker, and execute it as a serverless job on AWS Batch (Fargate).
- Purpose of the Lab
- What You Will Learn
- Architecture
- Project Structure
- Technology Stack
- Prerequisites
- Getting Started
- Local Development
- Cleanup
- License
This project is a didactic lab for the Advanced Programming Techniques course at Sapienza. Its goal is to teach students how to:
- Define cloud infrastructure as code β write declarative Terraform configurations to provision a full AWS environment (networking, storage, compute, IAM).
- Containerize an application β package a Python program into a Docker image, push it to a private container registry (ECR), and run it in a serverless compute environment.
- Design a batch processing pipeline β use Amazon S3 as a data lake with separate input/output prefixes, and AWS Batch with Fargate to execute the processing job without managing servers.
- Understand IAM and security β configure task execution roles, and least-privilege policies (with educational shortcuts noted in the code).
The business logic is intentionally simple (summing numbers from a file) so that students can focus on the cloud engineering and DevOps aspects rather than the application logic itself.
| Area | Concepts |
|---|---|
| Infrastructure as Code | Terraform providers, resources, variables, outputs, tfvars |
| Networking | VPC, subnets, internet gateways, route tables, security groups |
| Storage | S3 buckets, object prefixes, data upload/download via boto3 |
| Containers | Dockerfiles, image tagging, ECR authentication, multi-architecture builds |
| Serverless Compute | AWS Batch, Fargate, compute environments, job queues, job definitions |
| IAM | Assume-role policies, managed policy attachments, task execution roles |
| Python | boto3 SDK, OOP design, logging, environment variable configuration |
flowchart TB
subgraph Developer["π¨βπ» Developer Machine"]
TF["Terraform CLI"]
Docker["Docker CLI"]
Script["Bash Scripts"]
end
subgraph AWS["βοΈ AWS Cloud β eu-west-1"]
subgraph VPC["VPC 10.0.0.0/16"]
subgraph Subnet["Public Subnet 10.0.1.0/24"]
Fargate["AWS Fargate Task<br/>(Python Container)"]
end
IGW["Internet Gateway"]
SG["Security Group<br/>(outbound-only)"]
end
ECR["ECR Repository"]
S3["S3 Bucket"]
Batch["AWS Batch"]
IAM["IAM Roles"]
subgraph S3_Detail["S3 Bucket Structure"]
Input["π input/addends.txt"]
Output["π output/sum.txt"]
end
end
TF -- "provisions" --> VPC
TF -- "provisions" --> ECR
TF -- "provisions" --> S3
TF -- "provisions" --> Batch
TF -- "provisions" --> IAM
Docker -- "push image" --> ECR
ECR -- "pull image" --> Fargate
Batch -- "launches" --> Fargate
Fargate -- "reads" --> Input
Fargate -- "writes" --> Output
Fargate --> IGW
IGW --> Fargate
The Terraform configuration (main.tf) provisions five logical layers:
| Resource | Purpose |
|---|---|
aws_vpc |
Isolated virtual network (10.0.0.0/16) |
aws_subnet (public) |
Hosts Fargate tasks with public IP (10.0.1.0/24, eu-west-1a) |
aws_internet_gateway |
Enables outbound internet access for image pulling & S3 communication |
aws_route_table + association |
Routes 0.0.0.0/0 traffic through the IGW |
aws_security_group |
Allows all outbound traffic, no inbound rules |
| Resource | Purpose |
|---|---|
aws_s3_bucket |
Data lake for input/output files (prefix-separated) |
aws_ecr_repository |
Private Docker image registry |
Both resources have
force_destroy/force_deleteenabled for easy lab cleanup β not recommended for production.
| Role | Trusted Service | Purpose |
|---|---|---|
| ECS Task Execution Role | ecs-tasks.amazonaws.com |
Allows Fargate to pull images from ECR, write logs, and access S3 |
AmazonS3FullAccessis attached for educational simplicity β in production, scope this down to the specific bucket ARN.
| Resource | Purpose |
|---|---|
aws_batch_compute_environment |
Managed Fargate compute pool (max 16 vCPUs) |
aws_batch_job_queue |
Job submission queue linked to the compute environment |
| Property | Value |
|---|---|
| Platform | Fargate |
| vCPU | 0.25 |
| Memory | 512 MB |
| Architecture | Configurable (ARM64 / X86_64) |
| Environment Variables | BUCKET_NAME, INPUT_PREFIX, OUTPUT_PREFIX |
sequenceDiagram
participant User as π¨βπ» User
participant S3 as S3 Bucket
participant Batch as AWS Batch
participant Fargate as Fargate Task
User->>S3: 1. Upload addends.txt to input/
User->>Batch: 2. Submit job
Batch->>Fargate: 3. Launch container from ECR image
Fargate->>S3: 4. Download input/addends.txt
Note over Fargate: 5. Parse numbers & compute sum
Fargate->>S3: 6. Upload output/sum.txt
User->>S3: 7. Download output/sum.txt
aptsapienza/
βββ main.py # Python application (S3BucketManager + Adder)
βββ main.tf # Terraform infrastructure definition
βββ terraform_tfvars_template.txt # Template for terraform.tfvars
βββ Dockerfile # Containerization (python:3.14-slim)
βββ build_and_push_to_ecr.sh # Script to build & push Docker image to ECR
βββ local_launch_with_aws_profile.sh # Script to run main.py locally
βββ requirements.txt # Python dependencies (boto3)
βββ input/
β βββ addends.txt # Sample input file (numbers) to be uploaded to S3
βββ .gitignore # Ignores Terraform state, venvs, Docker, OS files
βββ LICENSE # GNU GPLv3
| File | Description |
|---|---|
main.py |
Contains S3BucketManager (S3 I/O operations) and Adder (number summing logic). Reads configuration from environment variables. |
main.tf |
Single-file Terraform configuration that provisions the entire AWS stack: VPC, S3, ECR, IAM, Batch compute environment, job queue, and job definition. |
Dockerfile |
Builds a slim Python 3.14 image with a virtual environment, installs boto3, and sets main.py as the entrypoint. |
build_and_push_to_ecr.sh |
Automates ECR login, Docker build, tag, and push. Reads the ECR URL from Terraform output. |
local_launch_with_aws_profile.sh |
Runs main.py locally with a named AWS profile. Creates a venv, installs dependencies, sets env vars from Terraform output. |
terraform_tfvars_template.txt |
Template to create terraform.tfvars β includes project_name, docker_image_tag, and docker_image_architecture. |
| Layer | Technology | Version / Details |
|---|---|---|
| Language | Python | 3.14 |
| AWS SDK | boto3 | Latest |
| IaC | Terraform | AWS Provider ~> 6.0 |
| Container | Docker | python:3.14-slim base image |
| Compute | AWS Batch + Fargate | Serverless containers |
| Storage | Amazon S3 | Prefix-based partitioning |
| Registry | Amazon ECR | Private container images |
| Region | eu-west-1 |
Ireland |
- AWS Account with an IAM user/profile named
aptsapienza(or updatemain.tf) - Terraform-local installed
- Docker installed and running
- AWS-local CLI configured with the
aptsapienzaprofile - Python 3.14+ (for local development only)
Copy the template and edit it with your values:
cp terraform_tfvars_template.txt terraform.tfvarsEdit terraform.tfvars:
project_name = "apt-sapienza-fg-cloud-infra" # Use your initials
docker_image_tag = "v1.0"
docker_image_architecture = "ARM64" # ARM64 for Apple Silicon, X86_64 for Intel/AMDtflocal init
tflocal plan # Review the changes
tflocal apply --auto-approve # Provision resources (type "yes" to confirm)Terraform will output:
ecr_repository_urlβ the ECR image URLs3_bucket_nameβ the S3 bucket namejob_queue_nameβ the Batch job queue name
Upload the sample input file to S3:
awslocal s3 cp input/addends.txt s3://$(tflocal output -raw s3_bucket_name)/input/addends.txt \
--profile aptsapienzachmod +x build_and_push_to_ecr.sh
./build_and_push_to_ecr.shThis script:
- Retrieves the ECR repository URL from Terraform output
- Authenticates Docker with ECR
- Builds the image
- Tags it as
v1.0 - Pushes it to ECR
awslocal batch submit-job \
--job-name "my-first-job" \
--job-queue "$(tflocal output -raw job_queue_name)" \
--job-definition "$(tflocal output -raw job_queue_name | sed 's/-queue/-job/')" \
--profile aptsapienza \
--region eu-west-1Monitor the job in the AWS Batch Console.
Once the job status is SUCCEEDED, download the result:
awslocal s3 cp s3://$(tflocal output -raw s3_bucket_name)/output/sum.txt ./sum.txt \
--profile aptsapienza
cat sum.txtRun the application locally (without AWS Batch) using the provided script:
chmod +x local_launch_with_aws_profile.sh
./local_launch_with_aws_profile.sh aptsapienzaThis will:
- Retrieve the S3 bucket name from Terraform output
- Create a Python virtual environment (if needed) and install dependencies
- Set the required environment variables
- Execute
main.py
Note: You must have the input file already uploaded to S3 (
input/addends.txt).
Destroy all AWS resources when done to avoid charges:
tflocal destroy # Type "yes" to confirm
force_destroyon S3 andforce_deleteon ECR ensure a clean teardown even with existing objects/images.
This project is licensed under the GNU General Public License v3.0 β see the LICENSE file for details.