BERT-based binary classifier (Disaster vs Not Disaster) with a production-ready FastAPI service, Docker image, CI, and deployment notes.
- Quick local smoke test (Docker)
- Google Cloud Run (recommended production flow)
- GitHub Actions / CI
- API spec
- Environment variables & secrets
- Testing
- Production sizing & ops
- Observability [and miscellaneous]
- License & contact
Build and run locally to verify the API:
# build
docker build -t disaster-tweets-api:local .
# run (replace HF_TOKEN)
docker run --rm -p 8080:8080 \
-e HF_TOKEN="hf_xxx" -e HF_REPO_ID="sakibalfahim/disaster-tweets-bert" \
disaster-tweets-api:local
# test
curl -X POST "http://localhost:8080/predict" \
-H "Content-Type: application/json" \
-d '{"text":"Huge explosion reported near the financial district — multiple injuries."}'If you have GPU access in your environment and want GPU in Docker, add --gpus all and ensure appropriate base image.
Build and push with Cloud Build and deploy with Cloud Run. Replace PROJECT_ID, REGION, and hf_xxx.
gcloud auth login
gcloud config set project PROJECT_ID
# Build container in GCP
gcloud builds submit --tag gcr.io/PROJECT_ID/disaster-tweets-api:latest
# Deploy (recommended sizing)
gcloud run deploy disaster-tweets-api \
--image=gcr.io/PROJECT_ID/disaster-tweets-api:latest \
--platform=managed \
--region=REGION \
--allow-unauthenticated \
--set-env-vars HF_REPO_ID="sakibalfahim/disaster-tweets-bert" \
--memory=4Gi --cpu=2 --concurrency=1Secrets: store HF_TOKEN as a Cloud Run secret and mount it into the service; do not put the token directly in CLI history.
Current workflow .github/workflows/ci.yml:
- Builds Docker image using
docker/build-push-action - Pushes image to GHCR (configured)
Next CI improvement: runpytestinside a lightweight test container to validate quick smoke tests (tests included intests/test_basic.py).
Base: GET /health and POST /predict
GET /health
Returns service readiness and device info.
{"status":"ok","device":"cuda"|"cpu"}POST /predict
Request:
{
"text": "Single tweet string"
}or
{
"text": ["tweet1", "tweet2"]
}Response:
{
"predictions": ["Disaster"],
"confidences": [{"Disaster":0.987,"Not Disaster":0.013}],
"latency_ms": 123.45
}HF_TOKEN— Hugging Face token (READ permission). Store as secret.HF_REPO_ID— model repo id, defaultsakibalfahim/disaster-tweets-bert.
Do not commit tokens. Use cloud secret managers or GitHub Secrets for CI.
A lightweight test file tests/test_basic.py is included that fake-injects a minimal transformers shim so CI can run fast without downloading the real model.
Run locally:
pip install -r requirements.txt
pip install pytest httpx
pytest -qCI will run container build; next step is to run tests inside the image before pushing.
- Model artifact ~400+MB (safetensors). Memory usage depends on batch size and device.
- Start with:
--memory=4Gi,--cpu=2,--concurrency=1. Increase memory to8Giif using larger batches. - Use concurrency=1 to avoid memory competition inside a single instance.
- If you expect sustained high QPS, use autoscaling with a reasonable min instance count to mitigate cold starts.
/metricsexposes basic in-memory metrics (requests, avg latency). Add Prometheus exporter or forward logs to Cloud Monitoring.- Add health-checks & readiness for autoscalers.
- Implement request rate limiting + authentication for production.
- Keep
HF_TOKENsecret and rotate periodically. - Consider private network / API gateway for production.
- Validate and sanitize inputs; set a max length for text to avoid resource abuse.
- CI: run
pytestinside built image; fail fast for regressions. - Add async batching & queueing to maximize GPU throughput.
- Add model warmup & quantization options (ORT, bitsandbytes) to reduce memory and improve latency.
- Add monitoring (Prometheus), tracing, and alerting.
app/main.py— FastAPI app and inference logicDockerfile— production image.github/workflows/ci.yml— CI build & pushtests/test_basic.py— smoke testsREADME.md— this file
MIT License
Author: sakibalfahim — contact via mail, LinkedIn, GitHub, or Hugging Face profile.