Skip to content

Lightweight facial landmark pipeline for ESP32-S3 using ESP-WHO + ESP-DL. Goal: a compact model that predicts ~22 2D facial keypoints (AIS-2141) #261

@ryan-sheriff-b

Description

@ryan-sheriff-b

Checklist

  • Checked the issue tracker for similar issues to ensure this is not a duplicate.
  • Provided a clear description of your suggestion.
  • Included any relevant context or examples.

Issue or Suggestion Description

Hello everyone — I’m building a lightweight facial landmark pipeline for ESP32-S3 using ESP-WHO + ESP-DL. Goal: a compact model that predicts ~22 2D facial keypoints (eyes, pupils, mouth corners, nose, chin) for blink, yawn and coarse gaze detection.

Current status:

  • ESP-WHO example (human_face_detect) runs on my ESP32-S3 with GC2145 camera.
  • I can flash and run .espdl models on the board.

Questions / requests:

  1. Recommended training → quantization flow for a 22-point regressor: PyTorch→ONNX→esp-ppq (PTQ) or prefer QAT? Any example repo?
  2. Best input resolution for ESP32-S3 (tradeoff fps vs accuracy)? 128×128 / 160×160 / 224×224 suggestions please.
  3. Operator support checklist: which ops commonly trip up conversion to .espdl? (conv2d, depthwise, relu6, upsample, transpose conv, gather/index ops?)
  4. Example quantization script or minimal esp-ppq template for converting ONNX→.espdl with a representative calibration dataset.
  5. Any tips to design a regression head (heatmap vs direct coordinate regression) that works best with ESP-DL and quantization?

help me with more tips
Thanks — looking for practical, minimal working example and gotchas to avoid.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions