Skip to content

Neural Network Inference

William Robert Robertson edited this page Dec 14, 2025 · 19 revisions

WildCamera will be the first wildlife camera able to carry out neural network inference in the field.

Requirements from Ecologists

On proprietary wildlife cameras, PIR sensors are often activated by unwanted activation sources - the main ones are:

Environment Unwanted Activation Sources
Forest Sun-warmed foliage and branches moving in the wind, birds.
Urban Humans passing at a distance, vehicles passing at a distance.

Video from these unwanted activations consumes storage space on the camera and consumes time as ecologists sift through it manually.

To address this, categorisation by neural networks into the following categories is being investigated:

Categorisation in Forest Environments

Category Description
Mammal Dormouse (family Gliridae), squirrel (family Sciuridae), tree rooting bats (family Vespertilionidae).
Bird Avian species - usually passerine species.
Background Foliage and branches moving in the wind, with a lower likelihood of people on paths at a distance.

Juvenile dormice (family Gliridae) typically do not leave the nest holes until they are able to climb - by this stage they have fur, their eyes are open and they have an appearance similar to adults. Juvenile tree dwelling bats (family Vespertilionidae) typically do not leave a roost until they are able to fly.

Categorisation in Urban Environments

Category Description
Mammal Dormouse (family Gliridae), squirrel (family Sciuridae).
Bird Avian species.
Background Humans passing at a distance, vehicles passing at a distance, possibly branches and foliage moving in the wind.

(Initially, training and testing will be done with video which includes two dormouse species - Muscardinus avellanarius and Eliomys quercinus, moving branches and foliage and humans and vehicles at a distance.)

Categorisation Errors in Forest Environments

In forest environments, cameras can often be positioned to face away from paths and roadways - avoiding unwanted activations by humans on paths and by vehicles. The severity of NN categorisation errors in forest environments is:

Categorisation Error Severity
Human classed as Mammal Very Low
Bird classed as Mammal Moderate - storage may fill with videos of birds investigating nest holes
Mammal classed Background Very high - data would be lost

Efficiency

The electrical power, compute power and fast RAM available will all be limited and so only simple neural network inference will be carried out on the camera itself. Neural network inference for advanced tasks requiring much deeper neural networks which require much more electrical power, much more compute power and much larger quantities of fast RAM will be done later on more powerful machines.

A rough estimate of what is possible using the 600 GOPS Neural-ART Accelerator on the ST STM32N6x7 MCU is given in this article:

"For example, when running the Ultralytics YOLOv8n model at 256 by 256 resolution on the STM32N6, the system reached 34 frames per second with each inference taking about 29 milliseconds. Power measurements showed it used only 9.4 millijoules per inference, making it well-suited for real-time vision tasks on low-power devices." (Will's emphasis.)

https://www.ultralytics.com/customers/embedded-vision-ai-with-ultralytics-yolo-and-stmicroelectronics-mcu

The above results were for colour images. Optimisation for the monochrome images available at night-time may give better performance. The ST Getting Started Object Detection example uses only RGB888 images.

Video may be colour video during the day or monochrome InfraRed (IR) video at night. In video recorded for analysis, conservationists have requested 30 fps (30 frames per second) video however it is not critical that the NN (Neural Network) analyses each frame of video - NN analysis of every 2nd frame of video, for example, would be acceptable.

Overview of MPU NPUs

This is a brief overview of the NPUs (Neural Processing Units) integrated into some of the MPUs (Micro Processor Units) under consideration. These MPUs run embedded Linux.

MPU NPU
Texas Instruments TI AM67A on the BeagleY-AI c. 4 TOPS NPU (MMA in TI documents)
NXP i.MX 8M Plus 2 to 3 TOPS
ST STM32MP25 1.35 TOPS

GOPS (Giga Operations Per Second) or TOPS (Tera operations Per Second) gives only a very vague indication of what the performance in real-world scenarios is likely to be - other factors play a very strong role.

NN inference on central processor cores consumes vastly more time and electrical energy than NN inference on hardware NPUs and so is not a viable option.

Notes

Neural-ART Accelerator is a trademark of ST.
MMA is an abbreviation for Matrix Multiplication Accelerator in Texas Instruments documents.

Clone this wiki locally