Skip to content

Computer Vision pipeline designed for precision agriculture applications, featuring automated dataset processing, advanced data augmentation, hyperparameter optimization, and edge-optimized model deployment for real-time crop and weed detection.

License

Notifications You must be signed in to change notification settings

manueljesus/CropEdgeAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

20 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

CropEdgeAI ๐ŸŒฑ

Python License PyTorch YOLO Unit Tests

CropEdgeAI is a complete Computer Vision pipeline designed for precision agriculture applications, featuring automated dataset processing, advanced data augmentation, hyperparameter optimization, and edge-optimized model deployment for real-time crop and weed detection.

Attribution

This project uses data from:

  • Weed-crop dataset in precision agriculture by Upadhyay et al., North Dakota State University
  • Licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)
  • DOI: 10.17632/mthv4ppwyw.2 Please check NOTICE for further details.

๐Ÿ“‹ Table of Contents

  • ๐ŸŽฏ Overview & Features
  • ๐Ÿ—๏ธ Technical Architecture
  • โšก Installation & Setup
  • ๐Ÿš€ Quick Start
  • ๐Ÿ“– Usage Documentation
  • ๐Ÿงฉ Module Documentation
  • ๐Ÿ› ๏ธ Development Guide
  • ๐Ÿšข Deployment

๐ŸŽฏ Overview & Features

CropEdgeAI addresses the critical need for automated crop monitoring and weed detection in modern precision agriculture. The system provides an end-to-end solution from raw agricultural imagery to production-ready edge deployment.

๐ŸŒŸ Key Features

  • ๐Ÿ”„ Complete ML Pipeline: From data ingestion to model deployment
  • ๐Ÿ“Š Advanced EDA Tools: Comprehensive exploratory data analysis with rich visualizations
  • ๐ŸŽจ Smart Data Augmentation: Overlay and close-up augmentation techniques for balanced datasets
  • โšก Hyperparameter Optimization: Optuna-powered HPO with multi-objective optimization
  • ๐ŸŽฏ Multi-Model Support: YOLO v8/v9/v10/v11/v12 compatibility with automatic benchmarking
  • ๐Ÿ“ฑ Edge Optimization: NCNN export and optimization for mobile/embedded deployment
  • ๐Ÿ” Batch Inference: High-performance batch processing with detailed analytics
  • ๐Ÿ“ˆ Rich Visualizations: Interactive plots for model performance and dataset insights
  • ๐Ÿ”„ Reproducible Experiments: Deterministic training with comprehensive logging
  • โ˜๏ธ Cloud Integration: Google Drive backup and experiment synchronization

๐ŸŒพ Supported Crops & Applications

  • Crops: Black bean, Canola, Corn, Field pea, Flax, Lentil, Soybean, Sugar beet
  • Detection Tasks: Multi-class crop identification, weed detection, spatial analysis
  • Use Cases: Precision spraying, yield optimization, field monitoring, automated farming

๐Ÿ—๏ธ Technical Architecture

System Overview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Data Layer    โ”‚    โ”‚  Processing     โ”‚    โ”‚   Deployment    โ”‚
โ”‚                 โ”‚    โ”‚     Layer       โ”‚    โ”‚     Layer       โ”‚
โ”‚ โ€ข Raw Images    โ”‚    โ”‚ โ€ข EDA Tools     โ”‚    โ”‚ โ€ข NCNN Models   โ”‚
โ”‚ โ€ข Annotations   โ”‚    โ”‚ โ€ข Augmentation  โ”‚    โ”‚ โ€ข Batch Proc.   โ”‚
โ”‚ โ€ข Validation    โ”‚    โ”‚ โ€ข Training      โ”‚    โ”‚ โ€ข Edge Deploy   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Core Modules  โ”‚    โ”‚   Experiments   โ”‚    โ”‚   Utilities     โ”‚
โ”‚                 โ”‚    โ”‚                 โ”‚    โ”‚                 โ”‚
โ”‚ โ€ข Dataset Ops   โ”‚    โ”‚ โ€ข HPO Pipeline  โ”‚    โ”‚ โ€ข Visualizers   โ”‚
โ”‚ โ€ข Augmenters    โ”‚    โ”‚ โ€ข Benchmarking  โ”‚    โ”‚ โ€ข Validators    โ”‚
โ”‚ โ€ข Validators    โ”‚    โ”‚ โ€ข NCNN HPO      โ”‚    โ”‚ โ€ข Config Mgmt   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Technology Stack

Category Technologies
Core ML PyTorch, Ultralytics YOLO, NCNN
Data Science Pandas, NumPy, Scikit-learn, OpenCV
Optimization Optuna
Visualization Matplotlib, Seaborn, Adjusttext
Development Python 3.11+, UV package manager, Pytest
Deployment NCNN (mobile/edge)

Module Architecture

src/cropedgeai/
โ”œโ”€โ”€ dataset/           # Data loading, validation, splitting
โ”‚   โ”œโ”€โ”€ loader.py      # Weed-crop dataset loader
โ”‚   โ”œโ”€โ”€ validator.py   # Data validation utilities
โ”‚   โ”œโ”€โ”€ splitter.py    # Stratified dataset splitting
โ”‚   โ””โ”€โ”€ augmenter/     # Data augmentation pipeline
โ”œโ”€โ”€ eda/               # Exploratory data analysis
โ”‚   โ”œโ”€โ”€ eda.py         # Statistical analysis engine
โ”‚   โ””โ”€โ”€ visualizer.py  # Advanced visualization tools
โ”œโ”€โ”€ experiments/       # ML experiment management
โ”‚   โ”œโ”€โ”€ yolo_experiment.py    # Multi-model benchmarking
โ”‚   โ”œโ”€โ”€ yolo_hpo.py           # Hyperparameter optimization
โ”‚   โ””โ”€โ”€ yolo_ncnn_hpo.py      # Edge deployment optimization
โ”œโ”€โ”€ inference/         # Production inference
โ”‚   โ””โ”€โ”€ yolo_batch_processor.py  # Batch processing engine
โ””โ”€โ”€ main.py           # CLI entry point

โšก Installation & Setup

Prerequisites

  • Python: 3.11 or 3.12 (3.13 not yet supported)
  • GPU: CUDA-compatible GPU recommended (optional but highly recommended)
  • Storage: 40GB+ free space for models and datasets

Quick Installation

# Clone the repository
git clone https://github.com/manueljesus/cropedgeai.git
cd cropedgeai

# Install with UV
pip install uv
uv sync

Development Installation

uv sync --group dev

๐Ÿš€ Quick Start

1. Data Preparation

from cropedgeai.dataset import WeedCropDatasetLoader

# Load your dataset
loader = WeedCropDatasetLoader("path/to/weed-crop-dataset")
dataset = loader()

print(f"Loaded {len(dataset)} annotations")
print(f"Classes: {dataset['class_name'].unique()}")

2. Exploratory Data Analysis

from cropedgeai.eda import WeedCropEDA, WeedCropVisualizer

# Perform EDA
eda = WeedCropEDA(dataset, dataset_name="My Dataset")
stats = eda.dataset_distribution()
...

# Generate visualizations
visualizer = WeedCropVisualizer(eda)
visualizer.plot_class_balance()
visualizer.plot_spatial_distribution()
...

3. Dataset Splitting & Augmentation

from cropedgeai.dataset import DatasetSplitter, DatasetAugmenter

# Split dataset
splitter = DatasetSplitter(dataset, validation_size=0.15, test_size=0.15, output_dir="./processed", organize_files=True)
train, val, test = splitter()

# Augment training data
augmenter = DatasetAugmenter(train, "./augmented", "./processed")
augmented_dataset = augmenter(
    run_overlay=True,
    run_closeup=True,
    overlay_images=1000,
    weeds_per_image=5
)

4. Model Training & Benchmarking

from cropedgeai.experiments import YOLOExperiment

# Run multi-model benchmark
experiment = YOLOExperiment("config/baseline_experiment.yaml")
results = experiment()

5. Hyperparameter Optimization

from cropedgeai.experiments import YOLOHyperparameterOptimizer

optimizer = YOLOHyperparameterOptimizer("config/hpo_yolo11n.yaml")
hpo_results = optimizer()

6. Edge Deployment Optimization

from cropedgeai.experiments import YOLONCNNHyperparameterOptimizer

ncnn_optimizer = YOLONCNNHyperparameterOptimizer("config/ncnn_hpo.yaml")
results = ncnn_optimizer()

ncnn_path = ncnn_optimizer.export_ncnn(best_size) # Check results to confirm `best_size`

7. Batch Inference

# CLI inference
cropedgeai model.pt input_folder/ output_folder/ \
  --conf-threshold 0.35 \
  --img-size 640 \
  --stats-csv detections.csv
# Programmatic inference
from cropedgeai.inference import YOLOBatchProcessor

processor = YOLOBatchProcessor(
    model_path="best.pt",
    input_folder="test_images/",
    output_folder="results/",
    stats_csv_file="detection_stats.csv",
    conf_threshold=0.35
)
processor()

๐Ÿ“– Usage Documentation

Configuration Files

CropEdgeAI uses YAML configuration files for experiment management:

Experiment Configuration

# config/my_experiment.yaml
name: "My_Crop_Experiment"
description: "Custom crop detection experiment"

dataset_yaml: "data/dataset.yaml"
models_to_test:
  yolo11n: "yolo11n.pt"
  yolo11s: "yolo11s.pt"

training_config:
  epochs: 100
  batch: 32
  imgsz: 640
  patience: 20

HPO Configuration

# config/my_hpo.yaml
name: "crop_hpo"
dataset_yaml: "data/dataset.yaml"
base_model_path: "yolo11n.pt"

hyperparameter_search_space:
  lr0:
    type: "float"
    min: 0.0001
    max: 0.01
    log: true
  
  mosaic:
    type: "float"
    min: 0.0
    max: 1.0

Command Line Interface

# Basic inference
cropedgeai model.pt input/ output/

# Advanced inference with custom parameters
cropedgeai model.ncnn input/ output/ \
  --conf-threshold 0.4 \
  --img-size 416 \
  --stats-csv results.csv

# Help and options
cropedgeai --help

Python API Examples

Custom Augmentation Pipeline

from cropedgeai.dataset.augmenter import DatasetAugmenter

augmenter = DatasetAugmenter(
    train=train_data,
    output_dir="augmented_output",
    validation_dir="validation_data",
    random_seed=42
)

# Custom augmentation parameters
augmented = augmenter(
    run_overlay=True,
    run_closeup=True,
    overlay_images=2000,
    weeds_per_image=3,
    closeup_padding=2.0
)

Advanced EDA

from cropedgeai.eda import WeedCropEDA

eda = WeedCropEDA(dataset, "Advanced Analysis")

# Get comprehensive statistics
stats = eda.dataset_distribution()
class_balance = eda.class_balance_scores()
bbox_stats = eda.bounding_boxes_stats()

# Generate detailed report
report = eda.generate_summary_report()
print(report)

# Custom visualizations
from cropedgeai.eda import WeedCropVisualizer

viz = WeedCropVisualizer(eda)
viz.plot_cooccurrence_matrix("crop")
viz.plot_spatial_distribution_per_crop()
viz.plot_bbox_size_distributions()

๐Ÿงฉ Module Documentation

๐Ÿ“ Dataset Module (dataset)

Purpose: Comprehensive dataset management with loading, validation, splitting, and augmentation capabilities.

Key Components

  • WeedCropDatasetLoader: Loads Weed-crop datasets with automatic validation
  • DatasetValidator: Ensures data integrity and format compliance
  • DatasetSplitter: Stratified splitting preserving class distributions
  • DatasetAugmenter: Orchestrates augmentation pipeline

Augmentation Sub-module

  • BaseAugmenter: Abstract base for augmentation techniques
  • OverlayAugmenter: Overlays weed instances onto crop images
  • CloseupAugmenter: Creates focused crops around objects

๐Ÿ“Š EDA Module (eda)

Purpose: Statistical analysis and visualization tools for agricultural datasets.

Key Components

  • WeedCropEDA: Core statistical analysis engine

    • Dataset distribution metrics
    • Class balance analysis
    • Bounding box statistics
    • IoU calculations
    • Comprehensive reporting
  • WeedCropVisualizer: Advanced visualization toolkit

    • Class distribution plots
    • Spatial distribution heatmaps
    • Co-occurrence matrices
    • Bounding box analysis
    • Interactive image annotation display

๐Ÿงช Experiments Module (experiments)

Purpose: ML experiment orchestration with benchmarking, HPO, and edge optimization.

Key Components

  • YOLOExperiment: Multi-model benchmarking pipeline
  • YOLOHyperparameterOptimizer: Optuna-powered HPO with advanced search spaces
  • YOLONCNNHyperparameterOptimizer: Edge deployment optimization
  • TrainingConfig: Structured training configuration management

๐Ÿ”ฎ Inference Module (inference)

Purpose: Production-ready inference capabilities with batch processing and performance analytics.

Key Components

  • YOLOBatchProcessor: High-performance batch inference
    • Multi-format model support (PyTorch, NCNN)
    • Automatic performance profiling
    • CSV statistics export
    • Progress tracking and logging

๐ŸŽฏ Main Entry Point (main.py)

Purpose: CLI interface for production inference workflows.

Features:

  • Argument parsing and validation
  • Model compatibility checks
  • Batch processing orchestration
  • Results summarization

๐Ÿ› ๏ธ Development Guide

Setting Up Development Environment

# Clone and setup
git clone https://github.com/manueljesus/cropedgeai.git
cd cropedgeai

# Install development dependencies
uv sync --group dev

# Install pre-commit hooks (optional)
pre-commit install

Code Organization Principles

  1. Separation of Concerns: Each module has a clear, single responsibility
  2. Dependency Injection: Configuration-driven architecture
  3. Type Safety: Comprehensive type hints throughout
  4. Error Handling: Graceful error handling with informative messages
  5. Logging: Structured logging for debugging and monitoring

Coding Standards

  • Style: Follows PEP 8 with Black formatting
  • Documentation: Comprehensive docstrings following Google style
  • Type Hints: Required for all public APIs
  • Error Handling: Custom exceptions with clear error messages
  • Testing: Pytest with fixtures and mocking

Running Tests

# Run all tests
uv run pytest pytest

# Run with coverage
uv run pytest --cov --cov-report=html

# Run specific test modules
uv run pytest test/dataset/
uv run pytest test/experiments/ -v

Test Structure

# Example test structure
class TestWeedCropEDA:
    @pytest.fixture
    def sample_dataset(self):
        return create_mock_dataset()
    
    def test_dataset_distribution(self, sample_dataset):
        eda = WeedCropEDA(sample_dataset)
        stats = eda.dataset_distribution()
        assert stats['total_images'] > 0

Adding New Features

  1. Create Feature Branch: git checkout -b feature/new-feature
  2. Implement with Tests: Write tests first (TDD approach)
  3. Update Documentation: Add docstrings and update README if needed
  4. Run Quality Checks: Ensure tests pass and code follows standards
  5. Submit PR: Include description of changes and test results

Performance Considerations

  • Memory Management: Use generators for large datasets
  • GPU Utilization: Automatic CUDA detection and fallback
  • Batch Processing: Optimized batch sizes for different hardware
  • Caching: Strategic caching of computed results

Deployment

Edge Deployment with NCNN

# Export optimized edge model
from cropedgeai.experiments import YOLONCNNHyperparameterOptimizer

optimizer = YOLONCNNHyperparameterOptimizer("config/ncnn_hpo.yaml")
ncnn_path = optimizer.export_ncnn(input_size=640)

# Run on the edge device
processor = YOLOBatchProcessor(
    model_path=ncnn_path,
    input_folder="edge_input/",
    output_folder="edge_output/",
    img_size=640,
    conf_threshold=0.25
)

About

Computer Vision pipeline designed for precision agriculture applications, featuring automated dataset processing, advanced data augmentation, hyperparameter optimization, and edge-optimized model deployment for real-time crop and weed detection.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages