CropEdgeAI 🌱

CropEdgeAI is a complete Computer Vision pipeline designed for precision agriculture applications, featuring automated dataset processing, advanced data augmentation, hyperparameter optimization, and edge-optimized model deployment for real-time crop and weed detection.

Attribution

This project uses data from:

Weed-crop dataset in precision agriculture by Upadhyay et al., North Dakota State University
Licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)
DOI: 10.17632/mthv4ppwyw.2 Please check NOTICE for further details.

📋 Table of Contents

🎯 Overview & Features
🏗️ Technical Architecture
⚡ Installation & Setup
🚀 Quick Start
📖 Usage Documentation
🧩 Module Documentation
🛠️ Development Guide
🚢 Deployment

🎯 Overview & Features

CropEdgeAI addresses the critical need for automated crop monitoring and weed detection in modern precision agriculture. The system provides an end-to-end solution from raw agricultural imagery to production-ready edge deployment.

🌟 Key Features

🔄 Complete ML Pipeline: From data ingestion to model deployment
📊 Advanced EDA Tools: Comprehensive exploratory data analysis with rich visualizations
🎨 Smart Data Augmentation: Overlay and close-up augmentation techniques for balanced datasets
⚡ Hyperparameter Optimization: Optuna-powered HPO with multi-objective optimization
🎯 Multi-Model Support: YOLO v8/v9/v10/v11/v12 compatibility with automatic benchmarking
📱 Edge Optimization: NCNN export and optimization for mobile/embedded deployment
🔍 Batch Inference: High-performance batch processing with detailed analytics
📈 Rich Visualizations: Interactive plots for model performance and dataset insights
🔄 Reproducible Experiments: Deterministic training with comprehensive logging
☁️ Cloud Integration: Google Drive backup and experiment synchronization

🌾 Supported Crops & Applications

Crops: Black bean, Canola, Corn, Field pea, Flax, Lentil, Soybean, Sugar beet
Detection Tasks: Multi-class crop identification, weed detection, spatial analysis
Use Cases: Precision spraying, yield optimization, field monitoring, automated farming

🏗️ Technical Architecture

System Overview

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Data Layer    │    │  Processing     │    │   Deployment    │
│                 │    │     Layer       │    │     Layer       │
│ • Raw Images    │    │ • EDA Tools     │    │ • NCNN Models   │
│ • Annotations   │    │ • Augmentation  │    │ • Batch Proc.   │
│ • Validation    │    │ • Training      │    │ • Edge Deploy   │
└─────────────────┘    └─────────────────┘    └─────────────────┘

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Core Modules  │    │   Experiments   │    │   Utilities     │
│                 │    │                 │    │                 │
│ • Dataset Ops   │    │ • HPO Pipeline  │    │ • Visualizers   │
│ • Augmenters    │    │ • Benchmarking  │    │ • Validators    │
│ • Validators    │    │ • NCNN HPO      │    │ • Config Mgmt   │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Technology Stack

Category	Technologies
Core ML	PyTorch, Ultralytics YOLO, NCNN
Data Science	Pandas, NumPy, Scikit-learn, OpenCV
Optimization	Optuna
Visualization	Matplotlib, Seaborn, Adjusttext
Development	Python 3.11+, UV package manager, Pytest
Deployment	NCNN (mobile/edge)

Module Architecture

src/cropedgeai/
├── dataset/           # Data loading, validation, splitting
│   ├── loader.py      # Weed-crop dataset loader
│   ├── validator.py   # Data validation utilities
│   ├── splitter.py    # Stratified dataset splitting
│   └── augmenter/     # Data augmentation pipeline
├── eda/               # Exploratory data analysis
│   ├── eda.py         # Statistical analysis engine
│   └── visualizer.py  # Advanced visualization tools
├── experiments/       # ML experiment management
│   ├── yolo_experiment.py    # Multi-model benchmarking
│   ├── yolo_hpo.py           # Hyperparameter optimization
│   └── yolo_ncnn_hpo.py      # Edge deployment optimization
├── inference/         # Production inference
│   └── yolo_batch_processor.py  # Batch processing engine
└── main.py           # CLI entry point

⚡ Installation & Setup

Prerequisites

Python: 3.11 or 3.12 (3.13 not yet supported)
GPU: CUDA-compatible GPU recommended (optional but highly recommended)
Storage: 40GB+ free space for models and datasets

Quick Installation

# Clone the repository
git clone https://github.com/manueljesus/cropedgeai.git
cd cropedgeai

# Install with UV
pip install uv
uv sync

Development Installation

uv sync --group dev

🚀 Quick Start

1. Data Preparation

from cropedgeai.dataset import WeedCropDatasetLoader

# Load your dataset
loader = WeedCropDatasetLoader("path/to/weed-crop-dataset")
dataset = loader()

print(f"Loaded {len(dataset)} annotations")
print(f"Classes: {dataset['class_name'].unique()}")

2. Exploratory Data Analysis

from cropedgeai.eda import WeedCropEDA, WeedCropVisualizer

# Perform EDA
eda = WeedCropEDA(dataset, dataset_name="My Dataset")
stats = eda.dataset_distribution()
...

# Generate visualizations
visualizer = WeedCropVisualizer(eda)
visualizer.plot_class_balance()
visualizer.plot_spatial_distribution()
...

3. Dataset Splitting & Augmentation

from cropedgeai.dataset import DatasetSplitter, DatasetAugmenter

# Split dataset
splitter = DatasetSplitter(dataset, validation_size=0.15, test_size=0.15, output_dir="./processed", organize_files=True)
train, val, test = splitter()

# Augment training data
augmenter = DatasetAugmenter(train, "./augmented", "./processed")
augmented_dataset = augmenter(
    run_overlay=True,
    run_closeup=True,
    overlay_images=1000,
    weeds_per_image=5
)

4. Model Training & Benchmarking

from cropedgeai.experiments import YOLOExperiment

# Run multi-model benchmark
experiment = YOLOExperiment("config/baseline_experiment.yaml")
results = experiment()

5. Hyperparameter Optimization

from cropedgeai.experiments import YOLOHyperparameterOptimizer

optimizer = YOLOHyperparameterOptimizer("config/hpo_yolo11n.yaml")
hpo_results = optimizer()

6. Edge Deployment Optimization

from cropedgeai.experiments import YOLONCNNHyperparameterOptimizer

ncnn_optimizer = YOLONCNNHyperparameterOptimizer("config/ncnn_hpo.yaml")
results = ncnn_optimizer()

ncnn_path = ncnn_optimizer.export_ncnn(best_size) # Check results to confirm `best_size`

7. Batch Inference

# CLI inference
cropedgeai model.pt input_folder/ output_folder/ \
  --conf-threshold 0.35 \
  --img-size 640 \
  --stats-csv detections.csv

# Programmatic inference
from cropedgeai.inference import YOLOBatchProcessor

processor = YOLOBatchProcessor(
    model_path="best.pt",
    input_folder="test_images/",
    output_folder="results/",
    stats_csv_file="detection_stats.csv",
    conf_threshold=0.35
)
processor()

📖 Usage Documentation

Configuration Files

CropEdgeAI uses YAML configuration files for experiment management:

Experiment Configuration

# config/my_experiment.yaml
name: "My_Crop_Experiment"
description: "Custom crop detection experiment"

dataset_yaml: "data/dataset.yaml"
models_to_test:
  yolo11n: "yolo11n.pt"
  yolo11s: "yolo11s.pt"

training_config:
  epochs: 100
  batch: 32
  imgsz: 640
  patience: 20

HPO Configuration

# config/my_hpo.yaml
name: "crop_hpo"
dataset_yaml: "data/dataset.yaml"
base_model_path: "yolo11n.pt"

hyperparameter_search_space:
  lr0:
    type: "float"
    min: 0.0001
    max: 0.01
    log: true
  
  mosaic:
    type: "float"
    min: 0.0
    max: 1.0

Command Line Interface

# Basic inference
cropedgeai model.pt input/ output/

# Advanced inference with custom parameters
cropedgeai model.ncnn input/ output/ \
  --conf-threshold 0.4 \
  --img-size 416 \
  --stats-csv results.csv

# Help and options
cropedgeai --help

Python API Examples

Custom Augmentation Pipeline

from cropedgeai.dataset.augmenter import DatasetAugmenter

augmenter = DatasetAugmenter(
    train=train_data,
    output_dir="augmented_output",
    validation_dir="validation_data",
    random_seed=42
)

# Custom augmentation parameters
augmented = augmenter(
    run_overlay=True,
    run_closeup=True,
    overlay_images=2000,
    weeds_per_image=3,
    closeup_padding=2.0
)

Advanced EDA

from cropedgeai.eda import WeedCropEDA

eda = WeedCropEDA(dataset, "Advanced Analysis")

# Get comprehensive statistics
stats = eda.dataset_distribution()
class_balance = eda.class_balance_scores()
bbox_stats = eda.bounding_boxes_stats()

# Generate detailed report
report = eda.generate_summary_report()
print(report)

# Custom visualizations
from cropedgeai.eda import WeedCropVisualizer

viz = WeedCropVisualizer(eda)
viz.plot_cooccurrence_matrix("crop")
viz.plot_spatial_distribution_per_crop()
viz.plot_bbox_size_distributions()

🧩 Module Documentation

📁 Dataset Module (dataset)

Purpose: Comprehensive dataset management with loading, validation, splitting, and augmentation capabilities.

Key Components

WeedCropDatasetLoader: Loads Weed-crop datasets with automatic validation
DatasetValidator: Ensures data integrity and format compliance
DatasetSplitter: Stratified splitting preserving class distributions
DatasetAugmenter: Orchestrates augmentation pipeline

Augmentation Sub-module

BaseAugmenter: Abstract base for augmentation techniques
OverlayAugmenter: Overlays weed instances onto crop images
CloseupAugmenter: Creates focused crops around objects

📊 EDA Module (eda)

Purpose: Statistical analysis and visualization tools for agricultural datasets.

Key Components

WeedCropEDA: Core statistical analysis engine
- Dataset distribution metrics
- Class balance analysis
- Bounding box statistics
- IoU calculations
- Comprehensive reporting
WeedCropVisualizer: Advanced visualization toolkit
- Class distribution plots
- Spatial distribution heatmaps
- Co-occurrence matrices
- Bounding box analysis
- Interactive image annotation display

🧪 Experiments Module (experiments)

Purpose: ML experiment orchestration with benchmarking, HPO, and edge optimization.

Key Components

YOLOExperiment: Multi-model benchmarking pipeline
YOLOHyperparameterOptimizer: Optuna-powered HPO with advanced search spaces
YOLONCNNHyperparameterOptimizer: Edge deployment optimization
TrainingConfig: Structured training configuration management

🔮 Inference Module (inference)

Purpose: Production-ready inference capabilities with batch processing and performance analytics.

Key Components

YOLOBatchProcessor: High-performance batch inference
- Multi-format model support (PyTorch, NCNN)
- Automatic performance profiling
- CSV statistics export
- Progress tracking and logging

🎯 Main Entry Point (main.py)

Purpose: CLI interface for production inference workflows.

Features:

Argument parsing and validation
Model compatibility checks
Batch processing orchestration
Results summarization

🛠️ Development Guide

Setting Up Development Environment

# Clone and setup
git clone https://github.com/manueljesus/cropedgeai.git
cd cropedgeai

# Install development dependencies
uv sync --group dev

# Install pre-commit hooks (optional)
pre-commit install

Code Organization Principles

Separation of Concerns: Each module has a clear, single responsibility
Dependency Injection: Configuration-driven architecture
Type Safety: Comprehensive type hints throughout
Error Handling: Graceful error handling with informative messages
Logging: Structured logging for debugging and monitoring

Coding Standards

Style: Follows PEP 8 with Black formatting
Documentation: Comprehensive docstrings following Google style
Type Hints: Required for all public APIs
Error Handling: Custom exceptions with clear error messages
Testing: Pytest with fixtures and mocking

Running Tests

# Run all tests
uv run pytest pytest

# Run with coverage
uv run pytest --cov --cov-report=html

# Run specific test modules
uv run pytest test/dataset/
uv run pytest test/experiments/ -v

Test Structure

# Example test structure
class TestWeedCropEDA:
    @pytest.fixture
    def sample_dataset(self):
        return create_mock_dataset()
    
    def test_dataset_distribution(self, sample_dataset):
        eda = WeedCropEDA(sample_dataset)
        stats = eda.dataset_distribution()
        assert stats['total_images'] > 0

Adding New Features

Create Feature Branch: git checkout -b feature/new-feature
Implement with Tests: Write tests first (TDD approach)
Update Documentation: Add docstrings and update README if needed
Run Quality Checks: Ensure tests pass and code follows standards
Submit PR: Include description of changes and test results

Performance Considerations

Memory Management: Use generators for large datasets
GPU Utilization: Automatic CUDA detection and fallback
Batch Processing: Optimized batch sizes for different hardware
Caching: Strategic caching of computed results

Deployment

Edge Deployment with NCNN

# Export optimized edge model
from cropedgeai.experiments import YOLONCNNHyperparameterOptimizer

optimizer = YOLONCNNHyperparameterOptimizer("config/ncnn_hpo.yaml")
ncnn_path = optimizer.export_ncnn(input_size=640)

# Run on the edge device
processor = YOLOBatchProcessor(
    model_path=ncnn_path,
    input_folder="edge_input/",
    output_folder="edge_output/",
    img_size=640,
    conf_threshold=0.25
)

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
config		config
src/cropedgeai		src/cropedgeai
test		test
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

manueljesus/CropEdgeAI

Folders and files

Latest commit

History

Repository files navigation

CropEdgeAI 🌱

Attribution

📋 Table of Contents

🎯 Overview & Features

🌟 Key Features

🌾 Supported Crops & Applications

🏗️ Technical Architecture

System Overview

Technology Stack

Module Architecture

⚡ Installation & Setup

Prerequisites

Quick Installation

Development Installation

🚀 Quick Start

1. Data Preparation

2. Exploratory Data Analysis

3. Dataset Splitting & Augmentation

4. Model Training & Benchmarking

5. Hyperparameter Optimization

6. Edge Deployment Optimization

7. Batch Inference

📖 Usage Documentation

Configuration Files

Experiment Configuration

HPO Configuration

Command Line Interface

Python API Examples

Custom Augmentation Pipeline

Advanced EDA

🧩 Module Documentation

📁 Dataset Module (dataset)

Key Components

Augmentation Sub-module

📊 EDA Module (eda)

Key Components

🧪 Experiments Module (experiments)

Key Components

🔮 Inference Module (inference)

Key Components

🎯 Main Entry Point (main.py)

🛠️ Development Guide

Setting Up Development Environment

Code Organization Principles

Coding Standards

Running Tests

Test Structure

Adding New Features

Performance Considerations

Deployment

Edge Deployment with NCNN

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Languages