CropEdgeAI is a complete Computer Vision pipeline designed for precision agriculture applications, featuring automated dataset processing, advanced data augmentation, hyperparameter optimization, and edge-optimized model deployment for real-time crop and weed detection.
This project uses data from:
- Weed-crop dataset in precision agriculture by Upadhyay et al., North Dakota State University
- Licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)
- DOI: 10.17632/mthv4ppwyw.2 Please check NOTICE for further details.
- ๐ฏ Overview & Features
- ๐๏ธ Technical Architecture
- โก Installation & Setup
- ๐ Quick Start
- ๐ Usage Documentation
- ๐งฉ Module Documentation
- ๐ ๏ธ Development Guide
- ๐ข Deployment
CropEdgeAI addresses the critical need for automated crop monitoring and weed detection in modern precision agriculture. The system provides an end-to-end solution from raw agricultural imagery to production-ready edge deployment.
- ๐ Complete ML Pipeline: From data ingestion to model deployment
- ๐ Advanced EDA Tools: Comprehensive exploratory data analysis with rich visualizations
- ๐จ Smart Data Augmentation: Overlay and close-up augmentation techniques for balanced datasets
- โก Hyperparameter Optimization: Optuna-powered HPO with multi-objective optimization
- ๐ฏ Multi-Model Support: YOLO v8/v9/v10/v11/v12 compatibility with automatic benchmarking
- ๐ฑ Edge Optimization: NCNN export and optimization for mobile/embedded deployment
- ๐ Batch Inference: High-performance batch processing with detailed analytics
- ๐ Rich Visualizations: Interactive plots for model performance and dataset insights
- ๐ Reproducible Experiments: Deterministic training with comprehensive logging
- โ๏ธ Cloud Integration: Google Drive backup and experiment synchronization
- Crops: Black bean, Canola, Corn, Field pea, Flax, Lentil, Soybean, Sugar beet
- Detection Tasks: Multi-class crop identification, weed detection, spatial analysis
- Use Cases: Precision spraying, yield optimization, field monitoring, automated farming
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Data Layer โ โ Processing โ โ Deployment โ
โ โ โ Layer โ โ Layer โ
โ โข Raw Images โ โ โข EDA Tools โ โ โข NCNN Models โ
โ โข Annotations โ โ โข Augmentation โ โ โข Batch Proc. โ
โ โข Validation โ โ โข Training โ โ โข Edge Deploy โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Core Modules โ โ Experiments โ โ Utilities โ
โ โ โ โ โ โ
โ โข Dataset Ops โ โ โข HPO Pipeline โ โ โข Visualizers โ
โ โข Augmenters โ โ โข Benchmarking โ โ โข Validators โ
โ โข Validators โ โ โข NCNN HPO โ โ โข Config Mgmt โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
| Category | Technologies |
|---|---|
| Core ML | PyTorch, Ultralytics YOLO, NCNN |
| Data Science | Pandas, NumPy, Scikit-learn, OpenCV |
| Optimization | Optuna |
| Visualization | Matplotlib, Seaborn, Adjusttext |
| Development | Python 3.11+, UV package manager, Pytest |
| Deployment | NCNN (mobile/edge) |
src/cropedgeai/
โโโ dataset/ # Data loading, validation, splitting
โ โโโ loader.py # Weed-crop dataset loader
โ โโโ validator.py # Data validation utilities
โ โโโ splitter.py # Stratified dataset splitting
โ โโโ augmenter/ # Data augmentation pipeline
โโโ eda/ # Exploratory data analysis
โ โโโ eda.py # Statistical analysis engine
โ โโโ visualizer.py # Advanced visualization tools
โโโ experiments/ # ML experiment management
โ โโโ yolo_experiment.py # Multi-model benchmarking
โ โโโ yolo_hpo.py # Hyperparameter optimization
โ โโโ yolo_ncnn_hpo.py # Edge deployment optimization
โโโ inference/ # Production inference
โ โโโ yolo_batch_processor.py # Batch processing engine
โโโ main.py # CLI entry point- Python: 3.11 or 3.12 (3.13 not yet supported)
- GPU: CUDA-compatible GPU recommended (optional but highly recommended)
- Storage: 40GB+ free space for models and datasets
# Clone the repository
git clone https://github.com/manueljesus/cropedgeai.git
cd cropedgeai
# Install with UV
pip install uv
uv syncuv sync --group devfrom cropedgeai.dataset import WeedCropDatasetLoader
# Load your dataset
loader = WeedCropDatasetLoader("path/to/weed-crop-dataset")
dataset = loader()
print(f"Loaded {len(dataset)} annotations")
print(f"Classes: {dataset['class_name'].unique()}")from cropedgeai.eda import WeedCropEDA, WeedCropVisualizer
# Perform EDA
eda = WeedCropEDA(dataset, dataset_name="My Dataset")
stats = eda.dataset_distribution()
...
# Generate visualizations
visualizer = WeedCropVisualizer(eda)
visualizer.plot_class_balance()
visualizer.plot_spatial_distribution()
...from cropedgeai.dataset import DatasetSplitter, DatasetAugmenter
# Split dataset
splitter = DatasetSplitter(dataset, validation_size=0.15, test_size=0.15, output_dir="./processed", organize_files=True)
train, val, test = splitter()
# Augment training data
augmenter = DatasetAugmenter(train, "./augmented", "./processed")
augmented_dataset = augmenter(
run_overlay=True,
run_closeup=True,
overlay_images=1000,
weeds_per_image=5
)from cropedgeai.experiments import YOLOExperiment
# Run multi-model benchmark
experiment = YOLOExperiment("config/baseline_experiment.yaml")
results = experiment()from cropedgeai.experiments import YOLOHyperparameterOptimizer
optimizer = YOLOHyperparameterOptimizer("config/hpo_yolo11n.yaml")
hpo_results = optimizer()from cropedgeai.experiments import YOLONCNNHyperparameterOptimizer
ncnn_optimizer = YOLONCNNHyperparameterOptimizer("config/ncnn_hpo.yaml")
results = ncnn_optimizer()
ncnn_path = ncnn_optimizer.export_ncnn(best_size) # Check results to confirm `best_size`# CLI inference
cropedgeai model.pt input_folder/ output_folder/ \
--conf-threshold 0.35 \
--img-size 640 \
--stats-csv detections.csv# Programmatic inference
from cropedgeai.inference import YOLOBatchProcessor
processor = YOLOBatchProcessor(
model_path="best.pt",
input_folder="test_images/",
output_folder="results/",
stats_csv_file="detection_stats.csv",
conf_threshold=0.35
)
processor()CropEdgeAI uses YAML configuration files for experiment management:
# config/my_experiment.yaml
name: "My_Crop_Experiment"
description: "Custom crop detection experiment"
dataset_yaml: "data/dataset.yaml"
models_to_test:
yolo11n: "yolo11n.pt"
yolo11s: "yolo11s.pt"
training_config:
epochs: 100
batch: 32
imgsz: 640
patience: 20# config/my_hpo.yaml
name: "crop_hpo"
dataset_yaml: "data/dataset.yaml"
base_model_path: "yolo11n.pt"
hyperparameter_search_space:
lr0:
type: "float"
min: 0.0001
max: 0.01
log: true
mosaic:
type: "float"
min: 0.0
max: 1.0# Basic inference
cropedgeai model.pt input/ output/
# Advanced inference with custom parameters
cropedgeai model.ncnn input/ output/ \
--conf-threshold 0.4 \
--img-size 416 \
--stats-csv results.csv
# Help and options
cropedgeai --helpfrom cropedgeai.dataset.augmenter import DatasetAugmenter
augmenter = DatasetAugmenter(
train=train_data,
output_dir="augmented_output",
validation_dir="validation_data",
random_seed=42
)
# Custom augmentation parameters
augmented = augmenter(
run_overlay=True,
run_closeup=True,
overlay_images=2000,
weeds_per_image=3,
closeup_padding=2.0
)from cropedgeai.eda import WeedCropEDA
eda = WeedCropEDA(dataset, "Advanced Analysis")
# Get comprehensive statistics
stats = eda.dataset_distribution()
class_balance = eda.class_balance_scores()
bbox_stats = eda.bounding_boxes_stats()
# Generate detailed report
report = eda.generate_summary_report()
print(report)
# Custom visualizations
from cropedgeai.eda import WeedCropVisualizer
viz = WeedCropVisualizer(eda)
viz.plot_cooccurrence_matrix("crop")
viz.plot_spatial_distribution_per_crop()
viz.plot_bbox_size_distributions()Purpose: Comprehensive dataset management with loading, validation, splitting, and augmentation capabilities.
WeedCropDatasetLoader: Loads Weed-crop datasets with automatic validationDatasetValidator: Ensures data integrity and format complianceDatasetSplitter: Stratified splitting preserving class distributionsDatasetAugmenter: Orchestrates augmentation pipeline
BaseAugmenter: Abstract base for augmentation techniquesOverlayAugmenter: Overlays weed instances onto crop imagesCloseupAugmenter: Creates focused crops around objects
Purpose: Statistical analysis and visualization tools for agricultural datasets.
-
WeedCropEDA: Core statistical analysis engine- Dataset distribution metrics
- Class balance analysis
- Bounding box statistics
- IoU calculations
- Comprehensive reporting
-
WeedCropVisualizer: Advanced visualization toolkit- Class distribution plots
- Spatial distribution heatmaps
- Co-occurrence matrices
- Bounding box analysis
- Interactive image annotation display
Purpose: ML experiment orchestration with benchmarking, HPO, and edge optimization.
YOLOExperiment: Multi-model benchmarking pipelineYOLOHyperparameterOptimizer: Optuna-powered HPO with advanced search spacesYOLONCNNHyperparameterOptimizer: Edge deployment optimizationTrainingConfig: Structured training configuration management
Purpose: Production-ready inference capabilities with batch processing and performance analytics.
YOLOBatchProcessor: High-performance batch inference- Multi-format model support (PyTorch, NCNN)
- Automatic performance profiling
- CSV statistics export
- Progress tracking and logging
Purpose: CLI interface for production inference workflows.
Features:
- Argument parsing and validation
- Model compatibility checks
- Batch processing orchestration
- Results summarization
# Clone and setup
git clone https://github.com/manueljesus/cropedgeai.git
cd cropedgeai
# Install development dependencies
uv sync --group dev
# Install pre-commit hooks (optional)
pre-commit install- Separation of Concerns: Each module has a clear, single responsibility
- Dependency Injection: Configuration-driven architecture
- Type Safety: Comprehensive type hints throughout
- Error Handling: Graceful error handling with informative messages
- Logging: Structured logging for debugging and monitoring
- Style: Follows PEP 8 with Black formatting
- Documentation: Comprehensive docstrings following Google style
- Type Hints: Required for all public APIs
- Error Handling: Custom exceptions with clear error messages
- Testing: Pytest with fixtures and mocking
# Run all tests
uv run pytest pytest
# Run with coverage
uv run pytest --cov --cov-report=html
# Run specific test modules
uv run pytest test/dataset/
uv run pytest test/experiments/ -v# Example test structure
class TestWeedCropEDA:
@pytest.fixture
def sample_dataset(self):
return create_mock_dataset()
def test_dataset_distribution(self, sample_dataset):
eda = WeedCropEDA(sample_dataset)
stats = eda.dataset_distribution()
assert stats['total_images'] > 0- Create Feature Branch:
git checkout -b feature/new-feature - Implement with Tests: Write tests first (TDD approach)
- Update Documentation: Add docstrings and update README if needed
- Run Quality Checks: Ensure tests pass and code follows standards
- Submit PR: Include description of changes and test results
- Memory Management: Use generators for large datasets
- GPU Utilization: Automatic CUDA detection and fallback
- Batch Processing: Optimized batch sizes for different hardware
- Caching: Strategic caching of computed results
# Export optimized edge model
from cropedgeai.experiments import YOLONCNNHyperparameterOptimizer
optimizer = YOLONCNNHyperparameterOptimizer("config/ncnn_hpo.yaml")
ncnn_path = optimizer.export_ncnn(input_size=640)
# Run on the edge device
processor = YOLOBatchProcessor(
model_path=ncnn_path,
input_folder="edge_input/",
output_folder="edge_output/",
img_size=640,
conf_threshold=0.25
)