A comprehensive Python project for processing, analyzing, and visualizing single and unified meteor observing data from CSV files.
CemPyLib provides a structured framework for working with meteor observation data. It includes:
- Data loading and validation for unified and all meteor datasets
- Data cleaning and processing with batch support for large files
- Statistical analysis and meteor stream identification
- Visualization of meteor characteristics and orbital elements
- Memory optimization for efficient processing
- Multiple output formats (CSV, Parquet, JSON)
CemPyLib/
│
├── data/ # Data directory
│ ├── raw/ # Place your CSV files here
│ │ ├── *_unified.csv # Unified meteor observations
│ │ └── *_all.csv # All meteor observations
│ └── processed/ # Output will go here
│
├── src/ # Source code
│ ├── config.py # Configuration settings
│ ├── data_loader.py # Functions to load and validate CSV data
│ ├── data_processor.py # Core data processing functions
│ ├── data_analyzer.py # Analysis functions
│ ├── data_visualizer.py # Visualization functions
│ └── utils.py # Utility functions
│
├── scripts/
│ └── run_processing.py # Main script to run the complete pipeline
│
├── tests/ # Unit tests (for future development)
│
├── requirements.txt # Python dependencies
└── readme.md # This file
-
Clone or download the repository:
git clone https://github.com/yourusername/CemPyLib.git cd CemPyLib -
Install dependencies:
pip install -r requirements.txt
Note: If using Anaconda/conda, you can also install with:
conda install pandas numpy scipy matplotlib seaborn scikit-learn pyarrow tqdm joblib
-
Verify installation:
python scripts/run_processing.py --help
-
Place your meteor CSV files in the
data/raw/directory. -
Run the processing pipeline:
python scripts/run_processing.py -
View the results in the
data/processed/directory.
The main script supports various command line options:
python scripts/run_processing.py --help
Common options:
--input-dir PATH: Custom input directory--output-dir PATH: Custom output directory--batch-size N: Process data in batches of size N--skip-cleaning: Skip data cleaning step--skip-visualization: Skip visualization generation--optimize-memory: Optimize DataFrame memory usage--stream-filter STREAM: Filter data by specific meteor stream--save-parquet: Save processed data in Parquet format
The processing pipeline generates several output files in data/processed/:
-
Processed CSV files: Cleaned and processed meteor data
unified_YYYYMMDD_HHMMSS_*.csv- Processed unified observationsall_YYYYMMDD_HHMMSS_*.csv- Processed all observations
-
Statistics JSON: Statistical summary of the data
statistics_YYYYMMDD_HHMMSS.json- Descriptive statistics, distribution metrics
-
Stream Analysis: Analysis of meteor streams
stream_analysis_YYYYMMDD_HHMMSS.csv- Stream counts and statistics
-
Outlier Detection: Identified outliers in the data
outliers_YYYYMMDD_HHMMSS.csv- Flagged outliers for key parameters
-
Visualizations (if enabled):
meteor_distribution.png- Sky distribution (RA/Dec)velocity_distribution.png- Velocity histograms and box plotsorbital_elements.png- Orbital parameter analysisstream_analysis.png- Meteor stream comparisonsreport.html- Interactive HTML report with all visualizations
The project expects CSV files with 106 columns of meteor data including:
Key Columns:
_Version,_#- Version and identifier_localtime,_mjd,_sol- Time information_ID1,_ID2- Observer identifiers_amag- Apparent magnitude_ra_o,_dc_o- Right ascension and declination (observed)_ra_t,_dc_t- Right ascension and declination (theoretical)_vg,_vo,_vi,_vs- Velocity measurements_a,_q,_e,_incl,_peri,_node- Orbital elements_stream- Meteor stream identifier
File Naming Convention:
- Unified observations:
*_unified.csv(e.g.,U2_20240105_unified.csv) - All observations:
*_all.csv(e.g.,U2_20240105_all.csv)
ImportError: No module named 'pandas'
- Solution: Install dependencies with
pip install -r requirements.txt
FileNotFoundError: No CSV files found
- Solution: Ensure CSV files are placed in
data/raw/directory - Check that files follow naming convention (
*_unified.csvor*_all.csv)
MemoryError: Large datasets
- Solution: Use
--batch-sizeoption to process in smaller chunks - Example:
python scripts/run_processing.py --batch-size 5000
Visualization issues
- Solution: Use
--skip-visualizationto skip plots if matplotlib has issues - Example:
python scripts/run_processing.py --skip-visualization
This version includes several fixes and improvements:
- Fixed file naming inconsistencies (hyphen vs underscore)
- Added missing pandas import in main script
- Fixed missing return statement in
optimize_dataframe()function - Completed incomplete
plot_velocity_distribution()function - Fixed column name issue in
analyze_meteor_streams()function - Updated README with accurate instructions and troubleshooting
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.