Add script to calculate dataset statistics and monthly label distribution for temporal segmentation#2
Add script to calculate dataset statistics and monthly label distribution for temporal segmentation#2
Conversation
Co-authored-by: peppocola <45401861+peppocola@users.noreply.github.com>
Co-authored-by: peppocola <45401861+peppocola@users.noreply.github.com>
|
I need you also to list the months you included in training for debugging reasons @copilot |
Co-authored-by: peppocola <45401861+peppocola@users.noreply.github.com>
Added training and test month file listings to the output in commit 244d5a8. The script now displays:
The files are sorted chronologically to make debugging easier. |
Co-authored-by: peppocola <45401861+peppocola@users.noreply.github.com>
Co-authored-by: peppocola <45401861+peppocola@users.noreply.github.com>
|
produce also a table with the distribution of labels for each month. I also need to know how many samples of each label i have till a given month. and i also need to know how many total samples i have if i consider each month till a given month. Can you add this as an output? @copilot |
Co-authored-by: peppocola <45401861+peppocola@users.noreply.github.com>
Added monthly label distribution feature in commit 70f6874. Use
Example output: Save to CSV with |
Co-authored-by: peppocola <45401861+peppocola@users.noreply.github.com>
Co-authored-by: peppocola <45401861+peppocola@users.noreply.github.com>
Co-authored-by: peppocola <45401861+peppocola@users.noreply.github.com>
Need to calculate training/test months and instances from configuration files to populate a LaTeX table for paper submission.
Implementation
calculate_dataset_stats.py: Script that reusesDataHandler,ConfigManager, andLabelMapperto calculate statistics using the same data loading logic as the classification pipelineUsage
Output Example
Dataset Statistics Summary
The script also outputs detailed month file listings for debugging:
Monthly Label Distribution
With
--monthly-distributionflag, the script generates a detailed table showing:CSV output format for monthly distribution:
Notes
data/windows/month_range_1/{project}/populated bycreate_time_windows.py_get_files_after_range()to load files chronologically after the training periodOriginal prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.