Enron Data Loading Optimization

Took a piece of code from a notebook on kaggle that loads the Enron Email Dataset and optimized it.
Using multiprocessing library to be able to run code on multiple cores, and thus achieving 2mins55.14seconds vs 4mins9.27seconds on my machine.
The code uses multiprocessing pool class with imap and map, imap was used for it's less memory intesive.
Also there is a method that can be used with a 'with block' so you can enclose any code to measure it's perfomance easily.
For testing you will need the email dataset, that can be found on kaggle, it is not posted here since it is a large file(1.4GB).
( kaggle notebook https://www.kaggle.com/code/zichen/explore-enron/notebook )

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitignore		.gitignore
README.md		README.md
enron_data_loadding_kaggle.ipynb		enron_data_loadding_kaggle.ipynb
enron_data_loading_optimized.ipynb		enron_data_loading_optimized.ipynb
my_functions.py		my_functions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Enron Data Loading Optimization

About

Uh oh!

Releases

Packages

Languages

marius-florea/Enron-Data-Loading

Folders and files

Latest commit

History

Repository files navigation

Enron Data Loading Optimization

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages