In this 4-hour workshop, students will learn basic data processing skills using Python. Attendees will learn how to import code from other modules and packages to take advantage of the existing Python ecosystem. After seeing how to access packages, we will explore popular data analysis packages. We will see how to use NumPy to perform operations on large data arrays and how to use Matplotlib to generate clear data visualisations. We will also scratch the surface on using pandas to store data in tables. Along the way, we will discuss how to approach new, unfamiliar packages and learn how to use them.
By the end of this workshop, you should be able to:
- Import code from existing modules and packages.
- Use NumPy to easily process multidimensional data.
- Use Matplotlib to generate different types of plots to visualise data.
- Use pandas to represent data stored in tables.
- Approach a new package and explore its documentation and examples.
- Basic knowledge of Python is required.
- Attendees must be comfortable using variables for simple data types, as well as collections. Attendees should also be comfortable with loops and control flow and be familiar with the basics of using functions in Python.
- To be able to participate in the exercises, participants must either:
- (Preferred) Have a Google Account to run in-browser as a Colab notebook
- Have a local installation of Python and software to edit Jupyter notebooks (e.g., Jupyter Lab, Microsoft Visual Studio Code, PyCharm)
This workshop is intended to be interactive. Before the workshop, please download the materials from this repository. You can download the material as a ZIP file using the green button higher up on this page, or you can simply clone this repository by typing the following in a terminal:
git clone https://github.com/QLS-MiCM/DataProcessingInPython.gitIn your Python environment, you must have the following packages installed:
- NumPy
- Matplotlib
- pandas
If you don't want to install anything locally, you can open the workshop materials using Google Colab:
- Student version (with blank fields): https://colab.research.google.com/github/QLS-MiCM/DataProcessingInPython/blob/main/Exercises/scripts/DataProcessingPython.ipynb
- Compact student version (with blank fields and shorter explanations): https://colab.research.google.com/github/QLS-MiCM/DataProcessingInPython/blob/main/Exercises/scripts/DataProcessingPythonCompact.ipynb
- Solution version (filled out): https://colab.research.google.com/github/QLS-MiCM/DataProcessingInPython/blob/main/Exercises/solutions/DataProcessingPython.ipynb
⚠ Warning: Make sure that
using_colab = Truein the first code cell and run that cell to download all the data files required for this workshop.
This workshop material relies heavily on the documentation of the various projects discussed, including NumPy, Matplotlib, pandas, conda and pip, as well as the official Python documentation. Links to relevant documentation pages are provided throughout the Jupyter notebook. There are also references to a few other useful tutorials.
This workshop is based on previous iterations of this workshop (as Intermediate Python) and the Intro to Python workshop, which can be found at the following repositories:
- Intro to Python:
- Intermediate Python:
Colab badge created using https://shields.io.
Some cool Markdown tricks can be found at https://www.markdownguide.org/hacks/.
Workshop created as part of the McGill Initiative in Computational Medicine.
For more information about the QLS-MiCM, visit: https://www.mcgill.ca/micm/.
The contents of this repository are licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.