Project - Webscrape-Faculty-Info

Little web scraping project to update the professor list

Project Overview

I was assigned to update an Excel file with information on the current faculty in the ECE Department, but I decided to turn it into a small project. Using Python, Beautiful Soup, pandas, and a custom module I created for data-related tasks, I scraped the relevant information from the department's website. In the end, I merged two separate lists to create a consolidated, up-to-date faculty list.

Faculty Site

Issues had

During this project, I encountered challenges accessing and parsing data wrapped in p tags, particularly when ensuring the correct identification of professors by their last names while preserving the original Excel document's styling. The complexity increased due to variations in the professor names—some included middle names or were as short as two letters—which made filtering and handling names more difficult. To maintain efficiency, I limited myself to a few hours of work over two days, which added an extra layer of urgency to the task.

Future Improvements

I plan to enhance the project by implementing a method to remove middle names, leaving only the first and last names of the professors. Additionally, I aim to maintain the original formatting of the Excel document during data processing. Expanding the functionality to access and extract other relevant information from the remaining p tags is also a key area for improvement.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
.gitattributes		.gitattributes
ECE_PantherSoft_ID_and_Employee_Position.xls		ECE_PantherSoft_ID_and_Employee_Position.xls
README.md		README.md
cleaner.py		cleaner.py
main.ipynb		main.ipynb
new prof. list		new prof. list
new prof. list.csv		new prof. list.csv
new_professors.csv		new_professors.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project - Webscrape-Faculty-Info

Project Overview

Issues had

Future Improvements

About

Uh oh!

Releases

Packages

Languages

DmanDSR/Project---Webscrape-Faculty-Info

Folders and files

Latest commit

History

Repository files navigation

Project - Webscrape-Faculty-Info

Project Overview

Issues had

Future Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages