This is my capstone project for CODE:You. The project analyzes four streaming services and their price histories to gain insights into content differences such as overall content amount, content types, genres, and more along with a comparison of price points. The goal of this project is to demonstrate a general knowledge of Python.
Four of the datasets used in this project contain content information about each streaming service listed below, including title, content type, genre, release year, IMDb ID, and IMDb Average Rating, all of which came from kaggle.com and are updated daily via an API used by the datasets' owner. The files uses below were last updated as of 03/23/2025. One dataset contains information related to pricing on each of the services for a specific timeframe noted as month-year. This dataset was manually derived. See details below
- Netflix
- Hulu
- Prime
- AppleTV
- Streaming_Service_Pricing_Histories
- Manually Derived From: Price_History_Reference.docx
This project is organized as follows:
- Preliminary Data Exploration: Jupyter notebooks or scripts to explore a dataset.
- Preliminary Data Manipulation: Using python for feature creation to differentiate the streaming service datasets.
- Data Cleaning & Preparation: Using python and other packages to clean and prepare data for analysis.
- Analysis: Using Python with the Pandas package to analyze the data.
- Visualizations: Using Matplotlib and Seaborn to visualize my findings.
- Summary: Summary of analysis/findings.
| Feature | Description |
|---|---|
| Read FIVE data files | Used 4 CSV files from Kaggle & created one of my own. |
| Created several seaborn & matplotlib plots, 4 Stacked Bar Charts, 3 WordClouds, & 1 Line Graph | Made various plots to visually show my findings |
| Utilized a virtual environment | Created a venv for this project to keep my computer clean |
| Utilized Markdown & Commenting in my Jupyter Notebook | Included Markdown Language and commenting in my code to describe each section of my project & to define clear notes describing each code block. |
| Best practices | Created a function to wrap text on the x-axis of several graphs |
The following is a guide to running the project files locally:
- If you want to save a copy on your GitHub, fork the repository located here, otherwise, move to step 2
- In your command center or in the terminal of VS Code, clone the repository to your on your local machine: 'git clone https://github.com/rkynhoff/Streaming_Service_Comparisons.git'
- Ensure your command center is opened to the folder in which you wish to save this repository
- Follow the first three steps in the "Virtual Environment Instructions" to create and activiate a virtual environment, depending on your operating system (OS)
- This step should also include installing the requirements.txt file
- Explore the Juptyer notebooks and contents in the respective folders.
- Open the "my_functions.py" file then run it
- Open the "STRM_SERV_COMP_V2.ipynb" file
- In the toolbar, select "Run All" to run the program
- Investigate the code blocks, comments, and markdown areas for insight into the program
- Refer to the data dictionaries within the Jupyter Notebook located after the intitial DataFrames load and after the final cleaned DataFrame, or their respecitve ipynb files if needed
- Helpful Hint: You may want to turn on Word Wrap as some of the cells contain comments/notes that would require scrolling without Word Wrap enabled
- To do this in VS Code:
- Select File > Preferences > Settings
- Type in Word Wrap in the search
- Toggle Word Wrap to "on" if not already on
- Jupyter Notebooks online (JupyterLab,JupyerLite, etc.)
- Select File > Wrap Words
- Choose to turn it on
- To do this in VS Code:
- If running an editor which requires the ipykernel extension, proceed with the install when prompted
- When you are finished perusing the repository, run the final line code for your OS from the Virtual Environment Instructions below
Depending upon your OS, enter the commands below into your terminal to create, activate and install a virtual environment on your machine Onlly use Deactivate when you are finished with the program
| Command | Linux/Mac | GitBash |
|---|---|---|
| Create | python3 -m venv venv |
python -m venv venv |
| Activate | source venv/bin/activate |
source venv/Scripts/activate |
| Install | pip install -r requirements.txt |
pip install -r requirements.txt |
| Deactivate | deactivate |
deactivate |
- pandas and numpy for data manipulation and analysis
- matplotlib and seaborn for data visualization
- wordcloud for generating word cloud visuals
- PIL (Python Imagining Library) for image processing
- textwrap for wrapping text on graph axes



