This project builds and visualizes a Knowledge Graph of movies, directors, and genres using the Kaggle Movies Dataset, processed in Python with pandas and visualized with networkx.
The project was developed and tested on Google Colab. https://colab.research.google.com/drive/1Kp0fB5VcTfnefknErd8omrzohUlwOZeC?usp=sharing
We use the following files from The Movies Dataset on Kaggle:
movies_metadata.csvcredits.csv
These files include metadata about movies, including:
- Movie titles
- Genres
- Directors (from crew information)
β
Parses genres and directors from the dataset
β
Builds triples:
- Movie β has_genre β Genre
- Director β directed β Movie
β
Filters a small sample of movies with directors for better visualization
β
Builds a directed knowledge graph
β
Visualizes the graph with nodes colored by type:
- Movies (pink)
- Directors (purple)
- Genres (blue)
π Movie_Knowledge_Graph/
βββ credits.csv
βββ Example.png
βββ Movie_Knowledge_Graph.ipynb
βββ movies_metadata.csv
βββ README.md
1οΈβ£ Download the two CSV files from Kaggle and save them to your computer.
2οΈβ£ Open Google Colab and upload:
movies_metadata.csvcredits.csv
3οΈβ£ Upload and run the Movie_Knowledge_Graph.ipynb file in Colab.
4οΈβ£ The notebook will:
- Process and clean the data
- Build the graph
- Visualize it as a plot
Youβll get a graph like this:
- pink nodes: movies
- purple nodes: directors
- Blue nodes: genres
Edges show the relationships (has_genre,directed).
- Google Colab (recommended) or Python 3.x
- Python packages:
- pandas
- matplotlib
- networkx
All packages are already available in Colab!
Pull requests are welcome!
Feel free to open an issue if you have ideas, questions, or improvements.
This project is open-source and free to use under the MIT License.
β If you like it, please give the repo a β on GitHub!
