Skip to content

Build and visualize a small knowledge graph of movies, directors, and genres using the Kaggle Movies Dataset, pandas, and networkx.

Notifications You must be signed in to change notification settings

HanineKh/Movie_Knowledge_Graph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎬 Movie Knowledge Graph

This project builds and visualizes a Knowledge Graph of movies, directors, and genres using the Kaggle Movies Dataset, processed in Python with pandas and visualized with networkx.

The project was developed and tested on Google Colab. https://colab.research.google.com/drive/1Kp0fB5VcTfnefknErd8omrzohUlwOZeC?usp=sharing


πŸ“Š Dataset

We use the following files from The Movies Dataset on Kaggle:

  • movies_metadata.csv
  • credits.csv

These files include metadata about movies, including:

  • Movie titles
  • Genres
  • Directors (from crew information)

πŸ› οΈ Features

βœ… Parses genres and directors from the dataset
βœ… Builds triples:

  • Movie β†’ has_genre β†’ Genre
  • Director β†’ directed β†’ Movie

βœ… Filters a small sample of movies with directors for better visualization
βœ… Builds a directed knowledge graph
βœ… Visualizes the graph with nodes colored by type:

  • Movies (pink)
  • Directors (purple)
  • Genres (blue)

πŸ“‚ Project Structure

πŸ“ Movie_Knowledge_Graph/

β”œβ”€β”€ credits.csv

β”œβ”€β”€ Example.png

β”œβ”€β”€ Movie_Knowledge_Graph.ipynb

β”œβ”€β”€ movies_metadata.csv

β”œβ”€β”€ README.md


πŸš€ How to Run

1️⃣ Download the two CSV files from Kaggle and save them to your computer.

2️⃣ Open Google Colab and upload:

  • movies_metadata.csv
  • credits.csv

3️⃣ Upload and run the Movie_Knowledge_Graph.ipynb file in Colab.

4️⃣ The notebook will:

  • Process and clean the data
  • Build the graph
  • Visualize it as a plot

πŸ–ΌοΈ Example Output

You’ll get a graph like this:

  • pink nodes: movies
  • purple nodes: directors
  • Blue nodes: genres
    Edges show the relationships (has_genre, directed).

example


πŸ“š Requirements

  • Google Colab (recommended) or Python 3.x
  • Python packages:
  • pandas
  • matplotlib
  • networkx

All packages are already available in Colab!


🀝 Contributing

Pull requests are welcome!
Feel free to open an issue if you have ideas, questions, or improvements.


πŸ“œ License

This project is open-source and free to use under the MIT License.


⭐ If you like it, please give the repo a ⭐ on GitHub!

About

Build and visualize a small knowledge graph of movies, directors, and genres using the Kaggle Movies Dataset, pandas, and networkx.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published