This project explores the relationship between the Best Actress winner at the Academy Awards (Oscar) and the influence of other major awards. By analyzing the winners over the last decade (2014-2023), the goal is to uncover patterns and trends that can help predict this year's winner using Machine Learning.
- Analyze the last 10 years of Best Actress Oscar nominees historical.
- Examine the influence of other prestigious awards.
- Train a Logistic Regression Machine Learning (ML) model to predict this year’s Best Actress at the Oscars.
The following major awards were analyzed to evaluate their influence on the Best Actress winner at the Oscars:
- Golden Globes
- SAG Awards
- BAFTA
- Critics' Choice Awards
The dataset includes the following columns:
- ano: Year of the award season
- nome: Name of the actress
- indicada_sag: Nominated for the SAG Awards (Yes [1] /No [0])
- indicada_gg: Nominated for the Golden Globe Awards (Yes [1] /No [0])
- indicada_os: Nominated for the Oscar (Yes [1] /No [0])
- indicada_bafta: Nominated for the BAFTA (Yes [1] /No [0])
- indicada_cc: Nominated for the Critics' Choice Awards (Yes [1] /No [0])
- ganhou_sag: Won the SAG Award (Yes [1] /No [0])
- ganhou_bafta: Won the BAFTA (Yes [1] /No [0])
- ganhou_cc: Won the Critics' Choice Awards (Yes [1] /No [0])
- ganhou_gg: Won the Golden Globe (Yes [1] /No [0])
- ganhou_os: Won the Oscar (Yes [1] /No [0])
Note: The column ganhou_os (Won Academy Awards) is the target variable we are predicting.
For the prediction of the Best Actress winner, were employed:
- Logistic Regression: A simple yet effective model for binary classification, useful for predicting outcomes based on the given features.
- Python: For data analysis and model training
- BeautifulSoup: For web scraping
- Pandas: For data manipulation
- Matplotlib/Seaborn: For data visualization
Finally our model achieved the following results:
-
Precision and Recall of 0.500: These values indicate that the model is making a notable amount of errors in both directions—misclassifying positive examples (false positives) and failing to identify positive examples (false negatives). This is likely due to the scarcity of positive instances (1s) in the dataset.
-
F1-score of 0.500: This reflects the balanced relationship between precision and recall, showing that the model performs equally in both error directions.
-
Accuracy of 0.750: This suggests the model is making correct predictions in the majority of cases. However, it’s worth noting that accuracy might be skewed due to the dominance of class 0.
-
Remembering these results are influenced by the small sample size, as only one winner is selected per category each year. Based on our analysis, the two most probable candidates for winning the 97º Best Actress winner Oscars award are Demi Moore and Mikey Madison.
Important Note: The model is not capable of accounting for variables influenced by human bias, and predictions are based solely on available data, which may not capture all real-world influences.
- Add additional features such as gender, age, critic ratings, and news articles. This will help not only to analyze potential correlations but also to see if these features can improve the model's performance.
- Train different models to compare metrics
- Clone the repository:
git clone https://github.com/JamboTechUFPB/oscar-prediction/
- Navigate to the project folder:
cd oscar-prediction/
If you want to contribute, follow these steps:
- Fork the repository
- Create a branch for your feature:
git checkout -b my-feature
- Commit your changes:
git commit -m 'Adding new feature' - Push to the remote repository:
git push origin my-feature
- Open a Pull Request, we will analyze this
