Man vs Machine! Predicting the Academy Awards

Andrew Nolan

2023-03-25

Every year I try my best to complete the Oscars Death Race. I enjoy watching all the movies and I appreciate the chance to learn about and watch some films I would not have watched otherwise. Like this year, one of my favorite films was Mrs. Harris Goes to Paris, not the type of movie I would normally watch. But this post isn't about movies, it's about machine learning.

I didn't predict any of them would win. But I'm glad they did! Especially the Everything Everywhere all at once folks, they deserved it!

AI and machine learning are all the rage right now. This isn't another generative text algorithm post though. In an effort to teach myself more about machine learning and refine my skills, I decided to make this year's Oscar watching experience a challenge. Can I train an algorithm to predict the Academy Award winners better than I can.

And the results are, I can! I will admit none of the models I trained performed that well, but they did beat me this year. (Although the Oscar results this year were somewhat surprising, that's a discussion for another time).

The following table summarizes the predictions, training, and final accuracy of me and the five models.

Prediction Results

Category	My Predictions	Naive Bayes	KNN	Logistic Regression	Random Forest	SVM	Actual Results
actor in a leading role	Austin Butler - Elvis	The Banshees of Inisherin	Elvis	The Banshees of Inisherin	The Banshees of Inisherin	Elvis	The Whale
actor in a supporting role	Brendan Gleeson - The Banshees of Inisherin	Everything Everywhere All at Once	Everything Everywhere All at Once	The Banshees of Inisherin	Everything Everywhere All at Once	Causeway	Everything Everywhere All at Once
actress in a leading role	Cate Blanchett - Tar	Everything Everywhere All at Once	Everything Everywhere All at Once	To Leslie	Everything Everywhere All at Once	Tar	Everything Everywhere All at Once
actress in a supporting role	Kerry Condon - The Banshees of Inisherin	Everything Everywhere All at Once	Everything Everywhere All at Once	The Banshees of Inisherin	Everything Everywhere All at Once	Black Panther: Wakanda Forever	Everything Everywhere All at Once
animated feature film	Guillermo del Toro's Pinocchio	Guillermo del Toro's Pinocchio	Guillermo del Toro's Pinocchio	Puss in Boots: The Last Wish	Puss in Boots: The Last Wish	Guillermo del Toro's Pinocchio	Guillermo del Toro's Pinocchio
cinematography	All Quiet on the Western Front	All Quiet on the Western Front	All Quiet on the Western Front	Tar	All Quiet on the Western Front	Bardo, False Chronicle of a Handful of Truths	All Quiet on the Western Front
costume design	Everything Everywhere All at Once	Everything Everywhere All at Once	Everything Everywhere All at Once	Everything Everywhere All at Once	Everything Everywhere All at Once	Black Panther: Wakanda Forever	Black Panther: Wakanda Forever
directing	Everything Everywhere All at Once	Everything Everywhere All at Once	Everything Everywhere All at Once	Triangle of Sadness	Everything Everywhere All at Once	The Banshees of Inisherin	Everything Everywhere All at Once
documentary feature film	Navalny	Fire of Love	All That Breathes	Fire of Love	All the Beauty and the Bloodshed	All That Breathes	Navalny
documentary short film	The Elephant Whisperers	The Elephant Whisperers	The Elephant Whisperers	The Elephant Whisperers	Haulout	The Elephant Whisperers	The Elephant Whisperers
film editing	Top Gun: Maverick	Everything Everywhere All at Once	Everything Everywhere All at Once	Top Gun: Maverick	Everything Everywhere All at Once	Elvis	Everything Everywhere All at Once
international feature film	All Quiet on the Western Front	All Quiet on the Western Front	All Quiet on the Western Front	The Quiet Girl	All Quiet on the Western Front	Argentina, 1985	All Quiet on the Western Front
makeup and hairstyling	Elvis	Black Panther: Wakanda Forever	All Quiet on the Western Front	The Batman	All Quiet on the Western Front	Black Panther: Wakanda Forever	The Whale
music (original score)	Babylon	Everything Everywhere All at Once	All Quiet on the Western Front	Everything Everywhere All at Once	Everything Everywhere All at Once	Babylon	All Quiet on the Western Front
music (original song)	RRR	Everything Everywhere All at Once	Everything Everywhere All at Once	RRR	Everything Everywhere All at Once	Black Panther: Wakanda Forever	RRR
best picture	Everything Everywhere All at Once	Everything Everywhere All at Once	All Quiet on the Western Front	Top Gun: Maverick	The Banshees of Inisherin	Avatar: The Way of Water	Everything Everywhere All at Once
production design	Babylon	Avatar: The Way of Water	All Quiet on the Western Front	Babylon	All Quiet on the Western Front	Avatar: The Way of Water	All Quiet on the Western Front
short film (animated)	Ice Merchants	My Year of Dicks	The Boy, the Mole, the Fox and the Horse	Ice Merchants	Ice Merchants	The Boy, the Mole, the Fox and the Horse	The Boy, the Mole, the Fox and the Horse
short film (live action)	Le Pupille	Le Pupille	An Irish Goodbye	The Red Suitcase	Le Pupille	An Irish Goodbye	An Irish Goodbye
sound	All Quiet on the Western Front	Avatar: The Way of Water	All Quiet on the Western Front	Top Gun: Maverick	All Quiet on the Western Front	Avatar: The Way of Water	Top Gun: Maverick
visual effects	Avatar: The Way of Water	Black Panther: Wakanda Forever	All Quiet on the Western Front	Top Gun: Maverick	All Quiet on the Western Front	Avatar: The Way of Water	Avatar: The Way of Water
writing (adapted screenplay)	Women Talking	All Quiet on the Western Front	All Quiet on the Western Front	Top Gun: Maverick	All Quiet on the Western Front	All Quiet on the Western Front	Women Talking
writing (original screenplay)	Everything Everywhere All at Once	Everything Everywhere All at Once	Everything Everywhere All at Once	Triangle of Sadness	Everything Everywhere All at Once	Everything Everywhere All at Once	Everything Everywhere All at Once
Training Accuracy	60.86%*	66.15%	79.68%	79.68%	80.27%	79.68%	-
True Accuracy	11/23 (47%)	11/23 (47%)	14/23 (61%)	4/23 (17%)	8/23 (35%)	6/23 (26%)	-

*My training accuracy is based on the average of my predictions the past 3 years.

Results

K-nearest neighbors (KNN) performed the best and Logistic Regression performed the worst. Interesting to see how well or poorly they both generalized considering their training accuracy (basic cross-validation).

I originally expected Random Forest to be the most successful as it is an ensemble method and it's often considered one of the best (non-deep learning) models for classifications. It also has more hyperparameters for me to play with and tune, so I was optimistic it would be a stronger model.

KNN did the best though. In the code I did hyperparameter tuning on each of my models. Although I only tuned two for KNN, in the final model it was determined that Manhattan Distance and 20 neighbors were the best parameters to use. So that is what our winning algorithm did.

Although KNN won, the other models (mostly) did well too! The expected value of predictions is 4.5 out of 23 if you guess completely randomly (1/5 * 22 + 1/10). All categories have 5 nominees except best picture which has 10. Therefore, all of these models did better than true random except Logistic Regression.

That said, I should mention, the winners were chosen based on the highest probability of winning according to the models. All the models, except Naive Bayes, predicted it was more likely every movie lost every category. But obviously we can't have that so I used the highest probabilities for winning (lowest odds of losing) as each winner.

I personally only got 11 right, so I lost to the machine this year. But this was a fun learning experience and I hope to try it again next year! Although, if I improve the code I'll probably just lose even harder...

Technical Details

This was a pretty bare bones machine learning experiment. It's been a while since I have worked with machine learning so this was a reintroduction for me, and thus I kept it simple.

If you want to check out the code, you can take a look at the Git repo and mess around with it yourself here.

This project went through the full data mining process. After searching for a while, I found a Kaggle dataset that had all of the Academy Award data since the first Academy Awards in 1929. I downloaded that and we were off to a good start.

We needed to do some data cleaning before we could use it. I converted all special characters in the titles to their unaccented versions. This was needed to webscrape Letterboxd (more about that below). Any row without a film (like special awards) we didn't need, so we removed those rows.

The second step in the process was to do the web scraping and feature building. Nominees and categories from the Kaggle set were a good start but not much to work with. To build out the dataset, I used two additional sources of data, Letterboxd and Twitter.

Letterboxd is the top film-watching social media (in my opinion). I figured the ratings on Letterboxd may be useful data for determining if a film is an award winner. Using the Letterboxdpy package I scraped data for each film in the dataset. I collected the average user rating and also the genre from here.

From Twitter, I used the snscrape package to search for the film title + "oscars" and film title + "academy awards." With that data we collected the average likes and retweets on some of the search results to determine a metric for engagement. I also used the textblob package to perform a quick sentiment analysis on the data. This gave us additional features of sentiment and subjectivity of the tweets.

For both Twitter and Letterboxd, if any movies were included more than once, I reused the data instead of scraping again. This was a small optimization for improving the time it took to build the data. Scraping the web data is pretty slow, so the process to build the data set (not even training models yet) took about 6 hours to run.

With the data collected, I was almost ready to jump into the fun machine learning stuff.

First we needed to preprocess all this data. There were a few steps to this.

For the training data, we dropped any rows with NaN (not a number) or null values. This mostly included special awards (a category we weren't predicting) and some older films. The dataset was around 10,000 records long, so dropping a few still left us with a good set of data to train on.

For this year's data, I couldn't drop any rows since we need to predict on all this year's results! The only NaN values in this case were when letterboxd scaping failed. To account for that, I imputed the missing ratings as averages of the other ratings. Not perfect, but it at least got us working data. This could be improved next time.

For the "genre" and "nomination category" features, I needed to convert these from nominal variables to categorical variables. In order to make this something useful we could train and predict on, I used one-hot encoding to make each category into its own column/variable.

After creating the one-hot encodings I had to align the test data and this year's unknown data to ensure that there were no missing columns when we went to predict later.

Now that we had all the data ready we could start machine learning!

First I plotted a correlation heatmap to show the relationships between all the features. Unfortunately, due to all the one hot columns it got pretty big...

Heatmap of the correlation of all the variables we are going to train on.

Both text analytics values, sentiment and subjectivity seemed correlated. The film count (number of times a film was nominated) correlated a bit with its Letterboxd rating, sentiment, and subjectivity as well.

Not too surprising, Animated Short and the Animation Genre also had some strong correlations.

With the data ready, we jumped into training. I used scikit-learn to train the models. Since I was only using basic classification algorithms, they trained quickly. This let me implement several different models. I chose 5 different types of classification algorithms to train, a fun bonus experiment to see which model worked best:

Gaussian Naive Bayes
Random Forest
K-Nearest Neighbors
Support Vector Classification
Logistic Regression

During the training sci-kit made it super easy. We used 20% of the data for cross validation to get our training accuracy. I also used the builtin Randomized Search CV and Grid Search CV for hyperparameter tuning on the models that supported it.

Once all of this was there I predicted the outcomes using each model and saved it to a CSV. The results are seen in the table above.

I used a lot of great packages to make this work. Apart from machine learning, my biggest takeaway from this experience was how massive the network of Python packages is! People really have done everything, so it's a very easy ecosystem to jump into and just start working. The packages I used are:

letterboxdpy- to scrape Letterboxd data.
numpy- you always need this one.
pandas- to create dataframes and handle the data.
scikit-learn- to train the machine learning models.
seaborn- for data visualization.
snscrape- to scrape Twitter data.
textblob- for sentiment analysis.
unidecode- To clean up special characters

Future Work

I had a lot of fun with this project this year, but there is still a lot of room to grow. The Oscars aren't going away, so next year we will be back with, hopefully, an even stronger set of models!

This code was very basic and can be optimized in the future. A few ideas I have for improving these predictions next year are:

Additional Data Points

We used Twitter and Letterboxd data this year. Some other features that may be helpful to include in the model are:

Rotten Tomatoes data, either Audience or Critic scores (maybe both).
Other awards data: For example, Golden Globes or BAFTA winners.
Scraping another social media, maybe Reddit?
Google search trends.

I think these additional data points may reveal some additional patterns we can use to train the models.

Deep Learning

Speaking of training models. This year I focused on basic, classical machine learning models. Next year, I think it could be good to do some deep learning and build a neural network from this dataset. The hope would be a multilayered neural net can uncover more patterns we didn't see and perform with even better accuracy.

More Specific Models

This year, the models were trained on all data to predict for all categories. This worked pretty well. But, it may work better to train models specifically for each category instead of generalizing to all categories. This would be more time to train but may get better results.

This was a fun project, and I am sad I lost. Next year I hope to train an even better model and lose by even more! (I'm sure I'll get over it by then).

Thanks for reading! If you want to check out the code you can find it in this Git repo.

Andrew's Blog