Jerome Chua

Movie Genre Prediction

Presentation Slides

GitHub Repo

Wrote a multi-label classification algorithm to predict unseen movie posters genres with 0.8 model accuracy

Problem Statement

Are we able to predict the labelled set of unseen posters through analysing training instances with a known labelled set?

Technologies Used

Image Processing: scikit-learn
Data Modelling: keras, pandas, numpy
Data Visulisation: seaborn, matplotlib
Others: os, itertools, collections

Challenges

Over representation of certain genres
Subjective movie labelling by IMDb sources & varied posters for release
Limited packages to deal with multi-label classification problems i.e iterative stratification for splitting data effectively, one-error loss etc.
Transfer learning proved to be of little value in this multi-label problem. A ConvNet was our model of choice in achieving a high model accuracy (0.9) as compared to VGG16 (0.6)

What I'd Do Differently

Conduct SMOTE to severely under/over sample the majority/minority class to create a large number of distinct training sets to increase performance
Employ iterative stratification of train and test
Further explore interpretability of the ConvNet
Deepen understanding of inner workings of the ConvNet