Jerome Chua

Movie Genre Prediction

Presentation Slides
GitHub Repo

Wrote a multi-label classification algorithm to predict unseen movie posters genres with 0.8 model accuracy

Problem Statement

  •   Are we able to predict the labelled set of unseen posters through analysing training instances with a known labelled set?

Technologies Used

  • Image Processing: scikit-learn
  • Data Modelling: keras, pandas, numpy
  • Data Visulisation: seaborn, matplotlib
  • Others: os, itertools, collections

Challenges

  • Over representation of certain genres
  • Subjective movie labelling by IMDb sources & varied posters for release
  • Limited packages to deal with multi-label classification problems i.e iterative stratification for splitting data effectively, one-error loss etc.
  • Transfer learning proved to be of little value in this multi-label problem. A ConvNet was our model of choice in achieving a high model accuracy (0.9) as compared to VGG16 (0.6)

What I'd Do Differently

  • Conduct SMOTE to severely under/over sample the majority/minority class to create a large number of distinct training sets to increase performance
  • Employ iterative stratification of train and test
  • Further explore interpretability of the ConvNet
  • Deepen understanding of inner workings of the ConvNet