Last summer (2015), as I put myself through the paces in this brilliant course by one of my personal heroes, Andrew Ng, I grew exceedingly confident about my ability to implement complex machine learning approaches (I
blame credit Dr. Ng). Consequently, upon finishing the course, I jumped straight into [what I later realized was] the deep end by signing up for the Metis¹ Naive Bees Classifier challenge, hosted by DrivenData.org² .
Nevertheless, despite the fact that my main intention was just to get my hands dirty with machine learning code, I quickly realized that my approach to training an algorithm to differentiate between the Bees genus was rather, well… naive: I was trying to extract the dominant colors from the training images, using either Principal Components Analysis or K-Means clustering; once done, I wanted to run a classifier on this much smaller subspace of features. This turned out to be an ill-informed strategy – too embarrassed to post the training error – simply because… well, take a look at some of the training images for yourself:
[Click “Read More” to read how I explored Kmeans clustering on these images.]