K-Means Clustering in R: Unsupervised Color Extraction from an Image

Last summer (2015), as I put myself through the paces in this brilliant course by one of my personal heroes, Andrew Ng, I grew exceedingly confident about my ability to implement complex machine learning approaches (I blame credit Dr. Ng). Consequently, upon finishing the course, I jumped straight into [what I later realized was] the deep end by signing up for the Metis¹ Naive Bees Classifier challenge, hosted by DrivenData.org² .

Nevertheless, despite the fact that my main intention was just to get my hands dirty with machine learning code,  I quickly realized that my approach to training an algorithm to differentiate between the Bees genus was rather, well… naive: I was trying to extract the dominant colors from the training images, using either Principal Components Analysis or K-Means clustering; once done, I wanted to run a classifier on this much smaller subspace of features. This turned out to be an ill-informed strategy – too embarrassed to post the training error –  simply because… well, take a look at some of the training images for yourself:

[Click “Read More” to read how I explored Kmeans clustering on these images.]

Read More »