Saturday, January 18, 2014

Machine Learning - Stanford@Coursera




So, I just got done with another course on Coursera; this time it is Andrew Ng's Machine Learning.  I have tried to summarize is what I've learnt over the past 10 weeks. This post may end up being a bit too technical and one dimensional; and that's because the course goes depth wise into Machine Learning, but it still just a baby step into that field. 

To start, What is Machine Learning? There are several definitions of Machine Learning available online and I'm not going repeat one of them here. Instead, let me tell you what Machine Learning is not..  Machine Learning is certainly not Big Data.. It precedes MapReduce and NoSQL by a couple of decades. Machine Learning can work without terabytes of data. The programming assignments given out in this course covered a wide range of application such as Handwriting recognition, predicting house prices, Image compression etc and all of them used nominal amounts of training data. Big data, in turn, is not something that is solely used for predictive analysis (Machine Learning) problems. It can be used for simple applications like searching log files for keywords that occur most often etc. But, Machine Learning is supposed to work well with large amounts of data and so there is a considerable overlap between the 2 fields. Technologies like Apache Mahout operate in this space.

From a coding perspective, I could say that Machine Learning is Matrix Manipulation :) or at least, it boils down to that often. The matrix could represent anything, say the features of a product or it could be a pixel map of a photograph or a vector representation of an audio clip. Matrix manipulations are computationally intensive operations i.e. take a lot of time and memory. Writing efficient algorithms for matrix operations is more of a mathematician's job and than an engineer's job. Fortunately, prepackaged solutions exist in tools such as Matlab or Octave. So, if you are an engineer working on a Machine Learning problem, most likely you wouldn't be coming up with an algorithmic solution; instead you would be doing the following:-

1. Building the data representation and choosing the most appropriate features (columns of the Matrix)
2. Reduce the number of dimensions i.e. columns (compression)
2. Choosing the best algorithm (inbuilt function) or pipeline of algorithms, most relevant to the problem.
3. Fine tuning the parameters.

This seemed a little counter intuitive to me at first.

And next, for those of you who are curious specifically about this course, I can broadly classify its syllabus into 3 categories:-

1. The concept is lucid and so is the underlying mathematics: - Linear Regression, Gradient Descent for Linear Regression, Feature scaling and Normalization, Regularization to prevent overfitting a training set, K means Clustering and Collaborative Filtering.

Collaborative Filtering is about building recommender systems, i.e. "People who bought this also bought", "You may like". If you are in e-commerce, this might interest you.. 

2. Concept is  relatively simple to implement with Octave's inbuilt functions, but whose underlying mathematics is beyond the scope of the course and beyond me as well :)  - Normal Equations for Linear Regression, Sigmoid function and Logistic Regression, Dimensionality reduction with  SVD, Anomaly Detection using Gaussian distribution.

3. And finally, the part that evaporated as soon as the video lecture stopped playing :) - Neural Networks, Support Vector Machines

After the course, what next? I have my eyes set on this competition in Kaggle http://www.kaggle.com/c/galaxy-zoo-the-galaxy-challenge It is about classifying galaxies. If time permits I would like to do at least one non trivial submission. Anyone, who is interested in teaming up, please do reach out :) The Prize money is $10,000 :)