How do I learn machine learning

Rohit Malshe
Feb 25, 2017
10 min read

Rohit Malshe, Chemical Engineer, Programmer, Author, Thinker, Engineer at Intel Corporation

Updated Feb 5

I got inspired by the best one liner about machine learning from a genius computer scientist! A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E. ~ Tom M. Mitchell - Wikipedia

This is a very open ended question that has gotten a large number of answers. Many answers are amazing, and certainly add a lot of value to the audience. I read many of them, and thank all the authors!

Machine learning is a very wide field, and to learn things, a large number of resources exist. I will go over many of them collected by some research done online. I will add some of my own inspiration and learning, and finally I will try to set the readers with a simple enough path they can follow.
This is exciting science, and look what these amazing companies have done using it. These companies have been front runners of technology and science. I have specifically not included Facebook, Netflix in this list, simply because of choice, because here I also wanted to signify that these companies also build some kind of physical products, and move real things from one place to another, and not just run web platforms. Although I have to say Facebook and Netflix have been some serious companies that are extremely successful, and signify the finesse of machine learning!
A few months back I wrote an answer about curate paths to Data Science: Rohit Malshe's answer to Are ‘curated paths to a Data Science career’ on Coursera worth the money and time?
In the same time frame I wrote about a comparison between Python and R: Rohit Malshe's answer to Is Python better than R?
Very recently I wrote a simple article on some of the interesting things to do with Python. Rohit Malshe's answer to What are some interesting things to do with Python?
All these answers got several up-votes and views, hence I wanted to take a good attempt at going over an elaborate answer to this question. I would certainly inspire readers to go over those answers too, because they were written earlier than this, and convey messages on those topics in a detailed manner. I don’t want to copy and paste text from those answers in this one, but yet want to mention they carry a lot of value.

Now the answer:

First the inspiration part:

I would like to inspire the readers to just take a look at the Wikipedia page, and more specifically, take a look at the applications. There are a large number of applications, and one can be very lost on which applications to choose from.

Applications for machine learning include following:

Adaptive websites
Affective computing
Bioinformatics
Brain-machine interfaces
Cheminformatics
Classifying DNA sequences
Computational anatomy
Computer vision, including object recognition
Detecting credit card fraud
Game playing
Information retrieval
Internet fraud detection
Marketing
Machine perception
Medical diagnosis
Economics
Natural language processing
Natural language understanding
Optimization and metaheuristic
Online advertising
Recommender systems
Robot locomotion
Search engines
Sentiment analysis (or opinion mining)
Sequence mining
Software engineering
Speech and handwriting recognition
Stock market analysis
Structural health monitoring
Syntactic pattern recognition
User behavior analytics
My personal inspiration has been to go in an order. The order that sounded natural to me was ~ Structured text data, unstructured text data, human comments (sentiment analysis type of projects), image data, voice data. I don’t want to learn just about anything, and learning everything is simply impossible, hence this simple order has helped me very much as a beginner.
For text data, a simple pipe line has to look somewhat similar to as shown in the picture below. More or less, any engineer in any of these or other companies should be able to build a pipeline like this.
90% of my job has to do with dealing with text data, and the only 5 languages that have helped me excel are ~ SQL, Python, R, VBScript, C#. Wow! that is a very small number of things to learn, and one can achieve significantly. Similar other languages exist that one can learn. The other day I was talking to a Data Scientist and I asked her what she does: Her reply ~ I'm on a big data team that creates Spark Streaming applications in Scala on the Cloudera distribution of Hadoop with Cassandra as a data store. Most of my work is accomplished using Hive with a bit of Bash, Python, and R. Now I do admit that I haven’t used Hive, and I haven’t explored Scala as much, but can I do it if I wanted to? Yes absolutely. It would take me a few weeks to learn Hive/Scala, and switch my mindset towards learning them.

Second, the science part:

I got my hands on Kaggle website, and looked up some housing market data. Seriously, I had no clues about how house prices are determined, but I was able to predict a lot of things about house prices without even knowing everything in and out. Isn’t that the beauty of DataScience! You don’t even know the science to be able to make a significant impact, but is this completely true?
No! My inspiration is that before you do anything with data at all, know what science tells you. So talk to a realtor, look at Redfin website, look at what the things are about housing data that seem important. Take a tour with a realtor to see various houses, and understand what all those features mean. That would make you a better data scientist.
In simple words: DataScientist + Scientist = Better Data Scientist.
The housing market data carried just way too many features, and it was pretty clear early on, that a ridge type of regression model could work nicely for this one.
To give you some more inspiration on just the housing market data, one can for example use ElasticNet:

# 4* ElasticNet

elasticNet = ElasticNetCV(l1_ratio = [0.1, 0.3, 0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 1],
alphas = [0.0001, 0.0003, 0.0006, 0.001, 0.003, 0.006,
0.01, 0.03, 0.06, 0.1, 0.3, 0.6, 1, 3, 6],
max_iter = 50000, cv = 10)
elasticNet.fit(X_train, y_train)
alpha = elasticNet.alpha_
ratio = elasticNet.l1_ratio_
print("Best l1_ratio :", ratio)
print("Best alpha :", alpha )
print("Try again for more precision with l1_ratio centered around " + str(ratio))
elasticNet = ElasticNetCV(l1_ratio = [ratio * .85, ratio * .9, ratio * .95, ratio, ratio * 1.05, ratio * 1.1, ratio * 1.15],
alphas = [0.0001, 0.0003, 0.0006, 0.001, 0.003, 0.006, 0.01, 0.03, 0.06, 0.1, 0.3, 0.6, 1, 3, 6],
max_iter = 50000, cv = 10)
elasticNet.fit(X_train, y_train)
if (elasticNet.l1_ratio_ > 1):
elasticNet.l1_ratio_ = 1
alpha = elasticNet.alpha_
ratio = elasticNet.l1_ratio_
print("Best l1_ratio :", ratio)
print("Best alpha :", alpha )
print("Now try again for more precision on alpha, with l1_ratio fixed at " + str(ratio) +
" and alpha centered around " + str(alpha))
elasticNet = ElasticNetCV(l1_ratio = ratio,
alphas = [alpha * .6, alpha * .65, alpha * .7, alpha * .75, alpha * .8, alpha * .85, alpha * .9,
alpha * .95, alpha, alpha * 1.05, alpha * 1.1, alpha * 1.15, alpha * 1.25, alpha * 1.3,
alpha * 1.35, alpha * 1.4],
max_iter = 50000, cv = 10)
elasticNet.fit(X_train, y_train)
if (elasticNet.l1_ratio_ > 1):
elasticNet.l1_ratio_ = 1
alpha = elasticNet.alpha_
ratio = elasticNet.l1_ratio_
print("Best l1_ratio :", ratio)
print("Best alpha :", alpha )
print("ElasticNet RMSE on Training set :", rmse_cv_train(elasticNet).mean())
print("ElasticNet RMSE on Test set :", rmse_cv_test(elasticNet).mean())
y_train_ela = elasticNet.predict(X_train)
y_test_ela = elasticNet.predict(X_test)
# Plot residuals
plt.scatter(y_train_ela, y_train_ela - y_train, c = "blue", marker = "s", label = "Training data")
plt.scatter(y_test_ela, y_test_ela - y_test, c = "lightgreen", marker = "s", label = "Validation data")
plt.title("Linear regression with ElasticNet regularization")
plt.xlabel("Predicted values")
plt.ylabel("Residuals")
plt.legend(loc = "upper left")
#plt.hlines(y = 0, xmin = 10.5, xmax = 13.5, color = "red")
plt.show()
# Plot predictions
plt.scatter(y_train, y_train_ela, c = "blue", marker = "s", label = "Training data")
plt.scatter(y_test, y_test_ela, c = "lightgreen", marker = "s", label = "Validation data")
plt.title("Linear regression with ElasticNet regularization")
plt.xlabel("Predicted values")
plt.ylabel("Real values")
plt.legend(loc = "upper left")
#plt.plot([10.5, 13.5], [10.5, 13.5], c = "red")
plt.show()
# Plot important coefficients
coefs = pd.Series(elasticNet.coef_, index = X_train.columns)
print("ElasticNet picked " + str(sum(coefs != 0)) + " features and eliminated the other " + str(sum(coefs == 0)) + " features")
imp_coefs = pd.concat([coefs.sort_values().head(10),
coefs.sort_values().tail(10)])
imp_coefs.plot(kind = "barh")
plt.title("Coefficients in the ElasticNet Model")
plt.show()

As you see, some knowledge of what model to use is important. So learn these various models one after the other from the courses given by the professors, and by following online blogs written by various people.

Third, more science, math!:

More scientifically, you should know a lot of mathematical concepts, lots of physics, and then dive into this science, because without knowing those fundamentals this science won’t come very handy to you.
Example: Long time back, I was in engineering school and enrolled in a math class. I studied singular value decomposition in that course: Singular value decomposition - Wikipedia
At that point in time, it made no inspirational impact on me whatsoever. I just came out with this feeling ~ “Why! Why do you want to do all this Sir!”. Today it makes a lot of sense. Why? Because using singular value decomposition, I can reduce the sizes of images significantly thus deriving some conclusions out of images, but making the code faster.
Same way, long time back I studied Fourier transforms and kind of learned how to get 2D Fourier transforms from images. It is only when I applied it practically, I got the real dollar value out of it. Fourier Transform takes us from real domain to frequency domain. If I compare Fourier transforms of two images, I can highlight the differences between the two more easily than any other method.

Forth, How to I learn the required math first?

Follow these professors:
Nando de Freitas: Nando de Freitas
Andrew Ng: http://www.andrewng.org/
There are others too, but start here I say. Look up their lectures on Youtube, and go though those lectures one after the other. This is going to take roughly a whole semester worth of time, but without this, progressing on the Machine Learning path is a waste of time. Sitting through those lectures watching them on Youtube will seem boring sometimes, and you would want to do something that seems practical, but be patient and do this. I did it over about ~ 1.5 months period and revised as many concepts as I could. I accelerated this by watching two to three lectures everyday.
A very good way to learn Machine Learning is by going through the test cases given on Kaggle, Your Home for Data Science, which is what I do every now and then, and always end up learning some or the other things. Most of the times, I am interesting in seeing how others are applying machine learning concepts on the projects, where you may not really have much clues about the data. This way I have learned plenty of things when my own knowledge seems not enough.
Python and R seem to be very popular in the community and significant results can be derived using the two. For beginners, I inspire that you begin with Python. The programming will look and feel natural and you would be able to do lots of things quickly enough.

Fifth, What can I do with all this quickly?

Take a look for example at this very small project that I did on Kaggle: Amazon Reviews: Unlocked Mobile Phones. The data comes from Amazon.com, and basically tells me about everything I need to know with respect to the cell phone market in a very short time.
There are a lot of people building their workbooks. Take a look at as many of them as you can. You would learn from others very quickly. Then build your own codes bit by bit and add more stuff everyday.

Sixth, can I get access to some example data sets?

Yes, more than you can even handle!

US Government Data http://www.data.gov/
Reddit/r/DataSets https://www.reddit.com/r/datasets
Kaggle: Your Home for Data Science
Federal Reserve Economic Data: https://fred.stlouisfed.org/
City-Data.com - Stats about all US cities - real estate, relocation info, crime, house prices, cost of living, races, home value estimator, recent sales, income, photos, schools, maps, weather, neighborhoods, and more

Seventh, Is there a long list of Machine-Learning / Data Mining websites I can follow?

The following is a list of free, open source books on machine learning, statistics, data-mining, that I really acquired from online search:

Real World Machine Learning [Free Chapters]
An Introduction To Statistical Learning - Book + R Code
Elements of Statistical Learning - Book
Probabilistic Programming & Bayesian Methods for Hackers - Book + IPython Notebooks
Think Bayes - Book + Python Code
Information Theory, Inference, and Learning Algorithms
Gaussian Processes for Machine Learning
Data Intensive Text Processing w/ MapReduce
Reinforcement Learning: - An Introduction
Mining Massive Datasets
A First Encounter with Machine Learning
Pattern Recognition and Machine Learning
Machine Learning & Bayesian Reasoning
Introduction to Machine Learning - Alex Smola and S.V.N. Vishwanathan
A Probabilistic Theory of Pattern Recognition
Introduction to Information Retrieval
Forecasting: principles and practice
Practical Artificial Intelligence Programming in Java
Introduction to Machine Learning - Amnon Shashua
Reinforcement Learning
Machine Learning
A Quest for AI
Introduction to Applied Bayesian Statistics and Estimation for Social Scientists - Scott M. Lynch
Bayesian Modeling, Inference and Prediction
A Course in Machine Learning
Machine Learning, Neural and Statistical Classification
Bayesian Reasoning and Machine Learning Book+MatlabToolBox
R Programming for Data Science
Data Mining - Practical Machine Learning Tools and Techniques Book

Deep-Learning

Deep Learning - An MIT Press book

Natural Language Processing

Coursera Course Book on NLP
NLTK
NLP w/ Python
Foundations of Statistical Natural Language Processing

Information Retrieval

An Introduction to Information Retrieval

Neural Networks

A Brief Introduction to Neural Networks
Neural Networks and Deep Learning

Probability & Statistics

Think Stats - Book + Python Code
From Algorithms to Z-Scores - Book
The Art of R Programming - Book (Not Finished)
All of Statistics
Introduction to statistical thought
Basic Probability Theory
Introduction to probability - By Dartmouth College
Principle of Uncertainty
Probability & Statistics Cookbook
Advanced Data Analysis From An Elementary Point of View
Introduction to Probability - Book and course by MIT
The Elements of Statistical Learning: Data Mining, Inference, and Prediction. -Book
An Introduction to Statistical Learning with Applications in R - Book
Learning Statistics Using R
Introduction to Probability and Statistics Using R - Book
Advanced R Programming - Book
Practical Regression and Anova using R - Book
R practicals - Book
The R Inferno - Book