top of page

How do I learn machine learning

Rohit Malshe, Chemical Engineer, Programmer, Author, Thinker, Engineer at Intel Corporation

Updated Feb 5

I got inspired by the best one liner about machine learning from a genius computer scientist! A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E. ~ Tom M. Mitchell - Wikipedia

This is a very open ended question that has gotten a large number of answers. Many answers are amazing, and certainly add a lot of value to the audience. I read many of them, and thank all the authors!

  • Machine learning is a very wide field, and to learn things, a large number of resources exist. I will go over many of them collected by some research done online. I will add some of my own inspiration and learning, and finally I will try to set the readers with a simple enough path they can follow.

  • This is exciting science, and look what these amazing companies have done using it. These companies have been front runners of technology and science. I have specifically not included Facebook, Netflix in this list, simply because of choice, because here I also wanted to signify that these companies also build some kind of physical products, and move real things from one place to another, and not just run web platforms. Although I have to say Facebook and Netflix have been some serious companies that are extremely successful, and signify the finesse of machine learning!

  • A few months back I wrote an answer about curate paths to Data Science: Rohit Malshe's answer to Are ‘curated paths to a Data Science career’ on Coursera worth the money and time?

  • In the same time frame I wrote about a comparison between Python and R: Rohit Malshe's answer to Is Python better than R?

  • Very recently I wrote a simple article on some of the interesting things to do with Python. Rohit Malshe's answer to What are some interesting things to do with Python?

  • All these answers got several up-votes and views, hence I wanted to take a good attempt at going over an elaborate answer to this question. I would certainly inspire readers to go over those answers too, because they were written earlier than this, and convey messages on those topics in a detailed manner. I don’t want to copy and paste text from those answers in this one, but yet want to mention they carry a lot of value.

Now the answer:

First the inspiration part:

  • I would like to inspire the readers to just take a look at the Wikipedia page, and more specifically, take a look at the applications. There are a large number of applications, and one can be very lost on which applications to choose from.

Applications for machine learning include following:

  • Adaptive websites

  • Affective computing

  • Bioinformatics

  • Brain-machine interfaces

  • Cheminformatics

  • Classifying DNA sequences

  • Computational anatomy

  • Computer vision, including object recognition

  • Detecting credit card fraud

  • Game playing

  • Information retrieval

  • Internet fraud detection

  • Marketing

  • Machine perception

  • Medical diagnosis

  • Economics

  • Natural language processing

  • Natural language understanding

  • Optimization and metaheuristic

  • Online advertising

  • Recommender systems

  • Robot locomotion

  • Search engines

  • Sentiment analysis (or opinion mining)

  • Sequence mining

  • Software engineering

  • Speech and handwriting recognition

  • Stock market analysis

  • Structural health monitoring

  • Syntactic pattern recognition

  • User behavior analytics

  • My personal inspiration has been to go in an order. The order that sounded natural to me was ~ Structured text data, unstructured text data, human comments (sentiment analysis type of projects), image data, voice data. I don’t want to learn just about anything, and learning everything is simply impossible, hence this simple order has helped me very much as a beginner.

  • For text data, a simple pipe line has to look somewhat similar to as shown in the picture below. More or less, any engineer in any of these or other companies should be able to build a pipeline like this.

  • 90% of my job has to do with dealing with text data, and the only 5 languages that have helped me excel are ~ SQL, Python, R, VBScript, C#. Wow! that is a very small number of things to learn, and one can achieve significantly. Similar other languages exist that one can learn. The other day I was talking to a Data Scientist and I asked her what she does: Her reply ~ I'm on a big data team that creates Spark Streaming applications in Scala on the Cloudera distribution of Hadoop with Cassandra as a data store. Most of my work is accomplished using Hive with a bit of Bash, Python, and R. Now I do admit that I haven’t used Hive, and I haven’t explored Scala as much, but can I do it if I wanted to? Yes absolutely. It would take me a few weeks to learn Hive/Scala, and switch my mindset towards learning them.

Second, the science part:

  • I got my hands on Kaggle website, and looked up some housing market data. Seriously, I had no clues about how house prices are determined, but I was able to predict a lot of things about house prices without even knowing everything in and out. Isn’t that the beauty of DataScience! You don’t even know the science to be able to make a significant impact, but is this completely true?

  • No! My inspiration is that before you do anything with data at all, know what science tells you. So talk to a realtor, look at Redfin website, look at what the things are about housing data that seem important. Take a tour with a realtor to see various houses, and understand what all those features mean. That would make you a better data scientist.

  • In simple words: DataScientist + Scientist = Better Data Scientist.

  • The housing market data carried just way too many features, and it was pretty clear early on, that a ridge type of regression model could work nicely for this one.

  • To give you some more inspiration on just the housing market data, one can for example use ElasticNet:

# 4* ElasticNet

  1. elasticNet = ElasticNetCV(l1_ratio = [0.1, 0.3, 0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 1],

  2. alphas = [0.0001, 0.0003, 0.0006, 0.001, 0.003, 0.006,

  3. 0.01, 0.03, 0.06, 0.1, 0.3, 0.6, 1, 3, 6],

  4. max_iter = 50000, cv = 10)

  5. elasticNet.fit(X_train, y_train)

  6. alpha = elasticNet.alpha_

  7. ratio = elasticNet.l1_ratio_

  8. print("Best l1_ratio :", ratio)

  9. print("Best alpha :", alpha )

  10. print("Try again for more precision with l1_ratio centered around " + str(ratio))

  11. elasticNet = ElasticNetCV(l1_ratio = [ratio * .85, ratio * .9, ratio * .95, ratio, ratio * 1.05, ratio * 1.1, ratio * 1.15],

  12. alphas = [0.0001, 0.0003, 0.0006, 0.001, 0.003, 0.006, 0.01, 0.03, 0.06, 0.1, 0.3, 0.6, 1, 3, 6],

  13. max_iter = 50000, cv = 10)

  14. elasticNet.fit(X_train, y_train)

  15. if (elasticNet.l1_ratio_ > 1):

  16. elasticNet.l1_ratio_ = 1

  17. alpha = elasticNet.alpha_

  18. ratio = elasticNet.l1_ratio_

  19. print("Best l1_ratio :", ratio)

  20. print("Best alpha :", alpha )

  21. print("Now try again for more precision on alpha, with l1_ratio fixed at " + str(ratio) +

  22. " and alpha centered around " + str(alpha))

  23. elasticNet = ElasticNetCV(l1_ratio = ratio,

  24. alphas = [alpha * .6, alpha * .65, alpha * .7, alpha * .75, alpha * .8, alpha * .85, alpha * .9,

  25. alpha * .95, alpha, alpha * 1.05, alpha * 1.1, alpha * 1.15, alpha * 1.25, alpha * 1.3,

  26. alpha * 1.35, alpha * 1.4],

  27. max_iter = 50000, cv = 10)

  28. elasticNet.fit(X_train, y_train)

  29. if (elasticNet.l1_ratio_ > 1):

  30. elasticNet.l1_ratio_ = 1

  31. alpha = elasticNet.alpha_

  32. ratio = elasticNet.l1_ratio_

  33. print("Best l1_ratio :", ratio)

  34. print("Best alpha :", alpha )

  35. print("ElasticNet RMSE on Training set :", rmse_cv_train(elasticNet).mean())

  36. print("ElasticNet RMSE on Test set :", rmse_cv_test(elasticNet).mean())

  37. y_train_ela = elasticNet.predict(X_train)

  38. y_test_ela = elasticNet.predict(X_test)

  39. # Plot residuals

  40. plt.scatter(y_train_ela, y_train_ela - y_train, c = "blue", marker = "s", label = "Training data")

  41. plt.scatter(y_test_ela, y_test_ela - y_test, c = "lightgreen", marker = "s", label = "Validation data")

  42. plt.title("Linear regression with ElasticNet regularization")

  43. plt.xlabel("Predicted values")

  44. plt.ylabel("Residuals")

  45. plt.legend(loc = "upper left")

  46. #plt.hlines(y = 0, xmin = 10.5, xmax = 13.5, color = "red")

  47. plt.show()

  48. # Plot predictions

  49. plt.scatter(y_train, y_train_ela, c = "blue", marker = "s", label = "Training data")

  50. plt.scatter(y_test, y_test_ela, c = "lightgreen", marker = "s", label = "Validation data")

  51. plt.title("Linear regression with ElasticNet regularization")

  52. plt.xlabel("Predicted values")

  53. plt.ylabel("Real values")

  54. plt.legend(loc = "upper left")

  55. #plt.plot([10.5, 13.5], [10.5, 13.5], c = "red")

  56. plt.show()

  57. # Plot important coefficients

  58. coefs = pd.Series(elasticNet.coef_, index = X_train.columns)

  59. print("ElasticNet picked " + str(sum(coefs != 0)) + " features and eliminated the other " + str(sum(coefs == 0)) + " features")

  60. imp_coefs = pd.concat([coefs.sort_values().head(10),

  61. coefs.sort_values().tail(10)])

  62. imp_coefs.plot(kind = "barh")

  63. plt.title("Coefficients in the ElasticNet Model")

  64. plt.show()

As you see, some knowledge of what model to use is important. So learn these various models one after the other from the courses given by the professors, and by following online blogs written by various people.

Third, more science, math!:

  • More scientifically, you should know a lot of mathematical concepts, lots of physics, and then dive into this science, because without knowing those fundamentals this science won’t come very handy to you.

  • Example: Long time back, I was in engineering school and enrolled in a math class. I studied singular value decomposition in that course: Singular value decomposition - Wikipedia

  • At that point in time, it made no inspirational impact on me whatsoever. I just came out with this feeling ~ “Why! Why do you want to do all this Sir!”. Today it makes a lot of sense. Why? Because using singular value decomposition, I can reduce the sizes of images significantly thus deriving some conclusions out of images, but making the code faster.

  • Same way, long time back I studied Fourier transforms and kind of learned how to get 2D Fourier transforms from images. It is only when I applied it practically, I got the real dollar value out of it. Fourier Transform takes us from real domain to frequency domain. If I compare Fourier transforms of two images, I can highlight the differences between the two more easily than any other method.

Forth, How to I learn the required math first?

  • Follow these professors:

  • Nando de Freitas: Nando de Freitas

  • Andrew Ng: http://www.andrewng.org/

  • There are others too, but start here I say. Look up their lectures on Youtube, and go though those lectures one after the other. This is going to take roughly a whole semester worth of time, but without this, progressing on the Machine Learning path is a waste of time. Sitting through those lectures watching them on Youtube will seem boring sometimes, and you would want to do something that seems practical, but be patient and do this. I did it over about ~ 1.5 months period and revised as many concepts as I could. I accelerated this by watching two to three lectures everyday.

  • A very good way to learn Machine Learning is by going through the test cases given on Kaggle, Your Home for Data Science, which is what I do every now and then, and always end up learning some or the other things. Most of the times, I am interesting in seeing how others are applying machine learning concepts on the projects, where you may not really have much clues about the data. This way I have learned plenty of things when my own knowledge seems not enough.

  • Python and R seem to be very popular in the community and significant results can be derived using the two. For beginners, I inspire that you begin with Python. The programming will look and feel natural and you would be able to do lots of things quickly enough.

Fifth, What can I do with all this quickly?

  • Take a look for example at this very small project that I did on Kaggle: Amazon Reviews: Unlocked Mobile Phones. The data comes from Amazon.com, and basically tells me about everything I need to know with respect to the cell phone market in a very short time.

  • There are a lot of people building their workbooks. Take a look at as many of them as you can. You would learn from others very quickly. Then build your own codes bit by bit and add more stuff everyday.

Sixth, can I get access to some example data sets?

Yes, more than you can even handle!

  • US Government Data http://www.data.gov/

  • Reddit/r/DataSets https://www.reddit.com/r/datasets

  • Kaggle: Your Home for Data Science

  • Federal Reserve Economic Data: https://fred.stlouisfed.org/

  • City-Data.com - Stats about all US cities - real estate, relocation info, crime, house prices, cost of living, races, home value estimator, recent sales, income, photos, schools, maps, weather, neighborhoods, and more

Seventh, Is there a long list of Machine-Learning / Data Mining websites I can follow?

The following is a list of free, open source books on machine learning, statistics, data-mining, that I really acquired from online search:

  • Real World Machine Learning [Free Chapters]

  • An Introduction To Statistical Learning - Book + R Code

  • Elements of Statistical Learning - Book

  • Probabilistic Programming & Bayesian Methods for Hackers - Book + IPython Notebooks

  • Think Bayes - Book + Python Code

  • Information Theory, Inference, and Learning Algorithms

  • Gaussian Processes for Machine Learning

  • Data Intensive Text Processing w/ MapReduce

  • Reinforcement Learning: - An Introduction

  • Mining Massive Datasets

  • A First Encounter with Machine Learning

  • Pattern Recognition and Machine Learning

  • Machine Learning & Bayesian Reasoning

  • Introduction to Machine Learning - Alex Smola and S.V.N. Vishwanathan

  • A Probabilistic Theory of Pattern Recognition

  • Introduction to Information Retrieval

  • Forecasting: principles and practice

  • Practical Artificial Intelligence Programming in Java

  • Introduction to Machine Learning - Amnon Shashua

  • Reinforcement Learning

  • Machine Learning

  • A Quest for AI

  • Introduction to Applied Bayesian Statistics and Estimation for Social Scientists - Scott M. Lynch

  • Bayesian Modeling, Inference and Prediction

  • A Course in Machine Learning

  • Machine Learning, Neural and Statistical Classification

  • Bayesian Reasoning and Machine Learning Book+MatlabToolBox

  • R Programming for Data Science

  • Data Mining - Practical Machine Learning Tools and Techniques Book

Deep-Learning

  • Deep Learning - An MIT Press book

Natural Language Processing

  • Coursera Course Book on NLP

  • NLTK

  • NLP w/ Python

  • Foundations of Statistical Natural Language Processing

Information Retrieval

  • An Introduction to Information Retrieval

Neural Networks

  • A Brief Introduction to Neural Networks

  • Neural Networks and Deep Learning

Probability & Statistics

  • Think Stats - Book + Python Code

  • From Algorithms to Z-Scores - Book

  • The Art of R Programming - Book (Not Finished)

  • All of Statistics

  • Introduction to statistical thought

  • Basic Probability Theory

  • Introduction to probability - By Dartmouth College

  • Principle of Uncertainty

  • Probability & Statistics Cookbook

  • Advanced Data Analysis From An Elementary Point of View

  • Introduction to Probability - Book and course by MIT

  • The Elements of Statistical Learning: Data Mining, Inference, and Prediction. -Book

  • An Introduction to Statistical Learning with Applications in R - Book

  • Learning Statistics Using R

  • Introduction to Probability and Statistics Using R - Book

  • Advanced R Programming - Book

  • Practical Regression and Anova using R - Book

  • R practicals - Book

  • The R Inferno - Book

Linear Algebra

  • Linear Algebra Done Wrong

  • Linear Algebra, Theory, and Applications

  • Convex Optimization

  • Applied Numerical Computing

  • Applied Numerical Linear Algebra

All the best learning machine learning! Stay blessed and stay inspired!


Featured Posts
Recent Posts
Archive
Search By Tags
No tags yet.
Follow Us
  • Facebook Basic Square
  • Twitter Basic Square
  • Google+ Basic Square
bottom of page