Best Data Science Resources for Absolute Beginners

Recommended resources to study Data Science for an absolute beginner

When I started studying data science, I spent a lot of time to find resources that best fit my background and my needs. The ones listed below have helped me tremendously in my study. I hope you will find some of them helpful in your learning as well.

1. Programming

A fundamental skill that any data scientist must have is programming. The most popular programming languages are without a doubt Python and R. You should learn both but if you have to pick one, choose Python. It has been the fastest-growing major language, and also the most wanted language, for the several years in a row.

I strongly recommend MIT 6.0001 Introduction to Computer Science and Programming in Python, which is the most visited course of MIT OpenCourseWare.

2. Probability and Statistics

The lecture notes of MIT 18.05 Introduction to Probability and Statistics is an excellent study resource for a quick review of probability and statistics with clear and intuitive explanations. This is perhaps the best introduction to Probability and Statistics that I have ever seen.

After finishing the above lecture notes, for a better coverage, I recommend you to continue with the book All of Statistics: A Concise Course in Statistical Inference by Larry A. Wasserman. The book provides a broad and as the name of the book indicated, concise coverage of probability and statistics, including basic and modern concepts that are closely related to machine learning.

3. Linear Algebra

After struggling with two books borrowed from my school’s library, and also with the Linear Algebra series from Khan Academy, I could only appreciate the beauty of matrices thanks to the MIT 18.06 Linear Algebra course of Prof. Gilbert Strang. This is for many people, myself included, the best introduction to Linear Algebra. In parallel with watching the lectures, I would recommend you to do exercises in his book Introduction to Linear Algebra, 5th edition.

4. Optimization

Optimization is of central importance in Data Science. I have learned many things from the Stanford EE364 Convex Optimization course (lecture videos are available on YouTube) and its accompanying textbook Convex Optimization by Stephen Boyd and Lieven Vandenberghe. However, I have to admit that I have finished only a small part of these resources as they were not so easy to digest. One of my main objectives for the next few months is to finish them, because I feel that what I have learned from my graduate program is really not enough.

5. Machine Learning

Besides, following the Professor Andrew Ng’s Machine Learning course has proved to be highly profitable to my learning. I love Prof. Andrew Ng’s teaching style where complicated concepts become crystal clear through his explanations. I found the practical assignments building algorithms from scratch in the course extremely useful to grasp the associated algorithms.

For an upper level, I would suggest two books, Pattern Recognition and Machine Learning by Christopher M. Bishop and The Elements of Statistical Learning by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie. These two books are considered to be the best textbooks of Machine Learning by many people.

Conclusion

There is a large number of online resources and what works for some may not be found as useful by others. This is only recommendations based on my personal experience. I hope that you will find the ones that are best suited to you.

Avatar
Hang Le
PhD Candidate in Speech and Language Processing

My interests include machine learning, deep learning and natural language processing.