Data Science & Machine Learning Resources

This is a collection of interesting sites and great references to the broader topics of data science and machine learning.  I’m creating it partly as a way of keeping track of the things I find as I delve into ML, but also as a resource for others that might be interested.

General Resources

Online Courses

  • Andrew Ng’s Machine Learning course on Coursera
    • I’m currently going through this course.  The materials are great, and Ng does a great job of explaining the topics.  He uses Octave throughout the course, rather than R or Python.  The course is rather advanced in the sense that you need to not be scared of math.
  • Data Science and Machine Learning with Python – Hands On – Udemy
    • I’m going through this course as well.  For me, the material is somewhat basic.  However, the tutorials and such are in Python, which is something I’m not too familiar with.  If you’re totally new to ML, this could be a great place to start.

Tools

Languages

Courses with Material OnLine

Hadoop Cluster on Raspberry Pis

Out of an interest in exploring Hadoop and the related technologies, I’m planning to build a cluster using the nifty Raspberry Pi computers.  Here are some tutorials and how-tos I’ve collected as guidance.

Data Sets by Category

Regression

Classification

Time Series

  • Bejing PM2.5 Data Set –  This is a data set that reports on the weather and the level of pollution each hour for five years at the US embassy in Beijing, China. Used in example here.