Review of the most interesting materials on data analysis and machine learning No. 3 (review of online courses)
This edition of the review of the most interesting data analysis and machine learning materials is entirely devoted to online courses on Data Science. In the last issue was presented a list of online courses, starting soon. In this issue, I tried to collect the most interesting online courses on the topic of data analysis. It is worth noting that some courses have already ended, but for most of these courses you can see the archive of all educational materials. The review begins with a set of courses from Johns Hopkins University at Coursera, which are combined in one Data Specialization specialization, so it makes sense to consider them separately from other courses. These are 9 official specialization courses and two additional Mathematical Biostatistics Boot Camp 1 and 2, which are not officially included in the specialization. It is important to note that the entire set of course data begins anew and, in general, you can quite flexibly build your schedule for specialization. Most courses last 4 weeks. The R language is the main programming language in this course set. The following is a list of courses from the Data Science specialty from Johns Hopkins University:
The Data Scientist's Toolbox is a basic course in specialization and is dedicated to reviewing various data analyst tools. The amount of materials is small and the course can be completed in 3-4 hours.
R Programming is a basic course in specialization and is dedicated to the basics of working with the R programming language.
Getting and Cleaning Data is also a basic course in specialization and is devoted to a very important topic of preparing and processing raw input data for further analysis.
Exploratory Data Analysis - the course is devoted to exploratory data analysis and data visualization using the R language and popular visualization packages such as lattice and ggplot2.
Reproducible Research - the course talks about such an important topic in data analysis as Reproducible Research. The knitlr package for the R language is considered, as well as the Markdown R markup language.
Statistical Inference - the course is formally devoted to the topic of statistical inference, but in essence it is a course on the basics of statistics and probability theory. Filed everything in a very crumpled and muddled form. One of the most controversial courses in this specialization. I hope that in future versions the course will be seriously revised.
Regression Models - the course is devoted to the topic of regression analysis. The course also has questions on the subject of elaboration of the material and the hope that the creators of the course will pay attention to students' comments and seriously rework the course in the future.
Developing Data Products - a course dedicated to the development of modern products in the subject of data analysis. Such popular frameworks as Shiny and Slidify are considered.
Mathematical Biostatistics Boot Camp 1 - the first part of the course on biostatistics from Johns Hopkins University, is an unofficial addition to the specialization of Data Science, well covers the basics of statistics and probability theory.
Mathematical Biostatistics Boot Camp 2 - the second part of the course on biostatistics from Johns Hopkins University, is an unofficial addition to the specialization of Data Science, well covers the basics of statistics and probability theory.
Next, we consider courses that will help improve the general skills required for a data analyst:
Intro to Hadoop and MapReduce (Udacity) - the course is devoted to the basics of working with Hadoop and large data sets.
Data Wrangling with MongoDB (Udacity) - this course will focus on working with data in the currently popular NoSQL database like MongoDB.
Programming Foundations with Python (Udacity) - the course is devoted to the basics of the Python programming language, which is rapidly gaining popularity among data analysts.
Introduction to Databases (Coursera - Stanford University) - the course talks about working with relational data sources, as well as working with other popular data storage formats (XML, JSON)
We now turn to courses that are devoted to probability theory and statistics. Certainly knowledge of these disciplines will be useful to anyone who claims to be a data analyst. In some cases, the division of courses into categories is rather arbitrary, since many courses cover various aspects related to data analysis. The following is a list of courses in this category:
Probabilty and Statistics (Khan Academy) - an excellent set of basic things in statistics and probability theory from the Khan Academy.
Case-Based Introduction to Biostatistics (Coursera - Johns Hopkins University) - the course presents in an accessible form the basics of statistics and probability theory with examples from biostatistics.
Data Analysis and Statistical Inference (Coursera - Duke University) - an excellent course in data analysis, which is clearly described the basics of probability theory and statistics.
Statistics One (Coursera - Princeton University) is a good course in basic statistics. The material is delivered at an affordable level and does not require a special knowledge from the listener to master the material.
Statistics in Medicine (Stanford Online) - basics of statistics based on examples from medicine.
Stat_2.2x - Introduction to Statistics: Probability (edX - BerkleyX) is the second part of a series of courses on statistics and probability theory. The second part is devoted to the basics of the theory of probability.
Stat_2.3x - Introduction to Statistics: Inference (edX - BerkleyX) is the third part of a series of courses on statistics and probability theory. The third part is devoted to the topic of statistical inference.
Explore Statistics with R (edX - KIx) - a new course on working with the statistical programming language R. The first session of this course begins on September 9, 2014.
Intro to Statistics (Udacity) is another course on the basics of statistics.
Statistics (Udacity) is a fairly simple course in probability theory and statistics.
The following is a list of courses that are devoted to various aspects of the topic of data analysis, such as machine learning, natural language processing, neural networks, recommendation systems, social network analysis, artificial intelligence, and others:
Data Analysis (Coursera - Johns Hopkins University) - a course on data analysis using the R language for 8 weeks.
Introduction to Data Science (Coursera - University of Washington) - the course lasts 8 weeks. One of the most popular online courses on the basics of Data Science.
Machine Learning (Coursera - University of Washington) - an excellent course that lasts 10 weeks in machine learning from the University of Washington.
Machine Learning (Coursera - Stanford University) is one of the most well-known Machine Learning courses taught by Stanford University professor Andrew Ng. The course lasts 10 weeks. The course is quite simple and clear, it does not require any special knowledge for its successful completion, and at the same time it covers quite a lot of Machine Learning areas.
Natural Language Processing (Coursera - Stanford University) is one of the most popular online natural language processing courses from Stanford University.
Introduction to Recommender Systems (Coursera - University of Minnesota) - introduction to recommender systems. This is not to say that the course has been thoroughly worked out, but there are not so many courses on this one, so that it may be interesting for those who are engaged in the subject of recommender systems.
15.071x The Analytics Edge (edX - MITx) - course with excellent material on the topic of data analysis and machine learning.
Learning From Data (edX - CaltechX) is one of the best courses in machine learning. Many machine learning topics are available.
CS188.1x Artificial Intelligence (edX - BerkleyX) is probably one of the most interesting online courses on artificial intelligence. The course uses the Python programming language.
Machine Learning 1 — Supervised Learning (Udacity) is the first part of a series of machine learning courses from Udacity. The first part is devoted to the topic of learning with a teacher (Supervised learning).
Machine Learning 2 — Unsupervised Learning (Udacity) is the second part of a series of machine learning courses from Udacity. The second part is devoted to the topic of unsupervised learning.
Machine Learning 3 — Reinforcement Learning (Udacity) is the third part of a series of machine learning courses from Udacity. The third part is devoted to the popular method of machine learning Reinforcement Learning.
Artificial Intelligence for Robotics (Udacity) - an introduction to the topic of programming artificial intelligence on the example of an unmanned vehicle.