Review of the most interesting materials on data analysis and machine learning №16 (September 29 - October 5, 2014)
I present to you the next issue of the review of the most interesting materials on the topic of data analysis and machine learning.
General
Using a data-driven approach to machine learning Another interesting article from the blog MachineLearningMastery, in this case we will discuss what opportunities there are to improve the efficiency of machine learning algorithms.
Introduction to Machine Learning for Developers A good introduction to the topic of machine learning for developers, which mentions many aspects that are necessary to work with machine learning algorithms.
Top 30 Data Science Blogs Ranking of the best blogs on the subject of Data Science according to the DataScienceCentral portal.
Improving machine learning skills A few helpful tips from the author of the blog MachineLearningMastery, which can help to improve machine learning skills.
Vowpal Wabbit Modules in Azure ML Continuing the story from the blog "Micorosoft Technet Machine Learning" about the possibilities of Vowpal Wabbit in the cloud service of machine learning Azure ML from Microsoft.
22 skills that Data Scientist requires An interesting article from Vincent Granville on the popular DataScienceCentral portal about the skills that a data analyst needs with regard to his specialization.
First week of Stanford's Machine Learning course The author of the article shares his impressions of the first week of the popular machine learning course from Andrew Ng and Stanford University, whose regular session was launched not so long ago at Coursera.
Theory and algorithms of machine learning, code examples
Introduction to Neural Networks A fairly voluminous article from the blog Andrej Karpathy (CS PhD student at Stanford), in which the author talks about machine learning and neural networks, gives examples of code and says that this article will eventually be supplemented with new materials.
Introduction to the support vector method A useful article from the Analytics blog Vidhya, in which the work of the Support Vector Machines method is described in a fairly simple language.
miniCRAN: your own library repository An article that briefly discusses the miniCRAN library for the R programming language, which allows you to make your own library repository.
Running RStudio in the cloud An article on how to quickly and easily start RStudio in a browser using a cloud solution and Docker.
Interview with Diogo Ferreira A useful interview on the MachineLearningMastery blog with a successful participant in the machine learning competition Diogo Ferreira.
Online courses, educational materials and literature
Mining Massive Datasets online course launched September 29, 2014 at Coursera started an online course that attracted so much attention. This is a course from Stanford University called Mining Massive Datasets.
The book "The Field Guide to Data Science" A brief description and a free version of a curious book called The Field Guide to Data Science about the basics of Data Science.
Reading List (October) A list of books from the blog of Dave Gilles (Professor of Economics at the University of Victoria), which in the opinion of the professor may be interesting to read.
Martin Maechler on the practice of good code on R Martin Maechler (member of the R-Core team) gave an interesting talk at the useR conference! 2014. In this video, he will talk about the practice of good code both in the R programming language and in general about the best practices and practices in programming.
Nando de Freitas about decision trees An excellent lecture from Professor Nando de Freitas from The University of British Columbia on decision trees.
Jürgen Schmidhuber about Deep Learning An interesting video in which Professor Jürgen Schmidhuber from IDSIA (International Computer Science Institute) tells about the history of Deep Learning and the revival of interest in this method of machine learning at the present time.
Data engineering
Using Pinot for real-time analytics An interesting article from a LinkedIn blog about the architecture of their real-time analytics solution using a proprietary product called Pinot.
NoSQL storage performance test results Fresh interesting comparison of the performance of various NoSQL-storages (Apache Cassandra, MongoDB, CouchBase) with different load profiles.
Scalable Apache Spark resolution trees Continuing discussions of the new version of Apache Spark 1.1, in this case we will focus on decision trees and the possibilities of their scaling in the MLlib machine learning library.