Review of the most interesting materials on data analysis and machine learning №16 (September 29 - October 5, 2014)

I present to you the next issue of the review of the most interesting materials on the topic of data analysis and machine learning.

General

Using a data-driven approach to machine learning
Another interesting article from the blog MachineLearningMastery, in this case we will discuss what opportunities there are to improve the efficiency of machine learning algorithms.
Introduction to Machine Learning for Developers
A good introduction to the topic of machine learning for developers, which mentions many aspects that are necessary to work with machine learning algorithms.
Top 30 Data Science Blogs
Ranking of the best blogs on the subject of Data Science according to the DataScienceCentral portal.
Improving machine learning skills
A few helpful tips from the author of the blog MachineLearningMastery, which can help to improve machine learning skills.
How to successfully pass an interview for a position in the field of Data Science
An interesting and useful article that will help prepare for the interview for a position in Data Science.
Vowpal Wabbit Modules in Azure ML
Continuing the story from the blog "Micorosoft Technet Machine Learning" about the possibilities of Vowpal Wabbit in the cloud service of machine learning Azure ML from Microsoft.
22 skills that Data Scientist requires
An interesting article from Vincent Granville on the popular DataScienceCentral portal about the skills that a data analyst needs with regard to his specialization.
First week of Stanford's Machine Learning course
The author of the article shares his impressions of the first week of the popular machine learning course from Andrew Ng and Stanford University, whose regular session was launched not so long ago at Coursera.

Theory and algorithms of machine learning, code examples

Naive Bayes and Text Classification (Part 1)
On the computational complexity of MapReduce
A good article about the theoretical foundations of the program model MapReduce.
Introduction to Neural Networks
A fairly voluminous article from the blog Andrej Karpathy (CS PhD student at Stanford), in which the author talks about machine learning and neural networks, gives examples of code and says that this article will eventually be supplemented with new materials.
Using machine learning and NodeJS to determine the gender of Instagram users
A good example of a predictive model based on neural networks for determining the gender of Intstagram users based on various input parameters, as well as using NodeJS.
Introduction to the support vector method
A useful article from the Analytics blog Vidhya, in which the work of the Support Vector Machines method is described in a fairly simple language.
Evaluation of the effectiveness of the binary classification system
A brief introduction to evaluating the effectiveness of binary classification systems.
miniCRAN: your own library repository
An article that briefly discusses the miniCRAN library for the R programming language, which allows you to make your own library repository.
Running RStudio in the cloud
An article on how to quickly and easily start RStudio in a browser using a cloud solution and Docker.
Displaying several variables on a line chart in ggplot2
A small practical example of the output of several variables on a line chart using the R programming language and the ggplot2 library.

Machine learning competitions

Interview with Diogo Ferreira
A useful interview on the MachineLearningMastery blog with a successful participant in the machine learning competition Diogo Ferreira.
Simple model for Kaggle "Bike Sharing Demand"
A description of a fairly simple Kaggle Bike Sharing Demand machine learning competition with examples in the R programming language.

Online courses, educational materials and literature

Mining Massive Datasets online course launched
September 29, 2014 at Coursera started an online course that attracted so much attention. This is a course from Stanford University called Mining Massive Datasets.
The book "The Field Guide to Data Science"
A brief description and a free version of a curious book called The Field Guide to Data Science about the basics of Data Science.
Announcement of the book "Practical Data Science Cookbook"
A small article-announcement of a rather curious book, Practical Data Science Cookbook.
Reading List (October)
A list of books from the blog of Dave Gilles (Professor of Economics at the University of Victoria), which in the opinion of the professor may be interesting to read.
The book "Getting Started with Impala"
Announcement of the curious book “Getting Started with Impala” on the Cloudera company blog.

Video

Martin Maechler on the practice of good code on R
Martin Maechler (member of the R-Core team) gave an interesting talk at the useR conference! 2014. In this video, he will talk about the practice of good code both in the R programming language and in general about the best practices and practices in programming.
Materials from the meeting "New PostgreSQL 9.4 and something else"
Not so long ago an interesting meeting took place in the office of Yandex and was devoted to the PostgreSQL DBMS. And here came the video from this meeting.
Nando de Freitas about decision trees
An excellent lecture from Professor Nando de Freitas from The University of British Columbia on decision trees.
Jürgen Schmidhuber about Deep Learning
An interesting video in which Professor Jürgen Schmidhuber from IDSIA (International Computer Science Institute) tells about the history of Deep Learning and the revival of interest in this method of machine learning at the present time.

Data engineering

Using Pinot for real-time analytics
An interesting article from a LinkedIn blog about the architecture of their real-time analytics solution using a proprietary product called Pinot.
NoSQL storage performance test results
Fresh interesting comparison of the performance of various NoSQL-storages (Apache Cassandra, MongoDB, CouchBase) with different load profiles.
Scalable Apache Spark resolution trees
Continuing discussions of the new version of Apache Spark 1.1, in this case we will focus on decision trees and the possibilities of their scaling in the MLlib machine learning library.
The announcement of the beta version of ForestDB
Announcement of ForestDB open-source key-value repository from the creators of CouchBase.
What is Apache Storm
An article that gives a brief description of Apache Strorm.

Reviews

Weekly Digest from DataScienceCentral
Regular weekly digest of articles on data analysis from the DataScienceCentral portal.
Niut Blanche's best materials (September)
The best materials for September from the popular blog Nuit Blanche.
Hadoop Weekly Weekly Review No. 89 (September 28)
Hadoop weekly news and materials.
Hadoop Weekly Weekly Review No. 88 (September 21)
Hadoop weekly news and materials.
The most interesting materials from Freakonometrics №170
A collection of the most interesting materials from the popular portal Freakonometrics.
The most interesting materials from Freakonometrics # 169
A collection of the most interesting materials from the popular portal Freakonometrics.
The most interesting materials from Freakonometrics # 168
A collection of the most interesting materials from the popular portal Freakonometrics.
The most interesting materials on High Scalability
Review of the most interesting materials on HighScalability from the popular High Scalability portal.

Previous release: Review of the most interesting materials on data analysis and machine learning №15 (September 22 - 28, 2014)

Source: https://habr.com/ru/post/239247/

All Articles

Review of the most interesting materials on data analysis and machine learning №16 (September 29 - October 5, 2014)

General

Theory and algorithms of machine learning, code examples

Machine learning competitions

Online courses, educational materials and literature

Video

Data engineering

Reviews

More articles: