Overview of the most interesting materials on data analysis and machine learning №13 (September 8 - 14, 2014)
I present to you the next issue of the review of the most interesting materials on the topic of data analysis and machine learning. In this release, there are a lot of interesting examples using the R and Python programming languages. There are also some interesting articles on machine learning competitions. Quite a lot of materials that will be interesting to beginners in the topic of data analysis and machine learning. Traditionally, a number of materials are devoted to the topic of Data Engineering.
Time series classification: KNN and DTW The author gives an example of time series classification using K Nearest Neighbors & Dynamic Time Warping. The examples are implemented using the Python programming language.
Machine Learning Cheat Sheet I met a very interesting document on the machine learning crib, which will help to quickly refresh knowledge on the subject.
Visual evidence that neural networks can calculate any function I have already mentioned the draft version of the book “Neural Networks and Deep Learning”, in this case it is a chapter from a book that seemed to me very curious, called “A visual proof that you can compute any function”.
Bike Sharing Demand on Kaggle: sample code I want to give a small simple example of a code from a Kaggle machine learning competition called Bike Sharing Demand, in which participants are invited to predict the hourly quantitative need for bicycles at rental locations in Washington, DC
K-Medium Image Clustering A small illustrative example of the use of k-means clustering (k-means clustering) in relation to the image. The example uses the programming language R.
Building a spam filter on R A fairly simple example of code for building a spam filter using the R programming language, as well as using the Caret machine learning library and learning using the support vector machine (SVM).
From hemp to trees and forests Another article from the blog Microsoft Technet Machine Learning Blog. This time, Chris Burges speaks in a rather simple language about decision trees.
Apache Kafka Introduction This article from the Cloudera blog is an introduction to Apache Kafka distributed messaging system.
Online course "KIx: KIexploRx Explore Statistics with R" For the first time, edX started a course called edIX: KIexploRx Explore Statistics with R. The course will be primarily interesting to those who want to get acquainted with the programming language R and its practical application.
Efficient indexing in MongoDB 2.6 A small article telling how to properly use indexing in the MongoDB NoSQL database, including new indexing features that appeared in version 2.6.
Video lectures from the “Learning From Data” course On September 25, a new session of the very popular online course will begin with edX “Learning From Data” by the California Institute of Technology and Professor Yaser Abu-Mostafa as the main instructor. But now the complete set of video lectures and practical tasks is available.
How data centers work The description of the work of data centers in the United States, presented in the form of visual infographics.
High Performance Material Review Weekly digest of the most interesting materials on high performance from the popular HighScalability portal.
180 top bloggers A list of the top 180 Data Science bloggers proposed by DataScienceCentral.
Top Big Data Sites A list of 6 Big Data resources that may be of interest to big data specialists, although most of you already know most of the resources.
Introduction to Big Data Architecture This article from the Cloudera blog is a good introduction to the Big Data architecture topic and a description of what the Big Data Engineer does.
Sample forecasting on R A small example of using the R programming language for forecasting from the Global Energy Forecasting Competition 2014 machine learning competition.
Data Mining News A small list of interesting resources on Data Mining on September 10.
The best materials of the month List of the best articles of the month on the topic of data analysis on the version of the popular DataScienceCentral portal.
The third annual Russian AI Cup championship has begun. According to the Mail.Ru Group's blog on Habrahabr, the third annual Russian AI Cup championship called “CodeHockey” began. Last year I reached the finals of CodeTroopers and in general it was quite interesting, although very time-consuming. This year I also plan to try my hand at this competition.