Overview of the most interesting materials on data analysis and machine learning â„–12 (September 1 - 8, 2014)
I present to you the next issue of the review of the most interesting materials on the topic of data analysis and machine learning. This release has turned out quite voluminous, it has a lot of materials on Data Engineering. More and more materials appear from the KDD 2014 conference. As usual, there are articles about various machine learning competitions, including articles about the recent “ImageNet Large Scale Visual Recognition Challenge (ILSVRC)” competition. There are also quite a few examples of code in the R and Python programming languages. There is a mention of, I think, a very curious online course on “Introduction to Computational Finance and Financial Econometrics”.
Analysis of the results of the recently held ImageNet Large Scale Visual Recognition Challenge (ILSVRC), an annual image processing competition in which the Google team won first place.
Not so long ago, Data Modeling Adviser for MongoDB was published on Daprota's website - a very useful guide to data modeling in the MongoDb NoSQL database.
The author tells about the experience gained while participating in the AVITO.ru competition on Kaggle and about the analysis of various approaches to solving the problem that other participants of the competition used.
Continuation of a series of articles on text analysis and work with unstructured data. In this article, the author talks about possible approaches to solving the problem of constructing a dictionary when analyzing text data.
More recently, an online course has started on Coursera, which will be useful to those who are interested in statistics and the R programming language, as well as to those who are interested in using statistical methods in the financial sphere.
An article from Hortonworks' blog about plans for the new product Stinger.next, which will significantly improve many of the qualitative indicators of the performance of SQL queries when working with Hadoop.
A small news article about Google’s progress in machine learning, Deep Learning. The article does not address the technical details of the implementation of the Deep Learning algorithms.
Video from the “Big Data, Large Scale Machine Learning” course, which was held in 2013 and lasted 14 weeks, with Yann LeCun and John Langford as the main instructors.
A useful article from the Cloudera blog about how to translate MapReduce requests into the increasingly popular Apache Spark and understand the difference between the concepts in these two approaches.
5 ways to assess the accuracy of the predictive model available in the Caret machine learning library for the R programming language, described by the author of the popular blog MachineLearningMastery.
It is often difficult to keep track of all the news in data analysis and machine learning. The author of the popular blog MachineLearningMastery offers a small list of newsletters that can simplify the task of getting the latest news from the field of Data Science.