Overview of the most interesting materials on data analysis and machine learning №11 (August 25 - September 1, 2014)
I present to you the next issue of the review of the most interesting materials on the topic of data analysis and machine learning. In this release there is a lot of diverse information. There are many articles on the subject of Data Engineering. There are materials for beginners and a few video lectures. As usually mentioned competitions for machine learning at Kaggle. An interesting article about startups in the field of Data Science. An interesting article about improving game AI through the use of machine learning.
Data analysis and machine learning materials
Predictive modeling, teacher training, and pattern classification A good article on machine learning, which will be interesting including for beginners, which covers such topics as learning with a teacher, visualization during machine learning, input data processing, feature enginering, sampling and others.
Let's talk for Hadoop Introduction to the Hadoop Ecosystem in Russian. In the end there is a good set of links to useful materials on this topic.
How to become a Data Scientist An interesting article from the DataScienceCentral portal for those interested in the topic of Data Science. The article briefly describes the concept of Data Scientist, identifies 4 areas in the profession and discusses the tools that are necessary for a data analyst.
Using the pbapply () function An interesting example of using the pbapply () function from the pbapply library for the R programming language.
Azure DocumentDB Article about the new NoSQL database from Microsoft called Azure DocumentDB. DomentDB is still in the preview stage. At the end of this article there is a good set of links on the topic.
Data Science startups from Y Combinator In the field of Data Science at the present there are many opportunities for business development. This article provides a list of Data Science startups in 2014 from the well-known incubator startups Y Combinator.
New Kaggle Competition: Epilepsy Seizure Prediction Challenge Not so long ago at the Kaggle started a new competition for machine learning American Epilepsy Society Seizure Prediction Challenge. The competition will last until November 17, 2014.
33 unusual problems that can be solved with Data Science The author of the popular DataScienceCentral portal in his short post published a list of 33 problems from various areas of life activity, which, according to Vincent Granville, can be solved using Data Science.
List of interesting literature A list of interesting books that may be interesting to read to those who are interested in the topic of data analysis.
New dataset from Microsoft Research Just yesterday, Microsoft Research published an interesting dataset called Microsoft Research Dense Visual Annotation Corpus.
How machine learning helped improve game AI Rather curious article written by a good living language, about how the use of machine learning techniques helped the author of the article to significantly simplify and improve the effectiveness of AI for the game bot.
Convergence of machine learning and Big Data The article presents interesting observations of a fairly well-known data analyst Mikko Braun on the need to bring together machine learning and Big Data communities, and that now they are actually quite far from each other, which leads to certain problems and inconveniences.
Unstructured Data Analysis Continuation of a series of articles on text analysis and work with unstructured data. In this case, the author proceeds from asking questions to practical aspects and discusses the processing and cleaning of unstructured text data, in preparation for the further steps of analyzing this data.
Using Big Data in the securities market The author of the article offers 3 practical tips on how to use Big Data for investment in the securities markets that anyone can use.
Online course "Data Analysis and Statistical Inference" On Monday, the first of September, the Coursera starts the second time a very well-proven online course on data analysis and statistics entitled “Data Analysis and Statistical Inference” from Duke University.
Using Bayesian machine learning methods with Apache Spark A small interesting article from the authors of the blog Cloudera, which provides an example of the possibility of using Bayesian methods of machine learning using the popular product of the Hadoop family called Apache Spark and the PyMC library for the Python programming language.
Facts and Myths about Big Data A small interesting article from the popular insideBIGDATA portal, in which the author discusses the issues of the now popular Big Data topic and shares his thoughts about common misconceptions in this area.
12 tips on MongoDB A small article that contains 12 useful tips for those who want to use the popular NoSQL MongoDB database in combat conditions.
John Chambers: Interfaces, Performance, and Big Data John Chambers in this useR! The 2014 conference talks about the past, present, and future of the R programming language in a discussion called Interfaces, Efficiency and Big Data.
Write operations in MongoDB An article that describes well the subtleties of the issue of recording and updating in MongoDB, giving several modes of working with MongoDB when updating data: Unacknowledged, Acknowledged, Journalled, etc.
Nonlinear classification in R using decision trees 7 types of nonlinear classification using decision trees with examples of code in the R programming language from the author of the popular blog on data analysis MachineLearningMastery.
Impala: future plans A small article from the Cloudera blog about the company's plans for the future of a popular Hadoop product called Cloudera Impala, which allows you to work with this Hadoop data using SQL queries.
Slamdata: SQL queries in MongoDB Announcement of a rather interesting product SlamData, which will allow to execute SQL queries to data located in MongoDB. At the moment, the product is in beta testing, release is scheduled for early October of this year.