Overview of the most interesting materials on data analysis and machine learning №10 (August 18 - 25, 2014)
I present to you the next issue of the review of the most interesting materials on the topic of data analysis and machine learning. In this release, a lot of interesting materials for beginners. There are a couple of interesting video. There are materials on Data Engineering. As usual, a number of articles are devoted to code examples related to data analysis and machine learning. And traditionally, several articles are devoted to the topic of participation in machine learning competitions.
Data analysis and machine learning materials
MIT's Deep Learning Book The book from MIT in the very popular direction of machine learning Deep Learning. The book is not yet complete, but many chapters are already available to readers.
Data processing with R A small book that can be useful to all who work with this data, using the R programming language, dedicated to processing and cleaning data in the preprocessing phase, which is known to take a lot of time and takes a lot of effort from data analysis specialists.
The Hard Way to Learn Machine Learning - A Pony Story Nathan Taggart (Product Manager at New Relic) in this video tells his story of learning machine learning and about what mistakes to avoid in this challenging task. The video is designed for beginners in the topic of data analysis and machine learning.
What is R A small capacious overview of the programming language R with a description of the advantages and disadvantages.
What companies need to know about Big Data An article arguing that many companies might want to change the way they work with their data and focus more on modern trends in Data Science.
Unstructured text data analysis guide The first part of a series of articles from the popular portal Analytics Vidhya, devoted to an interesting topic of text analysis. This article describes the basic problems and issues, in future articles will describe the details of the implementation of these issues.
Analyzing data from Mario Garzia from Microsoft Data analyst Mario Garzia from Microsoft, in his article on the Microsoft Technet Machine Learning Blog, gives an interesting discussion about the current state of affairs in Data Science.
Time series visualization using googleVis library Not so long ago, the release of googleVis version 0.5.5 was released. This short post provides a very simple code example for visualizing time series using the googleVis library for the R programming language.
Microsoft Azure DocumentDB A small article about Microsoft's new NoSQL database called Azure DocumentDB.
The use of machine learning for trading (part 1) Introduction to the topic of using machine learning for trading. This series of articles has already been presented in reviews on data analysis and machine learning. In this case, the translation of the first part into Russian.
Fast HDF5 with Pandas An example of working with the format of storing information HDF5 from the Pandas data analysis framework for the Python programming language.
Interesting Deep Learning Resources The list of resources for the popular machine learning technology Deep Learning, compiled by the famous portal KDnuggets.
An example of solving the problem on Kaggle An example of a possible solution to the popular Kaggle machine learning competition is “Predict Bike Sharing Demand” using the Gradient Boosted Trees technique. The example uses the machine learning tool GraphLab Create.
Logistic Regression Job Visualization In machine learning, logistic regression is often used. This short post presents the visualization of the work of logistic regression in the form of an animated image.
Machine Learning and Computer Vision (Part 2) The second part of a series of articles from the Microsoft Technet Machine Learning Blog, dedicated to the use of machine learning in solving image recognition problems and the use of computer vision technologies. The article is small and written in simple language, without immersing in the details of this rather complex topic.
Hadoop Ecosystem A small useful article that gives a brief description of the main elements of the Hadoop ecosystem.
What is Big Data? An interesting little article in which the author discusses what Big Data is and an attempt is made to give the most simple description of this term.
Using expression in R Interesting article about the use of the function expression () in the programming language R.
Supervised learning block diagram of machine learning Many are familiar with this way of machine learning, like learning with a teacher (Supervised learning). In this short post in the form of a flowchart, a good visualization of a sequence of typical actions is presented during training with a teacher.
21 great schedule A few excellent examples of data visualization using various types of graphs and charts from the DataScienceCentral portal.
Sybil: Google's machine learning scaling system In this report, Tushar Chandra talks about the fate of Sybil on Google. Sybil is an important Google research project that implements various machine learning algorithms, allowing them to scale. This development is widely used by Google.
Four main languages for data analysis The results of the vote, conducted by the popular KDnuggets portal, about the most popular languages used for data analysis.
Math for machine learning The article is devoted to the question of the necessary mathematical skills necessary for the development of basic knowledge of machine learning. The author indicates that the article is a draft version and that over time additional information will appear in it.