Review of the most interesting materials on data analysis and machine learning №9 (August 11 - 18, 2014)
I present to you the next issue of the review of the most interesting materials on the topic of data analysis and machine learning. In this release, a lot of interesting video. Some number of materials devoted to the topic of Data Engineering. There are a lot of practical code examples in the R and Python programming languages in this release. As usual, a lot of materials are devoted to machine learning algorithms.
Your own image search The author tells his own development in Python, which allows you to simplify work with images on the local computer.
Alex Smola talks about scalable machine learning This is another lecture from a series of lectures that were presented at the Machine Learning Summer School (MLSS '14) summer school in Pittsburgh. In this video lecture, a well-known computer science expert, including machine learning, Alex Smola (a Google researcher and professor at Carnegie Mellon University) touches on a very interesting and important topic of scaling in machine learning.
List of leading researchers in the field of data analysis An interesting list of leading researchers and scientists in the field of data analysis and Data Science from the popular KDnuggets portal, based on the processing of data results with Microsoft Academic Search.
Selecting a subset of records from a large file When working with a large file in the R programming language, it is often more convenient to work with a small random subset of records from the entire data set. This short article presents sample code for extracting a subset of records from a file.
Apache Spark with IPython A small article from Cloudera’s blog about Apache Spark and IPython integration.
PyStruct machine learning library A library for machine learning, namely Structured Learning using the Python programming language. The library was created with the orientation on the similarity of design with the popular library of machine learning scikit-learn.
Fast learning with Vowpal Wabbit A small article from the Microsoft Technet Machine Learning Blog about the open-source machine learning system Vowpal Rabbit, which is developed by Microsoft Research and which has the ability to integrate with the cloud-based machine learning platform Microsoft Azure ML.
SAS in the cloud This article briefly describes the work of SAS in the AWS cloud from Amazon, as well as the integration of the SAS platform with some AWS services.
How to make inclined signatures on the axes of the chart How to make inclined signatures on the axes of the graph is a question that often arises when using standard visualization tools in the R programming language. In this article there is a small example of code that allows you to make signatures to the axes at different angles of inclination.
Comparison of data analysis software Comparative table of software products (R, MATLAB, SAS, STATA and SPSS) for built-in support for various statistical analysis tools in them.
18 basic tools of the Hadoop family The number of new tools around Hadoop is growing rapidly and it is quite difficult to follow all the innovations in this direction. In this article you can find a list of 18 major with a brief description of each.
SemPlot library for R language A small example of using the semPlot library, which is intended for data visualization of Structural equation modeling (SEM), which allows you to explore various complex relationships between variables.
Prisoner's dilemma: an example in R An interesting example of the implementation of the fundamental problem from the game theory “Prisoner's Dilemma” using the R programming language.
Some basic statistics A few examples of simple operations from statistics with examples in the Python programming language.
GrapherR: GUI visualization system for R GrapherR is a library for the R programming language, which allows you to visualize various data, but what is very important - this library has its own GUI.
Convolutional neural networks The publication is devoted to the topic of convolutional neural networks, with a sufficiently deep immersion in the material and theory on this interesting and popular topic.
So you wanted to try Deep Learning? The article is devoted to the popular topic of Deep Learning, but rather is a collection of useful and interesting resources on this topic that will allow you to better understand the topic of Deep Learning.
OpenML Short Description A small article about the increasingly popular portal for machine learning OpenML, which, among other things, can be used to participate in machine learning contests.
Research data analysis with Python and Pandas A very interesting article about research data analysis using Python and Pandas, with code examples based on the popular dataset “Titanic” with Kaggle.
Building an infrastructure for machine learning In this interesting video with a very easy presentation style, Josh Willis (Senior Director of Data Science at Cloudera) will tell you what is working at Cloudera at the moment and about using machine learning in a live environment with lots of data or Industrial Machine Learning, which is often much harder than academic machine learning.
New in CDH 5.1: HDFS Caching Reading This article will tell you about the new functionality in CDH 5.1: read caching in HDFS, which potentially allows a significant increase in read speed in systems that use HDFS.