Overview of the most interesting materials on data analysis and machine learning №5 (July 7 - 21, 2014)
I present the next issue of the review of the most interesting materials on the topic of data analysis and machine learning. As always, there are materials on machine learning algorithms (including Deep Learning). A few practical articles on the popular Scikit-Learn Python machine learning package. There are articles on the practical application of the R. language. A number of materials are devoted to the subject of Data Engineering. There are interesting articles about popular projects 'Google Brain' and 'Project Adam'.
Data analysis and machine learning materials
About Google Brain [EN] Interesting thoughts about Google’s research project, which bears the unofficial name 'Google Brain'.
Microsoft's artificial intelligence system 'Project Adam' [EN] A fairly large article about a new project from Microsoft Research called 'Prjoject Adam'. To some extent, this project can be called a response from Microsoft to the project 'Google Brain'.
Startup Clarify [EN] A small interesting story about a new startup in the field of artificial intelligence and machine learning, Clarify, which has not yet been bought by any of the software giants and that is engaged in quite interesting research in the field of pattern recognition and image processing.
R naming conventions [EN] The discussion of naming conventions in the R programming language, with which, as is well known in this language, there are big problems and uncertainty in the standards.
Adjusting algorithm parameters with Python Scikit-Learn [EN] Continuing discussion on working with scikit-learn - a popular machine-learning library for Python. In this case, we will discuss the adjustment of the parameters of the algorithm.
List of resources for NoSQL, Big Data and Machine Learning [EN] A large list of resources in various areas of data analysis (distributed computing, graph databases, time series analysis, data visualization, search engines and other areas).
Machine Learning with Java [EN] A small overview of technologies and products for machine learning using the Java programming language.
Introduction to Microsft Azure Machine Learning [EN] A brief introduction to the new Microsoft Azure Machine Learning machine cloud product, which is currently in the Public Preview stage.
Self-learning computers from Darpa [EN] An article about the project Darpa, which is designed to develop the technology of self-learning computers and their application.
10 Tips for Deep Learning [EN] 10 small tips for better results when using Deep Learning machine learning techniques.
Basics of data analysis with Python: libraries and data structures [EN] The article is devoted to the first steps in data analysis using Pyhton and additional libraries. This is a continuation of the discussion on this topic. This article focuses on libraries and data structures.
Preparing data with Python Scikit-Learn [EN] Continuing discussion on working with scikit-learn - a popular machine-learning library for Python. In this case, we will focus on the preparation of data, namely the Rescaling Data process.
Feature Selection process using Python Scikit-Learn [EN] Another article about working with scikit-learn - a popular machine-learning library for Python. In this case, we will focus on the process of Feature Selection in machine learning.
R language rating [EN] The latest rating of programming languages from IEEE, including you can see that R is in 9th place among all languages.
Data loading with Scikit-Learn [EN] A small but useful article about loading data using the popular Python library for machine learning scikit-learn.
Dependencies of popular R libraries [EN] A small article about which libraries depend on popular packages of the R language (ggplot2, data.table, plyr, knitr, shiny, xts, lattice) and how many libraries will be installed in the end if all the popular libraries from this list are installed.
Processing Time Series with Apache Crunch [EN] An article from the Cloudera blog about working with time series (time series) using Apache Crunch with Java code examples.
Fast function for 2x2 tables in the R language [EN] A small example of creating your own accelerated function to create 2x2 tables in the R language, instead of the standard table function.
HDFS and MapReduce in simple language [EN] Describing basic Hadoop components such as the Hadoop Distributed File System (HDFS) and MapReduce is quite simple.
Data Origami: Data Science Screencasts [EN] A small overview of the Data Origami website, where you can find many screencasts of various levels of complexity on the topic of data analysis and machine learning. True site has a monthly subscription fee.