Today it is three years since the launch of one of our educational projects, the
Mail.Ru Technosphere , implemented jointly with the Faculty of the VMK of Moscow State University. Lomonosov. The Technosphere program is designed to train specialists in the field of big data. Initially, it was designed for one year and consisted of six disciplines. However, a year later, we revised the program and made it biennial. During the four semesters, students study 12 disciplines, performing a large amount of practical work. At the same time, a preparatory course “Algorithms and Data Structures” was developed.
In the Technosphere take students 2-4 courses. Despite the fact that the scheme of entrance examinations in all our educational projects is the same (students take an online test and pass a face-to-face interview), in the Technosphere we focus more on the basic knowledge of higher mathematics. In addition to giving lectures, we created a laboratory where students work with real-life tasks that we encounter at Mail.Ru Group. For example, they are trying to improve analytical algorithms, to create certain heuristics. That is, they are doing the same thing that they would have done during a regular internship at the company. Since the fall of 2015, research began in the laboratory. For example, the possibilities of using neural networks for solving various business problems are being studied.
And in honor of the birthday, we post a list of training materials that are recommended for the study of our students throughout the two-year course.
Course: Algorithms for intelligent processing of large amounts of data

')
Lectures, articles and other materialsMathematical monk blog
Large-Scale High-Precision Topic Modeling on Twitter
Comparison of cluster algorithms
Carnegie Mellon Lectures on Statistics: Data Mining, Lecture 22
Selection of materials on IPython and Jupyter
Visual Information Theory article on the theory of information
Sergey Nikolenko's blog on Habré
Learning representations by back propagation errors
Ruslan Salakhutdinov - Deep Learning
Yahoo! Hadoop tutorial
Tentative NumPy Tutorial
100 NumPy exercises
CRISP-DM User Guide
IBM CRISP-DM Guide
MapReduce: Simpli fi ed Data Processing on Large Clusters
LiteraturePattern Recognition and Machine LearningHere algorithms of inference are described, allowing to get quick approximate answers when it is permissible in specific situations. Graphic models are used to describe the probability distribution.
Data Mining Practical Machine Learning Tools and TechniquesThe book describes the concepts of machine learning and gives practical advice on the use of tools and techniques in real data analysis tasks.
Introduction to Information RetrievalThe book teaches how to efficiently extract information using online search, text classification and clustering. It covers all aspects of the design and implementation of systems for collecting, indexing and searching for documents, methods for creating developing systems, the use of machine learning for working with text collections.
Mining of massive datasetsHere, the emphasis is on practical algorithms used to solve key problems in data analysis. The authors explain the various tricks associated with hashing, sensitive to locality, and with algorithms for processing fast incoming data. Also touched on the issues of the web, search for frequent sets of objects and clustering.
Pattern ClassificationThe book is devoted to neural networks, statistical pattern recognition, machine learning theory and invariance theory. Practical examples and comparisons of different methods are also given.
Machine Learning: a Probabilistic PerspectiveThe book is an introduction to machine learning based on a unified probabilistic approach.
An Introduction to Data ScienceA book for those who make the first steps in data processing. Code samples on R are presented for solving various interesting problems.
Data Mining and Knowledge Discovery HandbookIt describes key ideas, theories, standards, methodologies, trends, complexities, and methods for applying in-depth data analysis.
Stochastic Gradient Descent TricksThis is the first chapter of the book “Neural Networks, Tricks of the Trade”, it discusses the method of stochastic back-propagation for training neural networks. In essence, this is a type of stochastic gradient descent technique.
Neural Networks and Learning Machines (3rd Edition)The book discusses the modern methods of application of neural networks from an engineering point of view. Matlab code samples can be downloaded
from here .
The elements of statistical learningIt describes important ideas in the areas of statistics, data processing, machine learning and bioinformatics.
Ensemble Methods: Foundations and AlgorithmsThe book describes the theory and algorithms of machine learning, from simple to more complex.
Course: Introduction to Data Analysis
Lectures, articles and other materialsTime Series Analysis and Its Applications: With R Examples
Mathematical statistics
R: Data analysis and visualization
R documentation
IRkernel
Seaborn: statistical data visualization
SF GIS Crime
Apache Maven Guide
Junit
Java Guide
Lambda expressions in java 8
US Government Open Data
Data from the US Sociological Service
UN data
EU Open Data Portal
Dive into Python
Python documentation
CRISP-DM User Guide
IBM CRISP-DM Guide
LiteratureR in action. Analysis and visualization of data in the language RThis is a guide to learning the R language with particular attention to practical tasks. The book presents useful examples of statistical data processing, describes elegant methods of working with confusing and incomplete data, as well as with data, the distribution of which is different from the normal and with which it is difficult to cope with conventional methods. You will also master the extensive graphic possibilities for visual research and data presentation.
Statistical analysis and data visualization with RToday, the R language is the undisputed leader among the freely distributed statistical analysis systems. Leading universities in the world, analysts of major companies and research centers regularly use R when conducting scientific and technical calculations and creating large information projects. The widespread teaching of statistics on the basis of this system and the full support of the scientific community have led to the fact that bringing R code scripts is gradually becoming a generally accepted standard both in journal publications and in informal communication of scientists all over the world.
Data science for businessYou will learn how to improve the interaction between business and information processing specialists, how to enter data processing into the company's business processes, how to develop “thinking in data analysis style”, how to use scientific methods when making business decisions, etc.
Course: Advanced C / C ++ Programming
Lectures, articles and other materialsHow to start working with GitHub: quick start
Debugging Programs with GDB
Work with Valgrind
UNIX operating system for programming students
Introduction to operating systems
LiteratureProgramming with POSIX ThreadsThe book will give you an understanding of the flow and reveal the possibilities of this programming mode for use in real projects. It discusses in detail the IEEE OS interface standard - POSIXAE (Portable Operating System Interface) threads, often called Pthreads. Designed for experienced C-programmers.
Linux System Programming: Talking Directly to the Kernel and C LibraryThis is a system programming guide for Linux, a system call manual and a manual for writing fast code competently.
Advanced Programming in the UNIX EnvironmentThis book has been a desktop for UNIX programmers for over 20 years. In the latest edition of the information is updated. The author consistently leads readers, working with files, directories and processes, processing signals and terminal I / O commands. It also discusses threads and multi-threaded programming.
Unix programming artThe book describes a good Unix programming style, a variety of available languages, their advantages and disadvantages, various IPC techniques and development tools. The author analyzes the Unix philosophy, culture and main traditions of the community formed around it. The book explains the best practical techniques for designing and developing programs in Unix.
At the same time, the models and principles described in the book will be in many respects useful to Windows developers. Particularly considered styles of user interfaces Unix-programs and tools for their development. A separate chapter is devoted to the description of the principles and tools for creating good documentation.
Course: Multi-threaded C / C ++ Programming
Lectures, articles and other materialsBeej Guide
Fast portable non-blocking network programming with Libevent
FD passing for DRI.Next
Nanomsg documentation
Useful materials in C ++
LiteratureUnix programmingThis practical guide will help you study the features of system calls for different implementations of UNIX and UNIX-like systems, which will allow you to create universal portable applications. It considers inter-process and network communication, terminal and file I / O, signal management, multithreading, real-time operation and much more.
Stevens W. UNIX. Network Application DevelopmentThe book is devoted to the creation of web servers, client-server applications or any other network software in the UNIX operating system. The book includes a description of key modern standards, implementations and methods.
Jeff Alger. C ++: Programmer's LibraryFrom the book you can learn about the nontrivial possibilities of one of the most ingenious object-oriented languages. The author talks about the intricacies of C ++ programming, specific problems that arise when developing software systems, and how to solve them.
Course: Information Search. Part 1
Lectures, articles and other materialsThe list of libraries and frameworks that will help in the processing of natural language:
NLTK
Freeling
Gensim
Neural language models in distributive semantics
Word processing
Deep Structured Semantic Model / Deep Semantic Similarity Model
Learning Deep Structured Models for Web Search using Clickthrough Data
LiteratureFoundations of Statistical Natural Processing Processing (Chapter
Collocations )
The book is an introduction to the statistical processing of natural languages. Here is the theory and algorithms for creating the necessary tools.
Course: Methods of processing large amounts of data
Lectures, articles and other materialsStylization of images using neural networks: no mysticism, just matan
Convolution Arithmetic in Deep Learning. Part 2
Deep feedforward neural networks
A guide to convolution arithmetic for deep learning
A Neural Algorithm of Artistic Style
TensorFlow VGG-16 pre-trained model
Inception in TensorFlow
Efficient BackProp
Delving Deep into Rectifiers: Human-Level Performance on ImageNet Classification
Dropout: A Simple Way to Prevent Neural Networks from Overfitting
Batch Normalization: Accelerating Deep Network Training
An overview of gradient descent optimization algorithms
A Practical Guide to Training Restricted Boltzmann Machines
Neural Networks and Learning Machines (3rd Edition)
A Beginner's Guide To Understanding Convolutional Neural Networks
A Beginner's Guide To Understanding Convolutional Neural Networks. Part 2
The 9 Deep Learning Papers You Need To Know About (Understanding CNNs Part 3)
Spatial Transformer Networks
A Neural Algorithm of Artistic Style
Playing Atari with Deep Reinforcement Learning
A Neural Conversational Model
DCGAN neural network (2)
Jupyter notebook
Awesome tensorflow
Tensor with unspecified dimension in tensorflow
What is the difference?
A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the "echo state network" approach
Finding Structure in Time
The Unreasonable Effectiveness of Recurrent Neural Networks
Generating Text with Recurrent Neural Networks
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
Composing Music with LSTM Recurrent Networks - Blues Improvisation
Hubel & wiesel
Cognitron and Neocognitron
Neocognitron: A Self-organizing Mechanism of Pattern Recognition
LeNet-5, convolutional neural networks
Convolutional Neural Networks (LeNet)
ImageNet Classification with Deep Convolution Neural Networks
Very Deep Convolutional Networks for Large-Scale Image Recognition
Classifying plankton with deep neural networks
Visualizing and Understanding Convolutional Networks
Transfer learning
Learning representations by back propagation errors
A Growing Neural Gas Network Learns Topologies
Learning multiple layers of representation
An introduction to information retrieval
Active Learning to Rank
Semi supervised learning tutorial
Combined labeled and unlabeled data with co-training
Neural Networks for Machine Learning
Variational inference
Explaining "Explaining away"
A fast learning nets algorithm
Semantic hashing
Dropout: A Simple Way to Prevent Neural Networks from Overfitting
Rectified Linear Units Improve Restricted Boltzmann Machines
A Practical Guide to Training Restricted Boltzmann Machines
Exponential Family Harmoniums
Gaussian-binary Restricted Boltzmann Machines On Modeling Natural Image Statistics
Improved Learning of Gaussian-Bernoulli Boltzmann Restricted Machines
Learning Deep Architectures for AI
Textbook "for beginners" on neural networks
Calculus on Computational Graphs: Backpropagation
Visual Information Theory article on the theory of information
A Step by Step Back Propagation Example
Literaturehttp://www.deeplearningbook.orgThis guide is for students and practitioners, which will help to get started in the field of machine learning in general and depth learning in particular.
Course: Methods of distributed processing of large amounts of data in Hadoop
Lectures, articles and other materialsWriting an Hadoop MapReduce Program in PythonCourse: Information Search. Part 2
Lectures, articles and other materialsThe list of libraries and frameworks that will help in the processing of natural language:
Neural language models in distributive semantics
Word processing
Deep Structured Semantic Model / Deep Semantic Similarity Model
Learning Deep Structured Models for Web Search using Clickthrough Data
* * *
And if you don’t have enough of these materials, we remind you that topical lectures and master classes on programming from our specialists in Technopark, Technosphere and Technotrack projects are still published on Tekhnostrim channel. The following courses of Technosphere are available: