📜 ⬆️ ⬇️

ITMO University Digest: materials for those who want to join Data Science

Today we have prepared for you a digest, in which we tried to collect the most interesting literary sources, articles, video courses and lectures (including those prepared by teachers, students and staff of the ITMO University) that will allow you to get acquainted with Data Science.

These materials cover both theoretical aspects of working with data, and practical - aimed at creating algorithms and writing programs.

Flickr / Thierry Leclerc / CC
')

Articles


Working with data is a new science.
The volume of scientific data is increasing at an astonishing rate, therefore there is a need for new mathematical methods and methods of analysis. It is not enough just to collect and store huge amounts of information, they need to be properly organized, and this requires a special structure. An article about how scientists implement non-trivial approaches to working with data.

List of machine learning resources. Part 1
Adapted selection of useful materials on machine learning, which were discussed by residents of Stack Overflow and Stack Exchange. Attention is paid to topics such as logistic regression, neural networks of direct distribution, processing of natural language, the method of support vectors, etc.

List of machine learning resources. Part 2
The second part of the adapted collection of useful materials: frameworks, presentations, interviews and other materials on the topic.

Columbia Pictures does not represent: what IMDB data can tell
A student of the Computer Engineering Department, Yuri Volkov, told how he analyzed dataset of the world's largest repository of information about IMDB films and what conclusions he reached.

Deep learning: A bit of theory
What you need to create artificial intelligence and what algorithms are used for this purpose. The complexity of implementation and solutions to problems.

Selection: More than 70 sources for machine learning for beginners
This list is intended for those who are just starting to explore the topic of machine learning, for example, using Python. Here you will find articles, courses, books, packages and tools, chats and discussions.

40 tools and techniques used by data analysts
The most common terms are what they mean and what they mean in the context of data science. Each item in the list is a link to several other articles of the portal.

Literature


“ Naked statistics. The most interesting book about the most boring science "
The book is suitable not only for data processing specialists. It contains the basics of statistical analysis, which will be useful in other areas of activity. The author of the book, Professor Charles Whelan, with humor and good examples teaches to find hidden relationships between phenomena.

Statistics: Tutorial
The book was developed in accordance with the program of the discipline "Statistics" of the ITMO University, and it contains the main methodological and methodological provisions on the theory of statistics and their application.

Journal " Scientific and Technical Bulletin of Information Technologies, Mechanics and Optics "
The journal is published on the basis of ITMO University and is one of the oldest scientific periodicals in the country. It contains a large number of articles on computer systems and information technology, including in-depth training and analysis of statistical data.

Doing Data Science: Straight Talk from the Frontline
This book is based on the Columbia University course and allows you to study in depth topics such as regression models, spam filtering, recommendation machines and big data.

Think Stats: Exploratory Data Analysis in Python
Think Stats focuses on simple techniques that you can use to research real-world data sets. It also provides a specific example of data from the National Institutes of Health.

“ Algorithms. Development Guide »
This is the most comprehensive guide to developing efficient algorithms. The first part of the book discusses the types of data structures, sorting algorithms, examples of the use of combinatorial search, heuristic methods and dynamic programming. In the second part, the author posted a list of references and a catalog of the 75 most common algorithmic problems with existing software implementations.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction
The book does not have a single line of Python or R code, but there are a lot of graphs and formulas. It covers a large number of areas: machine learning with and without a teacher, neural networks, decision trees, support vector machine and model ensembles. On the website of Stanford University you can download it for free.

“ Algorithms. Construction and Analysis "
The book is an exhaustive textbook covering the whole range of modern algorithms: from fast algorithms and data structures to polynomial-time algorithms and specialized substring search algorithms, computational geometry and number theory.

“ Algorithms. Development and application "
The reader first gets acquainted with the basic aspects of building algorithms, basic concepts and definitions, and then proceeds to the methods of building algorithms, undecidability and methods for solving unsolvable problems. The most complex topics are explained with simple examples.

Lean Analytics
The book tells how to use data in a business environment. She teaches why it is important to focus on one key metric when evaluating the company's workflows, and she also talks about six types of online business and data management strategies for each of them.

Analytics Lessons Learned: Free e-book with 13 case studies
This electronic edition is a peculiar addition to the previous book. It contains stories about how companies such as Airbnb, Backupify, Sincerely, Swiffer and EMI work with data.

I Heart Logs: Event Data, Stream Processing, and Data Integration
This small book contains only 60 pages, but it gives a good idea of ​​the technical side of data collection and processing. The reader will also learn what data the infrastructure specialists of various companies are working with.

Data Science at the Command Line
This book is designed to expand your capabilities in the field of data analysis. This is also the only book containing information about data analysis using the command line.

" Python and data analysis "
This section covers reformatting, cleaning, and processing data in Python. It can also be considered as a modern practical introduction to the development of scientific applications in Python, focused on data processing. This is a book about those parts of the Python language and libraries for it that are needed to effectively solve a wide range of analytical tasks.

“ R in action. Analysis and visualization of data in the language R »
Guide to teaching the language R, which focuses on practice. Here are useful examples of statistical data processing and describes methods for working with confused and incomplete data. It also teaches the reader how to properly present data for visual research.

" Hadoop. Detailed Guide »
Apache Hadoop is an open source framework that implements the computational paradigm known as MapReduce. This book will show you how to use the full power of Hadoop to create reliable, scalable distributed systems and handle large data sets.

The Basics of Data Science and Big Data. Python and the science of data "
Each of the chapters of this book is devoted to one of the most interesting aspects of data analysis and processing. You will begin with the theoretical foundations, then proceed to the algorithms of machine learning, working with huge arrays of data, NoSQL, streaming data, in-depth analysis of texts and visualization of information. Numerous practical examples use Python scripts.

Video courses


Learn more about Machine Learning and Catch the Robot: 10 Online Courses to Enroll
A selection of 10 online courses from leading companies and universities in the world that are never too late to sign up for. The programs are suitable for those who have long wanted to try MOOK (mass open online course), but decided only now.

Methods and algorithms of graph theory
The goal of the course is the formation of basic knowledge and skills to solve the most important and frequently encountered in practice graph problems. As part of the online course, video lectures are used along with surveys on their individual parts, exercises, interactive demonstrations and virtual laboratories to form and control the skills of algorithmic problem solving on graphs.




Functional programming: basic course
The course studies the basics of the functional approach to programming and practical questions of Lisp programming. Functional languages ​​have many interesting features, familiarity with which expands the horizons of the programmer.

Web application programming and development
The goal of the course is the formation of basic knowledge and skills to solve the most important and frequently encountered in practice programming tasks in the Python language. Attention is also paid to the creation of systems and applications using Django CMS. An addition to the course can be a free e- book on Python.

Data 8: Basic Data Science
The course provides an opportunity to familiarize with important concepts and skills of programming and statistical analysis, offering to work with real data sets: economic and geographical and information from social networks. All software used in the course is open source.



Machine Learning with Andrew Un
A machine learning course from Andrew Una, a computer science scientist from Stanford University. Andrew begins by explaining the principles of machine learning, and then moves smoothly to the algorithms and functions used.

PS ITMO University teachers conduct online courses on other topics: geometric optics , rheology , management . A full list of available courses can be found here .

Source: https://habr.com/ru/post/326894/


All Articles