Cam Analog Computer / Wiki IndicatorIn our blog, we have already talked about the development of a quantum communication system and how advanced programmers are trained from ordinary students. Today we decided to return to the topic of machine learning and provide an adapted ( source ) selection of useful materials. ')
This list is for those who are just starting to learn about machine learning, for example, using Python (if you want to start learning Python, this article will help you).
Machine learning is only one of the mathematical disciplines related to the concept of “data”. To understand data analytics, data analysis, data science, machine learning, and big data, read this material.Here are the tools you need:
You can install Python 3 and all the necessary packages in a few clicks using the
Anaconda Python build. Anaconda is a fairly popular distribution among people involved in machine learning.
It doesn't matter if you have Python 2.7 installed. There is no need to switch to Python 3. Instead of Anaconda, you can use pip or virtualenv. Can't decide? Read
this material.
To
get started,
get acquainted with IPython Notebook (it will take 5-10 minutes). You can also watch this
video . Next, consider a small
example (it will take 10 minutes) to classify numbers using the
scikit-learn library.
Visual introduction to machine learning theory
Let's learn more about machine learning: about ideas and features. Read the
article by Stephanie Yee and
Tony Chu “A Visual Introduction to Machine Learning. Part 1".

Read the
article by Professor Pedro Domingos. Take your time while reading, take notes. There are two main points in the article:
Data alone is not enough. Domingos wrote: “... there is nothing surprising in the fact that learning requires knowledge. Machine learning cannot get something from nothing, but it can get more from less. Training is like farming, where nature does most of the work. Farmers give the seed nutrients to grow crops. So here: to create a program, you need to combine knowledge and data. ”
A large amount of data is better than an elaborate algorithm. Do not try to reinvent the wheel and complicate decisions:
choose the shortest path leading to the goal. Domingos says: “As a rule, a“ stupid ”algorithm with a large amount of data is superior to a“ smart ”algorithm with a small amount of data. In machine learning, data always plays a major role. ”
So, knowledge and data are crucial. This
means that you need to complicate the algorithms only when you really have no choice.

The diagram is based on a slide from Alex Pinto's
lecture “Mathematics on the guard of security: a monitoring guide using machine learning”.
Learn by example
Select and review one or two of the examples below.
- Face recognition on photos from Labeled Faces in the Wild site database.
- Machine learning based on data on the crash of the Titanic. It demonstrates the methods of data conversion and analysis, as well as visualization techniques. There are examples of machine learning methods with a teacher.
- Predicting election results : using the Nate Silver model to make a forecast of the results of the 2012 US presidential election, published by The New York Times.
Here are more tutorials and reviews:
Other sources where you can find IPython notebooks:
- Gallery of interesting IPython notebooks: statistics, machine learning and data science.
- Great Gallery of Fabian Pedregosa (Fabian Pedregosa).
Machine Learning Courses
It will be useful if you start working on some small independent project - so you will have the opportunity to apply this knowledge in practice. You can use one of
these datasets.
The
book “The Elements of Statistical Learning” is also often recommended, but it usually serves as a reference book. The book is free, so download it or add to browser bookmarks.
There are also these online courses:
- The course "Machine Learning" by Professor Pedro Domingos of the University of Washington.
- Workshop on the science of data.
- The science of data .
- Video “Introduction to machine learning with scikit-learn” by Kevin Markham. After watching the video, you can take an online course on data science (there are earlier versions: 7 , 5 , 4 , 3 ).
- Harvard Course CS109 - The Science of Data.
- Advanced statistical computing course (Vanderbilt University course BIOS8366).
Feedback on courses and various discussions:
- Check out Jack Golding's answer to Quora. There you will find a link to the Coursera specialization “Data Science” - if you don’t need a certificate, you can complete all 9 courses for free.
- Another Quora discussion : how to become a data processing and analysis specialist?
- A large list of data science resources from Data Science Weekly, as well as a list of open online courses.
Learning Pandas
To work with Python, you need to get acquainted with the Pandas package. Here is a list of materials that will help:
- Main : familiarity with Pandas,
- Manual : a few things in Pandas that I would like to know before (IPython notebooks),
- Useful Pandas Code Snippets,
You should also pay attention to these resources:
More materials and articles
- Available book John Foreman (John Foreman) “Data Smart”,
- Data Science Course with IPython Notebooks,
- Article : Major Difficulties in the Data Science Section (Read the article and commentary by Joseph McCarthy)
- IPython : Key Skills for Data Specialists.
Questions, Answers, Chats
At the moment, the best place to find answers to your questions is the
section on machine learning on stackexchange.com. There is also a subreddit:
/ r / machinelearning . Join the scikit-learn
channel on Gitter! You should also pay attention to
discussions on Quora and a large
list of materials on data science from the site Data Science Weekly.
Other things you need to know.
- Data Science : An article by John Foreman, a data processing and analysis specialist for MailChimp.
- Article : Eleven factors leading to retraining, and how to avoid them.
- Worthy article : "Machine learning: the overhead that a technical debt entails" ("Machine Learning: The High-Interest Credit Card of Technical Debt"). The purpose of this article is to identify specific risk factors for machine learning and create patterns with which to avoid them.
- John Foreman : "The Dangerous World of Machine Learning."
- Kdnuggets : "Costs of machine learning systems."
You need practice. The user with the Olympus nickname on Hacker News
noted that for this it is necessary to participate in contests and competitions.
Kaggle and
ChaLearn are platforms for researchers where you can try your hand at various contests.
Here you will find code examples for the Kaggle contest. Another option:
HackerRank .
Listen and read what the winners of Kaggle contests say about their proposed solutions. For example, read the
blog "No Free Hunch".
Competitions or contests are just one way to practice. You can start doing research:
- Start with the question. “The most important thing about data science is the question,” says Dr. Jeff T. Leek. Start with the question, then find the real data and analyze it.
- Announce the results and ask for expert evaluation.
- Eliminate the problems found. Share your discoveries.
Learn more about the scientific method
here and
here .
Here are a couple more machine learning tutorials: