📜 ⬆️ ⬇️

Selection: More than 70 sources for machine learning for beginners



Cam Analog Computer / Wiki Indicator

In our blog, we have already talked about the development of a quantum communication system and how advanced programmers are trained from ordinary students. Today we decided to return to the topic of machine learning and provide an adapted ( source ) selection of useful materials.
')
This list is for those who are just starting to learn about machine learning, for example, using Python (if you want to start learning Python, this article will help you).

Machine learning is only one of the mathematical disciplines related to the concept of “data”. To understand data analytics, data analysis, data science, machine learning, and big data, read this material.

Here are the tools you need:


You can install Python 3 and all the necessary packages in a few clicks using the Anaconda Python build. Anaconda is a fairly popular distribution among people involved in machine learning.

It doesn't matter if you have Python 2.7 installed. There is no need to switch to Python 3. Instead of Anaconda, you can use pip or virtualenv. Can't decide? Read this material.

To get started, get acquainted with IPython Notebook (it will take 5-10 minutes). You can also watch this video . Next, consider a small example (it will take 10 minutes) to classify numbers using the scikit-learn library.

Visual introduction to machine learning theory


Let's learn more about machine learning: about ideas and features. Read the article by Stephanie Yee and Tony Chu “A Visual Introduction to Machine Learning. Part 1".



Read the article by Professor Pedro Domingos. Take your time while reading, take notes. There are two main points in the article:

Data alone is not enough. Domingos wrote: “... there is nothing surprising in the fact that learning requires knowledge. Machine learning cannot get something from nothing, but it can get more from less. Training is like farming, where nature does most of the work. Farmers give the seed nutrients to grow crops. So here: to create a program, you need to combine knowledge and data. ”

A large amount of data is better than an elaborate algorithm. Do not try to reinvent the wheel and complicate decisions: choose the shortest path leading to the goal. Domingos says: “As a rule, a“ stupid ”algorithm with a large amount of data is superior to a“ smart ”algorithm with a small amount of data. In machine learning, data always plays a major role. ”

So, knowledge and data are crucial. This means that you need to complicate the algorithms only when you really have no choice.



The diagram is based on a slide from Alex Pinto's lecture “Mathematics on the guard of security: a monitoring guide using machine learning”.

Learn by example


Select and review one or two of the examples below.


Here are more tutorials and reviews:


Other sources where you can find IPython notebooks:



Machine Learning Courses


It will be useful if you start working on some small independent project - so you will have the opportunity to apply this knowledge in practice. You can use one of these datasets.

The book “The Elements of Statistical Learning” is also often recommended, but it usually serves as a reference book. The book is free, so download it or add to browser bookmarks.

There are also these online courses:


Feedback on courses and various discussions:


Learning Pandas


To work with Python, you need to get acquainted with the Pandas package. Here is a list of materials that will help:


You should also pay attention to these resources:


More materials and articles



Questions, Answers, Chats


At the moment, the best place to find answers to your questions is the section on machine learning on stackexchange.com. There is also a subreddit: / r / machinelearning . Join the scikit-learn channel on Gitter! You should also pay attention to discussions on Quora and a large list of materials on data science from the site Data Science Weekly.

Other things you need to know.



You need practice. The user with the Olympus nickname on Hacker News noted that for this it is necessary to participate in contests and competitions. Kaggle and ChaLearn are platforms for researchers where you can try your hand at various contests. Here you will find code examples for the Kaggle contest. Another option: HackerRank .

Listen and read what the winners of Kaggle contests say about their proposed solutions. For example, read the blog "No Free Hunch".

Competitions or contests are just one way to practice. You can start doing research:

  1. Start with the question. “The most important thing about data science is the question,” says Dr. Jeff T. Leek. Start with the question, then find the real data and analyze it.
  2. Announce the results and ask for expert evaluation.
  3. Eliminate the problems found. Share your discoveries.

Learn more about the scientific method here and here .

Here are a couple more machine learning tutorials:

Source: https://habr.com/ru/post/276479/


All Articles