Machine Learning: Questions and Answers

As you have already noticed, we often pay attention to the topic of machine learning. So, we talked about deep learning , wrote about working with data and adapted various collections of sources on the topic: 1 , 2 , 3 .

Today we decided to look at the most interesting questions and answers on the topic of machine learning on the Quora resource.
')
Which programming language is best for machine learning?

Joshua Bengio (head of the Algorithm of Machine Learning Algorithms, Montreal) says that they have been programming in Python for many years, along with other languages. But he would like to use something like Python, which would have a more powerful compiler that can produce efficient and distributed (across clusters) code that can be easily ported.

It is for this reason that they began to develop the Theano library (this is not to say that this is a full-fledged language — rather, a set of functions for creating expressions and a compiler).

How many algorithms does the Netflix recommender use? There is an opinion that more than 800. Is this true?

Xavier Amatrian (Netflix CTO from 2011 to 2014) says it all depends on what is meant by a recommendation system. If we are talking about preferences based on rating, then they use two algorithms.

If the question implies a Netflix ecosystem of recommendations in general, then of course, much more algorithms are used, but not 800 at all. Here it describes how the movie recommendation algorithm works.

Is it necessary to get a doctoral degree in order to have a good job in machine learning? Is it true that in companies like Google, a doctoral degree is a basic requirement [for candidates]?

Bin Zhao (a professor of computer science at the University of California) is familiar with many students who, after graduation, received a position at Google, Microsoft, Twitter, Linkedin and Zynga. Most of them received these positions not because of a degree, but because at one time they, along with Zhao, conducted research on the analysis of social networks or fell into the intelligent hands of the personnel department.

Getting a doctoral degree, of course, gives its advantages. This is an opportunity to study current problems and constantly emerging technologies of working with them for several more years. Therefore, a doctoral degree does not interfere in obtaining a position (if the candidate really wants to deal exclusively with machine learning).

What do you think of the recently released Yahoo machine learning dataset?

James Baker (engaged in machine learning before he became so called) hopes that this will encourage other companies to release similar kits. He understands perfectly well how big this set should be, so he is not going to study it on his own - he is interested in assistants or collaborations with someone.

The difficulty of working alone with such data sets, as James notes, is that the researcher may not have enough power to process it.

James himself has a theoretical model of deep learning that he would like to apply to this set from Yahoo, but the problem is that his hardware won't pull, and he also lacks helpers in servicing his model.

Therefore, he is looking for interested persons, and James strongly recommends that researchers who are in a similar position wait for the formation of teams of enthusiasts - so the chance to practice using data from Yahoo can seriously increase.

Why are there so few startups in machine learning and in natural language processing?

Joseph Turian (Data Mining and Natural Language Processing Consultant) notes: it’s a matter of heightened risk. Most technology startups face relatively high marketing risks, which are balanced by relatively low risks related to the technology component.

In the field of machine learning and natural language processing, both marketing and technological risks are high - all this does not allow the founders of such startups to attract third-party funding. The fact that they do not always have an adequate understanding of business and market relations as a whole does not speak in favor of the founders in this case - most of the time machine learning specialists spend in such calm and little affected by the “big world” places like universities and large corporations.

James Baker complements Joseph's answer. He stresses that [despite the general pessimism] more start-ups are working in these areas than we are used to thinking. He notes that startups that use machine learning or natural language processing technology should use large amounts of data.

In this environment such giants as Google, Microsoft and others become their competitors, therefore, trying to avoid competition, small companies simply do not advertise this component of their work.

What great ideas are most popular in machine learning?

Charles Martin believes that one of them is the Hopfield neural network, its connection with the Ising model and its application in the modern implementation of deep learning. Such simple models find their application not only in statistical physics, but also in the development of modern deep learning algorithms.

He also notes the importance of the limited Boltzmann machine in machine learning, despite the fact that almost 20 years have passed since the appearance of this architecture and until its active use in deep learning models.

Abinav Maurya adds a kernel trick (kernel method) to this list for the support vector machine (a list of the most commonly used functions for this method can be found here). Other researchers note the maximum likelihood method (for its comprehensibility and simplicity) and the theory of approximately correct learning by Leslie Gabriel Valiant - because it is widely used in modern machine learning algorithms.

What algorithms should anyone who examines data use?

William Chen (data researcher at Quora) has 3 favorite algorithms:

Logit Regression / Linear Regression - for binary classification and regression
Random Forests - for classification
TF-IDF - for textual analysis

In his opinion, regression models are extremely effective, and knowledge of statistics will help reveal their hidden potential. He likes Random Forests for a good forecasting ability, and with TF-IDF it’s convenient to convert text information into numerical vectors. Other researchers also note the perceptron , the k-means method and recurrent neural networks .

What is the future for data science?

Brian Lange (data researcher in Datascope) believes that new data sources will appear: data that sensors will generate in production, in transport, even in offices, will become a source of new information for researchers.

There will be new tools that greatly simplify the work with data. This is primarily due to the emergence of open libraries and the active exchange of information between researchers. Brian emphasizes: the algorithms that you had to write by hand 10 years ago are now directly accessible and easily incorporated into work.

The profession of data explorer replenished with a number of varieties. According to Brian, as the amount of information and tasks that a data researcher performs increases, more and more employees from different departments of companies will start to work in data technology in one way or another - the work of researchers will not be limited to one department.

Dima Korolyov (a specialist in working with Big Data), on the contrary, believes that in the future there will be a full-stack data engineer (similar to full-stack developers). He says that, for example, on processing numbers in Excel, applying different models in Python or R, and translating results in real / close to real time, three people are usually busy now. In the future, you will need one that will perform many processes from beginning to end.

Are there simple projects for the application of machine learning in financial markets?

Vladimir Novakovsky (head of machine learning at Quora) believes that any project that predicts well the results of the bidding will definitely not be easy. He proposes to think about two areas in which machine learning can be successfully applied in the field of trading.

First area: forecasting indicators that indirectly affect trading. One of such indicators may be volatility (machine learning can be used to improve the GARCH-model of volatility), the unemployment rate or the inflation rate.

The essence of another direction for work is to analyze the behavior of market prices.

According to Vladimir, to create a good project that allows you to understand the topic of trading, it is enough to apply machine learning to analyze prices without “overloading” the model with information on transaction costs: of course, you can’t trade with such a model on the exchange, but it can be excellent , to "roll in" in the profession.

What is the difference between “big data” and “machine learning”?

Vladimir Novakovsky explains that “big data” is not directly related to any specific calculations. For example, creating a technology for aggregating data on billions of transactions by credit card and forming SQL queries to the resulting array to understand how many operations were performed for more than $ 10 is a task related to the field of big data, but not related to machine learning.

Vladimir notes that a large amount of data for carrying out calculations is not an obligatory component of machine learning - algorithms can be run on relatively small arrays (they are, nevertheless, usually more efficient for large ones, therefore, so often these two concepts overlap with each other) .

PS In our blog, we write about the development of communication systems and the first steps towards advanced programming. We will try to please you with regular publications, friends.

Source: https://habr.com/ru/post/278069/

All Articles

Machine Learning: Questions and Answers

More articles: