As I wrote the book 'Python Machine Learning'

Hello, dear habrovchane!

Currently, we seriously intend in the foreseeable future to please you with a serious book on machine or in-depth training. Among the books that aroused our greatest interest, the work of Sebastian Raska " Python Machine Learning " deserves special mention.

')
We suggest to read that the author himself tells about this book. We allowed ourselves to cut the article almost doubled, since the whole of its second part is devoted to the subtleties of writing and book design, and thematic subtleties and relevance of the topic were considered at the very beginning. We hope that you will like the text, and we - the results of the survey.

A lot of time has passed, and now I am pleased to inform you: the book “Python Machine Learning” is finally on the shelves! Of course, I could just send emails to anyone interested in the fate of this book. Another option is to simply tweet 140 characters (or rather, 140 minus the hyperlink) and forget about it. But, anyway, the work on the book “Python Machine Learning” lasted more than one month, and I would happily sit in my favorite coffee shop and briefly describe how it was created.

ISBN-10: 1783555130
ISBN-13: 978-1783555130
Paperback: 454 pages, ebook
Packt Publishing Ltd. (September 24th, 2015)

I have been asked for a long time about this book: what it is about, and how I took the time (1) to write it (2) to read all the interesting articles about which I tweet (3) to research the necessary material (4) to relax and enjoy life along the way. Theoretically, I can answer the first question like this: I think all technical books are written in free time. I decided that for a few months I could escape from the usual hobbies - programming and blogging - then I would have time for the book.

Write only about what you understand. Then you will have plenty of free time.
- Howard Nemerov

“Machine learning in Python?” - What is it all about?

It all started with good intentions: I wanted to compile a collection of useful tips for beginning machine learning experts. Yes, the book “Machine Learning in Python” cannot be called a folio, but the problem was not the lack of material. Quite the contrary: if you are passionate about the topic, you can write and write. Limiting the volume of the book - that was not easy!

So, “what is so interesting about machine learning and Python, that you decided to write a book about it, dedicating almost all your free time to it?” If you read my posts on social networks, but you still have such a question, then I have to just quote another person:

Now the data is flowing to us in a continuous stream. According to one of the latest estimates, 2.5 quintillion (10 ¹⁸ ) data bytes are generated daily. The volumes are incredible: more than 90% of the information that we now store has been generated over the past decade. Unfortunately, most of this information is such that a person cannot use it. Either this data is not amenable to standard analytical processing, or the data are too extensive for us to at least comprehend.
Thanks to machine learning, computers can process such data, learn from them, and extract supporting information (actionable data) due to the practically impenetrable walls of “big data”. The basis of the work of various devices from supercomputers, providing Google search, to smartphones that fit in your pocket, lie in the principles of machine learning, through which we know most of the world, often without even knowing it.
What is “machine learning”, how does it work? How does machine learning help me look into the unknown, pump my business or just find out what the online community thinks about my favorite movie? All this you will learn from the book, released from the pen of my good friend and colleague Sebastian Raska
- dr. Randal Olson (from the Preface)

Suppose "machine learning" is a pretty hot topic today, but "why Python"? In short: because it is intuitive, quite productive and applicable at various levels. Speaking of "productive", I immediately distract from the topic, so - do not judge strictly - refer you to my previous post of Python, Machine Learning, and Language Wars. A highly subjective point of view , which addressed this issue.

Technology is a trifle. The main thing is to believe in people, to consider that, in principle, people are good and smart and understand that if you give them the right tools, then these people will simply work wonders.
- Steve Jobs

What will we get with this book?

“Does this book stand out against the background of other works on Python and machine learning?” - there are already quite a lot of them. I think yes! When the publisher first approached me with a proposal to write such a book, I politely refused, since there were already several books on this subject. So, having refused, I decided to read some of them. Not to say that they are better or worse than mine, I just imagined otherwise a book on machine learning with applied examples in Python.

[...] As for computer programs for machine learning as such, they resemble the elaboration of the scientific method, but the work is performed not by a scientist, but by a computer, which, of course, is much more powerful than any scientist, therefore it works much faster and runs with much larger amounts of data
- Pedro Domingos (excerpt from an interview with Domingos on his new book: A Master Algorithm in Machine Learning Could Change Everything)

There are excellent books that deal with the theory of machine learning. In particular, I really like Bishop's Pattern Recognition and Machine Learning and the Duda, Hart and Stork Pattern Classification . They are cool, really cool. I think that writing something in their spirit is no longer required. However, even if these books are positioned “for beginners,” the newcomer will have a hard time with them. Although I recommend these books to anyone who seriously intends to engage in machine learning, this literature seems to me more "additional reading" than the first book on machine learning. In general, I believe that, along with the study of the theory, you need to tinker with machine learning algorithms yourself and implement them - this is how you can spend your time with the maximum benefit in mastering this discipline.

There are very practical books that read more like documentation (* insert scikit-learn, Vowpal Wabbit, caret or any other library / API for machine learning here). I think these are also good books, but I would leave a detailed discussion of libraries at the level of (online) documentation - it is convenient to use this resource as an online reference and, moreover, it is easy to update it in a timely manner.

There are rules according to which novels are written. Unfortunately, no one knows them.
- Do Somerset Maugham

While working on the book, I set myself three fundamental goals: (1) to explain the most general concepts (2) to supplement them with the necessary mathematical apparatus and (3) to give examples and explain how to apply them.

A general overview helps to put together fragments of this mosaic, and mathematics - to understand what is happening at the intrasystem level and get away from the most common problems. Learning is definitely more convenient if the material is accompanied by such examples. Finally, it is really important to “dig in the subject with your hands,” that is, write code that embodies the material studied. Yes, we want to explore the concepts, but ultimately we are going to solve real applied problems with the help of these concepts.

On the pages of the book we will work with a variety of libraries, in particular, with scikit-learn and Theano - effective and well-tested tools that will be useful to us when creating real-world applications. But we implement some algorithms from scratch. So we will deal with these algorithms, understand how libraries work and practice creating our own algorithms - by the way, some of them are not yet part of scikit-learn.

What you will find and what you will not find in this book

This book is not about “data science.” It says nothing about hypothesizing, collecting data, and drawing conclusions from the analysis of atypical or exotic data sets; the emphasis is on machine learning. However, along with a discussion of various algorithms of this kind, I believe that it is very important to consider other aspects of the typical machine learning process, to simulate an assembly line, starting with preprocessing data sets. We will discuss topics such as working with missing values, converting category variables into formats that are applicable to machine learning, choosing informative properties, compressing data with transfer to subspaces with a smaller number of dimensions. The book has an entire chapter on model interpretation, which discusses cross-validation with sample splitting, k-block cross validation, nested cross validation, setting up hyper-parameters, and other performance indicators. For a small consolidation of the material, I added a chapter on embedding machine learning models into a web application that can be shared with the world.

This is not another book from the category of "see how scikit-learn works." I want to explain how machine learning works, tell you everything you need about proven methods and pitfalls. Then we will learn how to apply these concepts in practice using NumPy, scikit-learn, Theano, etc. Of course, in this book there will be a fair amount of “mathematics and equations”, in my opinion, it is simply impossible otherwise, if we don’t want to turn the book into a “black box”. But, I hope, it will not really be difficult to follow the thread of the story, and the book will suit even those readers who do not have a solid mathematical background. In many sections of this book you will find examples from scikit-learn - in my opinion, this is the most beautiful and practical library for machine learning.

Here is a brief table of contents:

01 - Machine learning - how to make computers learn from data
02 - Training machine learning algorithms to perform classification
03 - Excursion on machine learning classifiers on the example of Scikit-Learn
04 - Preparation of good training sets - data preprocessing
05 - Compress data by decreasing the dimension
06 - Studying the best methods for model interpretation and optimization of hyperparameters
07 - Combining Multiple Learning Models
08 - Machine learning in the analysis of tonality
09 - Embedding a machine learning model in a web application
10 - Prediction of continuous target variables using regression analysis
11 - Working with unmarked data - Cluster analysis
12 - Training of artificial neural networks for the purpose of pattern recognition
13 - Parallelization of neural network training using Theano

Express links

GitHub repository with general information and sample code
Bibliography and links to additional resources
Links to electronic and paper versions in stores Amazon.com , Amazon.co.uk , Packt , Apple iBooks
Very interesting feedback , thanks!

Source: https://habr.com/ru/post/282167/

All Articles

As I wrote the book 'Python Machine Learning'

More articles: