
Data Science is a collection of concepts and methods that make it possible to give meaning and a clear view to huge amounts of data.
Each of the chapters of this book is devoted to one of the most interesting aspects of data analysis and processing. You will begin with the theoretical foundations, then proceed to the algorithms of machine learning, working with huge arrays of data, NoSQL, streaming data, in-depth analysis of texts and visualization of information. Numerous practical examples use Python scripts.
Processing and analyzing data is one of the hottest areas of IT, where developers are constantly required who can take on projects of any level, from social networks to learning systems. We hope the book will be the starting point for your journey into the fascinating world of Data Science.
')
All people can analyze data. The ability of our brain to see relationships, come to conclusions on the basis of facts and learn from experience is what makes a person a man. Human survival, more than any other species on the planet, depends on the brain; humanity has made a maximum bet on this feature to take its place in nature. While this strategy is working, and we are unlikely to want to change it in the near future.
However, with regard to the trivial processing of numbers, the possibilities of our brain are limited. It does not cope with the amount of data that we are able to perceive at one time, and with our curiosity. For this reason, we trust machines to part of their work: identifying patterns, building relationships, and getting answers to numerous questions.
The desire for knowledge lies in our genes. The use of computers to perform part of the work in our genes is not incorporated, but one cannot do without them.
Book structure
Chapters 1 and 2 provide the general theoretical foundations necessary for understanding other chapters of the book:
- Chapter 1 introduces the reader to data science and big data. It ends with a practical example of Hadoop.
- Chapter 2 is devoted to the data science process. It describes the steps that are present in almost every data science project.
Chapters 3–5 describe the application of machine learning principles to gradually increasing data sets:
- Chapter 3 deals with relatively small data that easily fits in the memory of an average computer.
- Chapter 4 makes the task more complicated: it examines the “big data” that can be stored on your computer, but does not fit in memory, and therefore the processing of such data without a computing cluster creates problems.
- In Chapter 5, we finally get to the real big data, which is impossible to work without many computers.
Chapters 6–9 discuss some interesting data science questions that are more or less independent of each other:
- Chapter 6 discusses the architecture of NoSQL and its difference from relational databases.
- In chapter 7, data science is applied to stream data. Here the main problem is not with the size, but with the speed of data generation and the loss of relevance of old data.
- Chapter 8 is devoted to in-depth text analysis. Not all data exists in numerical form. In-depth analysis and text analytics are beginning to play an important role in text formats: email, blogs, website content, etc.
- Chapter 9 focuses on the last part of the data science process (data visualization and application prototyping), for which we will look at a number of useful HTML5 tools.
Appendices A – D cover the installation and configuration procedures for Elasticsearch, Neo4j, and MySQL, referred to in the chapters of the book, as well as Anaconda, a Python software package that is extremely useful in data science.
Who is this book written for?
This book introduces the reader to the area of ​​data science. Experienced data science experts will understand that on some topics the material is presented at best superficially. Other readers will be informed that to extract the maximum benefit from the book will require some prerequisites: to take up practical examples, it is desirable to have at least minimal knowledge of SQL, Python, HTML5, and statistics or machine learning.
About the authors

Davy Silenus is an experienced entrepreneur, author and professor. Together with Arno and Mo, he is co-owner of Optimately and Maiton, two data science companies based in Belgium and the United Kingdom, respectively, and one of the co-owners of another data science company in Somaliland. All these companies specialize in strategic processing of “big data”; Many large companies from time to time turn to them for advice. Davy is a freelance lecturer at the IESEG School of Management in Lille, France, where he teaches and participates in research on the theory of "big data".

Mohamed Ali is an entrepreneur and consultant in data science. Together with Arno and Mo, he co-owns Opted and Maiton, two data science companies based in Belgium and the United Kingdom, respectively. His hobbies lie in two areas: data science and environmentally friendly projects. The latter direction was embodied in the creation of a third company based in Somaliland.

Arnaud Meisman is a dedicated entrepreneur and data science specialist. Together with Davy and Mo, he co-owns Opted and Maiton, two data science companies based in Belgium and the United Kingdom, respectively, and one of the co-owners of another data science company in Somaliland. All these companies specialize in strategic processing of “big data”; Many large companies from time to time turn to them for advice. Arno is a data science specialist with a wide range of interests, from retail to game analytics. He believes that the information obtained as a result of data processing, combined with some imagination, will help us improve this world.
»More information about the book can be found on
the publisher's website.»
Table of Contents»
ExcerptFor Habrozhiteley 25% discount coupon -
Data Science