📜 ⬆️ ⬇️

Data Mining literature review

Good day!

The publication of several articles on Data Mining has shown high community interest in this topic. Many questions were asked on the type of "where to read" and "where to start." I bring to your attention a selection of literature, resources for a confident start in this area.


Books


A.A. Barseghyan, M.S. Kupriyanov, V.V. Stepanenko, I.I. Cold Data Analysis Methods and Models: OLAP and Data Mining (+ CD-ROM)

image
The book presents the most relevant areas in the development of corporate systems: data warehousing, online (OLAP) and data mining (Data Mining). All three directions are considered in sufficient volume for understanding and further use in practice. The description of data analysis methods and algorithms and illustration of their work with examples will allow using the book not only as a textbook, but also as a practical guide in software development.
')
Book at Ozone

From myself: for a long time this book was for me the main source of information on Data Mining, therefore I strongly recommend it.

A. A. Barsegyan, M. S. Kupriyanov, V. V. Stepanenko, I. I. Kholod Data Analysis Technologies. Data Mining, Visual Mining, Text Mining, OLAP (+ CD-ROM)

image
The book is the second, updated and supplemented, edition of the textbook “Methods and models of data analysis. OLAP and Data Mining.
Outlines the main directions in the development of corporate systems: data warehousing, distributed, operational (OLAP), intellectual (Data Mining), visual (Visual Mining) and text (Text Mining) data analysis. A description of the methods and algorithms for solving the main problems of analysis: classification, clustering, etc. is given. The description of the idea of ​​each method is complemented by a specific example of its application. Attached is a CD containing Data Mining standards, Xelopes Algorithm Library, Data Mining Laboratory Workshop and related software.

Book on Ozon

From myself: a newer version of the book from the authors of the previous book, added new sections, such as Text Mining.

Paklin NB, Oreshkov V.I. Business Intelligence: From Data to Knowledge (+ CD)

image
The book consistently reveals the main technologies used in the creation and implementation of corporate information and analytical systems, combined by the term “business intelligence”: data warehouses and OLAP, data transformation, ETL and cleaning methods, Data Mining basic algorithms, time series analysis, approaches to building models ensembles and comparing them. In the second part, the authors on demo examples show how to solve such tasks as consolidation, analytical reporting, scoring, sales promotion, demand forecasting and other business intelligence tools based on the BaseGroup Labs Deductor analytical platform.

The book is intended for business analysts, both future and existing, experts in the field of data analysis, as well as students studying information and analytical systems, enrolled in the programs of first and second education in universities, advanced training.

The publication includes a CD with the distribution kit of the freely distributed version of the analytical platform Deductor Academic, files with demos for the second part of the book, as well as additional materials on the Deductor.

A page of a book
Book on Peter’s website

From myself: yesterday I was leafing through this book, it makes a pleasant impression both in quantity and quality of materials. If you are using (or want to use a deductor), then this book is for you (2009 book).

We program the collective mind

image
Want to know how search results ranking, product recommendation, social bookmarking and online matchmaking are implemented? This exciting book tells you how to build a Web 2.0 application that will extract useful information from a huge array of data created by partner Internet applications. Using the sophisticated algorithms described here, you can write intelligent programs that receive interesting data sets from other sites or from users of your applications and analyze them to identify patterns.

The book “Programming Collective Mind” is an introduction to the world of machine learning and statistics. It explains how to make useful, from a marketing point of view, conclusions about the behavior and preferences of users based on information collected daily by your and third-party applications. Each algorithm is described clearly and concisely and is accompanied by code that can be immediately included in your own website, blog, wiki, or any specialized application. The following topics are covered:

* Collaborative filtering methods that allow retailers to recommend products or multimedia products.
* Clustering methods used to detect groups of similar samples in a large data set.
* Optimization algorithms that allow us to consider millions of possible solutions to the problem and choose the best among them.
* Bayesian filtering used in anti-spam filters for the classification of documents on the basis of words and other signs.
* The method of support vectors used for matching couples on dating sites.
* The use of evolutionary techniques for solving various problems - the computer is trained, improving its own code after each game played.

Each chapter is accompanied by practical tasks aimed at mastering the algorithms considered in it. Go beyond simple database storage applications and make the richest data placers on the Internet work for you.

Book on books.ru

From myself: this book is highly recommended in one of the comments on previous articles. I did not read it myself, but I made plans for reading :)

Other books

  1. Natalia Elmanova, Alexey Fedorov. Introduction to Microsoft OLAP-technology - the book is intended for a wide range of readers, including experienced Microsoft Office users, business analysts, developers, managers of information services and automation departments, who want to learn the basics of analytic data processing (OLAP) and data warehousing, as well as with the opportunities provided by modern OLAP-tools. To illustrate the issues discussed in the book, Microsoft OLAP tools are used.
  2. Berger A. Microsoft SQL Server 2005 Analysis Services. OLAP and multidimensional data analysis - a book written by the developers of Microsoft SQL Server 2005 Analysis Services, gives the reader a complete picture of its operation and device. It discusses the basics of multidimensional data analysis and provides a deep understanding of multidimensional data models and the OLAP server device. It describes the basic concepts of the MDX multidimensional data access language and its advanced features, as well as the server architecture, data processing methods and data access algorithms. Internal and external data exchange protocols, including the XML / A protocol, are provided. Algorithms for resource management Analysis Services, including memory management algorithms. It describes the process of creating effective client applications using Analysis Services, the integration mechanisms for multidimensional and relational databases. Attention is paid to the security and administration of Microsoft SOL Server 2005 Analysis Services.
  3. D. McLennen, C. Tang, B. Krivat. Microsoft SQL Server 2008: Data Mining - Data Mining - a book written by the developers of Microsoft SQL Server Data Mining, gives the reader a complete picture of its functioning and shows how it is used when solving various problems in SQL Server 2008. An introduction to data mining and a language are considered DMX. Showing data analysis using MS Office 2007, creating solutions using Business Intelligence Development Studio, as well as using SQL Server Management Studio. The application of various analysis algorithms is described in detail, as well as the intellectual analysis of OLAP cubes. Considered architecture, administration and more. The material is accompanied by practical examples, tips and background information.
  4. Data Mining with Microsoft SQL Server 2008


Online resources


Interface.ru

Course on Data Mining using the MS SQL 2005 platform. Part 1
Course on Data Mining using the MS SQL 2005 platform. Part 2
Course on Data Mining using the MS SQL 2005 platform. Part 3
Course on Data Mining using the MS SQL 2005 platform. Part 4

Intuit.ru

Course "Data Mining", the author - I.A. Chubukova.

The course introduces students to the technology of Data Mining, discusses in detail the methods, tools and application of Data Mining. The description of each method is accompanied by a specific example of its use.

The differences of Data Mining from the classical statistical methods of analysis and OLAP-systems are discussed, the types of patterns identified by Data Mining (association, classification, sequence, clustering, forecasting) are considered. Describes the scope of Data Mining. The concept of Web Mining is introduced. Data Mining methods are considered in detail: neural networks, decision trees, limited search methods, genetic algorithms, evolutionary programming, cluster models, combined methods. An introduction to each method is illustrated by solving a practical problem using a tool using Data Mining technology. The basic concepts of data warehousing and the place of Data Mining in their architecture are described. The concepts of OLTP, OLAP, ROLAP, MOLAP are introduced. The process of data analysis using Data Mining technology is discussed. Details are considered stages of this process. Analyzed the market for analytical software, describes products from leading manufacturers of Data Mining, discusses their capabilities.

Go to the course page

Kdkeys.net

Data Mining blog. Sometimes interesting materials “slip”, but not often. Basically, topic topics are built according to the principle “I am very young, I need you help. Please do work for me, I will be thankful ":)

kdkeys.net

BaseGroup Labs

Contains a lot of useful information on Data Mining. BaseGroup Labs are the developers of Deductor, an analytical platform that supports Data Warehouse, ETL, OLAP, Knowledge Discovery in Databases and Data Mining technologies.

basegroup.ru

Other resources


Video materials


What is Data Mining in SQL Server 2008 - TechDays.ru
Introduction to Business Intelligence and Overview of the Microsoft Business Intelligence Platform - TechDays.ru

My humble contribution :)


Data Mining Source Code on Codeplex is an open-source project where some algorithms and methods of Data Mining are collected (part is assembled from the network, part is implemented independently, part is ported from other languages).

My Data Mining blog

I suspect that not all materials are presented in this article, so you can add to the list with your own options.

Thank you all for your attention!

PS Book descriptions are original annotations.

Upd. Thanks for the mention in the comments Weka - indeed, today it is the best open source library for Data Mining.

Data Mining Software in Java
Weka program description
The book "Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)"

Upd2. A collection of books on Data Warehouse Data Mining Olap technology (eng.)
Library of books on Data mining, work and programming in the programs Stata and SAS

Source: https://habr.com/ru/post/66561/


All Articles