📜 ⬆️ ⬇️

Treemap, deputies income, and language processing

Reading the topic on Habré “The State Duma presented the incomes of the deputies in a new form” , I decided that the data on the incomes of the deputies cost visualization. My acquaintance with the topic “Data Visualization” once began with Processing, therefore it was he who acted as a tool. Below is a fragment of one of the first pictures, and then you can get acquainted with what Treemapping is and how to prepare it at Processing.



One of the best books on data visualization for me is Ben Fry's “Visualizing Data” , a small book by one of the creators of Processing. This book kills two birds with one stone: it is well told about the principles of information visualization, plus it tells how to use the language Processing for this. About Processing on Habré already wrote, here and here . Version 2.0a4 is now available (albeit not yet alpha). In the first chapter, “Visualizing Data,” Ben Fry describes the seven stages of data visualization: data acquisition, data parsing, filtering excess data, data processing (for example, finding the minimum and maximum), choosing the data representation, improving this view, and the last point is the addition of interactivity. He also points out that this process is not always consistent and often has to go back to the previous stages.


')
The first item, of course, is getting the data itself. After viewing the pages of several deputies on the Duma’s website, it became clear that only the number changes in the url - by typing, it was found that the required range lies within [23494 - 23964], http://www.duma.gov.ru/structure/deputies / 23494 / until http://www.duma.gov.ru/structure/deputies/23964/ . On the page itself, I needed the data of belonging to a certain faction and information on income. I downloaded the data using a C # program (since I already had one), but it would not be difficult to do this with the help of some of my own scripts. True, going to the Duma’s website the next day, for some reason I didn’t see any information about the deputies ’income - I don’t know what it was - from work and then on other days everything was in place. But I digress. The first item was executed - the data was received. It is time to visualize them, and I decided to start with a bar chart, since it is quite simple to make.

Revenues of deputies for 2010

The picture well shows that the majority of deputies receive approximately the same money and that there is a part of deputies that receives an order of magnitude more. Cons of this chart: it is not very suitable for comparing about 450 indicators. In addition, the strong variation in income values ​​makes small values ​​indistinguishable. You can, of course, make the scale logarithmic, but then the visual intrigue of comparison of almost two billion to two million is lost. Good - once the comparison of indicators along the corresponding length of the column works poorly - it is necessary to compare areas! We draw circles instead of columns:

null

The circles are not impressive - we replace them with squares:

Treemap Example

As if better. Immediately for some reason I remembered the picture with xkcd. And here another option comes to mind - treemap .

Treemap - the invention of gurus in Information Visualization- Ben Shneiderman'a . In 1990, Ben Schneiderman wondered which files occupy the largest place on the hard disk (in the laboratory where he worked, 14 users used a disk with 80 Mbyte). Ben began to think about the compact visualization of the tree structure of folders and files - this is how the diagram, which was later called the Treemap, occurred to him. You can read more about the history of Treemapping here : at the end of the article there are many examples of using this type of visualization. And in the book "Visualizing Data" an entire chapter is devoted to a treemap. The Processing distribution kit includes examples for this book and there are two examples dedicated to a Treemap. In both examples, the Treemap library is used - this is a modified version of the Java library for working with Processing. In this library there are several algorithms for dividing the space into rectangles: slice-and-dice, pivot-by-middle, pivot-by-size, pivot-by-split-size and squarified layout. You can compare the work of different algorithms here (a Java plugin is required). The picture, which gives "Squarified layout", in my opinion, the most pleasant to the eye. This algorithm tends to construct rectangles so that their shape is as close as possible to the squares. A description of the “Squarified layout” algorithm is provided in the article by Jack van Wijk. To start working with the library you need to get acquainted with three classes from the library: SimpleMapItem, MapModel and Treemap. The SimpleMapItem class represents a single cell, MapModel stores a list of cells in an array, and the Treemap class builds the visualization itself, accepting an object of type MapModel and the coordinates of the chart boundaries in the designer. Another object in the Treemap is a MapLayout object, which specifies the implementation of the split algorithm. By default, the Treemap uses the pivot-by-split-size algorithm, but nothing prevents you from specifying a different algorithm using the function: setLayout (MapLayout algorithm). So, in general, to get a Treemap visualization in Processing, we create the classes DeputatItem and DeputatMap, which implement the SimpleMapItem and MapMode interfaces, respectively. In SimpleMapItem there is a size parameter, which is responsible for the size of a rectangle in a treemap; we will assign to it the amount of money earned by a deputy.

We read the data and save it in a DeputatItem object, which implements the SimpleMapItem interface. The created object with information about the deputy is added to the DeputatMap object (implementation of the MapModel interface).

DeputatMap dMap = new DeputatMap(); String[] lines = loadStrings("parliamentV2010.csv"); for (int i = 0; i < lines.length; i++) { //...  id, party, name, money  lines[i] DeputatItem d = new DeputatItem(id, party, name, money); dMap.addDeputat(d); } dMap.finishAdd(); 


Create a Treemap object, giving it the parameters of a DeputatMap, and the size of the rectangle in which to build the diagram.

 Treemap treemap = new Treemap(dMap , 0, 0, width - 1, height - 1); //   SquarifiedLayout    treemap MapLayout algorithm = new SquarifiedLayout(); treemap.setLayout(algorithm); treemap.updateLayout(); 


The previous code is executed in the Processing function setup (), and in the Processing function draw () we call the function draw () of the treemap object.

 void draw() { treemap.draw(); } 


If each deputy is asked to color in accordance with the party to which he belongs, and display information on the deputies in the cells whose size this allows, then we will get the following picture:

null

I decided to group the deputies belonging to the same party: for this you need to add another level of hierarchy - the party. Schematically, the tree structure obtained in the previous stage can be represented as follows: Duma-> deputies, adding parties will get: Duma-> parties-> deputies. For such a scheme, we need two more classes: PartyItem (implementation of the SimpleMapItem interface) and PartyMap (implementation of the MapModel interface). After similar steps to create a treemap, you get this picture:

null
null

The pictures make it clear the general structure of income distribution, but unfortunately, it will not work to find a specific deputy in the picture. To do this, you can use the interactive version or download the application for windows , linux , mac os , or run the processing by downloading the source code .

PS Treemap can be built not only in Processing. Natan Yu, the author of the Flowing data blog, promotes this approach to information visualization: we work with the data in the statistical analysis program R , then get the resulting image to mind in Illustrator. Nathan recently released a book that describes this approach in detail. In R there is a special function for building a Treemap, for those interested, click on a topic on Nathan’s blog.

There are also JavaScript implementations:
Google Chart Tools Treemap ;
d3.js treemap ;
JavaScript InfoVis Toolkit Treemap .

Source: https://habr.com/ru/post/137338/


All Articles