📜 ⬆️ ⬇️

Build a graph of keywords

In the last post I shared the results of my experiments on building a keyword graph. In it, I didn’t touch on the "technical aspects" of graph construction at all. In the comments I was asked to shed light on the technical component of my experiments. By common thinking, I decided to put them in a separate note, as they can fit to build any graph using the graphviz toolkit.

So, what is a graph and what they are, I will not tell - anyone can read about them in wikipedia . I will only tell you how you can build a simple undirected graph using the graphviz toolkit using the example of constructing a keyword graph.

To begin with, how is the keyword support done in the media repository . When designing a database for keeping track of keywords, two tables were created: the actual keyword table (kw) and the link table of keywords and files (file2kw). Thanks to this organization of the tables, you can specify your own set of keywords for each file.

When a new file is added to a media repository, the keywords are one of its description items. For example, I decided to add a photo of an aneroid barometer to the media repository. Then, to describe it with the help of keywords one could use approximately the following set: aneroid, barometer, physics, pressure, instrument. This is a primary set of keywords, starting with the word "aneroid." But after all, there may be other files in the media repository that also use the word “aneroid” for description.
')
Therefore, to build a map of related keywords with the word "aneroid", two tasks are solved:
1. Extract a list of all files that have used the word "aneroid"
2. Extraction of all keywords used to describe all received files, except for the word “aneroid”.

This approach turned out to be very convenient in that I decided to build a graph using a specialized solution - a set of graphviz utilities, the input dot-file for which has the following format (for more details about graphviz, see the official website or in Russian graphviz documentation ):

digraph kw{
->->->->;
->;
->;
}

Now we have to do a little: choose a list of all used keywords and build lists of related keywords by them. To solve this problem with MySQL using a single SQL query, it does not work, but in PostgreSQL this is quite possible. Due to the fact that the site is using MySQL, I had to write a small PHP script that did all the necessary extracts from the database and formatted them in the format of a dot-file for graphviz.

Graphviz allows you to build graphs in different ways. In this particular case, it is best to build a non-directed graph, which, in turn, can be constructed using either a rank approach or an energy one. As a result of my empirical research, it was found that for this particular case, the most convenient way to build a graph is the energy approach using the fdp method.

To obtain PNG images of the graph using the fdp method from the source dot-file in the UNIX console or Linux machine, type the following:

fdp -Tpng -o media-kw.png media-kw.dot

At the output, we get this (keyword graph as of September 1, 2008):

Source: https://habr.com/ru/post/38574/


All Articles