📜 ⬆️ ⬇️

Family tree inside git

Congratulations to all the programmer's day! I wish more bright "commits" made by "pull-rekvestov", less unplanned "merzhey" and that your branches of life remain relevant as long as possible. As an ideological gift, I propose the implementation of a genealogical tree by means of the Git version control system. Well ... sounds like a plan!


Kochurkins


For those who immediately understood everything, I post the source of the generator: GenealogyTreeInGit and the family trees themselves - mine and the presidents of the United States .


In addition, I implemented a simple social graph . It displays not only the degree of kinship, but also the status of relations between descendants, displays events such as wedding, divorce, childbirth, as well as contributions to the relations of various parties.


Git


Let me remind you that Git is one of the most popular version control systems. It is powerful: you can commit changes in it, create and merge branches (checkout and merge), compare different versions of files (diff), calculate authors of specific strings (blame), and do many other things.

Fortunately or unfortunately, Git is in some way similar to the winning state: it allows you to change the history, namely, change the dates, messages and authors of commits. But this contributes to the contrary in that it allows you to add family members, as if they are the authors of events made on a specific date.


I started with a simple one: I wrote several commands and voila - a fragment of the tree is ready. Fine. Now we have to do this with the whole army of relatives. I am happy to write 200 lines of teams for them in which you can get confused, and for presidents - all 10K!



Added me to the list of idiots? Cross off. Of course, I automated the process and wrote an application for converting genealogical data into a sequence of gita commands. There are several formats for such data, I chose GEDCOM .


Gedcom


GEDCOM is a format for describing genealogical trees. Pretty old, but textual and generally simple. The format specification is well described on the Internet. It is supported by almost all genealogical programs, so there are many examples for it: the tree of US presidents, the royal dynasty, Shakespeare.

I implemented all this mess on .NET Core - it is convenient and cross-platform. For parsing and processing GEDCOM there are several libraries under C #, for example, GeneGenie.Gedcom , gedcomx-csharp . I decided to write my own based on GedcomParser . Well, because it has a fatal flaw ... In fact, no: I wanted to myself myself better understand the format and get rid of all dependencies, which will allow, if desired, to easily port the project to other languages.


Command generation


It is time to bypass the extracted personalities in a convenient format and generate Git commands for them. It was decided to sort all events in chronological order, and then create branches, merge and commit them, moving in ascending order of dates. Unfortunately, not all events have dates, so I had to pretty much tinker to sort all the events correctly. On the nose, 2 ^ 2 ^ 3, and I realized that this approach is not entirely correct, since I wouldn’t have to bother with dates when going deeper into the depths. I will correct later (but this is not accurate).


Initialization


All that is required at this stage is to initialize the repository:


mkdir Family cd Family git init 

Developments


In this part of the script, all events are dealt with and committed. For this, the following commands were used:



The first, checkout , creates a branch for each person. The --orphan flag allows you to create orphan branches, i.e. branches without parents. The orphan branch is created once - when the next checkout switched, this parameter is omitted. In the end, almost all commits have parents, with the exception of the most distant ancestors, because the earlier ones are unknown to them.


The second team, merge , unites parents and creates a child. We will write in the message of the Birth commit — the birth with the corresponding year. We also indicate the flags - --allow-unrelated-histories and - --no-commit for the possibility to merge orphan branches and to commit changes later. Some children are foster, so for them we will write Adopted. It's funny, but Git allows you to create Swedish families, i.e. Merzhit at the same time several branches. And the branches do not have sex, which will appeal to lovers of "parent 1" and "parent 2".


Finally, the third command, commit , creates a commit with the message -m , the date --date and the author --author . As already mentioned, Git allows you to replace the message, the author and the date of the commit. Moreover, Git allows you to create commits without files using the --allow-empty flag and without messages using - --allow-empty-message . The author also needs to specify an email, but Git accepts an empty one - you just need to pass <> . Unfortunately, Git does not respect old people: for some reason, the date of the commit is limited to January 1, 1970 - the earlier date will not be displayed correctly. However, everything is not so bad: you can simply write the real date in the description. However, Git believes in the future and accepts dates in the future - pay attention to my son Git. Mothers and single fathers, by the way, can also be created.


Social graph


The social graph also records other events besides birth: baptism, change of residence, education, marriage, divorce, death, burial. After death the branch goes to digital paradise the appearance of subsequent events in the branch is impossible except for funerals. On the server, such a branch can be sealed at all, that is, you can make a protected branch (do not worry: in the future it will be possible to resurrect if necessary).


The event "Wedding" has two ancestors - spouses. And "Divorce" has one ancestor - the previous event "Wedding". We must work on family life as we do on children, so we can say that after the wedding a new descendant also appears - “relationships” that end after a divorce. Well, they resume again after the next wedding divorce cycles. In addition, several people may be involved in a relationship (merging several branches).


Finalize


Add a cherry to the cake: make a backup repository and upload all people to GitHub, GitLab, or any other server that supports Git. You can push all the branches one by one, but with the help of the magic command we will start them all, which is much faster and easier:


 git remote add origin https://gitlab.com/KvanTTT/Family.git git push origin --all -u 

To generate a regular genealogical tree, you must pass the --only-birth-events flag when the generator starts. In this case, one commit per person (birth) will be created. Otherwise, will be generated social network social graph.


Examples


As a small example, which will open at least everywhere, I created my family tree, and as a large example, the tree of US presidents (2145 people). They are available in the Kochurkins and Presidents repositories, respectively. To create my own, I used the service geni.com , from where I exported the tree to GEDCOM. A generated script for creating a genealogical repository is available in Gist .


Presidents


On GitHub, and on GitLab, you can navigate through ancestors and descendants. This is similar to the Familypedia or WeRelate genealogical wiki. True git (x | l) are somehow more advanced in some ways: the trees are easily pumped out of them (with the help of the - --clone ). And most importantly, you can open the entire graph at once. (In existing genealogical programs, for some reason, difficulties arose with the full discovery of even small graphs.) And this can be done using various tools (web service, Git Extesions , Sourcetree , GitKraken, and others). In addition, these services can be used for free, unlike most genealogical ones.


It is noteworthy that even some semblance of analytics is available in git * abah: you can find out from whom instagram life eventful life. Well, or the most open: the Insights tab displays a list of people in order of decreasing commits.


Pulse


Unfortunately, the big trees of GitHub and GitLab do not display correctly, but they are stored correctly - you can tighten the repository and verify this. Here is what my tree looks like in the gitlab web interface:


Kochurkins GitLab


Problems


It is not very clear how to complement the story from the roots. So far, you have to generate it completely from the GEDCOM file. I do not exclude that this can be done with the help of a clever rebase - you can experiment and tell in the comments. It would also be nice to rewrite the code so that it works “commit-oriented” and not “event-oriented”, since it is more natural with respect to the guitar: in fact, the branches in it are a sequence of commits, and not separate entities. I also thought how to attach tags and sub-modules , but for now I know how to do it better.


Conclusion


If you expand the idea of ​​family trees further, on web services for developers, then using issues you can create different global tasks and distribute them across different milestones : childhood, adolescence, adult life, old age.


In addition to the genealogical trees, you can turn other loaves of bread into trolley buses to encode in the gita the genealogical trees of programming languages ​​(this is even more “coderish”), syntactic trees, and in general any tree structures. Git can be mastered and housewives to build relationships between the characters of the Brazilian TV series :)


Practical benefits: this warm-up helps to better understand the structure of the gita, its teams, and the format for describing the GEDCOM family trees.


The source of the article itself is available on GitHub - send a pull request there if you find errors or want to add content. To convert to the habr.com format, the MarkConv library is used .


')

Source: https://habr.com/ru/post/351158/


All Articles