The subtleties of building network models in Python

What is the main tool that a manager uses when managing a project? It is considered that the main tool of the project manager is the schedule, which is based on the network model of work on the project. Once I had a chance to implement a network model of work in Python (code and description here ). The following are lessons learned from the work done.

First create the works, then make connections

When building a network model, the question often arises in which order to create works and establish connections between them? The most obvious is the two-step approach - first all the model works are created, then links are established between them. This approach avoids KeyError: '101' errors that occur when these two operations are executed in parallel, when the system tries to establish a connection with work that has not yet been created.

Of course, the total execution time of two consecutive operations for creating jobs and establishing connections between them can be optimized by using an algorithm in which these operations are performed in parallel. However, even on large models with tens of thousands of jobs, sequential algorithms work fairly quickly. Therefore, under the conditions of time constraints on the project implementation, it is quite possible to get along with the classical two-stage approach.

Recount the model after its construction

Is it worth it to recalculate every time when a connection is established between works when building a network model? On the one hand, the constant recalculation allows keeping the model up to date. On the other hand, recalculation increases the time of its construction.
')
For comparison, two functions were implemented:

build_model_by_method () - build with model recalculation;
build_model_by_assignment () - build without recalculating the model.

After that, a comparison of the time of their execution on models of 100, 1000 and 10000 works.

network.py

from predict import Activity import xml.etree.ElementTree as ET import sys import timeit from timeit import Timer #     def get_child(child, activity_field): text = child.find(activity_field).text if text is None: return None return int(text) #       def build_model_by_method(filename): sys.setrecursionlimit(10000) f = open(filename,'r') tree = ET.parse(f) root = tree.getroot() schedule = {} next = {} for child in root.findall('Activity'): id = child.find('id').text start_date = get_child(child,'start_date') finish_date = get_child(child,'finish_date') duration = get_child(child,'duration') not_early_date = get_child(child,'not_early_date') a = Activity(id, start_date, finish_date, duration, not_early_date) schedule[id] = a next_activity = '' if child.find('next_activity').text is None else child.find('next_activity').text next[id] = next_activity for key in schedule: if next[key] != '': for next_id in next[key].split(';'): schedule[key].append_next(schedule[next_id]) sys.setrecursionlimit(1000) #       def build_model_by_assignment(filename): f = open(filename,'r') tree = ET.parse(f) root = tree.getroot() schedule = {} next = {} for child in root.findall('Activity'): id = child.find('id').text start_date = get_child(child,'start_date') finish_date = get_child(child,'finish_date') duration = get_child(child,'duration') not_early_date = get_child(child,'not_early_date') a = Activity(id, start_date, finish_date, duration, not_early_date) schedule[id] = a next_activity = '' if child.find('next_activity').text is None else child.find('next_activity').text next[id] = next_activity for key in schedule: if next[key] != '': for next_id in next[key].split(';'): schedule[key].next_activity.append(schedule[next_id]) #     print('Test for 100 activities:') t1 = Timer("build_model_by_method('data/activity_100.xml')", "from __main__ import build_model_by_method") print("build_model_by_method", t1.timeit(number = 1000)) t2 = Timer("build_model_by_assignment('data/activity_100.xml')", "from __main__ import build_model_by_assignment") print("build_model_by_assignment", t2.timeit(number = 1000)) print('Test for 1000 activities') t3 = Timer("build_model_by_method('data/activity_1000.xml')", "from __main__ import build_model_by_method") print("build_model_by_method", t3.timeit(number = 1000)) t4 = Timer("build_model_by_assignment('data/activity_1000.xml')", "from __main__ import build_model_by_assignment") print("build_model_by_assignment", t4.timeit(number = 1000)) print('Test for 10000 activities') t5 = Timer("build_model_by_method('data/activity_10000.xml')", "from __main__ import build_model_by_method") print("build_model_by_method", t5.timeit(number = 1000)) t6 = Timer("build_model_by_assignment('data/activity_10000.xml')", "from __main__ import build_model_by_assignment") print("build_model_by_assignment", t6.timeit(number = 1000))

The results of the comparison:

 $ python network.py Test for 100 activities: build_model_by_method 1.7820062519999738 build_model_by_assignment 1.426311435999878 Test for 1000 activities build_model_by_method 18.998158786999966 build_model_by_assignment 14.216093206999858 Test for 10000 activities build_model_by_method 249.93449528199994 build_model_by_assignment 148.85600239800033

As you can see, the more jobs, the slower the function of building a network model with recalculation compared to the function in which recalculation is not used.

Control the depth of recursion

The network model of the project consists of work and links between them. In the absence of additional information about the network, recursive algorithms are used to recalculate it. Such algorithms consist in successive passage through network operations and recalculation of their parameters (for example, duration, start and end dates). The more jobs in the network, the greater the depth of recursion.

It is known that in Python, the default limit is set for the recursion depth - not more than 1000 recursive calls. To get this limit, use the getrecursionlimit () method of the sys module.

 >>> import sys >>> sys.getrecursionlimit() 1000

When working with large networks, in which the number of jobs is measured in tens of thousands, the usual situation is to exceed the depth of recursion and, as a result, the occurrence of a RecursionError error : maximum recursion depth exceeded in comparison . The error leads to a halt of building or recalculating the model and dropping the system.

To prevent an error, to build and recalculate a large network, it is necessary to increase the limit on the depth of recursion. And the setrecursionlimit () method of the sys module will help us with this.

 >>> import sys >>> sys.setrecursionlimit(10000) >>> sys.getrecursionlimit() 10000

To what value do you want to increase the depth of recursion? The answer to this question depends on the structure of the network model. If the structure is unknown or rather complicated, I recommend setting a limit to a value equal to the number of jobs in the network. The recursive algorithm is traversed either in all or in part of the work. Therefore, the depth of recursion should not exceed the amount of work in the network.

And finally ...

Project management is fashionable. But fashion still does not mean effectively. There are software solutions for recalculating network models on the market, such as Microsoft Project. However, the algorithms embedded in them remain available only to the vendors of the corresponding software.

This article is written based on the experience of developing an open module for constructing and recalculating network models of projects. I hope that the lessons learned in this article will be useful to the reader both from a theoretical and from a purely practical point of view. If this article is of interest, I will share new knowledge that will arise in the future, as the module develops.

Source: https://habr.com/ru/post/310216/

All Articles

The subtleties of building network models in Python

First create the works, then make connections

Recount the model after its construction

Control the depth of recursion

And finally ...

More articles: