Recently, I was engaged in the deployment of projects based on applications such as Magento and Odoo (OpenERP). Both applications allow third-party developers to create modules / extensions embedded in the main application. I have less experience with Odoo, but with Magento I talked quite tightly. And the question that I had, I put in the title of the article.
I will say right away that I have no definite answer, and it cannot be in principle. Difficult questions do not have simple solutions, complex questions have many correct solutions, each of which can be optimal in its own "coordinate system" (evaluation criteria). I will try to describe my “coordinate system”, for which the issue of distributed design of data structures may be relevant.
Limits of applicability
Consider the community of developers of such web applications as Magento (WordPress, Drupal, Joomla, ...). There is a group of developers in the community developing the basic functionality of the application, there are many groups of developers creating their own extensions of the basic functionality, there are integrators who are trying to put together the basic functionality and all the necessary extensions according to the wishes of the end user.
The developers of the basic functionality and extensions, as a rule, manipulate data stored in relational databases. Relational databases are allowed to more strictly describe the data structures - the restrictions imposed (foreign keys, unique indexes) make it easier for developers to understand the subject area (which the author of a particular structure wanted to portray) and make it difficult to incorrectly use already existing structures. They do not have the same scalability and flexibility as key-value storages, but their capabilities are sufficient for projects of the level in which they are used (tens and hundreds of simultaneously working users, data volumes are limited by the physical server's disk subsystem) ordering is sacrificed, which allows to increase the final complexity of the development.
')
Business logic in such applications is implemented at the level of program code that works with the database through the ORM, the specific capabilities of the used DBMS (SQL procedures and functions) are usually not involved.
Extensions use the already existing data structures of the basic modules, and also create their own, closed to the basic structures. In some cases, some extensions may refer to other extensions (the presence of dependency managers such as
composer only contributes to the appearance of longer chains of modules dependent on each other).
Each module (basic or extension) goes its own way of development, on which the data structures it uses changes. For example, here is an excerpt from the list of files that change the data structure in the database for the Mage_Core Magento module:
install-1.6.0.0.php ... upgrade-1.6.0.1-1.6.0.2.php upgrade-1.6.0.2-1.6.0.3.php upgrade-1.6.0.3-1.6.0.4.php
Here, to get the final data structure of version 1.6.0.4, you must first apply the install-1.6.0.0.php script, and then in turn all the scripts from upgrade-1.6.0.1-1.6.0.2.php to upgrade-1.6.0.3-1.6.0.4 .php (why there is no upgrade-1.6.0.0-1.6.0.1.php - I do not know). Scripts through the mechanism of Magento ORM add new elements of the structure (tables, fields, indexes), change the existing, remove obsolete. Such scripts can be used in any Magento extension - the basic functionality will detect them and apply them in the desired sequence. Taking into account intermodular dependencies. In general, a completely successful approach, which allows you to build rather complex data structures (of the order of hundreds of tables) quite independently. For my part, I see two points that I would like to improve.
Namespaces
First, use namespaces (namespaces) not only at the level of modules (Mage_Core, Mage_Catalog, ...), but also at the level of data structures. With a sufficiently large number of extension developers, the probability of the occurrence of tables with the name, say,
group or
message in various modules becomes significantly non-zero. Most major or intelligent developers prefix their structures (aw_, amasty_, prxgt_, ...), but I have seen that some modules create tables with the names
cc and
csv . The presence of the namespace allows to “untie” between the modules not only the tables (entities of the projected subject area), but also the fields of the tables (attributes of entities).
Declarative description
Secondly, to create the data structure of the module, use not the program code (the imperative approach), but the xml description (the declarative approach) - like
Hibernate Mapping Files . In this case, the basic functionality has the ability to analyze xml descriptions for all modules involved in the project without changing the data structures in the database and issue a diagnostic message if any inconsistencies are found (for example, module B adds an attribute / field to an entity / table defined in module A, but in the current version of module A, this entity is no longer used).
Moreover, the basic functionality can restore the declarative description of an already existing data structure (the description can be stored in the meta-data of the structure or restored from the names of tables / fields / indexes). In this case, it is also possible to automatically diagnose the compliance of the existing data structure with the requirements of the newly installed extension (extension group).
Summary
My experience in deploying web-based projects using third-party extensions suggests the need for a tool that allows different, loosely coupled teams to jointly and consistently design the final database structure. At a minimum, such a tool should allow decoupling the same-name entities / tables of different modules through the namespace. Perhaps, in such a tool, a declarative description of the target domain (the whole or only its significant part) is used. As a maximum, such a tool weakly depends on the language in which the application itself is created - data processing in the same database can be conducted from various applications / utilities / consoles, or rather, can support several programming languages ​​(php, python, java, c #, javascript, ...).
I would appreciate links to these tools, if they exist.