A few months ago, I needed a document database for one planned project in Python. I do not know how it happened, but when I was googling, I managed not to notice a single existing solution and came to the conclusion that the only way out was to write my own. Since I wanted to get a working option as quickly as possible, I decided to build on existing relational databases, so as not to suffer from the implementation of storage, search, transactions and such things. In the end, my creation got a decent look, and I decided to write a note about it here - maybe someone else will come in handy.
Where to get
For the most impatient:
The database is designed as a module and requires Python 3 to be used (and to run unit tests, Python 3.1, very good methods have been added there). Immediately I apologize to all Orthodox, who still cannot decide to go even to 2.6 - I, of course, could port my module to 2. *, but I was lazy.
')
As I already said, the module uses a regular relational database as the bottom layer for operation. Currently supported is SQLite (the one that comes with Python) and Postgre 8 (if py-postgresql is installed).
… and what to do with it
I believe that document databases are already familiar to many, so I will be brief. So, the basic concept with us is an object. The object has a unique identifier and a certain amount of data of various types. In my database, the data is a combination of the simplest types - int, float, str, bytes, and None - and the complex types - dict and list. The nesting level is limited only by the relational engine, I will explain the reasons later. The object identifier itself can also be stored in another (or in the same) object. Objects can be created, deleted, and their contents can be changed arbitrarily; and, of course, search for the necessary objects by the specified criteria.
So, suppose you have already downloaded and installed the module. Or even simply presented it - it does not matter. To illustrate the work with the module, I will simply copy here some examples from the documentation (for those who were too lazy to click the link above).
So, connect the module and create a connection. For simplicity, the default relational engine (SQLite) and in-memory database will be used.
>>> import brain
>>> conn = brain.connect(None, None)
Now create a pair of objects. Notice the nested list in the second object.
>>> id1 = conn.create({'a': 1, 'b': 1.345})
>>> id2 = conn.create({'id1': id1, 'list': [1, 2, 'some_value']})
Objects can be read in its entirety or select a specific section. In the second case, the path to the required data is used - just a list, where the string means the key in dict, and the number - the index in the list.
>>> print(conn.read(id1))
{'a': 1, 'b': 1.345}
>>> print(conn.read(id2, ['list']))
[1, 2, 'some_value']
The contents of the object can be changed.
>>> conn.modify(id1, ['a'], 2)
>>> print(conn.read(id1))
{'a': 2, 'b': 1.345}
And finally, the desired object can be found.
>>> import brain.op as op
>>> objs = conn.search(['list', 0], op.EQ, 1)
>>> print(objs == [id2])
True
The condition used is decrypted as “the 0th element of the list that is in the 'dict' key of the root dictionary is 1”.
And it's all?
Not really. For lists, there is a special insert command (which works about the same as the Python equivalent). Objects and their parts can be deleted (including by masks). Search terms can be combined using logical operators. Oh yeah, transactions are still supported, there is a simple RPC server and a caching connection (rather stupid). All this can be found in the documentation.
What's inside?
Inside, it's pretty simple. For each unique path, a table with the corresponding name is created (and paths that differ only in indexes in lists use the same table). For example, for data of the form {'key': [{'key2': 'val'}]], the value of 'val' will be stored in a table with the name "field.TEXT.key..key2" (the empty space between the two points says that the key is a list). In addition, there is one large table in which the fields available to each object and their types are recorded for each object. Thus, if something suddenly goes wrong, then it will be quite possible to restore the data with the available means for working with relational databases.
This results in a restriction on the level of nesting of stored data structures - it all depends on the maximum length of the table name the relational database supports.
So, what is next?
As I said, the existing version of the database is quite sufficient for my purposes. But if suddenly someone thinks that this creation can be useful to someone else, then I will be happy to add the necessary features. And, of course, any constructive criticism of the code / architecture / documentation is welcome.