Multi-file storage of Java objects in xml format (part 2)Introduction
In programming, we often face problems that we can solve in several ways: to find and use ready-made solutions, or to solve a problem on our own. Although many specifications and their implementations have been written, they do not always give us what is required in a particular case. So I once again had to face a similar situation.
The task was to store objects in a file in xml format. Nothing would have seemed complicated if not for a few buts. It is a lot of objects, they have a tree structure and on them operations of addition, change and removal in different flows are constantly performed. As you understand, permanently writing and reading a large xml file is quite a laborious task. Especially if several threads work with the same data. So actually the idea was born to write a multi-file storage of objects in xml format.
In this article I will not consider the implementation itself. I will give only the main ideas and how to use this implementation. If you want to go deep, you can download view source codes.
Sources are available at the link:
xdstore-1.3Source codes are slightly different from those in this article. Exceptional situations were worked out more deeply in them, namely, for each operation, including reading, its own exception is thrown. Also in the latest version implemented fragmentation.
')
The main idea of the development
The main idea is to store objects not in a single file, but in a certain set. At the same time provide the ability to customize storage policies for each required class. You can set one of the following policies for a class:
- ParentObjectFile - class objects will be saved in the file of the owner object as children, this policy is applied by default;
- SingleObjectFile - a separate file is provided for each class object, and only the link to this object will be saved in the owner object file (in the future I will simply call it an object reference); all files of each object will be stored in a separate folder inside the vault;
- ClassObjectsFile - all objects of this class will be stored in a separate file, and only object references will be saved in the owner object files.
The concept of an object reference is an object of a specified class that has a single field, an identifier. In the xml file, instead of the complete data of this object, only the class name and the identifier are saved so that later on this link you can get all the data. Loading such objects is similar to late initialization in hibernate.
Stored objects should be implemented as JavaBeans with get (is) and set methods for stored fields.
One interesting challenge
In order to better understand the situation that we find ourselves when trying to implement such a repository, it is necessary to correctly set the task. In terms of the database, it sounds like this: there are two rows in the database table, two transactions start at the same time, each of which modifies both rows, then the first transaction ends with a commit and the third begins, which also modifies these two rows.
We are interested in behavior in a similar situation, i.e. what happens to the data in each of the transactions. In the current library implementation, the behavior will be as follows:
1) Since the data was modified by the first transaction, the second transaction will be rejected to change the data as an exception. This is explained by the fact that the first and second transactions began at the same time and most likely worked with identical copies, and in order not to lose the changes of the first transaction of the second one, it is necessary to refuse.
2) But the data of the third transaction will be accepted, since it started after the commit of the first transaction and works with the updated data.
Since this is a fairly simple implementation, when solving the task, no record locks were used to avoid deadlocks and the need to roll back transactions by timeout. In this case, an exception is thrown, for which the transaction should be rolled back.
Start using
The very purpose of this development is to get a simple and flexible library that allows you to save objects in xml format. Therefore, the resulting interface is quite simple, and the requirements for the stored objects are minimized. The basic requirement for each saved object is the need to implement a simple interface IXmlDataStoreIdentifiable. It looks like this:
public interface IXmlDataStoreIdentifiable { String getId(); void setId(String id); }
As you can see, you only need to implement two methods of working with the object identifier. This prerequisite is due to the fact that with some policies, only links to objects are saved, according to which you may later need to restore (load) all properties. The link in the xml file looks like this:
<reference class="org.flib.xdstore.entities.XdGalaxy" id="cc74e3f2"/>
When this link is loaded, an object of the specified class will be created and an identifier property will be attached to it. The remaining fields will be initialized by default, i.e. they will not be loaded.
We now consider a simple example of setting up a repository for storing objects of the following classes: XdUniverse and XdGalaxy. First, we define their classes.
package org.flib.xdstore.entities; import java.util.Collection; import org.flib.xdstore.IXmlDataStoreIdentifiable; public class XdUniverse implements IXmlDataStoreIdentifiable { private String id; private Collection<XdGalaxy> galaxies; @Override public String getId() { return id; } @Override public void setId(final String id) { this.id = id; } public Collection<XdGalaxy> getGalaxies() { return galaxies; } public void setGalaxies(Collection<XdGalaxy> galaxies) { this.galaxies = galaxies; } public void addGalaxy(XdGalaxy galaxy) { galaxies.add(galaxy); } public XdGalaxy removeGalaxy() { final Iterator<XdGalaxy> it = galaxies.iterator(); XdGalaxy galaxy = null; if(it.hasNext()) { galaxy = it.next(); it.remove(); } return galaxy; } }
And a simple XdGalaxy class.
package org.flib.xdstore.entities; import org.flib.xdstore.IXmlDataStoreIdentifiable; public class XdGalaxy implements IXmlDataStoreIdentifiable { private String id; @Override public String getId() { return id; } @Override public void setId(String id) { this.id = id; } }
Now you can consider setting up storage for the specified entities.
final XmlDataStore store = new XmlDataStore("./teststore"); store.setStorePolicy(XdUniverse.class, XmlDataStorePolicy.ClassObjectsFile); store.setStorePolicy(XdGalaxy.class, XmlDataStorePolicy.ClassObjectsFile);
Now we have chosen the settings that all objects of each of the classes will be stored in their own file, i.e. for each class one file. You can use other settings and, for example, do not specify a policy for the XdGalaxy class, then its objects will be saved along with the objects of the XdUniverse class.
As a result, for our settings after recording the objects, we get two files: XdUniverse.xml and XdGalaxy.xml.
<?xml version="1.0" encoding="UTF-8"?> <objects> <object isNull="false" class="org.flib.xdstore.entities.XdUniverse" id="002df141"> <collection name="Galaxies" class="java.util.ArrayList"> <reference class="org.flib.xdstore.entities.XdGalaxy" id="cc74e3f2"/> <reference class="org.flib.xdstore.entities.XdGalaxy" id="ca519d20"/> </collection> <object name="Id" isNull="false" class="java.lang.String" value="002df141"/> </object> </objects>
As you can see from the example, this file contains links to objects from the second XdGalaxy.xml file below.
<?xml version="1.0" encoding="UTF-8"?> <objects> <object isNull="false" class="org.flib.xdstore.entities.XdGalaxy" id="cc74e3f2"> <object name="Id" isNull="false" class="java.lang.String" value="cc74e3f2"/> </object> <object isNull="false" class="org.flib.xdstore.entities.XdGalaxy" id="ca519d20"> <object name="Id" isNull="false" class="java.lang.String" value="ca519d20"/> </object> </objects>
So we got two file storage for our objects. If we do not need XdGalaxy class objects, then we can download only XdUniverse class objects and work with them. If we need objects of class XdGalaxy, then we just need to download them from already loaded links.
In case we set the policy of storing SingleObjectFile objects, a folder will be created in the root directory of the vault in which the object files will be saved.
Saving and loading objects
Consider the interface of the class XmlDataStore, concerning the operations of saving objects. It is quite simple and allows us to save objects without specifying policies, since they are already set during initialization of the repository.
public class XmlDataStore { public XmlDataStoreTransaction beginTransaction(); public void commitTransaction(final XmlDataStoreTransaction transaction); public void rollbackTransaction(final XmlDataStoreTransaction transaction); public <T extends IXmlDataStoreIdentifiable> boolean saveRoot(final T root) throws XmlDataStoreException public <T extends IXmlDataStoreIdentifiable> boolean saveObject(final T object) throws XmlDataStoreException public <T extends IXmlDataStoreIdentifiable> boolean saveObjects(final Collection<T> objects) throws XmlDataStoreException }
The repository was developed for multi-threaded use and several resource objects can be involved in the course of work, so it uses a transaction mechanism and provides the appropriate methods. The acceptance and rollback of a transaction can also be invoked through the methods of the object of the transaction itself.
Preserving root objects and child objects are slightly different, so the methods of working on root objects are divided into a separate group. The difference is that with the SingleObjectFile policy, a separate file will be selected for each root object and, in addition, an additional file will be created for all of them in which the links will be stored. This allows you to load all root objects at once.
Now consider the save operation.
final XmlDataStore store = initStore("./teststore"); final XdUniverse universe = generateUniverse(); final XmlDataStoreTransaction tx = store.beginTransaction(); try { store.saveRoot(universe); store.saveObjects(universe.getGalaxies()); tx.commit(); } catch (XmlDataStoreException e) { tx.rollback(); }
From the example, it is clear that saving objects is quite simple. We only note that since the objects of the XdGalaxy class are saved in a separate file, we need to explicitly perform their save operation. They can also be saved individually using another method described above. The object itself writes to the file when the transaction is committed to making a transaction, and until it is called, all operations are performed with the cache.
Now consider the part of the interface related to loading objects from the repository.
public class XmlDataStore { public <T extends IXmlDataStoreIdentifiable> Map<String, T> loadRoots(final Class<T> cl) throws XmlDataStoreException public <T extends IXmlDataStoreIdentifiable> T loadRoot(final Class<T> cl, final String id) throws XmlDataStoreException public <T extends IXmlDataStoreIdentifiable> boolean loadObject(final T reference) throws XmlDataStoreException public <T extends IXmlDataStoreIdentifiable> T loadObject(Class<T> cl, final String id) throws XmlDataStoreException public <T extends IXmlDataStoreIdentifiable> boolean loadObjects(final Collection<T> references) throws XmlDataStoreException }
As you can see, the repository allows us to load all the roots of the specified class at once or to request one root of the specified class by the identifier. You can also download objects of any class by reference or identifier. In our case, loading all the saved data will look like this.
final XmlDataStore store = initStore("./teststore"); final XmlDataStoreTransaction tx = store.beginTransaction(); try { final Map<String, XdUniverse> roots = store.loadRoots(XdUniverse.class); for (final XdUniverse root : roots.values()) { final Collection<XdGalaxy> galaxies = root.getGalaxies(); store.loadObjects(galaxies); } tx.commit(); } catch(XmlDataStoreException e) { tx.rollback(); }
From the example we see that all the roots are loaded first, and then for each root, all child objects are loaded by the object links.
Updating and deleting objects
Methods for updating (modifying) and deleting objects are presented below.
public class XmlDataStore { public <T extends IXmlDataStoreIdentifiable> boolean updateRoot(final T root) throws XmlDataStoreException public <T extends IXmlDataStoreIdentifiable> boolean deleteRoot(final T root) throws XmlDataStoreException public <T extends IXmlDataStoreIdentifiable> boolean deleteRoot(final Class<T> cl, final String id) throws XmlDataStoreException public <T extends IXmlDataStoreIdentifiable> boolean updateObject(final T object) throws XmlDataStoreException public <T extends IXmlDataStoreIdentifiable> boolean deleteObject(final T reference) throws XmlDataStoreException public <T extends IXmlDataStoreIdentifiable> boolean deleteObjects(final Collection<T> references) throws XmlDataStoreException }
It should be noted that all dependent objects that are stored in separate files from the owner must be explicitly updated or deleted. For example, in our case, when removing an XdGalaxy class object from an XdUniverse object, you need to update the XdUniverse object and additionally explicitly delete XdGalaxy.
final XmlDataStore store = initStore("./teststore"); final XmlDataStoreTransaction tx = store.beginTransaction(); try { final Map<String, XdUniverse> roots = store.loadRoots(XdUniverse.class); for (final XdUniverse root : roots.values()) { final Collection<XdGalaxy> galaxies = root.getGalaxies(); store.loadObjects(galaxies); } if(roots.size() > 0) { final XdUniverse universe = roots.values().iterator().next(); final XdGalaxy galaxy = universe.removeGalaxy(); if(galaxy != null) { store.updateRoot(universe); store.deleteObject(galaxy); } } tx.commit(); } catch(XmlDataStoreException e) { tx.rollback(); }
If an object is added, the code looks like this.
final XmlDataStore store = initStore("./teststore"); final XmlDataStoreTransaction tx = store.beginTransaction(); try { final Map<String, XdUniverse> roots = store.loadRoots(XdUniverse.class); for (final XdUniverse root : roots.values()) { final Collection<XdGalaxy> galaxies = root.getGalaxies(); store.loadObjects(galaxies); } if(roots.size() > 0) { final XdUniverse universe = roots.values().iterator().next(); final XdGalaxy galaxy = initGalaxy();
If the saving policy is ParentObjectFile, then for the child objects there is no need to explicitly perform the delete and save operations, since after updating the owner object the necessary operation will be performed automatically.
Complete cleaning of our storage will look like this:
final XmlDataStore store = initStore(storedir); final XmlDataStoreTransaction tx = store.beginTransaction(); try { final Map<String, XdUniverse> roots = store.loadRoots(XdUniverse.class); for (final XdUniverse root : roots.values()) { final Collection<XdGalaxy> galaxies = root.getGalaxies(); store.deleteObjects(galaxies); store.deleteRoot(root); } tx.commit(); } catch(XmlDataStoreException e) { tx.rollback(); }
From the example it is clear that we did not even need to load XdGalaxy class objects before deleting. We just passed a collection of object references. This is possible because the object reference stores the object identifier.
Little about implementation
To improve the performance of the storage is used non-detachable caching. Those. when working with any resource object (file), all objects stored in it are loaded and cached during the first transaction. All other transactions work with already cached data. Cache data is reset when the last transaction that runs on this resource object is completed. All changes are cached and not flushed to disk until a transaction is committed.
Since an indefinite number of resource objects may be affected during the execution of a transaction, the operation of making changes to a transaction is performed on all one by one. If any error occurs during this process, the integrity of the data warehouse is broken and an exception of type XmlDataStoreRuntimeException is thrown. In the current implementation, restoring the integrity of the repository is not implemented. This is one of the major drawbacks of the current version.
Development plans
In the current implementation, with a large number of objects of a certain class and the storage policy ClassObjectsFile, the complexity of read and write operations grows in direct proportion to the growth in the number of objects. In order to improve storage performance, we plan to implement fragmentation and the construction of an index file. Fragmentation means breaking down a single file into fragments containing a limited number of objects, and the index in this case will contain links indicating the file of the fragment in which the object is stored.
The plans also include implementation of restoring the integrity of the repository after a failure in making changes to the transaction.
It is possible that triggers will appear in the new storage implementation, which will be called when the state of stored objects changes. Those. when adding, changing or deleting objects.
Author: Beschastny EvgenyMulti-file storage of Java objects in xml format (part 2)