ObjectRepository - .NET in-memory repository pattern for your home projects

Why keep all the data in memory?

For storing site or backend data, the first desire of most sensible people is an SQL database.

But sometimes it comes to mind that the data model is not suitable for SQL: for example, when building a search or a social graph, you need to search for complex connections between objects.

The worst situation is when you work in a team, and a colleague does not know how to build quick queries. How much time did you spend on solving N + 1 problems and on building additional indexes so that SELECT on the main page would work in a reasonable time?

Another popular approach is NoSQL. A few years ago there was a big HYIP around this topic - for every opportunity MongoDB was deployed and were happy with the answers in the form of json documents (by the way, how many crutches did you have to insert because of the cyclic references in the documents?) .

Why not try to store all the data in the application's memory, periodically saving it in an arbitrary storage (file, remote database)?

The memory has become cheap, and any possible data from most small and medium projects will fit into 1 GB of memory. (For example, my favorite home project is a financial tracker that keeps daily statistics and a history of my spending, balances, and transactions over a year and a half using only 45 MB of memory.)

Pros:

Data access becomes easier - no need to worry about requests, lazy loading, ORM features, work happens with ordinary C # objects;
No problems associated with access from different streams;
Very quickly - there are no network requests, there is no translation of the code into the query language, no (de) serialization of objects is needed;
It is permissible to store data in any form - at least in XML on disk, at least in SQL Server, at least in Azure Table Storage.

Minuses:

Horizontal scaling is lost, and as a result you cannot make a zero downtime deployment;
If the application crashes, you can partially lose data. (But our application never crashes, right?)

How it works?

The algorithm is as follows:

At the start, a connection to the data storage is established, and the data is loaded;
An object model, primary indices, and relationship indices are built (1: 1, 1: Many);
A subscription is created for changes in the properties of objects (INotifyPropertyChanged) and for adding or deleting elements in a collection (INotifyCollectionChanged);
When a subscription is triggered - the changed object is added to the queue for writing to the data storage;
Periodically (by timer) in the background thread changes are stored in the repository;
Exiting the application also saves changes to the repository.

Code example

Add required dependencies

//   Install-Package OutCode.EscapeTeams.ObjectRepository    //  ,      //  ,   . Install-Package OutCode.EscapeTeams.ObjectRepository.File Install-Package OutCode.EscapeTeams.ObjectRepository.LiteDb Install-Package OutCode.EscapeTeams.ObjectRepository.AzureTableStorage    //  -       Hangfire // Install-Package OutCode.EscapeTeams.ObjectRepository.Hangfire

Describe the data model that will be stored in the repository.

 public class ParentEntity : BaseEntity {  public ParentEntity(Guid id) => Id = id; }  public class ChildEntity : BaseEntity {  public ChildEntity(Guid id) => Id = id;  public Guid ParentId { get; set; }  public string Value { get; set; } }

Then the object model:

 public class ParentModel : ModelBase {  public ParentModel(ParentEntity entity)  {    Entity = entity;  }    public ParentModel()  {    Entity = new ParentEntity(Guid.NewGuid());  }    //   1:Many  public IEnumerable<ChildModel> Children => Multiple<ChildModel>(x => x.ParentId);    protected override BaseEntity Entity { get; } }  public class ChildModel : ModelBase {  private ChildEntity _childEntity;    public ChildModel(ChildEntity entity)  {    _childEntity = entity;  }    public ChildModel()  {    _childEntity = new ChildEntity(Guid.NewGuid());  }    public Guid ParentId  {    get => _childEntity.ParentId;    set => UpdateProperty(() => _childEntity.ParentId, value);  }    public string Value  {    get => _childEntity.Value;    set => UpdateProperty(() => _childEntity.Value, value);  }    //       public ParentModel Parent => Single<ParentModel>(ParentId);    protected override BaseEntity Entity => _childEntity; }

And finally the repository class for data access itself:

 public class MyObjectRepository : ObjectRepositoryBase {  public MyObjectRepository(IStorage storage) : base(storage, NullLogger.Instance)  {    IsReadOnly = true; //  ,            AddType((ParentEntity x) => new ParentModel(x));    AddType((ChildEntity x) => new ChildModel(x));      //   Hangfire       Hangfire  ObjectRepository    // this.RegisterHangfireScheme();      Initialize();  } }

Create an instance of ObjectRepository:

 var memory = new MemoryStream(); var db = new LiteDatabase(memory); var dbStorage = new LiteDbStorage(db);  var repository = new MyObjectRepository(dbStorage); await repository.WaitForInitialize();

If the project will use HangFire

 public void ConfigureServices(IServiceCollection services, ObjectRepository objectRepository) {  services.AddHangfire(s => s.UseHangfireStorage(objectRepository)); }

Insert a new object:

 var newParent = new ParentModel() repository.Add(newParent);

In this call, the ParentModel object is added to both the local cache and the queue for writing to the database. Therefore, this operation takes O (1), and you can immediately work with this object.

For example, to find this object in the repository and make sure that the returned object is the same instance:

 var parents = repository.Set<ParentModel>(); var myParent = parents.Find(newParent.Id); Assert.IsTrue(ReferenceEquals(myParent, newParent));

What happens when this happens? Set <ParentModel> () returns a TableDictionary <ParentModel> , which contains ConcurrentDictionary <ParentModel, ParentModel> and provides additional functionality of the primary and secondary indexes. This allows you to have methods for searching by Id (or other arbitrary user indices) without a complete search of all objects.

When objects are added to the ObjectRepository , a subscription is added to change their properties, so any change in properties also results in adding this object to the write queue.
Updating properties from the outside looks the same as working with a POCO object:

 myParent.Children.First().Property = "Updated value";

You can delete an object in the following ways:

 repository.Remove(myParent); repository.RemoveRange(otherParents); repository.Remove<ParentModel>(x => !x.Children.Any());

This also adds the object to the delete queue.

How does preservation work?

An ObjectRepository when changing monitored objects (both adding or deleting, and changing properties) raises the ModelChanged event that IStorage is subscribed to . IStorage implementations when a ModelChanged event occurs add up changes to 3 queues — add, update, and delete.

Also, implementations of IStorage during initialization create a timer, which every 5 seconds causes changes to be saved.

In addition, there is an API to force saving: ObjectRepository.Save () .

Before each save, it first removes meaningless operations from the queues (for example, duplicate events - when the object was changed twice or quickly added / deleted objects), and only then the saving itself.

In all cases, the actual object is kept entirely, so it is possible that objects are saved in a different order than they changed, including more recent versions of objects than at the time of adding to the queue.

What else is there?

All libraries are based on .NET Standard 2.0. Can be used in any modern .NET project.
The API is thread safe. Internal collections are based on ConcurrentDictionary , event handlers have either locks, or do not need them.
The only thing worth remembering is to call ObjectRepository.Save () when the application terminates;
Arbitrary indices (require uniqueness):

 repository.Set<ChildModel>().AddIndex(x => x.Value); repository.Set<ChildModel>().Find(x => x.Value, "myValue");

Who uses it?

Personally, I began to use this approach in all hobby projects, because it is convenient and does not require large expenditures on writing a data access layer or deploying heavy infrastructure. Personally, I usually have enough data storage in litedb or in a file.

But in the past, when the startup startup EscapeTeams was done with the team (they thought, money - no, experience again ) - they used Azure Table Storage for data storage.

Future plans

I would like to fix one of the main disadvantages of this approach - horizontal scaling. To do this, you need either distributed transactions (sic!), Or make a volitional decision that the same data from different instances should not change, or let them change according to the principle "who is the last is right."

From a technical point of view, I see the following scheme possible:

Store instead of the EventLog and Snapshot object model
Find other instances (add endpoints of all? Udp discovery? Master / slave? To the settings)
Replicate EventLog between instances through any of the consensus algorithms, such as RAFT.

There is also another problem that bothers me - it is a cascade delete, or the detection of cases of deletion of objects referenced from other objects.

Source

If you’ve read it up to here, then only the code remains to be read, you can
found on github .

Source: https://habr.com/ru/post/452232/

All Articles