Strongly typed incomplete data representation

In the previous article “Type Designing,” the idea was described of how to construct types that are similar to classes. This makes it possible to separate the stored data from the meta-information and focus on the presentation of the properties of the entities themselves. However, the described approach is quite complex due to the use of the HList type. In the course of the development of this approach, it was understood that for many practical problems a linear ordered sequence of properties, like the completeness of a set of properties, is not mandatory. If this requirement is relaxed, the constructed types are much simpler and become very convenient for use.

In the updated version of the synapse-frames library, hierarchical data structures are described extremely simply and any subsets of such structures are represented.

')

Bilateral-typed relationships

The property of an object is usually considered in relation to the object itself, and in this case the property has a data type. One type is only for limiting the data that can be contained in the property. Therefore, it seemed logical to represent the property as Slot[T] . However, a property is also tied to the type of object in which this property is declared, although not in a very explicit way. In the above article, to establish such a relationship, a new surrogate type was constructed from a set of properties.

If, however, to express the relationship to the type of container directly in the type of the property itself, then this allows you to avoid creating a surrogate type and use much more convenient means. So, we present the property as a two-sided relationship between two types:

 sealed trait Relation[-L,R] case class Rel[-L, R](name: String) extends Relation[L, R]

(the -L symbol means contravariance, i.e., the property will be available also for the descendants of type L. And the type R is declared invariant, since we plan to use getters and setters for the property)

The Rel class allows us to describe the attributes available for type L. For example,

 class Box val width = Rel[Box, Int]("width") val height = Rel[Box, Int]("height")

(the same properties will be available for Box type descendants).

In addition to just the name, any meta information required by the application can be attached to the property — the database domain, the text description of the property, the serializer / deserializer, the limit on the size of the stored data, the width of the column in the table, the display format (for dates), etc. Meta-information, in case of need, can be linked by external linking using map.

For type L we need to have some kind of real type. In the previous version, we designed this type as a HList over the properties of this type. Here you can use an arbitrary type available in Scala as the type L. For example, any primitive type, or any type alias, you can use traits, abstract and final classes, object.types. Thanks to contravariant L, we can use the inheritance relation between types, which we use as property carriers. Apparently, it is convenient to reflect the inheritance relation in the form of a set of abstract classes, traits and final classes in accordance with the logic of the subject area.

 abstract class Shape trait BoundingRectangle final class Rectangle extends Shape with BoundingRectangle final class Circle extends Shape with BoundingRectangle val width = Rel[BoundingRectangle, Int]("width") val height = Rel[BoundingRectangle, Int]("height") val radius = Rel[Circle, Int]("radius")

A separate attribute can be viewed as one component, allowing you to go from the parent object to the child. If the child has its own attributes, then you can navigate through any of them. A pair of such attributes can be combined into a path from the “grandfather” to the “grandson” and a new relation will be obtained (Rel2 (attr1, attr2)).

  case class Rel2[-L, M, R](_1: Relation[L, M], _2: Relation[M, R]) extends Relation[L, R]

The `/` method, which constructs Rel2, is added to DSL, thereby realizing the composition of relations.

I would also like to note that such relationships are an integral part of the triples that form the basis of the RDF / OWL ontologies. Namely, relations represent the average component of the three:
(object identifier of type L, identifier of property Relation [L, R], identifier of value of property of type R).

Strongly Typed Ids

When using an incomplete description of an object through a set of attributes, the question of comparing different sets of attributes with the same instance is very important. It is necessary in some way to reflect the authenticity of the instance to itself. OOP for this purpose can use the fact that the attribute values belong to the same object. In the database, some method of identification is usually used. Equality of object identifiers allows to display the authenticity of the objects in question.

We can also use identifiers to relate attribute sets to a single instance. Since the attributes in our case are associated with the type of object, then the identifier must be associated with the same type. This will allow at the compilation stage to check the consistency of the types of the identified object and attributed attributes.

In the simplest case, we could use this type of identifier:

 trait Id[T]

However, this method of identification is not universal. First, many objects are identified only within parent objects; secondly, many types of objects can have several identification methods at once. To reflect the first phenomenon, we can use the Rel [-L, R] type described above, considering it as a way of transition from the parent object to a specific instance of the child object. If we recall that child objects are often combined into typed collections, then the child object identifier is composite: first, the collection is selected, and then an element of this collection is selected by an integer index:

  val children = Rel[Parent, Seq[Children]]("children") case class IntId[T](id: Int) extends Relation[Seq[T], T] val child123 = children / IntId(123)

(here the DSL method `/` is used, combining two relationships into one (composition of relationships)).

This method of identification allows you to uniquely move from the parent object to the desired child element. What if we want to use an alternative method of identification? For example, we know that some property of a child object has a unique property within the parent object, and, therefore, can be used to select a child object. In this case, we can use the identification through the index:

  trait IndexedCollection[TId, T] case class Index[TId, T](keyProperty: Relation[T, TId]) extends Relation[Seq[T],IndexedCollection[TId, T]] case class IndexValue[TId, T](value:TId) extends Relation[IndexedCollection[TId, T], T]

For example:

  val name = Rel[Child, String]("name") val childByName = name.index val childVasya = parent / children / childByName / IndexValue("Vasya")

Thus, the Rel [-L, R] type, extended by the sequence number in the collection and indexed by the property of the child object, allows navigation in the hierarchical data structure.

To identify objects that are at the topmost level and have no parent object, you can enter a special type of Global, which will contain all the collections of high-level objects:

  final class Global val persons = Rel[Global, Seq[Person]]("persons") val otherTopLevelObjects = Rel[Global, Seq[OtherTopLevelObject]]("otherTopLevelObjects")

Data schema

Relations themselves are building blocks that allow you to build both the data structures themselves and the schemas of these data. To describe the data schema, you can use the relational approach - entity-relationship. In this case, the schema is a collection of entity descriptions and a collection of descriptions of relationships between entities. For entities, a set of attributes is specified, and for relationships, 1-0, 1-1, 1- *, * - *

You can also use an object-oriented approach that describes the nature, properties and collections of child objects, for which, in turn, properties and collections are described.

The relational scheme is, of course, perfectly suited for presenting data in a database, and object-oriented can be used to create object-oriented services (web-services?).

To describe type T in the object-oriented version of the scheme, one of the descendants of Schema[T] .
SimpleSchema - for simple types that do not contain attributes;
RecordSchema - composite types containing the specified attributes;
CollectionSchema - for types Seq [T] allows you to bind the schema elements of the collection.

Data storage

Meta-information itself does not contain data. For storage, you must use other structures. Such structures depend on the needs of the application:

ordinary classes with ordinary properties that are accessed using reflection by property names;
special classes for storing data that also contain meta-information are inheritors of Instance[T] ( SimpleInstance, RecordInstance, CollectionInstance ). These types simplify working with the data described by the scheme, since data storage directly corresponds to the scheme;
linear tuple, "list of lists" ( List[Any] ). The hierarchical structure of nested Records can be decomposed into a linear structure - a sequence of primitive types. Nested collections are converted into lists of simplest type lists. Such a representation can be used for transmission over the network and for interaction with the database (since the tuple directly corresponds to the row of the table). To convert Instances to flat lists and back, use a pair of operations align / unalign (flatten);
DB tables, data from which is retrieved using RecordSet;
JSON objects;
XML.

Data construction

When creating data instances, the most important limitation that we want to check at the compilation stage is that the properties can be specified only for the types for which they are declared (for this, the property generally has a generic type for the left side relations). From this it follows that in the process of creating an instance of data that satisfies the scheme, it is necessary to use special tools. For example:

  val b1 = empty[Box] .set(width, simple(10)) .set(height, simple(20))

It uses the immutable type Instance[Box] , to which pairs are added - (property, value). In case there is little data, this approach is sufficient. If you need to collect a lot of data, it is more efficient to use a mutable builder, within which the required set of attributes is gradually formed. At the end of the build, the builder is converted to Instance[Box] :

 val boxBuilder = new Builder(boxSchema) boxBuilder.set(width, simple(10)) boxBuilder.set(height, simple(20)) val b1 = boxBuilder.toInstance

Also the builder provides two runtime checks -

the inadmissibility of the use of properties that are not included in the scheme;
ensuring the completeness of the formed object.

In order to present data in rows of tables in a database, it is necessary to convert nested Records into a flat structure. To do this, use a pair of align / unalign methods.

Conclusion

The outlined approach allows

describe complex subject areas with explicit preservation of meta-information;
operate on properties in a strongly typed way (with type checking at compile time);
to represent arbitrary hierarchical data structures (like json'a) with type checking at all levels;
submit incomplete data and verify the degree of completeness (for example, you can have smallSchema[T] and fullSchema[T] with which to check data instances).

In contrast to the approach described in the previous article , we weaken the requirement to ensure that the data completeness is checked at compile time. In return, a much simpler and more convenient approach is obtained. The admissibility of using the property on the specified type is checked by the compiler without constructing bulky surrogate types based on HList. At the same time, we are not constrained by an object-oriented approach in terms of presenting data and limiting the composition of entity attributes.

Source: https://habr.com/ru/post/229035/

All Articles