📜 ⬆️ ⬇️

google protocol buffers: polymorphism, search

The article consists of two parts. The first part contains a free retelling of an article about polymorphism in protobuf . The second part is devoted to annoying ads samopisnym "bike" to work with the framework.

Update : As rightly noted in the comments, I mixed inheritance and polymorphism in one heap. To remove the very gross errors and add new ones , I changed the text a little. Therefore, if it seems to you that some comment has nothing to do with the text, then most likely it simply refers to the previous version. I apologize for this inconvenience.


NB : The article does not answer the question “what is google protocol buffers” and is not tied to any particular programming language.

')
So, the statement of the problem:




NB: Here, instead of the English word message, either the word “message” or the word “structure” will be used. Just in some sentences, the word "message" sounds strange.


Part I: Polymorphism


Suppose our protocol contains three classes of objects: Square, Circle and Polygon. Suppose also that they all have a color field and an id field. In this place, we have several reasons to inherit them from the common ancestor of Shape and get the desired polymorphism (well, or at least the ability to refer to an object using the base type). And, probably, if we wrote in a language with support for inheritance, our code would look as follows:
pseudocode
enum Color { RED, GREEN, BLUE } struct Point { int x; int y; } struct Shape { int id; Color color; } struct Square extends Shape { Point corner; int width; } struct Circle extends Shape { Point center; int radius; } struct Polygon extends Shape { Point [] points; } 



Unfortunately, google protocol buffers do not support hierarchies. Jon Parise is considering three ways around this limitation.

Using optional fields

With this approach, we create a separate structure for each class of successor, and the Shape structure contains optional fields for each case.
This approach has several serious drawbacks:

geom-1.proto
  enum Color { RED = 1; GREEN = 2; BLUE = 3; } message Point { required fixed32 x = 1; required fixed32 y = 2; } message Square { required Point corner = 1; required fixed32 width = 2; } message Circle { required Point center = 1; required fixed32 radius = 2; } message Polygon { repeated Point points = 1; } message Shape { required TYPE type = 1; required fixed32 id = 2; optional Color color = 3; //  optional Square square = 4; optional Circle circle = 5; optional Polygon polygon= 6; } 



Nested serialization

Another approach is considering creating a Shape structure with fields common to the heirs, and adding another field where the heir’s already serialized fields lie.

It is also not the most successful option, since the “serializer” will not “unpack” the contents of the subclass field automatically , and therefore no integrity check will be performed.
Anyway, not beautiful somehow.
geom-2.proto
  enum TYPE { SQUARE = 1; CIRCLE = 2; POLYGON = 3; } enum Color { RED = 1; GREEN = 2; BLUE = 3; } message Point { required fixed32 x = 1; required fixed32 y = 2; } message Square { required Point corner = 1; required fixed32 width = 2; } message Circle { required Point center = 1; required fixed32 radius = 2; } message Polygon { repeated Point points = 1; } message Shape { required TYPE type = 1; required fixed32 id = 2; optional Color color = 3; //  required bytes subclass = 4; } 



Nesting extensions

The third (recommended) approach is similar to the first, but nested extensions are used instead of optional fields. To fight the square-circle, the type field is started.
geom-final.proto
  enum TYPE { SQUARE = 1; CIRCLE = 2; POLYGON = 3; } enum Color { RED = 1; GREEN = 2; BLUE = 3; } message Point { required fixed32 x = 1; required fixed32 y = 2; } message Shape { required TYPE type = 1; required fixed32 id = 2; optional Color color = 3; extensions 4 to max; } message Square { extend Shape { required Square shape = 5; } required Point corner = 1; required fixed32 width = 2; } message Circle { extend Shape { required Circle shape = 6; } required Point center = 1; required fixed32 radius = 2; } message Polygon { extend Shape { required Polygon shape = 7; } repeated Point points = 1; } 


Let's take a closer look at the benefits of this approach.


Part II: Search


And so, we have a file with a description of the message structure (geom.proto). Our program worked and created a large file with the messages themselves. I would like to find the necessary information in it, but a simple text search is not always possible.

For example:

Agree, the usual grep will not help us here.

Of course it is not difficult, for each such task to write a small program, so that I would look for shapes in the file, with the specified properties. However, since we have structured data and their format, why not write your own query language?

Let's return with examples of tasks:


All this, and much more, is able to make a samopisny bicycle .

And also, he can
  • Transfer files from binary to text format and vice versa.
  • Cut and print only certain fields from messages
  • Use multiple messages in one request. For example, find all the squares that go right behind the circles.



I hope that the bike will find its user, and with it, there will be a need for a detailed topic about the syntax and possibilities of the bike.
Thank.

Source: https://habr.com/ru/post/226225/


All Articles