Solving data type problems in Ruby or Make data reliable again

In this article, I would like to talk about what problems with data types exist in Ruby, what problems I encountered, how to solve them, and how to make the data we work with can be relied upon.

First you need to decide what data types are. I see the definition of this term, which can be found in HaskellWiki, extremely successful.

Types are how you describe the data your program will work with.

But what is wrong with data types in Ruby? To describe the problem in a comprehensive way, I would like to highlight several reasons.

Reason 1. Problems with Ruby itself.

As you know, Ruby uses strong dynamic typing with support for the so-called. duck typing . What does this mean?
')
Strong typing requires explicit type conversion and does not produce this cast on its own, as it happens, for example, in JavaScript. Therefore, the following Ruby code listing will fail:

1 + '1' - 1 #=> TypeError (String can't be coerced into Integer)

In dynamic typing, type checking occurs in runtime, which allows us not to specify the types of variables and use the same variable to store values of different types:

 x = 123 x = "123" x = [1, 2, 3]

The following statement is usually given as an explanation of the concept of “duck typing”: if it looks like a duck, swims like a duck and quacks like a duck, then this is most likely a duck. Those. duck typing, relying on the behavior of objects, gives us additional flexibility when writing our systems. For example, in the example below, the value for us is not the type of the collection argument, but its ability to respond to messages blank? and map :

 def process(collection) return if collection.blank? collection.map { |item| do_something_with(item) } end

The ability to create such "ducks" is a very powerful tool. However, like any other powerful tool, it requires great care when using. Rollbar’s research helps to verify this , where they analyzed more than 1000 Rail applications and identified the most common errors. And 2 out of 10 most frequent errors are connected with the fact that the object cannot respond to a specific message. And therefore, checking the behavior of an object, what duck typing gives us, in many cases may not be enough.

We can observe how type checking is added to dynamic languages in one form or another:

TypeScript introduces type checking for JavaScript developers.
Type hints have been added in Python 3
Dialyzer copes well with the type checking task for Erlang / Elixir
Steep and Sorbet add type checking in Ruby 2.x

However, before talking about another tool for working more efficiently with types in Ruby, let's look at two more problems for which I would like to find.

Reason 2. A common problem for developers in various programming languages.

Let's remember the definition of data types, which I gave at the very beginning of the article:

Types are how you describe the data your program will work with.

Those. Types are designed to help us describe the data from our data domain in which our systems operate. However, instead of operating with data types created by us from our data domain, we often use primitive types, such as numbers, strings, arrays, etc., which do not say anything about our data domain. This problem is usually classified as Primitive Obsession (obsession with primitives).

Here is a typical example of Primitive Obsession:

 price = 9.99 # vs Money = Struct.new(:amount_cents, :currency) price = Money.new(9_99, 'USD')

Instead of describing the type of data for working with money, ordinary numbers are often used. And this number, like any other primitive types, does not say anything about our subject area. In my opinion, this is the biggest problem of using primitives instead of creating their own type system, where these types will describe data from our subject area. We ourselves give up the advantages that we can gain by using types.

I will talk about these advantages immediately after covering another problem that our beloved Ruby on Rails framework taught us, thanks to which, I am sure, most of those present here came to Ruby.

Reason 3. The problem that the Ruby on Rails framework has taught us

Ruby on Rails, or rather the ActiveRecord ORM framework embedded in it, taught us that objects that are in invalid state are normal. In my opinion, this is not the best idea. And I will try to explain it.

Take this example:

 class App < ApplicationRecord validates :platform, presence: true end app = App.new app.valid? # => false

It is easy to understand that the app object will have an invalid state: validating the App model requires that the platform attribute for the objects of this model, and for our object this attribute is empty.

And now we will try to pass this object in an invalid state to a service that expects an App object as an argument and performs some actions depending on the platform attribute of this object:

 class DoSomethingWithAppPlatform # @param [App] app # # @return [void] def call(app) # do something with app.platform end end DoSomethingWithAppPlatform.new.call(app)

In this case, even a type check would pass. However, since this attribute of an object is empty, it is not clear how the service will handle this case. In any case, having the ability to create objects in an invalid state, we doom ourselves to the need to constantly handle cases where invalid states have leaked into our system.

But let's think about a deeper problem. In general, why do we check the validity of the data? As a rule, to ensure that an unacceptable state does not leak into our systems. If it is so important to ensure that the invalid state is not allowed, then why do we allow the creation of objects with an invalid state? Especially when we deal with such important objects as the ActiveRecord model, which often belongs to the root business logic. In my opinion, this sounds like a very bad idea.

So, summarizing all of the above, we get the following problems in working with data in Ruby / Rails:

in the language itself there is a mechanism for checking behavior, but not data
we, like developers in other languages, tend to use primitive data types instead of creating a type system of our domain
Rails taught us that having objects in an invalid state is normal, although this solution seems like a pretty bad idea.

How can you solve these problems?

I would like to consider one of the solutions to the problems described above, on the example of implementing a real feature in Appodeal. In the process of implementing the Daily Active Users statistics collection (hereinafter referred to as DAU) for applications that use Appodeal for monetization, we have come to the following data structure, which we need to collect:

 DailyActiveUsersData = Struct.new( :app_id, :country_id, :user_id, :ad_type, :platform_id, :ad_id, :first_request_date, keyword_init: true )

This structure has all the same problems that I wrote about above:

any type checking is completely missing, which makes it unclear what values attributes of this structure can take
there is no description of the data used in this structure, and instead of the types specific for our domain, primitives are used
the structure may exist in an invalid state

To solve these problems, we decided to use the dry-types and dry-struct libraries. dry-types is a simple and extensible type system for Ruby, useful for type casting, applying various constraints, defining complex structures, and others. dry-struct is a library built on top of dry-types that provides a convenient DSL for defining typed structures / classes.

To describe the data of our subject area used in the structure for collecting DAU, the following type system was created:

 module Types include Dry::Types.module AdTypeId = Types::Strict::Integer.enum(AD_TYPES.invert) EntityId = Types::Strict::Integer.constrained(gt: 0) PlatformId = Types::Strict::Integer.enum(PLATFORMS.invert) Uuid = Types::Strict::String.constrained(format: UUID_REGEX) Zero = Types.Constant(0) end

Now we have received a description of the data that is used in our system and which we can use in the structure. As you can see, the EntityId and Uuid have some restrictions, and the AdTypeId and PlatformId AdTypeId types can only have values from a specific set. How to work with these types? Consider the example PlatformId :

 #     enumerable- PLATFORMS = { 'android' => 1, 'fire_os' => 2, 'ios' => 3 }.freeze #       , #     Types::PlatformId[1] == Types::PlatformId['android'] #    ,    #   ,     Types::PlatformId['fire_os'] # => 2 #     ,   Types::PlatformId['windows'] # => Dry::Types::ConstraintError

So, with use of types understood. Now let's apply them to our structure. As a result, we got this:

 class DailyActiveUsersData < Dry::Struct attribute :app_id, Types::EntityId attribute :country_id, Types::EntityId attribute :user_id, Types::EntityId attribute :ad_type, (Types::AdTypeId ǀ Types::Zero) attribute :platform_id, Types::PlarformId attribute :ad_id, Types::Uuid attribute :first_request_date, Types::Strict::Date end

What we see now in the data structure for DAU? Through the use of dry-types and dry-struct we got rid of the problems associated with the lack of data type checking and the lack of data descriptions. Now, anyone looking at this structure and the description of the types used in it, can understand what values each attribute can take.

As for the problem with objects in an invalid state, dry-struct relieves us of this: if we try to initialize the structure with invalid values, then we will get an error as a result. And for those cases where the correctness of the data is essential (and in the case of the DAU collection, things are exactly like this), in my opinion, getting an exception is much better than trying to deal with invalid data. In addition, if the testing process is well established (and we have everything that way), then with high probability the code that generates such errors will not reach the production environment.

And besides the inability to initialize objects in an invalid state, dry-struct also does not allow modifying objects after initialization. Thanks to these two factors, we get a guarantee that the objects of such structures will be in a valid state and you can safely rely on this data anywhere else in your system.

Total

In this article, I tried to describe the problems that you may encounter when working with data in Ruby, as well as talk about the tools we use to solve these problems. And thanks to the introduction of these tools, I absolutely stopped worrying about the correctness of the data with which we work. Isn't that great? Isn't this the goal of any instrument - to make our life easier in some of its aspects? And in my opinion, dry-types and dry-struct do an excellent job with this!

Source: https://habr.com/ru/post/433180/

All Articles