In this article, I would like to talk about what problems with data types exist in Ruby, what problems I encountered, how to solve them, and how to make the data we work with can be relied upon.
First you need to decide what data types are. I see the definition of this term, which can be found in
HaskellWiki, extremely successful.
Types are how you describe the data your program will work with.
But what is wrong with data types in Ruby? To describe the problem in a comprehensive way, I would like to highlight several reasons.
Reason 1. Problems with Ruby itself.
As you know, Ruby uses
strong dynamic typing with support for the so-called. duck typing . What does this mean?
')
Strong typing requires explicit type conversion and does not produce this cast on its own, as it happens, for example, in JavaScript. Therefore, the following Ruby code listing will fail:
1 + '1' - 1 #=> TypeError (String can't be coerced into Integer)
In dynamic typing, type checking occurs in runtime, which allows us not to specify the types of variables and use the same variable to store values of different types:
x = 123 x = "123" x = [1, 2, 3]
The following statement is usually given as an explanation of the concept of “duck typing”: if it looks like a duck, swims like a duck and quacks like a duck, then this is most likely a duck. Those. duck typing, relying on the behavior of objects, gives us additional flexibility when writing our systems. For example, in the example below, the value for us is not the type of the
collection
argument, but its ability to respond to messages
blank?
and
map
:
def process(collection) return if collection.blank? collection.map { |item| do_something_with(item) } end
The ability to create such "ducks" is a very powerful tool. However, like any other powerful tool, it requires great care when using.
Rollbar’s research helps
to verify this , where they analyzed more than 1000 Rail applications and identified the most common errors. And 2 out of 10 most frequent errors are connected with the fact that the object cannot respond to a specific message. And therefore, checking the behavior of an object, what duck typing gives us, in many cases may not be enough.
We can observe how type checking is added to dynamic languages in one form or another:
- TypeScript introduces type checking for JavaScript developers.
- Type hints have been added in Python 3
- Dialyzer copes well with the type checking task for Erlang / Elixir
- Steep and Sorbet add type checking in Ruby 2.x
However, before talking about another tool for working more efficiently with types in Ruby, let's look at two more problems for which I would like to find.
Reason 2. A common problem for developers in various programming languages.
Let's remember the definition of data types, which I gave at the very beginning of the article:
Types are how you describe the data your program will work with.
Those. Types are designed to help us describe the data from our data domain in which our systems operate. However, instead of operating with data types created by us from our data domain, we often use primitive types, such as numbers, strings, arrays, etc., which do not say anything about our data domain. This problem is usually classified as Primitive Obsession (obsession with primitives).
Here is a typical example of Primitive Obsession:
price = 9.99 # vs Money = Struct.new(:amount_cents, :currency) price = Money.new(9_99, 'USD')
Instead of describing the type of data for working with money, ordinary numbers are often used. And this number, like any other primitive types, does not say anything about our subject area. In my opinion, this is the biggest problem of using primitives instead of creating their own type system, where these types will describe data from our subject area. We ourselves give up the advantages that we can gain by using types.
I will talk about these advantages immediately after covering another problem that our beloved Ruby on Rails framework taught us, thanks to which, I am sure, most of those present here came to Ruby.
Reason 3. The problem that the Ruby on Rails framework has taught us
Ruby on Rails, or rather the
ActiveRecord
ORM framework embedded in it, taught us that objects that are in invalid state are normal. In my opinion, this is not the best idea. And I will try to explain it.
Take this example:
class App < ApplicationRecord validates :platform, presence: true end app = App.new app.valid? # => false
It is easy to understand that the
app
object will have an invalid state: validating the
App
model requires that the
platform
attribute for the objects of this model, and for our object this attribute is empty.
And now we will try to pass this object in an invalid state to a service that expects an
App
object as an argument and performs some actions depending on the
platform
attribute of this object:
class DoSomethingWithAppPlatform # @param [App] app # # @return [void] def call(app) # do something with app.platform end end DoSomethingWithAppPlatform.new.call(app)
In this case, even a type check would pass. However, since this attribute of an object is empty, it is not clear how the service will handle this case. In any case, having the ability to create objects in an invalid state, we doom ourselves to the need to constantly handle cases where invalid states have leaked into our system.
But let's think about a deeper problem. In general, why do we check the validity of the data? As a rule, to ensure that an unacceptable state does not leak into our systems. If it is so important to ensure that the invalid state is not allowed, then why do we allow the creation of objects with an invalid state? Especially when we deal with such important objects as the ActiveRecord model, which often belongs to the root business logic. In my opinion, this sounds like a very bad idea.
So, summarizing all of the above, we get the following problems in working with data in Ruby / Rails:
- in the language itself there is a mechanism for checking behavior, but not data
- we, like developers in other languages, tend to use primitive data types instead of creating a type system of our domain
- Rails taught us that having objects in an invalid state is normal, although this solution seems like a pretty bad idea.
How can you solve these problems?
I would like to consider one of the solutions to the problems described above, on the example of implementing a real feature in Appodeal. In the process of implementing the Daily Active Users statistics collection (hereinafter referred to as DAU) for applications that use Appodeal for monetization, we have come to the following data structure, which we need to collect:
DailyActiveUsersData = Struct.new( :app_id, :country_id, :user_id, :ad_type, :platform_id, :ad_id, :first_request_date, keyword_init: true )
This structure has all the same problems that I wrote about above:
- any type checking is completely missing, which makes it unclear what values attributes of this structure can take
- there is no description of the data used in this structure, and instead of the types specific for our domain, primitives are used
- the structure may exist in an invalid state
To solve these problems, we decided to use the
dry-types
and
dry-struct
libraries.
dry-types
is a simple and extensible type system for Ruby, useful for type casting, applying various constraints, defining complex structures, and others.
dry-struct
is a library built on top of
dry-types
that provides a convenient DSL for defining typed structures / classes.
To describe the data of our subject area used in the structure for collecting DAU, the following type system was created:
module Types include Dry::Types.module AdTypeId = Types::Strict::Integer.enum(AD_TYPES.invert) EntityId = Types::Strict::Integer.constrained(gt: 0) PlatformId = Types::Strict::Integer.enum(PLATFORMS.invert) Uuid = Types::Strict::String.constrained(format: UUID_REGEX) Zero = Types.Constant(0) end
Now we have received a description of the data that is used in our system and which we can use in the structure. As you can see, the
EntityId
and
Uuid
have some restrictions, and the
AdTypeId
and
PlatformId
AdTypeId
types can only have values from a specific set. How to work with these types? Consider the example
PlatformId
:
# enumerable- PLATFORMS = { 'android' => 1, 'fire_os' => 2, 'ios' => 3 }.freeze # , # Types::PlatformId[1] == Types::PlatformId['android'] # , # , Types::PlatformId['fire_os'] # => 2 # , Types::PlatformId['windows'] # => Dry::Types::ConstraintError
So, with use of types understood. Now let's apply them to our structure. As a result, we got this:
class DailyActiveUsersData < Dry::Struct attribute :app_id, Types::EntityId attribute :country_id, Types::EntityId attribute :user_id, Types::EntityId attribute :ad_type, (Types::AdTypeId ǀ Types::Zero) attribute :platform_id, Types::PlarformId attribute :ad_id, Types::Uuid attribute :first_request_date, Types::Strict::Date end
What we see now in the data structure for DAU? Through the use of
dry-types
and
dry-struct
we got rid of the problems associated with the lack of data type checking and the lack of data descriptions. Now, anyone looking at this structure and the description of the types used in it, can understand what values each attribute can take.
As for the problem with objects in an invalid state,
dry-struct
relieves us of this: if we try to initialize the structure with invalid values, then we will get an error as a result. And for those cases where the correctness of the data is essential (and in the case of the DAU collection, things are exactly like this), in my opinion, getting an exception is much better than trying to deal with invalid data. In addition, if the testing process is well established (and we have everything that way), then with high probability the code that generates such errors will not reach the production environment.
And besides the inability to initialize objects in an invalid state,
dry-struct
also does not allow modifying objects after initialization. Thanks to these two factors, we get a guarantee that the objects of such structures will be in a valid state and you can safely rely on this data anywhere else in your system.
Total
In this article, I tried to describe the problems that you may encounter when working with data in Ruby, as well as talk about the tools we use to solve these problems. And thanks to the introduction of these tools, I absolutely stopped worrying about the correctness of the data with which we work. Isn't that great? Isn't this the goal of any instrument - to make our life easier in some of its aspects? And in my opinion,
dry-types
and
dry-struct
do an excellent job with this!