Many to many link and upsert in Ecto 2.1

In the previous chapter, we talked about many_to_many associations and how to map external data into associated entities using Ecto.Changeset.cast_assoc/3 . Then we were forced to follow the rules imposed by the cast_assoc/3 function, but this is not always possible or desirable.

In this chapter, we will look at Ecto.Changeset.put_assoc/4 versus cast_assoc/3 and look at a few examples. We will also take a look at the upsert function, which will appear in Ecto 2.1.

put_assoc vs cast_assoc

Imagine that we are creating a blog application that has posts and each post can have a lot of tags. And also, each tag can refer to several posts. This is a classic scenario where you can use many_to_many communication. Our migration will look like this:

 create table(:posts) do add :title add :body timestamps() end create table(:tags) do add :name timestamps() end create unique_index(:tags, [:name]) create table(:posts_tags, primary_key: false) do add :post_id, references(:posts) add :tag_id, references(:tags) end

Notice that we added a unique index to the tag name, because we do not need tags with the same name in our database. It is important to add an index at the database level, instead of using validations, as there is always a chance that two tags with the same name will be validated and inserted at the same time, which will lead to duplicate entries.

Now, imagine also that the user enters tags as a list of words, separated by commas, like for example: “elixir, erlang, ecto”. When this data is received on the server, we will divide them into separate tags, link them to the desired post, and create those tags that are not yet in the database.

So far, the conditions above sound reasonable, but they will definitely cause problems when using cast_assoc/3 . We remember that cast_assoc/3 is a changeset function, created to obtain external parameters, and to compare them with the associated data in our model. Ecto requires tags to be sent as a list of maps (list of maps). However, in our case, we expect tags as a string, separated by a comma.

Moreover, cast_assoc/3 relies on the primary key value for each tag, in order to decide whether to insert it, update it, or delete it. Again, since the user simply sends the string, we do not have information about the primary key.

When we cannot deal with cast_assoc/3 , it is time to use put_assoc/4 . In put_assoc/4 , we get the Ecto structure or changeset, instead of parameters, which allows us to manipulate the data the way we want. Let's define a schema and a changeset function for a post that can accept tags as a string:

 defmodule MyApp.Post do use Ecto.Schema schema "posts" do add :title add :body many_to_many :tags, MyApp.Tag, join_through: "posts_tags" timestamps() end def changeset(struct, params \\ %{}) do struct |> Ecto.Changeset.cast(struct, [:title, :body]) |> Ecto.Changeset.put_assoc(:tags, parse_tags(params)) end defp parse_tags(params) do (params["tags"] || "") |> String.split(",") |> Enum.map(&String.trim/1) |> Enum.reject(& &1 == "") |> Enum.map(&get_or_insert_tag/1) end defp get_or_insert_tag(name) do Repo.get_by(MyApp.Tag, name: name) || Repo.insert!(MyApp.Tag, %Tag{name: name}) end end

In the changeset function above, we brought all tag processing into a separate function called parse_tags/1 , which checks for the existence of a parameter, divides it into values using String.split/2 , then deletes the extra spaces with String.trim/1 , deletes everything blank lines and at the end checks whether the tag exists in the database, and if not, creates it.

Function parse_tags/1 Returns a list of MyApp.Tag structures, which we throw into put_assoc/3 . Calling put_assoc/3 , we tell Ecto that from now on these tags will be linked to the post. If tags that were previously associated will not be received in put_assoc/3 , then Ecto will bother to remove the association between the post and the remote tag in the database.

And this is all that is needed to use the many_to_many connection with put_assoc/3 . The put_assoc/3 function works with has_many , belongs_to and all other types of associations. However, our code is not ready for production, let's see why.

Limitations and race conditions

Recall that we added a unique index to the column :name tag. We did this to protect ourselves from duplicate tags in the database.

Adding a unique index and then using get_by with insert! To get or insert a tag, we create a potential error in our application. If two posts go at the same time with the same tags, then there is a chance that a check for the presence of a tag will occur at the same time, and both processes will decide that such a tag does not exist in the database. When this happens, only one process will succeed, while the second will fail with an error. This is a race condition (your lane is race condition): your code will fall with an error from time to time when certain conditions are met. And these conditions will occur depending on the time.

Many developers believe that such errors never happen in practice, or if they do, they are insignificant. But in practice, the occurrence of these errors greatly impairs the user experience. I heard an example of “first hand” from a company that develops mobile games. In their game, players can complete quests, and each quest you choose a second player (guest) from a certain list to complete the quest with you. At the end of the quest, you can add a second player as a friend.

Initially, the guest list was random, but after a while, users began to complain that sometimes old accounts, often inactive, appear in the guest list. To remedy this situation, the developers began to sort the game list by the most active players in recent times. This meant that if you played recently, then there is a high chance someone will appear on the guest list.

However, when they made these changes, users began to see a lot of errors, and they, furious, came to the game forums. This happened because when they sorted the list by activity, as soon as two players log in, their characters will most likely appear on each other guest list. If these players choose each other, then only the first player, who will have time to add the second as a friend, will be able to do this, the second will not, because the relationship with this user will already exist! Not only is this a problem, but the fact is that all the progress made for the quest will be lost, because the server will not be able to properly save the results to the database. It is clear why users started complaining.

Long story short: we must deal with the state of the race.

Fortunately, Ecto provides us with a mechanism for handling database constraint errors.

Check for constraint errors

When our function get_or_insert_tag(name) falls due to the fact that the tag already exists in the database, we need to handle this script. Let's rewrite this function, keeping in mind the state of the race.

 defp get_or_insert_tag(name) do %Tag{} |> Ecto.Changeset.change(name: name) |> Ecto.Changeset.unique_constraint(:name) |> Repo.insert |> case do {:ok, tag} -> tag {:error, _} -> Repo.get_by!(MyApp.Tag, name: name) end end

Instead of writing the tag directly, we first compile a changeset that allows us to use the unique_constraint annotation. Now Repo.insert will not fall due to the error associated with the unique index on :name , but will return {:error, changeset} tuple. Therefore, if the Repo.insert is successful, it means that the tag has been saved, otherwise, if the tag exists, then we just get it with Repo.get_by! .

Although the mechanism described above repairs a race condition, it is quite expensive. We need to process two requests for each tag that already exists in the database: (unsuccessful) insert and then retrieve the tag from the repository. Given that this is a fairly common scenario, we can rewrite it all as follows:

 defp get_or_insert_tag(name) do Repo.get_by(MyApp.Tag, name: name) || maybe_insert_tag(name) end defp maybe_insert_tag(name) do %Tag{} |> Ecto.Changeset.change(name: name) |> Ecto.Changeset.unique_constraint(:name) |> Repo.insert |> case do {:ok, tag} -> tag {:error, _} -> Repo.get_by!(MyApp.Tag, name: name) end end

The code above creates one request for each existing tag, two requests for each new tag, and 3 requests in the race state. All this on average will work a little better, but Ecto 2.1 allows you to do better.

Upserts

Ecto 2.1 supports the so-called “upsert” command, which is an abbreviation for “update or insert”. The idea is that when we try to write to the database and get an error, for example due to a unique index, we can decide whether the database will throw an error (the default), ignore the error (no error), or update the conflicting entity.

The “upsert” feature in Ecto 2.1 works with the :on_conflict parameter. Let's rewrite the get_or_insert_tag(name) again using the :on_conflict parameter. Remember that “upsert” is a new feature of PostgreSQL 9.5, so make sure you have this version of the base.

Let's try to use :on_conflict with the parameter :nothing as below:

 defp get_or_insert_tag(name) do Repo.insert!(%MyApp.Tag{name: name}, on_conflict: :nothing) end

Although the function above will not cause an error, in the event of a conflict, it will also not update the resulting structure, and will return a tag without an ID. One solution is to force the update to occur even in the event of a conflict, even if the update involves changing the tag name. In this case, PostgreSQL also requires :conflict_target parameter, which indicates the column (or list of columns) in which we expect an error:

 defp get_or_insert_tag(name) do Repo.insert!(%MyApp.Tag{name: name}, on_conflict: [set: [name: name]], conflict_target: :name) end

And that's it! We try to write the tag to the database, and if it already exists, then we tell Ecto to update the name to the current value, update the tag and get its id. This decision is definitely one step ahead of the rest, it makes one request for each tag. If 10 tags are received, there will be absolutely 10 requests. How can we even improve it?

Upserts and insert_all

Ecto 2.1 adds :on_conflict parameter not only to the Repo.insert/2 , but also to the Repo.insert_all/3 function introduced in Ecto 2.0. This means that we can write down all the missing tags in one request, and one more get them all. Let's see how the Post scheme will look like after these changes:

 defmodule MyApp.Post do use Ecto.Schema # Schema is the same schema "posts" do add :title add :body many_to_many :tags, MyApp.Tag, join_through: "posts_tags" timestamps() end # Changeset is the same def changeset(struct, params \\ %{}) do struct |> Ecto.Changeset.cast(struct, [:title, :body]) |> Ecto.Changeset.put_assoc(:tags, parse_tags(params)) end # Parse tags has slightly changed defp parse_tags(params) do (params["tags"] || "") |> String.split(",") |> Enum.map(&String.trim/1) |> Enum.reject(& &1 == "") |> insert_and_get_all() end defp insert_and_get_all([]) do [] end defp insert_and_get_all(names) do maps = Enum.map(names, &%{name: &1}) Repo.insert_all MyApp.Tag, maps, on_conflict: :nothing Repo.all from t in MyApp.Tag, where: t.name in ^names) end end

Instead of trying to write and retrieve each tag individually, the code above works with all tags at once, first creating a list of maps (comment lane list of maps) that is passed to insert_all and then receives all tags with the necessary names. Now, despite how many tags we get, we always make only two requests (except for the option when no tags are received, then we will immediately return an empty list). This solution is possible due to the introduction of the parameter in Ecto 2.1 :on_conflict , which guarantees that insert_all will not fail with an error if the tag received already exists.

Finally, do not forget that we did not use transactions in any of the examples above. This decision was deliberate. Getting or writing tags is an idempotent operation, that is, we can repeat it as many times as you like, and always get the same result. Therefore, if we cannot write a post to the database due to a validation error, the user will try to submit the form again, and each time we will perform an operation to receive or write tags. In case this is not required, all operations can be wrapped in a transaction, or modeled using an Ecto.multi abstraction, which we will discuss in the following chapters.

» Original
»Author: Jose Valim

Source: https://habr.com/ru/post/317602/

All Articles