📜 ⬆️ ⬇️

Rust: use serde to serialize

Serialization of data through serde. I recently wrote a Rust code for working with a third-party data source in a TOML format. In other languages, I would load the data with a TOML library and run my program through them, but I heard about serde , the serialization library on Rust, so I decided to try it.


Details - under the cut.


The basics


Below is a simplified example of the data I work with.


manifest-version = "2" # ...  ... [renames.oldpkg] to = "newpkg" 

This is a fairly simple data format, and it is fairly easy to write a Rust structure that could be serialized / deserialized.


 #[derive(Serialize, Deserialize)] struct ThirdPartyData { #[serde(rename = "manifest-version")] manifest_version: String, // ...  ... renames: BTreeMap<String, BTreeMap<String, String>>, } 

This structure corresponds to the structure of the input data, and the only additional code that I wrote is the serde(rename = "blah") attribute serde(rename = "blah") , because manifest-version not a valid Rust identifier.


Improved structure


In the communities of strongly-typed languages, the statement "Make incorrect states unrepresentable" is common. This means that if your program makes any assumptions about the nature of the data, you must use a type system to ensure that this is the case.


Take, say, the field manifest-version . This is not the part of the data that interests me, it is meta information, information about the data I need. When serializing, this field should be set to "2". When deserializing, if it is not "2", then it must be another file format that I do not work with => data reading stops. The code that uses the data does not need to work with this field, and if something changes the value of this field, then in the future this will lead to problems. The best way to make sure that no one reads a field or writes into it is to completely remove it without wasting memory on it.


The field renames creates other problems. Definitely these are data that interest me, but they are presented in the form of strange nested dictionaries. What would an empty dictionary match one of the keys to an external dictionary? The display of the "old name" => "new name" should be BTreeMap <String, String>, and incorrect states simply cannot arise. In short, I want my Rust structure to look like this:


 #[derive(Serialize, Deserialize)] struct ThirdPartyData { //   `manifest_version`! // ...  ... renames: BTreeMap<String, String>, } 

Unfortunately, it does not do what I need: the code does not verify that the manifest-version y is assigned the correct value.


1st attempt: do it yourself


If the derive macro for a serde cannot do this, we must do it manually, right? Based on this, I wrote my own implementation of the serde::Serialize and serde::Deserialize types for my ThirdPartyData structure. In short, it worked! However, it was tedious to write and difficult to understand.


The serde structure serialization documentation is simple and the process is simple: write a serialize method for your structure that calls the right methods on the serde::Serializer , and everything is ready. However, the documentation for deserialization is much more complicated: you need not only to implement Deserialize for your structure, you also need an auxiliary structure for which the type serde::Visitor is implemented.


In the documentation, a long Deserialize example shows writing Deserialize only for a primitive type like i32 . The structure deserialization takes up a separate page of documentation , and implementation is much more complicated.


As I said, I achieved that it works, but I did not have satisfaction with the work done when I committed this code to my project.


2nd attempt: field attributes


Part of the task before me was the implementation of Serialize and Deserialize manually, which made me have to write code to process all the fields in my structure, although serde could do most of this manually.


As it turned out, one of the many attributes provided to fields in the serde is the serde(with = "module") attribute serde(with = "module") . This attribute specifies the name of the module containing the serialize and deserialize functions that will be used to serialize / deserialize the fields, while the rest of the structure is processed by the serde as usual.


For the renames field this is great. Nevertheless, I had to make some efforts to work with Visitor , however, I had to do it only for one field, and not for all the fields in the structure.


When working with the manifest-version field, this did not help. Since I did not want to have a field manifest-version , there was no one to which I could add an attribute.


So I sighed, deleted this code, and tried to solve the problem in another way.


Success: using intermediate structures


Look back and see what problem we solved:



I think you have already guessed what you need to do: use serde to convert the input format into Rust structures that match it, the format, exactly, then manually convert the data into Rust structures that are convenient to use.


I use the ThirdPartyData version, which I briefly described above, but the deserializing code now looks like this:


 impl<'de> serde::Deserialize<'de> for ThirdPartyData { fn deserialize<D>(deserializer: D) -> Result<Self, D::Error> where D: serde::Deserializer<'de>, { use serde::de::Error; //  ,    . #[derive(Deserialize)] struct EncodedThirdPartyData { #[serde(rename = "manifest-version")] pub manifest_version: String, // ...  ... pub renames: BTreeMap<String, BTreeMap<String, String>>, } //     `Deserialize` , // serde      . let input = EncodedThirdPartyData::deserialize(deserializer)?; //   `manifest_version` . if input.manifest_version != "2" { return Err(D::Error::invalid_value( ::serde::de::Unexpected::Str(&input.manifest_version), &"2", )); } //    `renames`  . let mut renames = BTreeMap::new(); for (old_pkg, mut inner_map) in input.renames { let new_pkg = inner_map .remove("to") .ok_or(D::Error::missing_field("to"))?; renames.insert(old_pkg, new_pkg); } //       "" // . Ok(Channel { renames: renames, }) } } 

Our intermediate structure owns deserializable data, so we can disassemble it into parts to build a convenient structure without additional memory allocations ... Well, we need to create several BTreeMap to change the structure of the renames dictionary, but we don’t need to copy keys and values.


To serialize the structure, we can use the same intermediate structure and work in reverse order, but since the structure owns the data, we need to take our convenient structure into pieces in order to get the data or clone it. These options are not very attractive, so we will use a different structure that replaces String types with the type &str.serde , serializes them in the same way, which also means that we can do serialization without allocating memory.


 impl serde::Serialize for ThirdPartyData { fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error> where S: serde::Serializer, { //  ,     //   `&str`  `String`. #[derive(Serialize)] struct EncodedThirdPartyData<'a> { #[serde(rename = "manifest-version")] manifest_version: &'a str, // ...  ... renames: BTreeMap<&'a str, BTreeMap<&'a str, &'a str>>, } //    `renames`   //    . let mut renames = BTreeMap::new(); for (old_pkg, new_pkg) in self.renames.iter() { let mut inner = BTreeMap::new(); inner.insert("to", new_pkg.as_str()); renames.insert(old_pkg.as_str(), inner); } let output = EncodedThirdPartyData { //      ,  //   . manifest_version: "2", renames: renames, }; output.serialize(serializer) } } 

As a result, we obtained a structure with almost fully automated serialization / deserialization, which includes several lines of code for performing some checks and transformations.


')

Source: https://habr.com/ru/post/350956/


All Articles