📜 ⬆️ ⬇️

Unicode is very exciting.

This story happened almost a month ago. Knocked me on Skype a certain Egor.

Egor: Hello, are you looking for freelancers?)
Me: What do you know how?
Egor: And we, in fact, really do not know how and want to work for the experience.)

Egor turned out to be a good savvy lad, and I suggested that he test our lib cjCore.

It is necessary to clarify what it is. On the githaba, we have a repository where we dump our practices, and cjCore is one of our C ++ libraries.
')
Egor cloned himself and tried to compile it, but it was not there. He had trouble compiling our Unicode String.

As you know, in standard C ++ there are no normal Unicode strings and therefore many people write their own classes for this. For example, Qt has its own QString based on the ICU library. And we also decided to write our own line, but we began to use not the ICU, but the Boost or C ++ 11 library to choose from, who likes what more.

Egor has the latest version of Ubuntu, but for some reason, he did not want to compile a string from C ++ 11, and he only considered Boost for the sake of a line to be a deed too expensive (incidentally complaining about a weak Internet) and decided to use free the library eventually stopped at utfcpp.sourceforge.net .

Nahrapom to take this library Yegor failed. He constantly threw away errors that occurred to me, I suggested something, but he constantly failed to do something ...

In the meantime, I went to Wikipedia: en.wikipedia.org/wiki/UTF-8 and found there an interesting sign “Conversion to UTF-8”:


And here comes to me the insane thought: write the conversion functions UTF-8 -> UTF-32 and back with pure code, without any libraries!
Me: Now I will throw something off if I succeed
Egor: Ok

No heavy operations, just check the conditions, addition and operations with bits, 40 minutes and you're done!

Me: What are we for? To quickly your pile and everything works: github.com/sitev/cjCore/blob/master/src/test_utf32to8.cpp
Egor: And back?
I: I remember how long it took to deal with the encodings in C ++ 11 and boost, probably, a week

Another 30-40 minutes and earned a reverse conversion.

Egor took up the test and after some time took off a screenshot:


I: Type plow?
Egor: Not type, but plows: 1 line - source, 2 line - UTF-32, 3 line - from UTF-32 to UTF-8.

Thanks to Egor, he optimized the resulting code, and I wrapped it in the form of a Utf class: github.com/sitev/cjCore/blob/master/src/utf.cpp and asked Yegor to test the speed.

The result was not long in coming, the speed is 2-2.5 times faster than utfcpp.sourceforge.net ! Unfortunately, to compare the speed with other libraries, we did not have time, but try it yourself and post the results in the comments.

As it turned out, the process of programming the Unicode recoding back and forth is very fascinating, it forces you to quickly hit the keys with burning eyes and a quickened heart ...

Source: https://habr.com/ru/post/319602/


All Articles