📜 ⬆️ ⬇️

Silverlight and encodings

Silverlight is quite convenient because it provides an almost “full-fledged” .net in client applications. If it were not for this “almost,” everything would be great. I recently needed to use one .net library. I started by rearranging the project settings to silverlight and adding it to the main project. The application was compiled and I was already glad that it was so easy to use the existing developments, but it was too early to rejoice ...

The application began to fall in the most unusual places. Debugging has shown that the library cannot find the latin1 encoding it needs. I thought that the encoding in this case is called a little differently, and began to google. It turned out to be much worse: as reported by microsoft itself , the silverlight core supports only 3 encodings (utf-8, utf-16LE, utf-16BE), and the library I needed required latin1 (and in our realities - in some cases, windows-1251) .

Upd: the library could not be transferred to Unicode, since her task was to read files from the client machine in the encoding in which they were stored there.
')
I did not find ready solutions, only similar complaints on the forums . Therefore, I decided to write my own bike.

The source of codings for a bicycle is a “full-fledged” desktop dotnet. Because only single-byte encodings were required, then all the characters from them are easy to get by passing the input to the Encoding.GetChars array, filled from 0 to 255.

First, the first version of the GetString method (byte [] bytes, int start, int count) was compiled “on the knee”:
var sb = new System.Text. StringBuilder (count) { Length = count };
count += start;
for ( var i = start; i < count; i++)
sb[i - start] = chars[bytes[i]];
return sb.ToString();

* This source code was highlighted with Source Code Highlighter .

Next, I wanted to slightly increase performance, and the lookup on the array was replaced by a switch:
var sb = new System.Text. StringBuilder (count) { Length = count };
count += start;
for ( var i = start; i < count; i++) {
char tmp;
switch (bytes[i]) {
case 0: tmp = '\u0000' ; break ;
case 1: tmp = '\u0001' ; break ;
...
default : tmp = '\u02D9' ; break ;
}
sb[i - start] = tmp;
}
return sb.ToString();

* This source code was highlighted with Source Code Highlighter .

At the same time, I decided to dispel my doubts about how to work faster with the StringBuilder class.
option 3 (using .append () instead of the index):
var sb = new System.Text. StringBuilder (count);
count += start;
for ( var i = start; i < count; i++) {
switch (bytes[i]) {
case 0: sb.Append( '\u0000' ); break ;
case 1: sb.Append( '\u0001' ); break ;
...
default : sb.Append( '\u02D9' ); break ;
}
}
return sb.ToString();

* This source code was highlighted with Source Code Highlighter .

All methods showed low performance, almost an order of magnitude slower than the built-in implementation of utf-8 on files with English text (that is, when utf-8 also accommodates 1 byte in 1 character).
Then I decided to use just the char [] character array:

var sb = new char [count];
for ( var i = 0; i < sb.Length; i++) {
switch (bytes[i + start]) {
case 0: sb[i] = '\u0000' ; break ;
case 1: sb[i] = '\u0001' ; break ;
...
default : sb[i] = '\u02D9' ; break ;
}
}
return new string (sb);

* This source code was highlighted with Source Code Highlighter .

Update: advised # in the comments to combine the first and the last method, the following code came out:
var result = new char [count];
for ( var i = 0; i < result.Length; i++)
result[i] = charMap[bytes[i + index]];
return result;

* This source code was highlighted with Source Code Highlighter .

A similar code was also checked, where for the first 128 characters a direct cast to (char) was used, but it turned out slower (since looking on a small array is faster than comparing with a number, and casting).

Performance measurements:

OptionTime, ms
utf-8 (built-in)140-156
# 1 (array lookup)1340-1352
# 2 (StringBuilder [])1562-1578
№3 (StringBuilder.Append)1344-1375
# 4 (char [])451-468
# 5 (char [] + array lookup)306-319

The result, I think, is obvious, and I chose method No. 4, method No. 5.

Thanks to all users for helpful comments! It was possible to save a few more milliseconds and tens of kilobytes of generated code;)

Unfortunately, these encodings cannot be directly “slipped” to the StreamReader / StreamWriter classes, but this solution was enough for my needs.

For convenience, I made a small generator “backups” for encodings. Maybe someone will need.

Source: https://habr.com/ru/post/75531/


All Articles