📜 ⬆️ ⬇️

About how I ported Java to donet

image

A long time ago ... how long? yesterday! (C), that is, a couple of years ago, I ported one modest library from Java to .NET. And not just on .NET, but on version 1.1.

The approach is known - we take Sharpen in the mouth (or a converter from a visual studio of 2003, who likes what), and then with a jigsaw.
')
About the obviousness with iterators, structures (" System.Drawing.Size is not an object") and I will not talk in streams - banalism. But about some surprises - welcome.



Kultur-multur, who made you up?


image
Let's start with the simplest, for the seed. From translating numbers to strings and back.

CultureInfo oldCulture = CurrentThread.CurrentCulture; CultureInfo oldCultureUI = CurrentThread.CurrentUICulture; CurrentThread.CurrentCulture = new CultureInfo("en-US"); CurrentThread.CurrentUICulture = CurrentThread.CurrentCulture; try { render_impl(); } finally { CurrentThread.CurrentCulture = oldCulture; CurrentThread.CurrentUICulture = oldCultureUI; } 


Well, why this crutch?

And to write 100,500 times wrong, why else. Like not to write

 (float) System.Double.Parse(foo_string, NumberStyles.Float, new CultureInfo("en-US")); 
and
 new Double(foo_number).ToString(new CultureInfo("en-US")); 

Author lazy and parasite? Otozh. And what did you want - a week term, and joy fell down

 find ~/pd4ml/sources -type f -print | xargs cat | wc -l 420242 


Why is this hack needed? Because it is necessary to generate PDF, and the acrobat reader somehow does not respect the national commas in the form of separators of the fractional and integer parts. And in HTML / CSS the same local - specificity somehow did not take root, hehe.

Features of paranoia in architecture


image
The original library file contains neonku in an obvious way. If you need to give out a pack of different files, and also print to a printer and show on the screen, then java.awt.Graphics2D is inherited and there is a pack of PDFDevice , RTFDevice , etc. Obviously, where did I go that this method doesn’t work in detail? Together we say thank you to the Hindu who stuck the sealed anywhere.

I had to invent my own abstract MegaDevice , in which to bring the System.Drawing.Context contract and inherit from it. Time spent on it was - guard. And this is despite the fact that the output to the screen and to the printer just had to be thrown away.

The logical question is: why was the Detnetov contract taken, not from java?

The answer here is simple - the envelope is masterfully changed.

  g.drawString( prefix + index + " ", x, y ); 

on
  g.DrawString( prefix + index + " ", SupportClass.GraphicsManager.manager.GetFont(g), SupportClass.GraphicsManager.manager.GetBrush(g), new PointF(x, y)); 

so I had to adapt to it. The source code has not changed fundamentally - and then do not understand what is being ported, either the original glitches, or the introduced bugs.

State machines and signed / unsigned


image
The following rake appeared with CSS parser. Exactly the same ( com.steadystate. * ) Did not exist at the time of porting to the dotnet. Yes, I know - it would be right to rewrite the grammar from JavaCC to ANTLR. Watch for two or three hours, taking into account the timing and time lost in the previous step.

But my laziness suggested to me another option - to convert an auto-generated footwoman. It is only a kilobyte of three hundred, it is easy well. And here - flooded.

That is , bits - they are different :

 private static long URShift(long number, int bits) { if ( number >= 0) return number >> bits; else return (number >> bits) + (2L << ~bits); } 

and such - I do not forget to cast a long one with a sign in a long one without a sign, at the same time I tidy away the managed Identity:

 if (((ulong)active0 & (ulong)(0x8000103000000000L)) != 0L) { jjmatchedKind = 66; return 577; } 

I freshen up goto and break label in my memory, and then the generalka sculpts them everywhere:

 EOFLoop : for (; ; ) { for (; ; ) { ... else if ((jjtoSkip[URShift(jjmatchedKind, 6)] & (1L << (jjmatchedKind & 63))) != 0L) { if (jjnewLexState[jjmatchedKind] != - 1) curLexState = jjnewLexState[jjmatchedKind]; goto EOFLoop; } ... } } 

And rewrite the function FillBuf, and then streams - they also have ma-a-little differences. From which the machine is sick and it never reaches the end of the file. To be completely precise, it comes to the end and continues to try to read:

 int i; try { if(inputStream == null) throw new System.IO.IOException("EOF"); i = inputStream.ReadBlock(buffer, maxNextCharInd, available - maxNextCharInd); if (i <= 0 /* was == -1 */) { inputStream.Close(); inputStream = null; throw new IOException(); } else maxNextCharInd += i; return ; } catch (IOException e) { --bufpos; backup(0); if (tokenBegin == - 1) tokenBegin = bufpos; throw e; } 

This is how mass replacements in the text editor resolved the issue of porting the parser. With "easy" - I still sat in a puddle, yes. Day killed, but how killed - for a muddy job. "The expression crap was replaced with garbage 132 times, click OK . "

Symbols and numbers and non-obviousness around them


image
The next step was to support Unicode and right-to-left. And here I was in for a surprise in the form of java.lang.Character . Not only that between him and System.Char in common only in the title. So also in dotnet

 public static int digit(int codePoint, int radix) 

in epsilon-neighborhood is not visible (and not only).

Fifteen minutes googling did not suggest what this method can be replaced with, and a big stick went into action. Namely, a piece of java.lang. * Was sported, relating to this function. That is, java.lang.Character and java.lang.CharacterData * adjacent to it with all internal tables.

(ironically) And who said that Java is a worthless open source?

In the same scenario, java.math.BigDecimal was migrated. These little differences - they are not encouraging, especially if in the code there are many where you come. Yes, I have already said - am I a lazy man and a parasite? This is it again.

With BigDecimal, the magic setScale and the faithful toString () were needed:

  BigDecimal d1 = new BigDecimal(currentLineThickness / 2 + x ); BigDecimal d2 = new BigDecimal(currentLineThickness / 2 + y ); d1 = d1.SetScale(4, BigDecimal.ROUND_UP); d2 = d2.SetScale(4, BigDecimal.ROUND_UP); buf.Append(d1.ToString()).Append(" "); buf.Append(d2.ToString()).Append(" m\n"); 


Hashtable and tostring


image
The next moment, which brought a lot of unpleasant moments, is the well-known commonplace with collections. However, the Unknown Author (TM) masterfully made a knight's move - he built a cache around HTML attributes, and used ToString () as a key.

(thought out loud) I don’t like to add Hashmaps to Hashmaps, but many people like them. Either enlightenment has not yet reached, or something else, but - I do not like.

Obviously, attributes are a set of name = value pairs. That is - HashMap. What does java do with its toString ()? prints the contents of the collection. And in dotnet - well, you know. Indian style coding detected ?

The solution was obvious and simple.

  public static String ToString(Hashtable map) { String hta_str = "["; IEnumerator ie = map.Keys.GetEnumerator(); while(ie.MoveNext()) { Object o = map[ie.Current]; if(o is Array) { Array al = (Array)o; hta_str += ie.Current + "=["; for(int jjk=0;jjk<al.Length;++jjk) hta_str += al.GetValue(jjk) + ","; hta_str += "]"; } else hta_str += ie.Current + "=" + o + ","; } hta_str += "]"; return hta_str; } 


I didn’t even bother to copy “as in Zhave” - I just needed some kind of display in the text instead of Hashtable @ address.

Sweet - work with fonts


image
And at the end of the curtain - I want to please with another pearl - work with fonts. For half of the library is only engaged in magic - substitutions, selections, switching from left-to-right in right-to-left and vice versa, Arabic, Chinese, automatic change of Arial to Mincho and so on and so forth.

Without half a liter to eat this - it was something with something.

By the way, it immensely delivers that Windows OpenType itself is eating with a whistle, but .NET - alas. Although he is in the variation of "sausage" silverlight - yes. And this is in 4.0, what can we say about archaic 1.1! Waiting for 5.0 - maybe there will add? .

Here, for example, I had to completely rewrite the code for getting fonts from the specified directory.

 private static void listFonts(DirectoryInfo dd, String mask, Hashtable listOfFontFaces) { FileInfo[] files = dd.GetFiles(mask); FontFamily newFN = null; bool TtC = mask.Equals("*.ttc"); Hashtable foundFonts = new Hashtable(); System.Drawing.Text.PrivateFontCollection tmpPfc = null; for(int jjk=0;jjk<files.Length;++jjk) { tmpPfc = new System.Drawing.Text.PrivateFontCollection(); try { tmpPfc.AddFontFile(files[jjk].FullName); } catch(Exception /* e */) {} for(int jjv=0;jjv<tmpPfc.Families.Length;++jjv) { newFN = tmpPfc.Families[jjv]; FontFamilySpec spec = (FontFamilySpec)foundFonts[newFN.Name]; if(spec == null) { spec = new FontFamilySpec(newFN.Name); foundFonts[spec.Family] = spec; } String fn = files[jjk].Name; if(TtC) fn += "_" + jjv; spec.Files.Add(fn); spec.Files.Sort(); } tmpPfc.Dispose(); } ... } 

The code is obviously undocumented , but it works .

And here is another magic:

 char c = content[ 0 ]; UnicodeCategory prevUB = Char.GetUnicodeCategory(c); int lastCutPosition = 0; for ( int i = 1; i < content.Length; i++ ) { c = content[i]; if ( c == 0xAD || c == ' ') // soft hyphen continue; byte dirct = java.lang.Character.getDirectionality((int)c); if ( dirct == java.lang.Character.DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC || dirct == java.lang.Character.DIRECTIONALITY_RIGHT_TO_LEFT || dirct == java.lang.Character.DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING || dirct == java.lang.Character.DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE ) // sick of fighting with. // the only expected conflict is a combining of chinese and arabic in a single paragraph break; UnicodeCategory ub = Char.GetUnicodeCategory(c); if ( ub != prevUB ) { String pattern = ""; for ( int j = 0; j < content.Length; j++ ) { char ch = content[j]; if ( prevUB == Char.GetUnicodeCategory(c) ) pattern += ch; } ... prevUB = ub; } } ... 

Here, in general, compote came out - and pre-Net tools, and ported in insolently from Java6 sources - each creature in a pair. Very unobvious tricks.

To tame this fish was not easy at all. I swear to my cocked hat! (WITH)

Results


image
Alas, I didn’t shamefully invest in the allotted week, grabbed another weekend, and a couple of days after them. And then a couple of times came back with catching unobvious rake.

The client then sent my experiments to the authors, and they soon had an official version under the dotnet. No relation to further development, etc. I have not.

General impressions from acquaintance with the original code were very positive. Nowadays, universal webkitization to make your own HTML 4 render with HTML 5 elements, with CSS 2.1 support, and all this pure Java is real old school.

Not a single animal suffered during porting.

I hope this information will be useful to someone.

Source: https://habr.com/ru/post/151634/


All Articles