True TrueType font names and export to PDF

In the book of Ursula Le Guin “The Wizard of Earthsea” magic required knowledge of the “true name” of what the magician works with. I think any programmer will agree that the idea is sound. URLs, UUIDs, and other unique object identifiers are what we deal with all the time. And, just like in the wizarding world, these true names are not so easy to learn. At least for font names it is.

I needed to implement in our software product the export of text blocks to PDF. For export, proprietary libraries of the Adobe PDF Library (http://datalogics.com/products/pdfl/) and the DLI (Datalogics Library Interface) add-on are used. I will not delve into these libraries, I think they are of little interest to anyone. But I suppose that the problem I encountered is common for any implementation of PDF export.

Each font (take, for example, Arial) has 4 different styles - regular, bold, oblique and bold oblique. Those. Arial, Arial Bold, Arial Italic and Arial Bold Italic. Each style is stored in a separate TTF file or in a separate section of the TTC file. And if we want to output an oblique or bold font to a PDF file, we must explicitly specify “Arial Italic” or “Arial Bold” in the call to the corresponding function. But in the text block that we export, it is indicated that its font is “Arial” and the attributes Bold and Italic are set separately. And EnumFontsFamiliesEx returns only the name “Arial” to us and that's it! How to get the string “Arial Italic” we need?

The obvious solution — simply assigning the “Italic” line to the font name — does not always work. For example, it does not work with the “Lucida Sans Typewriter” font. The PDF library gives an error if we transfer “Lucida Sans Typewriter Italic”.
')
The key to the solution (pun) is HKEY_LOCAL_MACHINE \ SOFTWARE \ Microsoft \ Windows NT \ CurrentVersion \ Fonts. It is enough to look at the contents of this key and it becomes clear that it was necessary to transmit “Lucida Sans Typewriter Oblique”. Then everything works.
The format of records in this key is not documented anywhere, but, it seems, is obvious:

"Arial (TrueType)" = "arial.ttf"
"Arial Italic (TrueType)" = "ariali.ttf"
"Arial Bold (TrueType)" = "arialbd.ttf"
"Arial Bold Italic (TrueType)" = "arialbi.ttf"
"Batang & BatangChe & Gungsuh & GungsuhChe (TrueType)" = "batang.ttc"
...
"Mangal (TrueType)" = "mangal.ttf"
"Mangal Bold (TrueType)" = "mangalb.ttf"
"Meiryo & Meiryo Italic & Meiryo UI & Meiryo UI Italic (TrueType)" = "meiryo.ttc"
"Meiryo Bold & Meiryo Bold Italic & Meiryo UI Bold & Meiryo UI Bold Italic (TrueType)" = "meiryob.ttc"
"MS Gothic & MS PGothic & MS UI Gothic (TrueType)" = "msgothic.ttc"
...
"Lucida Sans Typewriter Regular (TrueType)" = "LTYPE.TTF"
"Lucida Sans Typewriter Bold (TrueType)" = "LTYPEB.TTF"
"Lucida Sans Typewriter Bold Oblique (TrueType)" = "LTYPEBO.TTF"
"Lucida Sans Typewriter Oblique (TrueType)" = "LTYPEO.TTF"

It can be seen that for TTC collections the fonts contained in them are indicated by “&“.

The algorithm for establishing the correspondence between the common font name and the names of the faces is as follows: for each name of the font, we cut one word at a time from the end, until the remainder matches any name obtained from EnumFontsFamiliesEx. In addition, we check the cut off words for coincidence with the words “Bold”, “Ilalic”, “Semibold”, “Oblique” and remember the corresponding attribute for this shape. For example, for the “Lucida Sans Typewriter” family:

Lucida Sans Typewriter Regular -> Lucida Sans Typewriter
Lucida Sans Typewriter Bold -> Lucida Sans Typewriter
Lucida Sans Typewriter Oblique -> Lucida Sans Typewriter
Lucida Sans Typewriter Bold Oblique -> Lucida Sans Typewriter Bold -> Lucida Sans Typewriter

Now, if you want to print the font “Lucida Sans Typewriter” bold and oblique, then we know that the name “Lucida Sans Typewriter Bold Oblique” corresponds to this style and transfer this name to the PDF library.

Here, however, waiting for another trouble. For example, the font “Mangal” has only a bold outline (“Mangal Bold”), but it does not have an inclined one. Although we can put the attribute “oblique” to this font and Windows GDI in this case will independently distort the existing face when displayed on the screen. When exporting to PDF, you have to do it yourself. The PDF library may allow you to specify a transformation matrix for text output. For example, in my case, it looked like this:

ASFixedMatrix fontSkew;
if (bSimulateItalic)
{
double angle = 15;
fontSkew.a = fixedOne; // x scale
fontSkew.b = fixedZero; // rotate & skew
fontSkew.c = FloatToASFixed (tan (_PI * angle / 180)); // rotate & skew
fontSkew.d = fixedOne; // y scale
fontSkew.h = 0; // x translation
fontSkew.v = 0; // y translation
dlpdfcontentfontskew (..., & fontSkew);
}

To imitate the bold font I did not find a beautiful solution. I just type the line to be displayed in bold, several times with a slight shift. Visually, everything looks fine, but it is frustrating that the text in the PDF file is duplicated.

But this is not the end. The product we are developing has the Japanese version. Therefore, special attention is paid to the correct work with Asian fonts. And here two more problems come out:

The font named “ＭＳＰゴシック” is not present in HKEY_LOCAL_MACHINE \ SOFTWARE \ Microsoft \ Windows NT \ CurrentVersion \ Fonts and it turns out that we cannot find out the names of the faces for this font.
The PDF library does not understand the font names in Unicode at all.

Let's start with the first problem (although historically everything started with the second, but for the coherence of the story it’s easier) Google tells us that the font “Ｐゴシック” is in fact MS Gothic. It turns out that he acquires the Japanese name if the Japanese locale is set in the system. At the same time in the registry, of course, it still remains under the name MS Gothic. This, it turns out, is the regular behavior of EnumFontsFamiliesEx. Here is a quote from the documentation on it: “The fonts for many languages. EnumFonts, EnumFontFamilies, and EnumFontFamiliesEx return the English textface.

By the way, if we already know that “ＭＳＰゴシック” is “MS Gothic”, then this solves the second problem, at least for the case when the English name is stored in the registry. We will simply transfer the name “MS Gothic” to the PDF library and it will work. It remains to establish this correspondence.
For most of the faces from HKEY_LOCAL_MACHINE \ SOFTWARE \ Microsoft \ Windows NT \ CurrentVersion \ Fonts, we have assigned the font names from EnumFontsFamiliesEx. But for some tracings the pair was not found. Of course, we have “MS Gothic” in the registry, and EnumFontsFamiliesEx returned “ＭＳＰゴシック”.
In this case, it remains only to independently parse the TTF / TTC file and find the corresponding Japanese name there.

Parsing a TTC / TTF file is a simple task. For a working sample, you can take the source of the project “ttf2eot” code.google.com/p/ttf2eot . The TTF / TTC format itself is well documented on the Microsoft website: www.microsoft.com/typography/otspec . You need to pay attention to the fact that all data in TTF is stored in big endian format, so that all numbers and Unicode strings must be converted before use.

Unfortunately, I don’t have the right to lay out my code, so I’ll just write here what to look for.

We are interested in the “name” table www.microsoft.com/typography/otspec/name.htm . Select records with:

nPlatformId = 3: Windows. I assume that if the font is installed under Windows, then these records should be there. Maybe I'm wrong, but let such a font meet first, then we will understand.
nNameId = 1: Font Family name. Font Family name, italics, bold, bold italic - as defined by OS / 2.fsSelection bit settings. Those. this is exactly the name returned by EnumFontFamiliesEx.
nEncodingId = 0 is a one-byte ASCII string or 1 is a two-byte USC2 string. The rest of the encodings can be ignored: the specification clearly requires that at least one of these two encodings is present: “When you’re building a symbol for Windows, the platform ID should be 3 and the encoding ID should be 0. ”

One of the found names will match some name from EnumFontFamiliesEx.

For example, for the “Meiryo Bold Italic” trait, by examining meiryob.ttc we find out that this trait corresponds to the name “メイリオ” from EnumFontFamiliesEx.

It remains to be seen whether this typeface is bold and oblique. The idea suggests itself to also take this information from the font, but, as it turned out experimentally, these attributes in the font file may be incorrect. Therefore, we take them from the name of the outline (“Meiryo Bold Italic”), as already done above. Only we will cut off the words until the remainder matches any name extracted from the TTF file, and not from the output of EnumFontFamiliesEx.

Thus, if you need to export a text block in oblique and bold type with the name “イリオ”, we transfer the name “Meiryo Bold Italic” to the PDF library. Profit!

Source: https://habr.com/ru/post/146786/

All Articles

True TrueType font names and export to PDF

More articles: