
A short post to continue with my
previous post about generating PDF from a WPF application using PDFSharp. As described in that article, generation is performed using
FlowDocument as an intermediary. In
FlowDocument, we can use
Hyperlink to display different types of hyperlinks, but it turned out that the version of
PDFSharp.Xps that I used bluntly ignores the attributes
FixedPage_NavigateUri attached to
XpsElement elements.
I spent some time trying to figure out the output format of PDF 1.4, but I haven’t yet been able to figure out how to fix the print correctly in the
PDFSharp.Xps project’s
PdfContentWriter .
Under the cut is a simpler solution, namely the imposition of hyperlinks to the text in the form of
Link Annotation . Also at the end of the article you will find the result of my research on the topic of “kosher” problem solving, through the introduction of primitives into the PDF output process.
Solution via Link Annotation
Here is a link to kammit with fix. As I wrote in the teaser, in the
PdfContentWriter code
, I added the creation of Link Annotation. I did this in the
WritePath (...) method (see the code below).
// Checking is there a link attached with this Path if (path.FixedPage_NavigateUri != null && !string.IsNullOrEmpty(path.FixedPage_NavigateUri.Trim())) { var bounds = path.Data.GetBoundingBox(); var xpsPage = path.Parent as FixedPage; if (xpsPage != null) { var pxToPtScale = xpsPage.PointHeight/xpsPage.Height; try { var uri = new Uri(path.FixedPage_NavigateUri); page.AddWebLink( new PdfRectangle(bounds.Left*pxToPtScale, page.Height - bounds.Top*pxToPtScale, bounds.Right*pxToPtScale, page.Height - bounds.Bottom*pxToPtScale), uri.AbsoluteUri); } catch (Exception) { Debug.Assert(false, "WritePath(...) > Invalid URI string provided"); } } }
In this code, I simply get the bounds of the Path object I just added to the PDF page, and I do this only for those Paths that have a non-empty value of
FixedPage_NavigateUri . As it turned out, the vertical axis of the PDF sheet is directed opposite to the same axis in the XPS, therefore, the vertical coordinates of the block boundary are subtracted from the page height. Then we transfer the obtained coordinates from screen pixels to points. I suspect that the corresponding coefficient depends on the resolution of the screen fonts, so we calculate it dynamically. The link attached to the Path is passed through the Uri class to verify that the link is valid. Perhaps there is a more reliable / efficient / functional way to convert URIs. We use so far this method as the easiest. If the link address is invalid, then simply write a message to the Debug console. Also here you can add a logging code.
The result of the converter with this patch is shown in the picture in the teaser of the article. Note the black border around the link. This is the created abstract link. The presence of a black border is a problem that can be solved at least by post-processing the created PDF. It will be in an unencrypted form submitted markup block annotations.
16 0 obj
<<
/ Type / Annot
/ NM (11aabcc9-2402-4718-8184-7ffb9bbb031c)
/ M (D: 20131119233814 + 04'00 ')
/ Subtype / link
/Rect[81.885 64.185 158.123 50.55]
/ BS << / Type / Border >>
/ Border [0 0 0]
/ A << / S / URI / URI (http://habrahabr.ru/) >>
>>
endobj
I suspect that in this markup the text "/ Border [0 0 0]" sets the RGB color border components.
Investigation results
Solution through reference anotation lay on the surface. The only difficulty was determining the correct coordinates. But the solution is not the best. It would be more correct to fix the output of the primitives itself, and not to put a crutch in the form of an annotation over the derived object Path. As you can see in the picture at the beginning of the article, by default this annotation is displayed with an ugly black border.
So I downloaded
the PDF v specification. 1.4 , opened the PDFSharp and PDFSharp.Xps projects and began to study the code.
In the
PdfLinkAnnotation class
, I came across a view code
internal override void WriteObject(PdfWriter writer) { // ... // switch (this.linkType) { // ... // case LinkType.Web: //pdf.AppendFormat("/A<</S/URI/URI{0}>>\n", PdfEncoders.EncodeAsLiteral(this.url)); Elements[Keys.A] = new PdfLiteral("<</S/URI/URI{0}>>", //PdfEncoders.EncodeAsLiteral(this.url)); PdfEncoders.ToStringLiteral(this.url, PdfStringEncoding.WinAnsiEncoding, writer.SecurityHandler)); break; // ... // }
Googling on the line
/ A << / S / URI / URI brought me to the
Analyzing PFs page , where I saw an approximate view of the markup of the link block.
6 0 obj
<<
/ Type / Action
/ S / uri
/ Uri (http://stinkeye.org)
>>
endobj
')
Opening the PDF I received, I found the following:
4 0 obj
<<
/ Type / Page
/ MediaBox [0 0 468 295.98]
/ Parent 3 0 R
/ Contents 5 0 R
/ Resources
<<
/ ProcSet [/ PDF / Text / ImageB / ImageC / ImageI]
/ ExtGState
<<
/ GS0 6 0 R
/ GS1 15 0 R
>>
/ Font
<<
/ F0 10 0 R
/ F1 14 0 R
>>
>>
/ Annots [16 0 R]
/ Group
<<
/ CS / DeviceRGB
/ S / Transparency
/ I false
/ K false
>>
>>
endobj
This is a page layout block.
5 0 obj
<<
/ Length 1114
/ Filter / FlateDecode
>>
stream
xњn 7} WҐ / $ P ђT; Ї
ў p} K ‹Zm # @ mx WgW + ieShN G ] C CH ћ 3 ®üg? ± ¶ј 3¶ј№ k k p і w x x x p pg ÀY№“ Brashch = v .....
.....
Џ k ~ „LA muw { l l lQQYyu!! BBjw $ d'bc K椦¤YPD¤ѓ $ · A syu˜Pђ": Ђl2i fY < ›w U` oSh odvђ¶n {1 1 † zHEЃ about <. dnW nYl yy> I \ H ѕ i i sp
endstream
endobj
Dots hidden text that is not supported markup habrahabr. There are many non-printable characters encoded in WinAnsi. All the PDF and Unicode text created by the converter are translated into it, in other words, it is the raw content of a binor stream. Therefore, there is hardly anything interesting. Come debug.
Put a break in
PdfContentWriter.WritePath (Path path) . For this break-point add condition
path.FixedPage_NavigateUri! = null &&! string.IsNullOrEmpty (path.FixedPage_NavigateUri)
once again not to put pressure on F5.
After we have parsed the template and clicked on the
Print button in the main window, we will get into this break point and we will be able to see the contents of the stream of primitives in text form. There will be something like the following text.
q% - BeginContent
0.75 0 0 -0.75 0 295.98 cm
-100 Tz
q% - begin Glyphs
0 0 0 rg
/ GS0 gs
BT
/ F0 -1 Tf
24 0 0 24 18.18 40.1867 Tm
0 0 Td <002B0048004F004F0052000F0003002B0044004500550044004B0044004500550004> Tj
ET
Q% - end Glyphs
q% - begin Glyphs
0 0 0 rg
/ GS0 gs
BT
/ F1 -1 Tf
16 0 0 16 18.18 87.3933 Tm
0 0 Td <0028005B005300480055004C005000480051> Tj
4.865 0 Td <0057> Tj
0.34 0 Td <004C0051004A0003005A004C> Tj
2.661 0 Td <0057> Tj
0.34 0 Td <004B000300470052> Tj
1.936 0 Td <0057> Tj
0.34 0 Td <002F004C00540058004C0047000F00030029004F0052005A0027005200460058005000480000> Tj
9.836 0 Td <0057> Tj
0.34 0 Td <000300440051004700030033002700290036004B004400550053> Tj
ET
Q% - end Glyphs
% ...%
q% - begin Canvas
1 0 0 1 18.18 145.44 cm
q% - begin path
1 0 0 1 5 10.4533 cm
0 0.204 0.506 rg
5 2.5 m
5 3.88 3.88 5 2.5 5 c
1.12 5 0 3.88 0 2.5 c
0 1.12 1.12 0 2.5 0 c
3.88 0 5 1.12 5 2.5 c
h
f *
Q% - end Path
q% - begin Glyphs
0 0.204 0.506 rg
/ GS0 gs
BT
/ F0 -1 Tf
14 0 0 14 20 17.8367 Tm
0 0 Td <00270052004600580050004800510057000300260052005100570048005B0057> Tj
ET
Q% - end Glyphs
% ...%
Q% - end Canvas
% ...%
q% - begin path
/ GS1 gs
0 0 0 rg
109.18 309.06 101.65 18.18 re
f
Q% - end Path
What do we see here? PostScript instructions "q - Q" are graphic contexts. They are nested in each other, and indents are obviously playing a role here (yes, surely all this is in the specification for the PDF format, but I don’t have time to study it deeply yet). How to embed markup for a reference block in Path mapper
<<
/ Type / Action
/ S / uri
/ Uri (http://stinkeye.org)
>>
I have not figured it out yet. The closest markup variant found in the specification (p. 635, example 9.14):
/ Link << / MCID 1 >>% Marked-content sequence 1 (link)
BDC% Begin marked-content sequence
0.7 w% Set line width
[] 0 d% Solid dash pattern
111.094 751.8587 m% to underline
174.486 751.8587 l% Draw underline
0.0 0.0 1.0 RG% Set stroking color to blue
S% Stroke underline
BT% Begin text object
14 0 0 14 111.094 753.976 Tm% Set text matrix
0.0 0.0 1.0 rg% Set nonstroking color to blue
(with a link) Tj% Show text of link
ET% End text object
EMC% End marked-content sequence
In this markup, I can’t understand what "
<< / MCID 1 >> " is. It is also not entirely clear how and where it will be correctly to place this block markup.
I would be very grateful for the help in the implementation of the fix fix. Thanks for attention!