Our HtmlDecoder (New in 11.5) can render HTML files as images.
This has been a long asked for feature.
Dependencies
- You need to be using DotImage version 11.5.0.0 or newer
- You need a license for Atalasoft DotImage Document Imaging (DID2/DIDX2 serial)
- You need to reference Atalasoft.dotImage.CommonDecoders.dll
- You will need to ensure you add the Perceptive Filter Dlls (Shipped with DotImage) to your app bin folder (see below)
Perceptive Filters
The perceptive filters "have bitness" like the rest of DotImage.. so you need to pick which you're targeting (x86 or x64) and use the files from the appropriate source folder:
C:\Program Files (x86)\Atalasoft\DotImage 11.5\bin\PerceptiveDocumentFilters\intel-32
C:\Program Files (x86)\Atalasoft\DotImage 11.5\bin\PerceptiveDocumentFilters\intel-64
You can just copy the DLLs to your app bin folder or you can add to your project and set their build action to "Copy if newer"
So long as the 4 files end up in your final bin directory next to Atalasoft.dotImage.CommonDecoders.dll
Using the Decoder
Many long time DotImage users will probably assume the next step would be to add a new HtmlDecoder to RegisteredDecoders.Decoders collection in a static constructor.
The short answer is yes you can, but if you do, then you will find that HTML with img src tags that point to local files (such as <img src="foo.png">) will not render in the output HTML. However, those that reference web-reachable (by the program doing the decoding) images will (such as <img src="https://www.atalasupport.net/images/logo_atalasoft_from_tungsten.jpeg">)
So, if you know your HTML files will never need to render local / relative images, this may be OK for you.
If you do need to render images, see the section below named "Known Issue: img src local image files not rendered"
HtmlDecoder Properties
Defaults
The defaults of new HtmlDecoder are
- PageSize: Size = {0 , 0} (when set to 0,0,the decoder will end up outputting a US Letter size page at Resolution - lots to unpack see below)
- Resolution: 96
- RenderExternalImages: true
Explanations
RenderExternalImages
This setting tells the decoder whether it is allowed to make web requests so that it can render image tags that point to web-reachable images.
If this is set to true, then image tags that refer to fully qualified URLs will be parsed and the decoder will attempt to fetch the image to use in rendering. For this to work, the program running the decoder must be able to reach the URL in the image tag - it will work for HTTP or HTTPS requests, but the url must be reachable without further authentication
If this is set to false, then no external requests will be made and the default "image not found" rendering wil be used
PageSize
This setting tells the decoder what the target page size for the image render is...
IMPORTANT NOTE ABOUT PAGE SIZE:
Page Size value here is in points... meaning 1/72 of an inch. This may be unintuitive at first because if let 0,0 you will get a US letter page at whatever Resolution was set
for the default Resolution of 96, this would be Size = {Width = 816 Height = 1056}
Except if you plug new Size(816,1056) into PageSize and see Resolution 96, you might expect a output image with {Width = 816 Height = 1056} but you;ll actually get an output image that is Size = {Width = 1088 Height = 1408}
Agian this is because for the PageSize it's looking for POINT size of a virtual physical page you'd be looking to fit the object ont.
So,
PageSize = new Size( RealPhysicalPageWidth_in_Inches * 72, RealPhysicalPageHeight_in_Inches * 72);
- Us Letter (8.5x11) would be 612,792
- Us Letter Landscape (11x8.5) would be 792,612
- US Legal Portrait (8.5x14) would be 612, 1008)
NOTE that the Decoder will hard cut the page width, dropping content to the right.. if your horizontal content is getting cut off, make the width wider... Meanwhile the decoder will make a new page at the height boundary and make as many pages as needed
Resolution
This setting tells the decoder how to interpret the image.
The formula to calculate the output Image size is
ImageOutputSize = { Width = PageSize.Width / 72 * Resolution, Height= PageSize.Height / 72 * Resolution }
So, for simple sake, lets say PageSize is the default of 612 x 792 (US Letter):
- Default Resolution: (96) the image will be {Width = 816 Height = 1056}
- 200 DPI: the image will be {Width = 1700 Height = 2200 }
- 300 DPI: the image will be {Width = 2550 Height = 3300 }
Combining resolution and Page size and considering that the control will paginate vertically but not horizontally, the suggested settings are
HtmlDecoder htmlDecoder = new HtmlDecoder() { PageSize = new Size(1008,612), Resolution = 200 };
This will result in 2800 x 1700 pixel images which unless your content is super wide should fit well
Tweak the PageSize to widen or narrow for content, tweak the Resolution to make the output image have higher quality
Known Issue: img src local image files not rendered
However, HTML with <img src="local file here"> src attributes do not render.
The decoder has a RenderExternalImages property which if set to true will properly render reachable images (the decoder needs to make a web request and reach the url containing the image at the time of rendering) - set this to false to prevent any external requests at the cost of un-rendered images
However, if your HTML has img src tags pointing at local resolvable files, you may find that the images do not render properly if you use
AtalaImage img = new AtalaImage("htmlfile.htm");
or in an image source
or even
AtalaImage img = htmlDecoder.Read(stream ...)
This is because of a limitation with the Perceptive Filter used to render HTML.
The workaround is that it will render local image tags (where the path can find the image in question ) if and only if you use the HtmlDecoder.Read overload that takes the string filename as the first arg
AtalaImage img = htmlDecoder.Read("C:\\path-to\\example.htm", frameIndex, null);
HtmlImageSource
An ImageSource for HTML
Since this is not entirely intuitive and it's a common use case to want to render html as PDF pages, and the PdfEncoder requires a "one shot decoder" and works best with ImageSource, here is a HtmlImageSource that handles the issue fairly well
Attached to this article, find HtmlImageSouce.zip
It contains an HtmlImageSource class you can use
usage:
Make a PDF from an HTML file with 14"x8.5" pages
using (AtalasoftExamples.HtmlImageSource his = new AtalasoftExamples.HtmlImageSource("test.htm", new Size(1008,612), 200)
{
using (FileStream outStream = new FileStream("out.pdf", FileMode.Create))
{
PdfEncoder pdfEnc = new PdfEncoder(new Size(1088, 200));
pdfEnc.SizeMode = PdfPageSizeMode.FitToPage;
pdfEnc.Save(outStream, his, null);
}
}