Blogger :
Michael Freidgeims Blog
All posts :
All posts by Michael Freidgeims Blog
Category :
XML
Blogged date : 2008 Jul 19
I want to save html file generated by ASP.NET to PDF.
I was pointed to
itextsharp open source project.
I found a few links, discussing how to do it:
http://www.velocityreviews.com/forums/t72716-using-itextsharp-to-generate-pdf-from-aspnet.html
iTextSharp Tutorial Chapter 7: XML and (X)HTML
iTextSharp Demo(asp.net 2.0):http://rubypdf.com/itextsharp/tutorial01/ap07Chap0707.cs.html introduces HtmlParser.Parse.
We tried to use it.
HtmlParser.Parse does NOT throw any error , but the pdf file generated from this could be blank/empty.
Debug output shows the messages from parser, if Html file has invalid structure.
This is a big problem: HtmlParser.Parse is very strict and any minor mistakes in HTML causes exceptions or almost silent creation of empty PDF file.
The post of Creating pdf in .NET from html has a lot of interesting comments, including suggestion to use HTML Agility Pack.
We are going to try how HtmlParser.Parse will be tolerant to html, regenerated from HTML Agility Pack.
Another option is always use XML complient HTML, verified by http://validator.w3.org/#validate_by_input ,but it could take some time to tidy up the HTML generated from ASP.NET
http://www.google.com.au/search?source=ig&hl=en&rlz=&q=HtmlParser.Parse&meta=
Links to other products:
Generate PDF from ASP.NET gives a few references to different products including iTextSharp
Dynamically Generating PDFs in .NET : http://www.developerfusion.co.uk/show/6623/
Another option is to try (and possibly buy) commercial product abcpdf
I saw a suggestion to use http://www.htmldoc.org/ -the command line version of HTMLDoc to convert HTML to PDF, but it is not good for programmatic access.
