PDF Generation Revisited

July 15, 2005 – 8:30 am

I gather from some of the comments and email that my earlier post about batch generation of PDFs from web pages wasn’t clear, so here’s my second try. The Software Carpentry course that I’m writing for the Python Software Foundation has about 35 web pages. I’d like to add something to my Makefile so that I can say make pdf and have each page turned into a PDF. (Currently, I do this by hand about once a week; it ain’t the ten minutes of tedium that bothers me, but what my students would think if they found out I’m doing a repetitive task by hand.)

So, option #1 is an HTML to PDF converter. There are several out there, but none of the open source ones I’ve been able to find will respect my stylesheets. Yes, I could investigate XSL-FO, but (a) I have 185 other issues to deal with, and (b) I’m morally opposed to duplicating style information.

Option #2: write a script to make Firefox do what I’ve been doing by hand, i.e., load, print, change printer to PDFCreator, “print”, click “OK” for the document title, specify where to save it, “OK”, repeat. A couple of people have said, “Oh, you can do that on Windows with the win32 module and COM,” but haven’t responded to my follow-up question, “Great—do you have some sample code I can tweak?” (Also, my build currently runs on both Windows and Linux, and I’d like to keep it that way if I can…)

Suggestions?

  1. 5 Responses to “PDF Generation Revisited”

  2. I’m going to go out on a limb, and say that you can’t do that with Firefox. As far as I could tell, Firefox doesn’t export a COM interface, nor does it include the XPCOM that Mozilla uses in place of COM.

    Having said that, I’m pretty sure that PDFCreator doesn’t run under Linux, so there’s no way of achieving your parenthesized requirement using the tools listed in option 2.

    If you install Mozilla, you should be able to use PyXPCOM to talk to the component described here: http ://lxr.mozilla.org/seamonkey/source/docshell/base/nsIContentViewerFile.idl#62 passing in true for the “silent” parameter. There seem to be some docs for how to do that here: http ://www-128.ibm.com/developerworks/webservices/library/co-pyxp2.html

    Or, change the print.always_print_silent preference to true in your copy of Firefox, (Found from here: http ://blog.andreashalter.ch/archives/13-silent-printing.html Read the warning there before you do that.) which should allow the javascript solution to print automatically.

    Finally, you say “Yes, I could investigate XSL-FO, but […snip…] ‘Great—do you have some sample code I can tweak?’” and it’s starting to sound to me like you are looking for the Internet to solve your problem. While I completely understand that you’re very busy, I suspect that most of the people who reply are also busy, and this isn’t their problem. Which is just a longer way of saying that I don’t think the Internet is going to be particularly useful for you this time, and I think you’ll have to buckle down and do much of the work yourself if you want to get this automated. I hope I’m wrong on this, or if I’m not, that the links I’ve provided turn out to be useful.

    Good luck,
    Blake.

    By Blake Winton on Jul 15, 2005

  3. I’ve found a potential alternate solution. There’s apparently a free service at http //www.pdfonline.com/convert_pdf.asp which you could post to from a Python program. I have no idea how their CSS support is, but it might be worth a try.

    Later,
    Blake.

    By Blake Winton on Jul 18, 2005

  4. And just because I can’t leave this alone…

    http //www.primopdf.com/ is a possible alternate print driver which might require fewer (by which I mean no) clicks to launch. It’s probably worth a look, and a possible email to them to ask if they have or are planning to have a Batch-Mode without the dialog.

    Later,
    Blake.

    By Blake Winton on Jul 27, 2005

  5. i like your blog, great!

    powerful pdf converters:

    http://www.pdf-to-html-word.com/pdf-to-html/

    By pdf converter on Aug 11, 2006

  6. great blog!

    there is another pdf converter:

    http://www.sharewarecheap.com/business-finance-word-processing/pdf-export-kit5160-35.htm

    By maggie on Aug 21, 2006

Post a Comment