Tuesday, 15 February 2011

Sitting on the .DOC of the bay....

Seems I have a lot of files in various word processor formats (doc, docx, wp, etc) which I would like to turn into one unified format, and since editing will not be required in the future, I choose PDF.
Now it would be perfectly possible to 'print-to-PDF' using Mac OS X, and for a small number of files that would be fine. Alternatively I could use OpenOffice's PDF export function. However I got to wondering if there was a solution that could be used by people who have no access to any PDF functionality in their operating system.

Turns out there are plenty of "PDF servers" available, which would make sense for larger organisations. However it is also possible to use OpenOffice from the command line to do the same thing. Here's how:

  • I already have a version of OS X Snow Leopard, which will give me the Terminal access I need. Java is also installed and up-to-date, as is the latest version of OpenOffice
  • First we start OpenOffice in "headless" mode using Terminal:
iMac:~ simon$ /Applications/OpenOffice.org.app/Contents/MacOS/soffice.bin -headless -nofirststartwizard -accept="socket,host=localhost,port=8100;urp;StarOffice.Service" &

Don't forget the ampersand at the end - it returns control of the terminal to the user after the service has started.
  • Use grep and netstat to check it is running
iMac:~ simon$ ps aux | grep soffice
simon 1810 42.4 1.2 515704 49620 s000 S 1:14pm 2:27.21 /Applications/OpenOffice.org.app/Contents/MacOS/soffice.bin -headless -nofirststartwizard -accept=socket,host=localhost,port=8100;urp;StarOffice.Service simon 1789 0.0 1.1 514160 47400 ?? S 1:13pm 0:01.07 /Applications/OpenOffice.org.app/Contents/MacOS/soffice.bin -headless -nofirststartwizard -
accept=socket,host=localhost,port=8100;urp;StarOffice.Service simon 45573 0.0 0.0 252104 88 s000 R 1:20pm 0:00.00 (soffice.bin) simon 45571 0.0 0.0 2435116 524 s000 S+ 1:20pm 0:00.00 grep soffice

iMac:~ simon$ netstat -an | grep 8100
tcp4 0 0 *.* LISTEN
  • Now it is necessary to find something that will allow us to interact with the service. I found JODConverter, which can be downloaded from http://www.artofsolving.com/opensource/jodconverter. A version is available which includes a bundled version of Apache Tomcat server.
  • I had to change the default port from 8080 to 1234, as I had something already running on 8080, but there are clear instructions within the package on how to do this.
  • Now we simply run the startup.sh script in the /bin directory:
iMac:bin simon$ ./startup.sh
Using CATALINA_BASE: /Users/simon/Downloads/jodconverter-tomcat-2.2.2
Using CATALINA_HOME: /Users/simon/Downloads/jodconverter-tomcat-2.2.2
Using CATALINA_TMPDIR: /Users/simon/Downloads/jodconverter-tomcat-2.2.2/temp
Using JRE_HOME: /System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home
  • Now we can find the web application at http://localhost:1234/
  • Simple - a web based PDF conversion that anyone can use! For the more terminally-minded amongst us, there is also a JODConverter Java application which can convert a whole load of files at once, in one go.