TIFF as output format

Discussion about all of the open source Xpdf tools
Posts: 10
Joined: Thu Sep 24, 2020 9:53 am

TIFF as output format

Post by ledaniel2 »

I've adapted xpdf version 4.02 to allow generation of multiple TIFF files (one per page) in the style of pdftoppm/pdftopng. The utility is called, perhaps unsurprisingly enough, pdftotiff and is developed straight from pdftoppm.

If you're interested in using/testing please find attached a patch which adds the source file pdftotiff.cc and modifies the build files as necessary. The prerequisites are the file "tiffio.h" and "tiff.h" as well as "libtiff.so"/"libtiff.a"; the build system tries to locate this development package in the same way as for libpng. I used the version provided by "libtiff5-dev" under Debian Stable which is "5.5.0" or #define TIFFLIB_VERSION 20191103, other source or binary versions may work too.

All of the output formats of pdftoppm are supported, however to get separated (cmyk) the flag SPLASH_CMYK needs to be enabled when building the library and the utility (as for pdftoppm). The make target is called "pdftotiff" and is build by "all" if libtiff is found by the build system.

There is no manpage at present although if there were sufficient interest I could look into writing one. Also, alpha channels are not (yet) supported; I'm not very experienced with libtiff and I'm guessing that the alpha channel would have to be interleaved with the image data after the call to displayPage() and before the call to TIFFWriteEncodedStrip().


Code: Select all

mkdir xpdftiff && cd xpdftiff
tar zxf ../path/to/xpdf-4.02.tar.gz
cp -r xpdf-4.02 xpdf-4.02.orig
ln -s xpdf-4.02 a
ln -s xpdf-4.02.orig b
unzip ../path/to/pdftotiff-4.02-patch.zip 
patch -p0 < pdftotiff-xpdf-4.02.patch 
cd xpdf-4.02 && mkdir build && cd build
make pdftotiff
Contents: pdftotiff-xpdf-4.02.patch (11059 bytes)
(3.68 KiB) Downloaded 2981 times
Posts: 1040
Joined: Wed Apr 05, 2017 6:57 pm

Re: TIFF as output format

Post by derekn »

I decided a while back not to do any more of the image-format-specific tools. I'm a big fan of the Unix Philosophy. In this case, I maintain pdftoppm, and then anyone can use the netpbm tools to convert to whatever output format they like. Otherwise we end up with pdftojpeg, pdftogif, etc etc. (In addition to pdftopng, which is already included, breaking my own rule.)

Having said that, Xpdf is open source, and you're more than welcome to build tools that are useful to you.

Regarding transparency, there are two issues:

(1) PDF specifies a final step in rendering a page: composite the contents onto an opaque background (the "paper"). That means the final output of rasterizing a page won't have any transparency. Xpdf has a hook to allow skipping that step: look at pdftopng.cc, and search for "setNoComposite". The results will depend on the PDF content. For example, some PDF files draw a filled opaque white rectangle behind text columns, so the output doesn't have any useful transparency info.

(2) As you pointed out, you'll need to feed the alpha data to libtiff. I've worked with libtiff, but I haven't looked into its transparency support, so I'm afraid I don't have any information for you there. But the writePNGData function in pdftopng.cc includes code to interleave alpha -- you may be able to use something similar with libtiff.
Posts: 10
Joined: Thu Sep 24, 2020 9:53 am

Re: TIFF as output format

Post by ledaniel2 »

Thanks for the detailed reply and pointers. I will definitely look into adding alpha support following the method used in pdftopng.

I appreciate what you say about the Unix philosophy and understand that you don't want to over-complicate the toolset. I may look into hosting this patch elsewhere, however the functionality of separated CMYK with transparency is not currently supported by the toolset I believe?

(Realised after I'd posted that this was the wrong forum, Admin: please could you move this thread?)
Posts: 10
Joined: Thu Sep 24, 2020 9:53 am

Re: TIFF as output format

Post by ledaniel2 »

Okay, I've looked into this and have partial success: I can write a TIFF with transparency in grayscale and RGB modes, but with CMYK the alpha channel is not recognized/interpreted by my image viewer or the GIMP.

I've Googled around and this seems to be a limitation of libtiff rather than the file format, none of what I've looked at indicates a solution with any of the available tools/libraries.

Please could someone look at the code and confirm if it a limitation of libtiff rather than a coding error of mine? Many thanks.
Contents: pdftotiff-xpdf-4.02.patch (12498 bytes)
(3.96 KiB) Downloaded 3296 times
Posts: 1040
Joined: Wed Apr 05, 2017 6:57 pm

Re: TIFF as output format

Post by derekn »

ledaniel2 wrote: Tue Sep 29, 2020 9:17 am the functionality of separated CMYK with transparency is not currently supported by the toolset I believe?
That's true -- CMYK (without alpha) is supported in pdftoppm, and RGB with alpha is supported in pdftopng, but there's no option to generate CMYK + alpha.

Just out of curiosity, what is your use case for this?

I wonder if it might be possible to work with two separate image files: one with the CMYK data, and another with the alpha data.
ledaniel2 wrote: Wed Sep 30, 2020 11:35 am Okay, I've looked into this and have partial success: I can write a TIFF with transparency in grayscale and RGB modes, but with CMYK the alpha channel is not recognized/interpreted by my image viewer or the GIMP.

I've Googled around and this seems to be a limitation of libtiff rather than the file format, none of what I've looked at indicates a solution with any of the available tools/libraries.

Please could someone look at the code and confirm if it a limitation of libtiff rather than a coding error of mine? Many thanks.
I'm not familiar enough with libtiff to be able to answer that one. Maybe someone else will take a look.
ledaniel2 wrote: Tue Sep 29, 2020 9:17 am (Realised after I'd posted that this was the wrong forum, Admin: please could you move this thread?)
Posts: 10
Joined: Thu Sep 24, 2020 9:53 am

Re: TIFF as output format

Post by ledaniel2 »

Thanks again for your reply.
what is your use case for this?
I do have some work experience in print/finishing but this was just a personal idea - I don't have an angry or impatient customer waiting! I suppose it is two things: symmetry between output color formats, and not having to reduce embedded CMYK images to RGB if background transparency was required.
I wonder if it might be possible to work with two separate image files: one with the CMYK data, and another with the alpha data.
Interesting. I note that the library provides the hook writeAlphaPGMFile which appears to output the mask (only). I may add this as an "-alpha" (only) option to pdftoppm.cc for my purposes.

Of course any image editor worth its salt would allow loading/editing of the mask separately from the image data.

I am now satisfied that the mask is being written correctly by libtiff; as an experiment I used the alpha channel from a grayscale render instead but it still was not recognized when loading. With option "-no-lzw" the files are about 25% bigger, which was what I was expecting. Therefore it must be a limitation of the GIMP, which surprises me.

I may post a CMYKA TIFF from my pdftotiff either here and/or to the gimp-dev mailing list and see what others make of it.
Posts: 10
Joined: Thu Sep 24, 2020 9:53 am

Re: TIFF as output format

Post by ledaniel2 »

So, I gave up on the GIMP as it seems to not support CMYK separated images at all and strips the alpha channel when converting to RGB.

However Krita (under Windows in fact) DOES load and display both images attached here "correctly". I say correctly because they are visibly different; this proves my point (and much of the motivation for the patch) that a (gamma corrected) CMYK is different from the source/sibling RGB, and repeatedly converting from one to the other loses color information. Both versions were created from a PDF saved by Inkscape and converted with pdftotiff using identical options, one with "-cmyk" and one without.

So maybe I have misunderstood that CMYK is meant to be a non-editable output/device format only rather like PostScript? If anyone following this thread has other photo-editing software could you compare these two files when loaded and visible?
Three-channel RGB plus alpha 738x738px 150dpi no anti-alias
Three-channel RGB plus alpha 738x738px 150dpi no anti-alias
image-rgba.tif (43.83 KiB) Viewed 43545 times
Four-channel CMYK plus alpha 738x738px 150dpi no anti-alias
Four-channel CMYK plus alpha 738x738px 150dpi no anti-alias
image-cmyka.tif (43.86 KiB) Viewed 43545 times
Posts: 1040
Joined: Wed Apr 05, 2017 6:57 pm

Re: TIFF as output format

Post by derekn »

This isn't a direct answer, because I'm not sure I have any software that can handle CMYK+alpha TIFF files. I just want to point out that CMYK is, in general, more sensitive to color management than RGB. The open source version of Xpdf does not include color management (but our commercially licensed tools do). For example, if your source PDF file includes RGB content, the conversion from RGB to CMYK uses a very simple transform.

Additionally, any software that displays CMYK is implicitly converting it back to RGB, and that also requires color management. You can't fix the bad RGB-to-CMYK conversion at that stage, but the profile being used will affect the RGB output.

The summary is: I'm not at all surprised that CMYK and RGB output looks different.
Posts: 10
Joined: Thu Sep 24, 2020 9:53 am

Re: TIFF as output format

Post by ledaniel2 »


To build pdftotiff with working CMYK/alpha support download patch pdftotiff-4.02-patch2.zip and use "make pdftotiff" after patching and configuring xpdf version 4.02.

A prebuilt version "pdftotiff.exe" cross-compiled from Linux with i586-mingw32msvc-g++ is here if you can't/don't want to compile from source. Download and use at own risk (tested under Debian/Wine32 and Windows 7 64bit), includes two necessary .DLL's and was uploaded because pdftotiff is unlikely to become part of the xpdf toolset.

Any issues, please post here.
Posts: 10
Joined: Thu Sep 24, 2020 9:53 am

Re: TIFF as output format

Post by ledaniel2 »

Hi, quick update to patch version 3, which is likely to be final. Now will compile correctly without SPLASH_CMYK and also outputs TIFF with alpha channel line-by-line, saving memory. A patch to pdftoppm.cc adds -alpha option to save alpha channel (only) as a PGM file.
Contents: pdftotiff-xpdf-4.02.patch (15279 bytes)
(4.54 KiB) Downloaded 2935 times
Post Reply