Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for high bit depth multichannel images #1888

Open
wiredfool opened this issue May 5, 2016 · 44 comments · May be fixed by #8224
Open

Add support for high bit depth multichannel images #1888

wiredfool opened this issue May 5, 2016 · 44 comments · May be fixed by #8224

Comments

@wiredfool
Copy link
Member

wiredfool commented May 5, 2016

Pillow (and PIL) is currently able to open 8 bit per channel multi-channel images (such as RGB) but is able to open higher bit depth images (e.g. I16, I32, or Float32 images) if they are single channel (e.g., grayscale).

Previous References

This has been requested many times: #1828, #1885, #1839, #1602, and farther back.

Requirements

  • We should be able to support common GIS formats as well as high bit depth RGB(A) images.
  • At least 4 channels, but potentially more (see Add tests for opening 2-5 layer uint16 greyscale TIFFs #1839)
  • Different pixel formats, including I16, I32, and Float.
  • There should be definitions for the array interface to exchange images with numpy/scipy
  • There should be enough support to read and write TIFFs and raw image data.
  • Support for resize, crop, and convert operations at the very least.

Background Reference Info

The rough sequence for image loading is:

  • Image file is opened

  • Each of the ImagePlugin _accept functions have a chance to look at the first few bytes to determine if they should attempt to open the file

  • The *ImagePlugin._open method is called giving the image plugin a chance to read more of the image and determine if it still wants to consider it a valid image of it's particular type. If it does, it passes back a tile definition which includes a decoder and an image size.

  • If there is a successful _open call, at some point later *ImagePlugin._load may be called on the image, which runs the decoder producing a set of bytes in a raw mode. This is where things like compression are handled, but the output of the decoder is not necessarily what we're storing in our internal structures.

  • The image is unpacked (Unpack.c) from the raw mode (e.g. I16;BS) into a storage (Storage.c) mode (I).

  • It's now possible to operate on the image (e.g. crop, pixel access, etc)

    There are 3 (or 4) image data pointers, as defined in Imaging.h:

struct ImagingMemoryInstance {

    /* Format */
    char mode[IMAGING_MODE_LENGTH]; /* Band names ("1", "L", "P", "RGB", "RGBA", "CMYK", "YCbCr", "BGR;xy") */
    int type;       /* Data type (IMAGING_TYPE_*) */
    int depth;      /* Depth (ignored in this version) */
    int bands;      /* Number of bands (1, 2, 3, or 4) */
    int xsize;      /* Image dimension. */
    int ysize;

    /* Colour palette (for "P" images only) */
    ImagingPalette palette;

    /* Data pointers */
    UINT8 **image8; /* Set for 8-bit images (pixelsize=1). */
    INT32 **image32;    /* Set for 32-bit images (pixelsize=4). */

    /* Internals */
    char **image;   /* Actual raster data. */
    char *block;    /* Set if data is allocated in a single block. */

    int pixelsize;  /* Size of a pixel, in bytes (1, 2 or 4) */
    int linesize;   /* Size of a line, in bytes (xsize * pixelsize) */

    /* Virtual methods */
    void (*destroy)(Imaging im);
};

The only one that is guaranteed to be set is **image, which is an array of pointers to row data.

Changes Required

  • Definitions for all of the modes that we're planning, and potentially a [format];MB[#bands] style generic mode.

Core Imaging Structure

  • The imaging structure has the fields required to add the additional channels. (type, bands, pixelsize, linesize)
  • The **image pointer can be used for any width of pixel.
  • We may or may not want to set the **image32 pointer.
  • Currently type of IMAGING_TYPE_INT32 and IMAGING_TYPE_FLOAT32 imply 1 band. This will change.
  • Consider promoting int16 to IMAGING_TYPE_INT16

Storage

  • Updates to Storage.c, Unpack.c, Pack.c, Access.c, PyAccess.py, and Convert.c

Ways to Help

We need a better definition of the format requirements. What are the various types of images that are used in GIS, Medical, or other fields that we'd want to interpret? We need small, redistributable versions of images that we can test against.

[in progress]

@terramars
Copy link

I'm having the same problem with 16 bit single-channel paletted TIFFs, created by GDAL. It would be "really" nice if Pillow could play nicely with GIS and scientific image formats, as GDAL is a pain in the ass and I'd rather not use it.

tiffinfo as follows:

TIFFReadDirectory: Warning, Unknown field with tag 33550 (0x830e) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 33922 (0x8482) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 34735 (0x87af) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 34737 (0x87b1) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 42113 (0xa481) encountered.
TIFF Directory at offset 0x34293c6 (54694854)
Image Width: 10774 Image Length: 12577
Bits/Sample: 16
Sample Format: unsigned integer
Compression Scheme: LZW
Photometric Interpretation: palette color (RGB from colormap)
Samples/Pixel: 1
Rows/Strip: 1
Planar Configuration: single image plane
Color Map: (present)
Tag 33550: 4.999617,4.999789,0.000000
Tag 33922: 0.000000,0.000000,0.000000,679006.067110,9955209.915048,0.000000
Tag 34735: 1,1,0,7,1024,0,1,1,1025,0,1,1,1026,34737,22,0,2049,34737,7,22,2054,0,1,9102,3072,0,1,32736,3076,0,1,9001
Tag 34737: WGS 84 / UTM zone 36S|WGS 84|
Tag 42113: 0
Predictor: horizontal differencing 2 (0x2)

@bodokaiser
Copy link

Any updates on this?

@wiredfool
Copy link
Member Author

Unfortunately, no.

@vfdev-5
Copy link

vfdev-5 commented Feb 20, 2018

@wiredfool what do you think about to add the support of multichannel images as sequence of Image ? For example, 4 channels image with uint16 is represented (more less equivalently) by
['<PIL.Image.Image image mode=I;16 size=... >', '<PIL.Image.Image image mode=I;16 size=...>', ..., '<PIL.Image.Image image mode=I;16 size=...>']. I mean by that, maybe, to provide a class inheriting from Image and tuple and override all method to work on a tuple of images... Sure that it looks like a hack, however it could unlock more features (and create issues :) ) at least while working with Image.fromarray.

@wiredfool
Copy link
Member Author

To do anything useful with it, we'd have to have support in the C layer, so it would have to be at the core imaging layer, and especially Unpack/Pack.

@vfdev-5
Copy link

vfdev-5 commented Feb 21, 2018

@wiredfool following your "Ways to help",

We need a better definition of the format requirements. What are the various types of images that are used in GIS, Medical, or other fields that we'd want to interpret?

For GIS, as there is a huge amount of different formats (for example, gdal format list), this can be left for GIS libraries as gdal, rasterio etc.
However, a support of Image.fromarray on input multi-channel (3,4,5,...) arrays of dtype np.uint16, np.float32 would be, imho, essential.

We need small, redistributable versions of images that we can test against.

For GIS imagery, this can be easily created manually with gdal, rasterio.

I would like to give a hand on this, so, feel free to ask me.

@edowson
Copy link

edowson commented Jun 7, 2018

PIL cannot handle processing multi-channel images. They get truncated to 3-ch images if you perform any transformation using PIL. #3160

@akinuri
Copy link

akinuri commented Jun 8, 2018

@bthorsted
Copy link

What is the status of this issue? It has been almost three years since the first proposal. I am unfortunately unable to provide any help since I have zero experience with coding in C, but I am among the people that is awaiting support for e.g. multi-channel floating-point images (with possibilities for negative pixel values). This especially useful in deep learning, where it is preferable to have all values normalized with zero mean. PIL has some really awesome ImageOps, which is one of the reasons for wanting this support.

@hugovk
Copy link
Member

hugovk commented Feb 17, 2019

@bjtho08 No updates.


#2485 links to a multipage RGB TIFF containing float64 values.

@omaghsoudi
Copy link

omaghsoudi commented Jul 5, 2019

Please fix the issue with multi-channel 16 bit images.
Thank you!

@aclark4life
Copy link
Member

@cgohlke Does any of your code here potentially help us by way of example to implement high bit depth multichannel in Pillow? https://github.com/cgohlke/tifffile/blob/master/tifffile/_imagecodecs.py

Thanks for any info

@aclark4life
Copy link
Member

Via @wiredfool , thanks!

  • I think that there's a good argument for planar image storage, i.e. r/g/b in separate arrays. Any single band calculation would just work, and the more complicated modes (e.g., channels with different bit depth) would be trivial to add, as they would essentially just be part of a list of planes.It would complicate the shufflers, and especially those image formats that currently just splat into an array without using the packer/unpacker. It's also less useful for luminance style calculations, though it's possible. There's definitely a tension in image formats on the interleaved vs planar approach, and I suspect it comes down to "one is easier for basic images, and one is more general.

  • I think there's a super strong argument for being able to have our storage be directly compatible with the arrow memory layout. I'm unclear if we could have arbitrary structs there, if we'd just want a linear array of one datatype, or if we'd want to do a tensor layout, or what the mechanics are for a dataframe style interop. Arrow + the evolution of the array interface would give us 0 copy interaction with polars/pandas2 and anything else in the new data space.

  • I think that interleaved storage with anything more than 1|3|4 channel x [list of pixel storage modes] is going to be a pain.

  • GIS is going to be a pain. I'd still recommend using gdal backed (e.g. rasterio) readers/writers for that, as we've got 0 support for pyramids, spatial metadata, and tiled tiffs. It's a huge field, and we're not even at square 1 for it.

So looking at that, I think there's two definite possibilities for progress.

  1. Planar Image Storage, in parallel with the current interleaved image storage. There's probably a couple of core bits here that would need to be in C, but most could probably be done at the Image.py layer.
  2. Arrow as a core storage interface. This is going to be all c, with a very small shim for the dataframe interface.

@aclark4life
Copy link
Member

Also possibly of interest: https://github.com/girder/large_image

@wiredfool
Copy link
Member Author

FWIW, some references on Arrow.

@aclark4life
Copy link
Member

aclark4life commented May 29, 2024

Can anyone suggest some test data we can use to develop this feature? This event is happening tomorrow and would be nice to have a success target in mind e.g. "If we can read/write this type of data …" https://www.meetup.com/dcpython/events/301086016/

@rbavery
Copy link

rbavery commented May 30, 2024

I think that interleaved storage with anything more than 1|3|4 channel x [list of pixel storage modes] is going to be a pain.

In case it isn't too much pain to work with more than 4 bands, we host this example subset of Eurosat, here is an example image s3://wherobots-examples/data/eurosat_small/Highway/Highway_1.tif.

Each image is 13 bands, uint16, planar

>>> tiff_image = tifffile.TiffFile("Highway_1.tif")
>>> print(tiff_image.pages[0].tags['PlanarConfiguration'].value)
PLANARCONFIG.CONTIG

@aclark4life
Copy link
Member

@wiredfool If we use Arrow that implies adding a dependency on pyarrow, ideally optionally via extras like pip install pillow[arrow], correct?

@wiredfool
Copy link
Member Author

@aclark4life Maybe. There's definitely a C-only implementation (nanoarrow) that might be what we want, since all of our image allocations are in the C layer now. PyArrow might be easier for integration/interop at the high level, but my sense here is that it wouldn't necessarily be giving us a whole lot that we'd not already have with a C arrow implementation + our usual set of accessors.

@aclark4life
Copy link
Member

aclark4life commented Jun 20, 2024

Folks interested in this issue, please test #8224 and give feedback, thanks all

@MyleneSimon
Copy link

Hi, any updates on a potential merge of #8224? If I understand properly it would add support for reading pyramidal tiffs?
Our use case is that we deal with scientific images that are stored as pyramidal tiled tiff, and a lot of ML libraries and some image annotation platforms such as CVAT use Pillow as a reader or expect Pillow images as an input, and so we find ourselves having to "hack" some of the pipelines/code bases to support our images. Pyramidal tiff support would be really helpful.

@aclark4life
Copy link
Member

@MyleneSimon No, but I haven't given up! May need to move ETA for more progress to Q1 2025. What we really need is some code reviews and opinions about which direction to go in… thanks for the interest.

@bigcat88
Copy link
Contributor

Can the community somehow help with the work on this feature?

It is possible to divide what is possible into minimal parts and start implementing and testing them, at least those that can be divided, if any.

@wiredfool
Copy link
Member Author

@MyleneSimon I don't think that the #8224 PR would add support for pyramids, that's an orthogonal issue to the high bit depth problem. It may be possible to read individual images in a pyramid now (assuming that they're a format compatible with our current platform), but I highly doubt that there's a way to save them. It would depend on being able to select the subimage, which may or may not be possible.

@wiredfool
Copy link
Member Author

wiredfool commented Jan 24, 2025

From some further thoughts along the lines of #1888 (comment) and the work that I've been doing with the Arrow support. I'm pretty sure that the best/easiest way forward is to use planar image storage for complicated images.

  • Planar image storage is where the individual channels are stored separately, so that all the channels pixels are in contiguous blocks.
  • Interleaved image storage is what we have now, where individual channels in a pixel are adjacent.

(There's also tiled images, which are very useful for large tiffs, where each nxn block is stored in a chunk. Ideally what we're doing here would support extensions in that direction, but it's not a primary goal)

Planar image storage would get us:

  • Arbitrary number of image channels in unint8, uint16, int32 and F32
  • No storage issues mixing arbitrary channel formats, e.g. RGBxuint16 + 1 bit alpha channel, or CMYKA
  • No complicated dynamic C to support arbitrary image depth combinations, and no requirement to enumerate them all beforehand. (at least for storage)
  • Subsampling for individual channels should be easy to support, at least at the storage level.

The complexity would be any place where we have multi-channel ops that can't be decomposed into the same op on all channels independently. Shuffle, Un/pack, Luminance/Color Space Conversion are the ones that jump out at me.

As a general guide to how I think this could be done, I see three phases:

  1. Python level handling for assembling planar images out of arbitrary single channel core images.

    • Support for all core functions that are channel independent. e.g., everything like resize and kernel, but not color conversion.
    • Support for zerocopy dataframe version of fromarrow / __arrow_c_array__ for existing planar arrays corresponding to channels.
    • Potentially non-zerocopy of interleaved -> planar storage.
    • Channel labeling at the python image object level.
  2. Support for shuffle/unpack for appropriate file load/save

    • tiff
    • png
    • webp/jpeg2k/...
  3. Support for multichannel ops,

    • grayscale conversion
    • color management
    • img.paste

I'm breaking this out this way because I think 2 and 3 will be incremental, multi part things, but part 1 can be done in one chunk and be complete enough to be able to test on its own, where part 2 requires the basic infra to already be available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment