Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebP: Some images are throwing an ArrayIndexOutOfBoundsException when reading #739

Open
wladimirleite opened this issue Mar 16, 2023 · 18 comments

Comments

@wladimirleite
Copy link
Contributor

wladimirleite commented Mar 16, 2023

Describe the bug
I found a few WebP images that seem valid that are throwing an ArrayIndexOutOfBoundsException when reading.

Version information

  1. The version of the TwelveMonkeys ImageIO library in use:
    3.9.4. I also tested with the master branch.

  2. The exact output of java --version:
    openjdk version "11.0.13" 2021-10-19 LTS
    OpenJDK Runtime Environment (build 11.0.13+8-LTS)
    OpenJDK 64-Bit Server VM (build 11.0.13+8-LTS, mixed mode, sharing)

  3. Extra information about OS version, server version, standalone program or web application packaging, executable wrapper, etc.
    I tested in Windows 11.

To Reproduce
Run a simple standalone program (below), using the WebP ImageIO plugin.

Expected behavior
As the sample images I found seem to be valid (i.e. they are read without any trouble by ImageMagick, Google Chrome etc.), they should be read.

Example code

import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

import javax.imageio.ImageIO;

public class WebPReadTest {
    public static void main(String[] args) throws IOException {
        File input = new File("1-oob.webp"); // Input image
        File output = new File("1.png"); // Output image

        BufferedImage image = ImageIO.read(input);
        ImageIO.write(image, "png", output);
    }
}

Sample files
sample-webp-images.zip

Stack trace
Running with the test program using 3.9.4, I get the following exception:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Coordinate out of bounds!
	at java.desktop/sun.awt.image.ByteInterleavedRaster.setDataElements(ByteInterleavedRaster.java:541)
	at com.twelvemonkeys.imageio.plugins.webp.lossless.VP8LDecoder.decodeBwRef(VP8LDecoder.java:307)
	at com.twelvemonkeys.imageio.plugins.webp.lossless.VP8LDecoder.decodeImage(VP8LDecoder.java:223)
	at com.twelvemonkeys.imageio.plugins.webp.lossless.VP8LDecoder.readVP8Lossless(VP8LDecoder.java:131)
	at com.twelvemonkeys.imageio.plugins.webp.WebPImageReader.readVP8Lossless(WebPImageReader.java:644)
	at com.twelvemonkeys.imageio.plugins.webp.WebPImageReader.readVP8Lossless(WebPImageReader.java:638)
	at com.twelvemonkeys.imageio.plugins.webp.WebPImageReader.read(WebPImageReader.java:429)
	at java.desktop/javax.imageio.ImageIO.read(ImageIO.java:1468)
	at java.desktop/javax.imageio.ImageIO.read(ImageIO.java:1315)
	at WebPReadTest.main(WebPReadTest.java:12)

Additional context
This seems to be a very rare issue, as it affects about 1/5000 of the test images I am using.

@wladimirleite
Copy link
Contributor Author

This seems to be a simple issue, and I already found a possible way to fix it.
I will try to find more images that cause the same exception and will submit a PR later.

@wladimirleite
Copy link
Contributor Author

wladimirleite commented Mar 16, 2023

After finding more images that cause the same (or a similar) exception, I realized that the issue is not as simple as I thought.
I will investigate it in more detail before proposing a fix.

@haraldk
Copy link
Owner

haraldk commented Mar 18, 2023

Thanks for looking into this! Feel free to share what you have found so far, maybe I can help solving the case! 😀

@Laserwolve
Copy link

I've got an 1024x1024 WEBP. When resized to 1019x1019 or below, I don't get this error, but when resized to 1020x1020 or above, I do. Both are included.

Due to the verbiage of AWT's setDataElements method I assumed it was purely image size related, however the included flower image resized from Google's WebP Image Gallery is 1024x1024 and causes no problems.

images.zip

I'm in way over my head but I'll continue investigating

@wladimirleite
Copy link
Contributor Author

Other 3 WEBP files that cause the issue.
more-webp-samples.zip

@Flyfish233
Copy link

Flyfish233 commented Dec 13, 2023

It only affects small webp images for my project, since my project uses several 1KB~10MB files, but ArrayIndexOutOfBoundsException mostly happens when reading less than ~700KB images, but I also find 2 large files. Found 878 errors on a nearly 1200 file project.
Hope it works, archive contains 25 small files that cause this problem.

error-samples.zip

large-error-sample.zip

@haraldk
Copy link
Owner

haraldk commented Dec 13, 2023

Thanks @Flyfish233 and @wladimirleite for samples! I'll investigate. 😀

@FriedrichF
Copy link

Something new on this issue?
I also encountered this for different files.

@haraldk
Copy link
Owner

haraldk commented Mar 12, 2024

@FriedrichF Sorry, no. I did some testing and can reproduce the problem. I have some ideas what this might be, wasn't able to fix the problem back then. I'm pretty busy these days, and my current funding is about 30min/month... 😛

So, if anyone can spend some quality time on this, that would be much appreciated!

@rabanti-github
Copy link

rabanti-github commented Mar 25, 2024

Hello,
I don't know whether this is somehow helpful, but I tried to track down the ArrayIndexOutOfBoundsException, using the sample images.

There seems to be two issues:

  1. decodeBwRef (Vp8LDecoder.java) has side effects on the parameters y and ySrc. Both can become bigger than the zero-based index of the image height (e.g. 512 instead of 511)
  2. decodeImage (Vp8LDecoder.java) has another side effect on y in line 229 --> y = y + ((x + length) / width); This causes an exception on decodeLiteral(raster, colorCache, curCodeGroup, rgba, y, x, code); because y is beyond the zero based index of the image height (e.g. 516 instead of 511)
    The second case only occurs after the first case is fixed by skipping or setting the index back to length -1, e.g.:
            if (y >= raster.getHeight()) {
                y = raster.getHeight() - 1;
            }
            if (ySrc >=raster.getHeight() ) {
                ySrc = raster.getHeight() - 1;
            }
                raster.getDataElements(xSrc++, ySrc, rgba);
                raster.setDataElements(x, y, rgba);

When trying to fix the second case by something before decodeLiteral(…), like:

                   if (y >= huffmanInfo.huffmanMetaCodes.getHeight()){
                       break;
                   }

… the function terminates without an exception.

However, doing these fixes, the resulting image has broken alpha. The alpha area is black instead of transparent and there are also blocky artefacts around the non-transparent pixels (see attachment). Other approaches to fix the indices lead to similar results, where the alpha area was scrambled and shifted into other areas of the image.

My current conclusion is that manipulating the passed y parameter in decodeBwRef is the main reason for the exceptions. The second reason is again the manipulation of y in the for loop of decodeImage.
This all seems to apply only if alpha is involved and if the image has a height of 512 or bigger. And it happens a lot, because the pipeline I use to automatically decode WebP images that where previously encoded by ImageIO, fails regularly.
Unfortunately, I cannot do a lot more for now, since I don't have the expertise on that format (there is a reason why this format was developed by probably top engineers of Google) and debugging is quite hard because of the convoluted logic.
test2-fail

@haraldk
Copy link
Owner

haraldk commented Mar 26, 2024

Thanks for looking into this!

This is pretty much where I'm at too, at the moment. I can make the exceptions go away (or have them pop up in other places), but it doesn't really fix the problem. We need to understand why the coordinates were manipulated in this way in the first place, to get the desired output...

Unfortunately, I cannot do a lot more for now, since I don't have the expertise on that format (there is a reason why this format was developed by probably top engineers of Google) and debugging is quite hard because of the convoluted logic.

Fully understandable. And absolutely agree. 😕

@rabanti-github
Copy link

rabanti-github commented Mar 26, 2024

I have some other ideas. Maybe I can find time to check the one or other:

  • One could reverse engineer the encoding process of the images (png --> webp). Since this process never seems to fail, maybe there is a hint, how to handle the last y-block correctly. However, maybe there is already a problem on encoding, because the issue 864 mentions a problem in the lower, mostly right area of some images (was also reported to me by users of our organization). Maybe other decoders are more resilient on unexpected block data
  • Maybe there is a reference implementation in C# or C++ (e.g. ImageMagick) that could give a hint about decoding of alpha
  • Most likely just stopping on out-of-bound indices (like tried) is not sufficient. Maybe the huffmanYSize is to split up into a final chunck when the full block size leads to out of bound. However, if not already handled in some way, this would probably fail a lot more, also on smaller images. But the image heights (e.g. 512) are a little bit suspicious regarding a mathematical mismatch
  • Maybe there is also an issue about a sub-raster that is not handled correctly. What comes into my mind is that I have seen images where the alpha and transparency values were modulated and not exactly overlapping (alpha raster with 100% value was bigger than the the RGB sub-raster with visible pixels)

These are just some thoughts

@rabanti-github
Copy link

Here is a small update.
I tried to read myself into the webp specs, and had a thorough look into the file header section.
What can be ruled out is the ALPH chunk. After some analysis, I though that the error comes from filtering in the alpha definition. I suspected color gradient filtering flailing on border values of an image.
However, looking at more failing samples showed that both formats "VP8X" (advanced) with "ALPH" definition and "VP8L" (lossless) without explicit "ALPH" definition are affected.
I try to look into the VP8L samples next, especially the alpha processing.

@rabanti-github
Copy link

I could check some further things.
It may be that the problem comes from the class HuffmanTable.java, since this is the foundation of lz77decode, what is used in decodeBwRef.
After some further searches, I stumbled across the following article: How to Detect BLASTPASS Inside a WebP File. It describes a bug about an overflow issue that sounds terribly similar to our problem here.
There is also the C implementation for Huffman trees, used in the WebP library linked and described how to detect the issue.
However, until now, I was not able to adapt the described check into HuffmanTable.java, since the C approach seems to be quite different from the Java approach in HuffmanTable.java.
Maybe somebody has better knowledge about that.

@haraldk
Copy link
Owner

haraldk commented Apr 11, 2024

While I don't think the errors we see are related to the CVE, and Java's memory handling should effectively guard against such attacks in our code, it's still an interesting read.

And yes, the problem might be in the Huffman decoding class, so it's worth debugging and perhaps looking for differences from the C implementation.

@dannyxu2015
Copy link

@wladimirleite the issue not fixed yet? found same issue when process some webp files

@Chris-SP365
Copy link

And yes, the problem might be in the Huffman decoding class, so it's worth debugging and perhaps looking for differences from the C implementation.

In my limited time I did some further investigations. This is a debug output from this lib (BW REF means Backward Reference) Kermit image:

DECODE LITERAL: raster: 16/16 code: 0 x/y: 0/0
DECODE LITERAL: raster: 16/16 code: 1 x/y: 1/0
BW REF DECODE:  raster: 16/16 code: 271 x/y: 2/0 width: 16 length: 206 distancePrefix: 1 distanceCode: 2
DECODE LITERAL: raster: 16/16 code: 0 x/y: 0/13
BW REF DECODE:  raster: 16/16 code: 263 x/y: 1/13 width: 16 length: 16 distancePrefix: 0 distanceCode: 1
BW REF DECODE:  raster: 16/16 code: 265 x/y: 1/14 width: 16 length: 31 distancePrefix: 1 distanceCode: 2
DECODE LITERAL: raster: 512/512 code: 255 x/y: 0/0
DECODE LITERAL: raster: 512/512 code: 0 x/y: 1/0
BW REF DECODE:  raster: 512/512 code: 279 x/y: 2/0 width: 512 length: 4095 distancePrefix: 1 distanceCode: 2
BW REF DECODE:  raster: 512/512 code: 279 x/y: 1/8 width: 512 length: 4095 distancePrefix: 1 distanceCode: 2
BW REF DECODE:  raster: 512/512 code: 279 x/y: 0/16 width: 512 length: 4095 distancePrefix: 1 distanceCode: 2
BW REF DECODE:  raster: 512/512 code: 279 x/y: 511/23 width: 512 length: 3584 distancePrefix: 1 distanceCode: 2

This is from the C implementation (please ignore x/y, It's always 0/0 here):

DECODE LITERAL: raster: 16/16 code: 0 x/y: 0/0
DECODE LITERAL: raster: 16/16 code: 1 x/y: 0/0
BW REF DECODE:  raster: 16/16 code: 271 x/y: 0/0 width: 16 length: 206 distancePrefix: 1 distanceCode: 2
DECODE LITERAL: raster: 16/16 code: 0 x/y: 0/0
BW REF DECODE:  raster: 16/16 code: 263 x/y: 0/0 width: 16 length: 16 distancePrefix: 16 distanceCode: 1
BW REF DECODE:  raster: 16/16 code: 265 x/y: 0/0 width: 16 length: 31 distancePrefix: 1 distanceCode: 2
DECODE LITERAL: raster: 512/512 code: 255 x/y: 0/0
DECODE LITERAL: raster: 512/512 code: 0 x/y: 0/0
BW REF DECODE:  raster: 512/512 code: 279 x/y: 0/0 width: 512 length: 4095 distancePrefix: 1 distanceCode: 2
BW REF DECODE:  raster: 512/512 code: 279 x/y: 0/0 width: 512 length: 4095 distancePrefix: 1 distanceCode: 2
BW REF DECODE:  raster: 512/512 code: 279 x/y: 0/0 width: 512 length: 4095 distancePrefix: 1 distanceCode: 2
ReadPackedSymbols: >= BITS_SPECIAL_MARKER
BW REF DECODE:  raster: 512/512 code: 279 x/y: 0/0 width: 512 length: 4095 distancePrefix: 1 distanceCode: 2

Notice the difference in the last line. Java reads a length: 3584 and C reads length: 4095 (and continues reading 4095 for around 50 times).

It happens after the C implementation calls ReadPackedSymbols the first time: https://github.com/webmproject/libwebp/blob/main/src/dec/vp8l_dec.c#L210

Since ReadPackedSymbols resets the bit stream position in both cases (if/else) I guess this is the difference to the Java implementation here.

But since this Hufmann implementation differs a lot from the original C implementation and I couldn't find a good WEBP reference regarding Packed Symbols and my time is limited, I can only write that down for others to pick up.

@haraldk
Copy link
Owner

haraldk commented Aug 5, 2024

Thanks @Chris-SP365!

I'll try to use your findings in further debugging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants