Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HMAC errors not reported when reading compressed and encrypted files (AE-2) #478

Open
cschreib-ibex opened this issue Jan 3, 2025 · 4 comments
Labels
bug libzip doesn't behave as expected. feedback Waiting for feedback from submitter.

Comments

@cschreib-ibex
Copy link

cschreib-ibex commented Jan 3, 2025

Describe the Bug
It appears that libzip is not fully checking for checksums / CRCs when reading data from compressed (DEFLATE) and encrypted (AES-256) files.

I wrote a test that creates a zip buffer in memory, then loops over each byte of that buffer, manually corrupting the byte, and trying to extract back the data. I expect either:

  • the data fails to extract because libzip reports an error
  • the data extracts without error and the extracted content is as expected

This works as expected for:

  • uncompressed and unencrypted
  • compressed and unencrypted
  • uncompressed and encrypted

But when enabling both compression and encryption, some corrupted bytes go through undetected (no reported error), leaving me with a seemingly good file with content that is actually corrupted.

Expected Behavior

Testing corruption without compression and without encryption
 --> all corrupted bytes were detected; PASSED.
Testing corruption with compression and without encryption
 --> all corrupted bytes were detected; PASSED.
Testing corruption without compression and with encryption
 --> all corrupted bytes were detected; PASSED.
Testing corruption with compression and with encryption
 --> all corrupted bytes were detected; PASSED.

Observed Behavior

Testing corruption without compression and without encryption
 --> all corrupted bytes were detected; PASSED.
Testing corruption with compression and without encryption
 --> all corrupted bytes were detected; PASSED.
Testing corruption without compression and with encryption
 --> all corrupted bytes were detected; PASSED.
Testing corruption with compression and with encryption
  byte 70: CRITICAL: mismatched content size
  byte 75: CRITICAL: mismatched content size
  byte 76: CRITICAL: mismatched content size
  byte 82: CRITICAL: mismatched content size
  byte 84: CRITICAL: mismatched content size
  byte 85: CRITICAL: mismatched content size
  byte 86: CRITICAL: mismatched content size
  byte 87: CRITICAL: mismatched content size
  byte 88: CRITICAL: mismatched content size
  byte 89: CRITICAL: mismatched content size
  byte 90: CRITICAL: mismatched content size
  byte 91: CRITICAL: mismatched content size
  byte 92: CRITICAL: mismatched content size
  byte 93: CRITICAL: mismatched content size
  byte 94: CRITICAL: mismatched content size
  byte 97: CRITICAL: mismatched content size
  byte 101: CRITICAL: mismatched content
  byte 102: CRITICAL: mismatched content
  byte 104: CRITICAL: mismatched content size
  byte 106: CRITICAL: mismatched content
  byte 107: CRITICAL: mismatched content
  byte 108: CRITICAL: mismatched content
 --> some corrupted bytes were not detected; FAILED.

To Reproduce

See source code below.

zip_test.c
#include <zip.h>
#include <stdlib.h>
#include <string.h>

void report_error(zip_error_t* error)
{
    printf("error: %s", zip_error_strerror(error));
    zip_error_fini(error);
}

typedef struct
{
    char* buffer;
    zip_int64_t length;
} sized_buffer;

const char* expected_file_content = "The quick brown fox jumps over the lazy dog";

int create_archive_buffer(int compress, const char* password, sized_buffer* output)
{
    // Create empty archive with source buffer.
    zip_error_t source_error;
    zip_error_init(&source_error);
    zip_source_t* archive_source = zip_source_buffer_create(NULL, 0, 0, &source_error);
    if (archive_source == NULL)
    {
        report_error(&source_error);
        return 1;
    }

    zip_t* archive = zip_open_from_source(archive_source, ZIP_TRUNCATE, &source_error);
    if (archive == NULL)
    {
        report_error(&source_error);
        zip_source_free(archive_source);
        return 1;
    }

    zip_source_keep(archive_source);

    // Add a single file with expected content.
    zip_source_t* file_source =
        zip_source_buffer_create(expected_file_content, strlen(expected_file_content), 0, &source_error);
    if (file_source == NULL)
    {
        report_error(&source_error);
        zip_close(archive);
        zip_source_free(archive_source);
        return 1;
    }

    zip_error_fini(&source_error);

    zip_int64_t index = zip_file_add(archive, "test.txt", file_source, ZIP_FL_OVERWRITE);
    if (index < 0)
    {
        report_error(zip_get_error(archive));
        zip_source_free(file_source);
        zip_close(archive);
        zip_source_free(archive_source);
        return 1;
    }

    int error = 0;
    if (compress)
    {
        error = zip_set_file_compression(archive, index, ZIP_CM_DEFLATE, 0);
        if (error != 0)
        {
            report_error(zip_get_error(archive));
            zip_close(archive);
            zip_source_free(archive_source);
            return 1;
        }
    }

    if (password != NULL)
    {
        // Set its encryption method and password.
        error = zip_file_set_encryption(archive, index, ZIP_EM_AES_256, password);
        if (error != 0)
        {
            report_error(zip_get_error(archive));
            zip_close(archive);
            zip_source_free(archive_source);
            return 1;
        }
    }

    // Close the archive.
    error = zip_close(archive);
    if (error != 0)
    {
        report_error(zip_get_error(archive));
        zip_source_free(archive_source);
        return 1;
    }

    archive = NULL;

    // Read the archive into a buffer.
    error = zip_source_open(archive_source);
    if (error != 0)
    {
        report_error(zip_source_error(archive_source));
        zip_source_free(archive_source);
        return 1;
    }

    error = zip_source_seek(archive_source, 0, SEEK_END);
    if (error != 0)
    {
        report_error(zip_source_error(archive_source));
        zip_source_close(archive_source);
        zip_source_free(archive_source);
        return 1;
    }

    const zip_int64_t archive_length = zip_source_tell(archive_source);
    output->buffer = malloc(archive_length);
    output->length = archive_length;

    error = zip_source_seek(archive_source, 0, SEEK_SET);
    if (error != 0)
    {
        report_error(zip_source_error(archive_source));
        zip_source_close(archive_source);
        zip_source_free(archive_source);
        return 1;
    }

    const zip_int64_t read_bytes = zip_source_read(archive_source, output->buffer, output->length);
    if (read_bytes != output->length)
    {
        report_error(zip_source_error(archive_source));
        zip_source_close(archive_source);
        zip_source_free(archive_source);
        return 1;
    }

    error = zip_source_close(archive_source);
    if (error != 0)
    {
        report_error(zip_source_error(archive_source));
        zip_source_free(archive_source);
        return 1;
    }

    // Free the source.
    zip_source_free(archive_source);
    return 0;
}

zip_int64_t read_file_content(const sized_buffer* input, const char* password, char* output, zip_int64_t output_length)
{
    // Open archive.
    zip_error_t source_error;
    zip_error_init(&source_error);
    zip_source_t* archive_source = zip_source_buffer_create(input->buffer, input->length, 0, &source_error);
    if (archive_source == NULL)
    {
        report_error(&source_error);
        return -1;
    }

    zip_t* archive = zip_open_from_source(archive_source, ZIP_RDONLY | ZIP_CHECKCONS, &source_error);
    if (archive == NULL)
    {
        report_error(&source_error);
        zip_source_free(archive_source);
        return -1;
    }

    zip_error_fini(&source_error);

    // Open file.
    zip_int64_t index = 0;
    zip_file_t* file = password != NULL ? zip_fopen_index_encrypted(archive, index, 0, password) :
                                          zip_fopen_index(archive, index, 0);

    if (file == NULL)
    {
        report_error(zip_get_error(archive));
        zip_close(archive);
        return -1;
    }

    // Read file content.
    char read_buffer[1024];
    zip_int64_t content_length = 0;
    while (1)
    {
        zip_int64_t read_bytes = zip_fread(file, read_buffer, sizeof(read_buffer));
        if (read_bytes < 0)
        {
            report_error(zip_file_get_error(file));
            zip_fclose(file);
            zip_close(archive);
            return -1;
        }

        if (read_bytes == 0) { break; }

        if (read_bytes > output_length)
        {
            printf("error: not enough space in output buffer");
            zip_fclose(file);
            zip_close(archive);
            return -1;
        }

        memcpy(output, read_buffer, read_bytes);
        output_length -= read_bytes;
        output += read_bytes;
        content_length += read_bytes;
    }

    // Close file.
    int error = zip_fclose(file);
    file = NULL;
    if (error != 0)
    {
        zip_error_t file_error;
        zip_error_init_with_code(&file_error, error);
        report_error(&file_error);
        zip_close(archive);
        return -1;
    }

    // Close the archive.
    error = zip_close(archive);
    if (error != 0)
    {
        report_error(zip_get_error(archive));
        return -1;
    }

    return content_length;
}

int test_corruption(int compressed, const char* password)
{
    printf(
        "Testing corruption %s compression and %s encryption\n",
        compressed ? "with" : "without",
        password != NULL ? "with" : "without");

    sized_buffer archive;
    if (create_archive_buffer(compressed, password, &archive) != 0) { return 1; }

    int bad = 0;
    char file_buffer[1024];
    for (int byte_index = 0; byte_index < archive.length; byte_index++)
    {
        // Create corrupted archive.
        sized_buffer corrupted_archive;
        corrupted_archive.buffer = malloc(archive.length);
        corrupted_archive.length = archive.length;
        memcpy(corrupted_archive.buffer, archive.buffer, archive.length);

        corrupted_archive.buffer[byte_index] ^= 85; // corrupt one byte

        // Attempt to read file content.
        printf("\33[2K\r  byte %d: ", byte_index);
        zip_int64_t content_length = read_file_content(&corrupted_archive, password, file_buffer, sizeof(file_buffer));
        if (content_length < 0)
        {
            // Error already reported.
        }
        else if (content_length != strlen(expected_file_content))
        {
            printf("CRITICAL: mismatched content size\n");
            bad = 1;
        }
        else if (memcmp(file_buffer, expected_file_content, content_length) != 0)
        {
            printf("CRITICAL: mismatched content\n");
            bad = 1;
        }
        else { printf("ok: content unaffected"); }

        free(corrupted_archive.buffer);
    }

    free(archive.buffer);

    if (bad)
    {
        printf("\33[2K\r --> some corrupted bytes were not detected; FAILED.\n");
        return 1;
    }
    else
    {
        printf("\33[2K\r --> all corrupted bytes were detected; PASSED.\n");
        return 0;
    }
}

int main()
{
    test_corruption(0, NULL);
    test_corruption(1, NULL);
    test_corruption(0, "1234");
    test_corruption(1, "1234");
}
libzip Version
1.11.1

With OpenSSL 3.3.2 and zlib 1.3.1

Operating System
Ubuntu 22.04

Test Files
None

Additional context

Digging through the source code and debugger as best I could, I first identified the root cause as this line:

crc_valid = false;

This marks CRCs as invalid, meaning they are not checked when reading the file content. Reading the ZIP spec, I understand this is expected for the WinZip AE-2 encryption, for which indeed the standard zip CRCs are not meant to be used and should be set to zero (NB: they are not -- I see valid non-zero CRCs in the zip).

The expectation in this situation is to check the authentication code (HMAC). As best I can tell, libzip is indeed reading and checking the HMAC, so I am not sure why this isn't able to catch all corrupted bytes.

Digging further with the debugger, I finally realize that the error is indeed caught, but not actually propagated because of this:

if (bytes_read == 0) {
return -1;
}
else {
return (zip_int64_t)bytes_read;
}

Bytes were actually read, so zip_source_read() does not return -1 which is normally used to indicate a read error. Therefore zip_fread() does not copy the source error into the file error structure, and the actual error goes undetected.

Edit: it's more complicated than that; the error reporting works well when compression is disabled, I assume because of the check for src->had_read_error. The issue is probably an interaction between the encryption and compression sources.

@cschreib-ibex cschreib-ibex added the bug libzip doesn't behave as expected. label Jan 3, 2025
@cschreib-ibex
Copy link
Author

Here's the sequence of events I observe in the debugger:

  1. verify_hmac() reports an error and stores it in ctx->error.
  2. Control returns to winzip_aes_decrypt(), which returns -1.
  3. Control returns to _zip_source_call(), which copies the error from ctx->error into src->error and returns -1.
  4. Control returns to zip_source_read(), which sets src->had_read_error and returns > 0 bytes read.
  5. Control returns to compress_read(), which ignores error, feeds the decrypted bytes to zlib (ctx->algorithm->input(...)) and continues with another iteration of the while loop.
  6. In that new iteration, ctx->algorithm->process() returns ZIP_COMPRESSION_END. NB: at this point ctx->end_of_input is false. The function then sets ctx->end_of_stream to true, and end to true. The loop stops, and the function returns > 0 bytes read.
  7. Control returns to compress_callback(), which forwards the > 0 return value.
  8. Control returns to zip_source_call(), which forwards the > 0 return value.
  9. Control returns to zip_source_read(), which records the bytes read and continues with another iteration.
  10. Control goes to zip_source_call(), then to compress_callback(), then to compress_read(). The function finds ctx->end_of_stream is true so returns 0.
  11. Control returns to zip_source_call(), which forwards the 0 return value.
  12. Control returns to zip_source_read(), which detects n == 0 and sets src->eof to 1, breaks from the loop, and returns > 0 bytes read (accumulated from the previous iteration).
  13. Control returns to zip_fread(), which forwards the > 0 return value.
  14. Control returns to read_file_content() in my code, which stores the returned bytes and continues with another iteration, calling zip_fread() again. At this point the error is still not reported.
  15. Control goes to zip_fread(), then zip_source_read(). Here, src->had_read_error is false (I guess it's a different source). _zip_source_eof() returns 0, which then propagates up to zip_fread(), and signals end of the file. Still no error reported.

@cschreib-ibex
Copy link
Author

cschreib-ibex commented Jan 3, 2025

My diagnostic is that it's a mistake for zip_source_read() to return > 0 when an error occurred. If anything, it's a violation of the API documentation:

Upon successful completion the number of bytes read is returned. When zip_source_read() is called after reaching the end of the file, 0 is returned. Otherwise, -1 is returned and the error information in source is set to indicate the error.

Changing the behavior of zip_source_read() to match the documented behavior seems like it would solve the problem (the sequence of events above would stop at step 5, with the error propagating up to zip_fread()), but I don't know if it would cause other problems.

The alternative would be to change compress_read() to trigger another call to zip_source_read(), for example at ZIP_COMPRESSION_END when ctx->end_of_input is false and expecting zero bytes; this would allow the had_read_error from the decryption source to be caught and propagate up.

@cschreib-ibex
Copy link
Author

cschreib-ibex commented Jan 3, 2025

Confirming that either of the two following patches fix the problem, and also pass all regression tests (though I have only tested the OpenSSL crypto implementation and the zlib compression algorithm; also the preload.test failed before and after the patches were applied, so I assume that is an unrelated failure).

Fix zip_source_read():

diff --git a/lib/zip_source_read.c b/lib/zip_source_read.c
index 910d4c3e..13e88f15 100644
--- a/lib/zip_source_read.c
+++ b/lib/zip_source_read.c
@@ -64,12 +64,7 @@ zip_source_read(zip_source_t *src, void *data, zip_uint64_t len) {
     while (bytes_read < len) {
         if ((n = _zip_source_call(src, (zip_uint8_t *)data + bytes_read, len - bytes_read, ZIP_SOURCE_READ)) < 0) {
             src->had_read_error = true;
-            if (bytes_read == 0) {
-                return -1;
-            }
-            else {
-                return (zip_int64_t)bytes_read;
-            }
+            return -1;
         }

         if (n == 0) {

Or fix compress_read():

diff --git a/lib/zip_source_compress.c b/lib/zip_source_compress.c
index 54387eca..7f10d91d 100644
--- a/lib/zip_source_compress.c
+++ b/lib/zip_source_compress.c
@@ -228,7 +228,20 @@ compress_read(zip_source_t *src, struct context *ctx, void *data, zip_uint64_t l
             ctx->end_of_stream = true;

             if (!ctx->end_of_input) {
-                /* TODO: garbage after stream, or compression ended before all data read */
+                if ((n = zip_source_read(src, ctx->buffer, sizeof(ctx->buffer))) < 0) {
+                    zip_error_set_from_source(&ctx->error, src);
+                    end = true;
+                    break;
+                }
+                else if (n != 0) {
+                    /* garbage after stream, or compression ended before all data read */
+                    zip_error_set(&ctx->error, ZIP_ER_INTERNAL, 0);
+                    end = true;
+                    break;
+                }
+
+                ctx->end_of_input = true;
+                ctx->algorithm->end_of_input(ctx->ud);
             }

             if (ctx->first_read < 0) {

dillof added a commit that referenced this issue Jan 8, 2025
Also, catch garbage after compressed data if checking for consistency.

Adresses issue #478
@dillof
Copy link
Member

dillof commented Jan 8, 2025

Thank you for the detailed analysis, we've reworked your patch slightly and applied it.

Can you please provide us with a short example file with an HMAC error, so we can include a test case?

@dillof dillof added the feedback Waiting for feedback from submitter. label Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug libzip doesn't behave as expected. feedback Waiting for feedback from submitter.
Projects
None yet
Development

No branches or pull requests

2 participants