Skip to content
This repository has been archived by the owner on Jan 16, 2025. It is now read-only.

Unable to decompress Snappy JSON file using golang/snappy #75

Open
raihan26 opened this issue Aug 22, 2023 · 1 comment
Open

Unable to decompress Snappy JSON file using golang/snappy #75

raihan26 opened this issue Aug 22, 2023 · 1 comment

Comments

@raihan26
Copy link

raihan26 commented Aug 22, 2023

I've encountered an issue with the golang/snappy library where I'm unable to decompress a Snappy compressed JSON file. The error I receive is Failed to decompress content: snappy: corrupt input. However, I've verified that the file is not corrupt by successfully decompressing it using the snzip tool.

Steps to Reproduce:

  1. Compress a JSON file using Spark job by using this parameter .option("compression", "snappy") and write it to s3.
  2. Attempt to decompress the file from s3 using the following Go code:
package main

import (
	"bytes"
	"fmt"
	"io/ioutil"
	"log"
	"github.com/golang/snappy"
)

func main() {
	// Read the compressed file
	content, err := ioutil.ReadFile("path_to_your_snappy_file.snappy")
	if err != nil {
		log.Fatalf("Failed to read file: %v", err)
	}

	// Decompress using golang/snappy
	decompressed, err := snappy.Decode(nil, content)
	if err != nil {
		log.Fatalf("Failed to decompress content: %v", err)
	}

	// Print the decompressed content
	fmt.Println(string(decompressed))
}

Observe the error: Failed to decompress content: snappy: corrupt input.

Expected Behavior:

The Snappy compressed JSON file should be decompressed without errors.

Actual Behavior:

Received an error indicating the input is corrupt, even though other tools like snzip can decompress the file without issues.

Additional Information:

The Snappy compressed file is a JSON file where each line is a separate JSON object.
I've verified the integrity of the file by decompressing it using snzip.
The issue might be related to the specific Snappy format or framing used, but I'm not certain.

@klauspost
Copy link
Contributor

You are using the block decompressor to decode what is probably a stream. There are unique formats (streams contains wrapped blocks). Try with a Reader.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants