Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race Condition Between Segment Refresh Message and Offline to Online Transition #14838

Open
ankitsultana opened this issue Jan 18, 2025 · 0 comments

Comments

@ankitsultana
Copy link
Contributor

ankitsultana commented Jan 18, 2025

We are seeing segments run into ERROR state due to a race condition between Segment Refresh message and Offline to Online transition. This ultimately leads to Inconsistent data read. Index data file xyz/columns.psf is possibly corrupted.

I haven't taken a deeper look into this, but I can share that the following events happened around the same time (in no particular order):

  • There was a node replacement.
  • There was a segment compaction job that processed this segment around the same time.
{"@timestamp":"2025-01-18T06:33:47.092+00:00","message":"Creating new inverted index for segment: some_table__13__2492__20250114T2318Z, column: lorem_ipsum","logger_name":"org.apache.pinot.segment.local.segment.index.loader.invertedindex.InvertedIndexHandler","thread_name":"HelixTaskExecutor-message_handle_thread_33","level":"INFO"}
{"@timestamp":"2025-01-18T06:33:47.098+00:00","message":"Need to create new inverted index for segment: some_table__13__2492__20250114T2318Z, column: foobar","logger_name":"org.apache.pinot.segment.local.segment.index.loader.invertedindex.InvertedIndexHandler","thread_name":"HelixTaskExecutor-message_handle_thread_43","level":"INFO"}

Update 19-Jan: Issue is not limited to compaction. Also saw this happen for one of the OFFLINE tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant