Skip to content

Commit

Permalink
Truncate seed if too large during seqio caching jobs.
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 633765044
  • Loading branch information
gauravmishra authored and SeqIO committed May 15, 2024
1 parent 20e5f45 commit f09ed20
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions seqio/beam_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,8 @@ def _emit_examples(self, shard: Tuple[int, str]):
shard_preprocessors_seed = int.from_bytes(md5_digest, "little") + (
self._preprocessors_seed or 0
)
# Truncate if still a large number.
shard_preprocessors_seed %= self._int64_max

ds = task.source.get_dataset(
split=self._split,
Expand Down

0 comments on commit f09ed20

Please sign in to comment.