Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Union types break service serialization #323

Open
chris05atm opened this issue Apr 1, 2020 · 0 comments
Open

Union types break service serialization #323

chris05atm opened this issue Apr 1, 2020 · 0 comments

Comments

@chris05atm
Copy link

What happened?

When we upgraded our conjure-python dependency we ran into runtime pyspark serialization issues. We previously could serialize a service object but post-conjure-python upgrade this same service was no longer serializable.

We suspect #320 or #221 broke serde behavior for us.

The pyspark error was:

py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/palantir/services/.4229/var/tmp/asset-install/85af169544daf00da129a002813aba21/spark/python/lib/pyspark.zip/pyspark/worker.py", line 413, in main
    func, profiler, deserializer, serializer = read_command(pickleSer, infile)
  File "/opt/palantir/services/.4229/var/tmp/asset-install/85af169544daf00da129a002813aba21/spark/python/lib/pyspark.zip/pyspark/worker.py", line 68, in read_command
    command = serializer._read_with_length(file)
  File "/opt/palantir/services/.4229/var/tmp/asset-install/85af169544daf00da129a002813aba21/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 173, in _read_with_length
    return self.loads(obj)
  File "/opt/palantir/services/.4229/var/tmp/asset-install/85af169544daf00da129a002813aba21/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 697, in loads
    return pickle.loads(obj, encoding=encoding)
AttributeError: type object 'AlertFailureResponse' has no attribute '_service_exception'

This was thrown when passing our service through a map function. This occurred even with zero data passed along. It was only the service code that previously worked.

Other conjure definitions:


      AlertResponse:
        union:
          failureResponse: AlertFailureResponse
          successResponse: AlertSuccessResponse

      AlertFailureResponse:
        fields:
          serviceException: ServiceException
      AlertSuccessResponse:
        fields:
          uuid: uuid

Our __conjure_generator_version__ is 3.12.1.

We mitigated the issue by building our Conjure service in a mapPartitions function which is likely a better practice anyway.

What did you want to happen?

We are not entirely sure on why these new type definitions are not serializable. I believe the fields are renamed in a way that pyspark's serialization cannot find but that is conjecture at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant