You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unable to write to Opensearch using Spark. Receiving following exceptions
Traceback (most recent call last): File "/Users/zgaurash/Downloads/test.py", line 21, in <module> .save("pyspark_idx") File "/Users/zgaurash/.pyenv/versions/3.6.13/lib/python3.6/site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 740, in save File "/Users/zgaurash/.pyenv/versions/3.6.13/lib/python3.6/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1322, in __call__ File "/Users/zgaurash/.pyenv/versions/3.6.13/lib/python3.6/site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco File "/Users/zgaurash/.pyenv/versions/3.6.13/lib/python3.6/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o54.save. : java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.opensearch.spark.sql.DefaultSource15 could not be instantiated at java.util.ServiceLoader.fail(ServiceLoader.java:232) at java.util.ServiceLoader.access$100(ServiceLoader.java:185) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) at java.util.ServiceLoader$1.next(ServiceLoader.java:480) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:303) at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:297) at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108) at scala.collection.TraversableLike.filter(TraversableLike.scala:395) at scala.collection.TraversableLike.filter$(TraversableLike.scala:395) at scala.collection.AbstractTraversable.filter(Traversable.scala:108) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:652) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720) at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:852) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.NoClassDefFoundError: scala/$less$colon$less at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2699) at java.lang.Class.getConstructor0(Class.java:3103) at java.lang.Class.newInstance(Class.java:412) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380) ... 32 more Caused by: java.lang.ClassNotFoundException: scala.$less$colon$less at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 37 more
Spark version 3.2.3
Scala version 2.13
How can one reproduce the bug?
Steps to reproduce the behavior.
Create pyspark script:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, from_json
This issue is due to a Scala version mismatch, you're using the Scala 2.13 version of opensearch-spark, but likely your Spark cluster is not the Scala 2.13 distribution. The default distribution for Spark 3.2 uses Scala 2.12.
What is the bug?
A clear and concise description of the bug.
Unable to write to Opensearch using Spark. Receiving following exceptions
Traceback (most recent call last): File "/Users/zgaurash/Downloads/test.py", line 21, in <module> .save("pyspark_idx") File "/Users/zgaurash/.pyenv/versions/3.6.13/lib/python3.6/site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 740, in save File "/Users/zgaurash/.pyenv/versions/3.6.13/lib/python3.6/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1322, in __call__ File "/Users/zgaurash/.pyenv/versions/3.6.13/lib/python3.6/site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco File "/Users/zgaurash/.pyenv/versions/3.6.13/lib/python3.6/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o54.save. : java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.opensearch.spark.sql.DefaultSource15 could not be instantiated at java.util.ServiceLoader.fail(ServiceLoader.java:232) at java.util.ServiceLoader.access$100(ServiceLoader.java:185) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) at java.util.ServiceLoader$1.next(ServiceLoader.java:480) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:303) at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:297) at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108) at scala.collection.TraversableLike.filter(TraversableLike.scala:395) at scala.collection.TraversableLike.filter$(TraversableLike.scala:395) at scala.collection.AbstractTraversable.filter(Traversable.scala:108) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:652) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720) at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:852) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.NoClassDefFoundError: scala/$less$colon$less at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2699) at java.lang.Class.getConstructor0(Class.java:3103) at java.lang.Class.newInstance(Class.java:412) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380) ... 32 more Caused by: java.lang.ClassNotFoundException: scala.$less$colon$less at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 37 more
Spark version 3.2.3
Scala version 2.13
How can one reproduce the bug?
Steps to reproduce the behavior.
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, from_json
from pyspark.sql import SparkSession
sparkSession = SparkSession.builder.appName("pysparkTest").getOrCreate()
df = sparkSession.createDataFrame([(1, "value1"), (2, "value2")], ["id", "value"])
df.show()
df.write
.format("org.opensearch.spark.sql")
.option("inferSchema", "true")
.option("opensearch.nodes", "127.0.0.1")
.option("opensearch.port", "9200")
.option("opensearch.net.http.auth.user", "admin")
.option("opensearch.net.http.auth.pass", "admin")
.option("opensearch.net.ssl", "true")
.option("opensearch.net.ssl.cert.allow.self.signed", "true")
.option("opensearch.batch.write.retry.count", "9")
.option("opensearch.http.retries", "9")
.option("opensearch.http.timeout", "18000")
.mode("append")
.save("pyspark_idx")
docker run --name opensearch -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" opensearchproject/opensearch:2.7.0
spark-submit --jars ~/opensearch-hadoop/spark/sql-30/build/libs/opensearch-spark-30_2.13-1.2.0-SNAPSHOT.jar test.py
What is the expected behavior?
A clear and concise description of what you expected to happen.
The connector should work and be able to read and write from opensearch. Also it should not be limited to specific versions of spark and scala.
What is your host/environment?
Operating system, version.
Do you have any screenshots?
If applicable, add screenshots to help explain your problem.
Do you have any additional context?
Add any other context about the problem.
The text was updated successfully, but these errors were encountered: