This archetype supports writing Scray compatible jobs by creating:
- directory structure
- pom.xml that creates an ueberjar for the job
- bin directory with shell script to start the jobs
- minimal job structure
Java | Scala | Spark | Hadoop |
---|---|---|---|
17 | 2.13 | 3.5.0 | 3.4 |
To use the archetype artefacts they must either be pulled from a archetype-repo or installed in the local repo.
mvn clean install
Archetypes are enhancements of maven ("plugins") that can generate new projects. To use them classical maven coordinates must be provided.
mvn archetype:generate \
-DarchetypeGroupId=org.scray \
-DarchetypeArtifactId=scray-archetype \
-DarchetypeVersion=1.1.6-SNAPSHOT
A local spark node will be started and this job will be executed on this node.
Example:
./bin/submit-job.sh \
--local-mode \
--master spark://<YOUR_LOCAL_IP>:7077 \
--total-executor-cores 4 \
-b \
-m spark://127.0.0.1:7077
The options --master
with the Spark master URL and --total-executor-cores
providing the number of cores are required by the runner script.
For the url of the master there are several options:
spark://<IP>:<Port>
(default port 7077)yarn-client
(run a job with a local client but execute on a Hadoop yarn cluster of spark workers)yarn-cluster
(run the client and the workers of the Spark job on a Hadoop yarn cluster)
./bin/submit-job.sh \
--local-mode \
--master spark://127.0.0.1:7077 \
--total-executor-cores 2 \
-b \
-m spark://127.0.0.1:7077
./bin/submit-job.sh \
--local-mode \
--master spark://127.0.0.1:7077 \
--total-executor-cores 2 \
-m spark://127.0.0.1:7077 \
-t db-nifi \
-k www.db.opendata.s-node.de:9092 \
-p /tmp/