sparkmon

Description

sparkmon is a Python package to monitor Spark applications. You can see it as an advanced Spark UI, that keeps track all of Spark REST API metrics over time, which makes it quite unique compared to other solutions (see comparison below). It is specifically useful to do memory profiling, including Python UDF memory.

Features

Monitoring plot example:

Disclaimer: Be aware that if you run Spark in local mode some of the subplots will be empty, sparkmon is designed to analyse Spark applications running in a cluster.

Log the executors metrics
Plot monitoring, display in a notebook, or export to a file
Can monitor remote Spark application
Can run directly in your PySpark application, or run in a notebook, or via the command-line interface
Log to mlflow

Comparison with other solutions

This package brings much more information than Spark UI or other packages. Here is a quick comparison:

sparkmonitor:
- Nice integration in notebook
- Doesn't bring more information that Spark UI, specially not memory usage over time.
sparklint:
- Need to launch a server locally, might be difficult on-premise. sparkmon doesn't need to have a port accessible.
- Monitors only CPU over time, sparkmon monitors everything including Java and Python memory overtime.
- No update since 2018
Data Mechanics Delight:
- Really nice and complete
- But cannot work fully on-premise
- Is not fully open-source
Sparklens:
- But cannot work fully on-premise
- Is not fully open-source

Requirements

Python
Spark
mlflow (optional)

Installation

You can install sparkmon via pip from PyPI:

$ pip install sparkmon
$ pip install sparkmon[mlflow]

Usage

Simple use-case:

import sparkmon

# Create and start the monitoring process via a Spark session
mon = sparkmon.SparkMon(spark, period=5, callbacks=[
    sparkmon.callbacks.plot_to_image,
    sparkmon.callbacks.log_to_mlflow,
])
mon.start()

# Stop monitoring
mon.stop()

More advanced use-case:

import sparkmon

# Create an app connection
# via a Spark session
application = sparkmon.create_application_from_spark(spark)
# or via a remote Spark web UI link
application = sparkmon.create_application_from_link(index=0, web_url='http://localhost:4040')

# Create and start the monitoring process
mon = sparkmon.SparkMon(application, period=5, callbacks=[
    sparkmon.callbacks.plot_to_image,
    sparkmon.callbacks.log_to_mlflow,
])
mon.start()

# Stop monitoring
mon.stop()

You can also use it from a notebook: Notebook Example

There is also a command-line interface, see Command-line Reference for details.

How does it work?

SparkMon is running in the background a Python thread that is querying Spark web UI API and logging all the executors information over time.

The callbacks list parameters allows you to define what do after each update, like exporting executors historical info to a csv, or plotting to a file, or to your notebook.

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the MIT license, sparkmon is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from @cjolowicz's Hypermodern Python Cookiecutter template.

Name		Name	Last commit message	Last commit date
Latest commit History 231 Commits
.github		.github
docs		docs
src/sparkmon		src/sparkmon
tests		tests
.cookiecutter.json		.cookiecutter.json
.darglint		.darglint
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CODE_OF_CONDUCT.rst		CODE_OF_CONDUCT.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENSE.rst		LICENSE.rst
README.rst		README.rst
codecov.yml		codecov.yml
noxfile.py		noxfile.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sparkmon

Description

Features

Comparison with other solutions

Requirements

Installation

Usage

How does it work?

Contributing

License

Issues

Credits

About

Releases 20

Packages

Contributors 3

Languages

License

stephanecollot/sparkmon

Folders and files

Latest commit

History

Repository files navigation

sparkmon

Description

Features

Comparison with other solutions

Requirements

Installation

Usage

How does it work?

Contributing

License

Issues

Credits

About

Resources

License

Stars

Watchers

Forks

Releases 20

Packages 0

Contributors 3

Languages

Packages