Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm chart images: conda removed -> pip only, usage disclaimer added, minimized Dockerfile complexity #533

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion continuous_integration/docker/base/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,11 @@ FROM centos:7
ARG python_version="3.10"
ARG go_version="1.18"

LABEL org.opencontainers.image.source="https://github.com/dask-gateway"
# Set labels based on the Open Containers Initiative (OCI):
# https://github.com/opencontainers/image-spec/blob/main/annotations.md#pre-defined-annotation-keys
#
LABEL org.opencontainers.image.source="https://github.com/dask/dask-gateway"
LABEL org.opencontainers.image.url="https://github.com/dask/dask-gateway/blob/HEAD/continuous_integration/docker/base/Dockerfile"

# Configure yum to error on missing packages
RUN echo "skip_missing_names_on_install=False" >> /etc/yum.conf
Expand Down
5 changes: 5 additions & 0 deletions continuous_integration/docker/hadoop/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@
#
FROM ghcr.io/dask/dask-gateway-ci-base:latest

# Set labels based on the Open Containers Initiative (OCI):
# https://github.com/opencontainers/image-spec/blob/main/annotations.md#pre-defined-annotation-keys
#
LABEL org.opencontainers.image.url="https://github.com/dask/dask-gateway/blob/HEAD/continuous_integration/docker/hadoop/Dockerfile"

# Notify dask-gateway tests that Yarn (part of Hadoop) is available
ENV TEST_DASK_GATEWAY_YARN true

Expand Down
5 changes: 5 additions & 0 deletions continuous_integration/docker/pbs/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@
#
FROM ghcr.io/dask/dask-gateway-ci-base:latest

# Set labels based on the Open Containers Initiative (OCI):
# https://github.com/opencontainers/image-spec/blob/main/annotations.md#pre-defined-annotation-keys
#
LABEL org.opencontainers.image.url="https://github.com/dask/dask-gateway/blob/HEAD/continuous_integration/docker/pbs/Dockerfile"

# Notify dask-gateway tests that PBS is available
ENV TEST_DASK_GATEWAY_PBS true

Expand Down
5 changes: 5 additions & 0 deletions continuous_integration/docker/slurm/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@
#
FROM ghcr.io/dask/dask-gateway-ci-base:latest

# Set labels based on the Open Containers Initiative (OCI):
# https://github.com/opencontainers/image-spec/blob/main/annotations.md#pre-defined-annotation-keys
#
LABEL org.opencontainers.image.url="https://github.com/dask/dask-gateway/blob/HEAD/continuous_integration/docker/slurm/Dockerfile"

# Notify dask-gateway tests that Slurm is available
ENV TEST_DASK_GATEWAY_SLURM true

Expand Down
82 changes: 42 additions & 40 deletions dask-gateway-server/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,45 +1,47 @@
FROM python:3.9-slim-bullseye as dependencies
LABEL MAINTAINER="Jim Crist-Harif"

# This Dockerfile and image, ghcr.io/dask/dask-gateway-server, is used by the
# dask-gateway Helm chart, by the api pod and the controller pod.
#
# The pods are started with different commands:
#
# - api pod command: dask-gateway-server ...
# - controller pod command: dask-gateway-server kube-controller ...
#
FROM python:3.10-slim-bullseye

# Set labels based on the Open Containers Initiative (OCI):
# https://github.com/opencontainers/image-spec/blob/main/annotations.md#pre-defined-annotation-keys
#
LABEL org.opencontainers.image.source="https://github.com/dask/dask-gateway"
LABEL org.opencontainers.image.url="https://github.com/dask/dask-gateway/blob/HEAD/dask-gateway-server/Dockerfile"

# Install tini and upgrade linux packages are updated to patch known
# vulnerabilities.
RUN apt-get update \
&& apt-get install -y tini \
&& rm -rf /var/lib/apt/lists/*

&& apt-get upgrade -y \
martindurant marked this conversation as resolved.
Show resolved Hide resolved
&& apt-get install -y \
tini \
&& rm -rf /var/lib/apt/lists/*

# Create a non-root user to run as
RUN useradd --create-home --user-group --uid 1000 dask
USER dask:dask
ENV PATH=/home/dask/.local/bin:$PATH
WORKDIR /home/dask/

# Install dask-gateway-server
#
# The Golang proxy binary isn't built as the dask-gateway Helm chart relies on
# Traefik as a proxy instead to run in its dedicated pod.
#
COPY --chown=dask:dask . /srv/dask-gateway-server
# The requirements are installed first as passing the --install-option flag
# makes pip not use wheels when installing dependencies.
#
RUN pip install --no-cache-dir \
aiohttp==3.8.1 \
colorlog \
cryptography \
traitlets==5.1.1 \
pyyaml \
kubernetes-asyncio==18.20.0



# Build dask-gateway-server from source in a builder stage
FROM dependencies AS builder

RUN mkdir -p /tmp/workdir
RUN mkdir -p /tmp/install-prefix
COPY . /tmp/workdir/
WORKDIR /tmp/workdir/
RUN python setup.py install \
--no-build-proxy \
--single-version-externally-managed \
--record=record.txt \
--prefix /tmp/install-prefix



# Final image - merge dependencies and built dask-gateway
FROM dependencies

COPY --from=builder /tmp/install-prefix/bin/dask-gateway-server /usr/local/bin/
COPY --from=builder /tmp/install-prefix/lib /usr/local/lib/

# Create non-root user and working directory
WORKDIR /srv/dask-gateway
RUN useradd -m -U -u 1000 dask && chown dask:dask /srv/dask-gateway
USER 1000:1000
--requirement=/srv/dask-gateway-server/requirements.txt \
kubernetes-asyncio \
&& pip install --no-cache-dir --no-deps \
/srv/dask-gateway-server --install-option="--no-build-proxy"
martindurant marked this conversation as resolved.
Show resolved Hide resolved

ENTRYPOINT ["tini", "-g", "--"]
CMD ["dask-gateway-server", "--config", "/etc/dask-gateway/dask_gateway_config.py"]
9 changes: 9 additions & 0 deletions dask-gateway-server/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# These are dependencies for installing the dask-gateway-server package.
#
# NOTE: changes to the dependencies here must also be reflected in
# ../dev-environment.yaml
#
aiohttp
martindurant marked this conversation as resolved.
Show resolved Hide resolved
colorlog
cryptography
traitlets
10 changes: 2 additions & 8 deletions dask-gateway-server/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,14 +97,8 @@ def run(self):
_clean.run(self)


# NOTE: changes to the dependencies here must also be reflected
# in ../dev-environment.yaml
install_requires = [
"aiohttp",
"colorlog",
"cryptography",
"traitlets",
]
with open("requirements.txt") as f:
install_requires = [l for l in f.readlines() if not l.startswith("#")]

extras_require = {
# pykerberos is tricky to install and requires a system package to
Expand Down
117 changes: 31 additions & 86 deletions dask-gateway/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,95 +1,40 @@
# ** A base miniconda image **
FROM debian:bullseye-slim as miniconda
LABEL MAINTAINER="Jim Crist-Harif"

# List of conda versions: https://repo.anaconda.com/miniconda/
# PURPOSE:
#
# This Dockerfile and image, ghcr.io/dask/dask-gateway, is used by the
# dask-gateway Helm chart. It acts as the sample image for scheduler and workers
# in Dask Clusters created by end users.
#
# Miniconda 4.9.2 has been chosen as it has a aarch64 release and hasn't
# introduced a bug described in:
# https://stackoverflow.com/questions/68213186/illegal-instruction-error-when-verifying-anaconda-miniconda-install
# The admin installing the dask-gateway Helm chart or its end users are meant to
# specify an image for the scheduler and worker pods to use that meets their
# needs for the Dask clusters they startup. Please build your own according to
# the documentation if this very limited image doesn't meet your needs.
#
# FIXME: Use micromamba or mambaforge instead, see
# https://github.com/mamba-org/mamba for more info
# See https://gateway.dask.org/install-kube.html#using-a-custom-image.
#
ARG CONDA_VERSION=py39_4.10.3
FROM python:3.10-slim-bullseye

# - Create user dask
RUN useradd -m -U -u 1000 dask
# Set labels based on the Open Containers Initiative (OCI):
# https://github.com/opencontainers/image-spec/blob/main/annotations.md#pre-defined-annotation-keys
#
LABEL org.opencontainers.image.source="https://github.com/dask/dask-gateway"
LABEL org.opencontainers.image.url="https://github.com/dask/dask-gateway/blob/HEAD/dask-gateway/Dockerfile"

# - Install tini
# - Install miniconda build dependencies
# Install tini and update linux packages to patch known vulnerabilities.
RUN apt-get update \
&& apt-get install -y tini wget bzip2 \
&& rm -rf /var/lib/apt/lists/*

# - Download and install miniconda
# - Configure conda to minimize automatic package updates
# - Cleanup conda files
# - Uninstall miniconda build dependencies
RUN ARCH=$(uname -m) \
&& wget --quiet "https://repo.anaconda.com/miniconda/Miniconda3-$CONDA_VERSION-Linux-$ARCH.sh" \
&& mv "Miniconda3-$CONDA_VERSION-Linux-$ARCH.sh" miniconda.sh \
&& sh ./miniconda.sh -b -p /opt/conda \
&& ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> /home/dask/.profile \
&& echo "conda activate base" >> /home/dask/.profile \
&& echo "always_yes: true" >> /home/dask/.condarc \
&& echo "changeps1: false" >> /home/dask/.condarc \
&& echo "auto_update_conda: false" >> /home/dask/.condarc \
&& echo "aggressive_update_packages: []" >> /home/dask/.condarc \
&& find /opt/conda/ -follow -type f -name '*.a' -delete \
&& /opt/conda/bin/conda clean -afy \
&& chown -R dask:dask /opt/conda

USER 1000:1000
ENV PATH="/opt/conda/bin:$PATH"
&& apt-get upgrade -y \
&& apt-get install -y \
tini \
&& rm -rf /var/lib/apt/lists/*

# Create a non-root user to run as
RUN useradd --create-home --user-group --uid 1000 dask
USER dask:dask
ENV PATH=/home/dask/.local/bin:$PATH
WORKDIR /home/dask/

ENTRYPOINT ["tini", "-g", "--"]



# ** An image with all of dask-gateway's dependencies **
FROM miniconda as dependencies

# Latest versions can be found at:
# - dask: https://anaconda.org/conda-forge/dask
# - distributed: https://anaconda.org/conda-forge/distributed
ARG DASK_VERSION=2022.02.0
ARG DISTRIBUTED_VERSION=2022.02.0
# Install dask-gateway
COPY --chown=dask:dask . /opt/dask-gateway
RUN pip install --no-cache-dir /opt/dask-gateway

# - Installs dask and dependencies
# - Cleans up conda files
# - Removes unnecessary static libraries
# - Removes unnecessary *.js.map files
# - Removes unminified bokeh js
RUN /opt/conda/bin/conda install -c conda-forge --freeze-installed -y \
"aiohttp==3.8.1" \
"click<8.1.0" \
"dask==$DASK_VERSION" \
"distributed==$DISTRIBUTED_VERSION" \
"numpy==1.21.4" \
"pandas==1.3.4" \
&& /opt/conda/bin/conda clean -afy \
&& find /opt/conda/ -follow -type f -name '*.a' -delete \
&& find /opt/conda/ -follow -type f -name '*.js.map' -delete \
&& find /opt/conda/lib/python*/site-packages/bokeh/server/static -follow -type f -name '*.js' ! -name '*.min.js' -delete



# ** Build dask-gateway from source in a temporary image **
FROM dependencies AS builder

RUN mkdir -p /tmp/workdir
RUN mkdir -p /tmp/install-prefix
COPY . /tmp/workdir/
WORKDIR /tmp/workdir/
RUN /opt/conda/bin/pip install . --no-cache-dir --no-deps --prefix /tmp/install-prefix



# ** The final image **
FROM dependencies

# Copy over the built dask-gateway
COPY --from=builder /tmp/install-prefix /opt/conda/
# Only set ENTRYPOINT, CMD is configured at runtime by dask-gateway-server
ENTRYPOINT ["tini", "-g", "--"]
16 changes: 16 additions & 0 deletions dask-gateway/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# These are dependencies for installing the dask-gateway package.
#
# NOTE: changes to the dependencies here must also be reflected in
# ../dev-environment.yaml
#
aiohttp
# FIXME: click 8.0.4 works, but 8.1.0-8.1.2 has found to cause failures for
# currently unknown reasons.
#
# This is tracked in https://github.com/dask/dask-gateway/issues/522.
#
click<8.1.0
dask>=2.2.0
martindurant marked this conversation as resolved.
Show resolved Hide resolved
distributed>=2021.01.1
pyyaml
tornado
17 changes: 2 additions & 15 deletions dask-gateway/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,21 +7,8 @@
exec(f.read(), {}, ns)
VERSION = ns["__version__"]

# NOTE: changes to the dependencies here must also be reflected
# in ../dev-environment.yaml
install_requires = [
"aiohttp",
# FIXME: click 8.0.4 works, but 8.1.0-8.1.2 has found to cause failures for
# currently unknown reasons.
#
# This is tracked in https://github.com/dask/dask-gateway/issues/522.
#
"click<8.1.0",
"dask>=2.2.0",
"distributed>=2021.01.1",
"pyyaml",
"tornado",
]
with open("requirements.txt") as f:
install_requires = [l for l in f.readlines() if not l.startswith("#")]

extras_require = {
"kerberos": [
Expand Down
7 changes: 3 additions & 4 deletions docs/source/install-kube.rst
Original file line number Diff line number Diff line change
Expand Up @@ -216,9 +216,8 @@ There are no other requirements for images, any image that meets the above
should work fine. You may install any additional libraries or dependencies you
require.

To develop your own image, you may either base it on a compatible version of
``daskgateway/dask-gateway``, or use our `example dockerfile`_ as a reference
and develop your own.
We encourage you to maintain your own image for scheduler and worker pods as
this project only provides a `minimal image`_ for testing purposes.

Using ``extraPodConfig``/``extraContainerConfig``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -415,4 +414,4 @@ here for reference:
.. _preemptible nodes: https://cloud.google.com/blog/products/containers-kubernetes/cutting-costs-with-google-kubernetes-engine-using-the-cluster-autoscaler-and-preemptible-vms
.. _init process: https://en.wikipedia.org/wiki/Init
.. _tini: https://github.com/krallin/tini
.. _example dockerfile: https://github.com/dask/dask-gateway/blob/main/resources/helm/example-images/Dockerfile
.. _minimal image: https://github.com/dask/dask-gateway/blob/main/dask-gateway/Dockerfile
Loading