- Instructions for system administrators to deploy the eWaterCycle platform
This repo contains (codified) instructions for deploying the eWaterCycle platform. The target audience of these instructions are system administrators. For more information on the eWaterCycle platform (and how to deploy it) see the eWaterCycle documentation.
With grading setup is one class, one grader.
For instructions on how to use the machine as deployed by this repo see the User guide.
These instructions assume you have some basic knowledge of vagrant and Ansible.
The hardware environment used by the eWaterCycle platform development team is the SURF Research Cloud. Starting a machine on the Surf Research Cloud requires that you have research budget with SURF, for more info see the website of SURF. Once running, access to the machine can be shared to anyone.
The setup instructions in this repo will create an eWaterCycle application(a sort-of VM template) that when started will create a machine with:
- Jupyter Hub: to interactivly generate forcings and perform experiments on hydrological models using the eWatercycle Python package
- nbgrader for grading
- nbgitpuller to open a cloned git repository in Jupyter Lab from an URL
- ERA5 and ERA-Interim global climate data, which can be used to generate forcings
- Installed models and their example parameter sets
An application on the SURF Research cloud is provisioned by running an Ansible playbook (research-cloud-plugin.yml).
In addition to the standard VM storage, additional read-only datasets are mounted at /data/shared
from a file server like a samba server or a dcache server. They may contain things like:
- climate data, see https://ewatercycle.readthedocs.io/en/latest/system_setup.html#download-climate-data
- observation
- parameter-sets
- singularity-images of hydrological models wrapped in grpc4bmi servers
See File server chapter for more information on the file server.
Previously the eWatercycle platform consisted of multiple VM on SURF HPC cloud, see v0.1.2 release for that code.
For developing the SURF Research Cloud applications locally you can use the Vagrant instructions
To register the eWaterCycle platform on the SURF Research cloud, follow instructions in SURF Research cloud developer document.
This chapter is dedicated for application deployers. A workspace is name for a Virtual Machine (VM) on the SURF Research cloud. The workspace is created with the eWaterCycle application from the catalog.
The eWatercycle system setup requires a lot of data files.
Two eWaterCycle catalog items have been created:
- eWaterCycle dcache, uses dcache as shared data source. High capacity, but high latency storage accessible via WebDAV from anywhere on the Internet. Usefull for research.
- eWaterCycle samba, uses samba as shared data source. A low capacity, low latency file server that is only accessible from the private network of the SURF Research cloud. Usefull for teaching.
The shared data is mounted read-only /data/shared
on the workspaces.
In the following chapters you will need to make choose which catalog item you want to use.
Depending on the choice, you need to do certain things.
Before you can create a workspace several steps need to be done first.
- Log into SURF Research Cloud
- Make sure you are allowed to use eWaterCycle catalog item
- Create new storage item for home directories
- To store user files
- Use 50Gb size for simple experiments or bigger when required for experiment.
- As each storage item can only be used by a single workspace, give it a name and description so you know which workspace and storage items go together.
- If shared data source is dcache then create new storage item for dcache cache
- To store cached files from dCache by rclone
- Use 50GB size as size
- As each storage item can only be used by a single workspace, give it a name and description so you know which workspace and storage items go together.
- If shared data source is samba then create new storage item for data
- To store training material like parameter sets, ready-to-use forcings, raw forcings and apptainer sif files for models.
- This storage item should be used later in the Samba file server.
- If shared data source is samba then create a private network
- Name:
file-storage-network
- Name:
- On https://portal.live.surfresearchcloud.nl/profile page in Collaborative organizations
- Create a secret named
samba_password
and a strong random password as value - Create a secret named
dcache_ro_token
and a dcache read-only token as value
- Create a secret named
To become root on a VM the user needs to be member of the src_co_admin
group on SRAM.
See docs.
If you want to create a eWaterCycle machine (aka workspace) that uses a Samba file server (aka shared data source is samba), you need to create a Samba file server first.
Each collaborative organization should run a single file server. This file server will be used to store read-only shared data. The file server should be created with the following steps:
- Create a new workspace
- Select
Samba Server
application - Select size with 2 cores and 16 GB RAM
- Select data storage item, created in previous section
- Select private network
- Wait for machine to be running
- Login to machine with ssh
- Become root with
sudo -i
- Edit /etc/samba/smb.conf and in
[samba-share]
section replaceread only = no
withread only = yes
- Restart samba server with
systemctl restart smbd
- Become root with
- Populate
/data/volume_2/samba-share/
directory with training material. This directory will be shared with other machines.
See data documentation on how to populate the file server.
Steps to create a eWaterCycle workspace:
- Create a new workspace
- Select collaborative organisation (CO) for example
ewatercycle-tudelft
- Select
eWaterCycle dcache
catalog item - Select size of VM (cpus/memory) based on use case
- Select storage item for home directories. Remember item you picked as you will need it in the workspace parameters.
- Select storage item for dcache cache. Remember item you picked as you will need it in the workspace parameters.
- Fill all the workspace parameters. They should look something like If you are not interested in grading then the following parameters can be left unchanged: 'Course repository', 'Course version', 'Grader user' and 'Students'.
- Wait for machine to be running
- Visit URL/IP
- When done delete machine
End user should be invited to Collaborative organization in SRAM or created as students so they can login.
See User guide to see what users have to do to login or use GitHub repository.
Steps to create a eWaterCycle workspace:
- Create a new workspace
- Select collaborative organisation (CO) for example
ewatercycle-tudelft
- Select
eWaterCycle samba
catalog item - Select size of VM (cpus/memory) based on use case
- Select home storage item. Remember items you picked as you will need them in the workspace parameters.
- Select the private network
- Fill all the workspace parameters. They should look something like If you are not interested in grading then the following parameters can be left unchanged: 'Course repository', 'Course version', 'Grader user' and 'Students'.
- Wait for machine to be running
- Visit URL/IP
- When done delete machine
End user should be invited to Collaborative organization in SRAM or created as students so they can login.
See User guide to see what users have to do to login or use GitHub repository.
During creation you can set the students
parameter to create local posix accounts for students.
The format of the parameter value is <username1>:<password1>,<username2>:<password2>
.
Use emtpy string for no students.
Make sure to use strong passwords as anyone on the internet can access the machine.
You can use the python script create_student_passwords.py to generate passwords. To use it, create a file "usernames.txt" with one username on each line. Then call the script to generate passwords. They will be stored in a new file called students.txt
. See docs in script for more details. The passwords generated by the script should be distributed to the students.
To get example notebooks end users should goto to the machines homepage and click one of the notebook links.
These links use nbgitpuller to sync a git repo and open a notebook in it.
To restrict the memory and cpu usage of each Jupyter user, you can edit the /etc/jupyterhub/jupyterhub_config.py
file on the workspace. Add the following lines to the file:
# Each user can use at most 4G of memory and 1 CPU
c.SystemdSpawner.mem_limit = '4G'
c.SystemdSpawner.cpu_limit = 1.0
See JupyterHub Systemdspawner docs for more information.
Reload configuration with sudo systemctl restart jupyterhub
.
By default the each user can use all the memory and cpu of the machine.
In the eWaterCycle project we make Docker images. The images are hosted on Docker Hub and GitHub Container Registry. A project member can create issues here for permisison to push images to Docker Hub or GitHub Container Registry.
The documentation/software code in this repository has been generated and/or refined using GitHub CoPilot. All AI-output has been verified for correctness, accuracy and completeness, adapted where needed, and approved by the author.