1st AtmoRep roadmap meeting

02-09-2024 1st AtmoRep roadmap meeting

General

Start: 15:15, end: 16:35
Participants: Christian, Michael L., Nikolay, Simon, Enxhi, Nishant, Michael T., Ankit, Ilaria, Julius, Kacper, David

Meeting notes

support for auto-regressive global rollout
- first prototype for forecasting planned for end of September
- approach:
  - forecasting using latent space from trained Multiformer
  - pretrained Multiformer probably frozen for forecasting network/'engine'
  - latent embeddings of neighborhood patched together to form global representation and propagate it into the future
  - project latent embeddings for local neighborhood into a smaller latent representation with an adapter (because it would be too big otherwise)
  - 'read-out head' based on attention-mechanism
  - put forward in time with a Forecaster transformer tail network
  - overlapping tokens in the predictions to avoid spurious artefacts
- Christian will draw from experience with forecasting from satellite observations (pursued work at ECMWF)
- hyperparameters: size of latent space, overlapping tiles, input window size (along time dimension)
- David questioned for potential normalization issues -> Christian does not think that relevant issues will arise from it
different/multiple data streams
- Kacper reports on experience with FESOM data
  - Multifield sampler cannot handle FESOM data currently -> abstract dataloader in parent class and write subclasses for data loaders of specific datasets?
  - adaptation of data loader should be performed in a coordinated way
  - suggestion: write an abstract data loader class, child classes for specific datasets
- Christian: current data loader should in principle support integration -> bugs and limitations to flexibility are possible
  - expected structure of data: (time, variable, levels, latitude, longitude)
- Christian: No need for abstract dataloader class, one rather needs a dataset-specific tokenizer
- time-dimension is shared by all dataset (except static data) -> also dimension for chunking samples
  - chunking on spatial dimension: may have hick-ups when loading from sampled neighborhoods (/I/O-bound, especially on JUST)
- how to handle other vertical level schemes, e.g. sub-surface and surface data
  - should already be supported with existing code
- AWI: require separate tokenization and/or embedding network for non-regular grids
- Kacper: normalization coefficients in the same file is inflexible -> should have switch on-the-fly; Christian: ECMWF software developers do not agree, rather have two set of files
- Kacper and Nikolay can meet with Ilaria and Christian to check adaptation of zarr conversion file
- David: demands for script for profiling the data loader; Nvidia provides comprehensive profilers (Simon and Michael T. are potentially in charge for this)
  - issue: nvidia-smi cannot distinguish between data loading and optimization
  - David plans to do this, but should collaborate with Simon
- Christian notes that data loader does not need be too generalized right now to enable progress in the next months
- agreed target: Multifield data sampler to be generalized to different datasets, later to multiple datasets at the same time
brief update by Julius and David
- StratoRep:
  - plan: start training run with model as it is (current AtmoRep version)
  - data must be first downloaded and processed/converted to zarr
  - separate meeting between Julius, Enxhi and Michael to coordinate data retrieval and provision
  - Note that conversion script for ERA5-data is already available
- Feedback from David:
  - support to develop auto-regressive rollout from his team from next month on
  - get familiar with code as To-Do for upcoming weeks
  - new PostDoc will start in October

The AtmoRep Collaboration - last update: April 2024

Website: www.atmorep.org
arXiv: link
analysis: analysis code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1st AtmoRep roadmap meeting

02-09-2024 1st AtmoRep roadmap meeting

General

Meeting notes

Clone this wiki locally