Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several quetions about GCHP. #467

Open
ChenBHXMU opened this issue Jan 8, 2025 · 20 comments
Open

Several quetions about GCHP. #467

ChenBHXMU opened this issue Jan 8, 2025 · 20 comments
Assignees
Labels
category: Question Further information is requested

Comments

@ChenBHXMU
Copy link

Your name

Baihua Chen

Your affiliation

Xiamen University

Please provide a clear and concise description of your question or discussion topic.

Dear @yantosca

Happy new Year!

I am trying to use GCHP (version 14.5.0), but according to the guide (URL: https://gchp.readthedocs.io/en/stable/getting-started/quick-start.html), I encountered the following error when executing /setRestartLink.sh.
微信截图_20250108145344
However, I can find the file in the Restarts folder.
微信截图_20250108145741
Then when executing the command mpirun -np 6 ./gchp, the following error occurs.
微信截图_20250108145848
Moreover, the content of the cap_restart file is 20190701 000000. How should I set the cutoff time?

My server has three nodes, each with 96 cores. Does the standard GEOS-Chem not support multi-node parallel execution?

Thank you!

Best regards
Baihua Chen

@ChenBHXMU ChenBHXMU added the category: Question Further information is requested label Jan 8, 2025
@lizziel
Copy link
Contributor

lizziel commented Jan 8, 2025

I will transfer this issue to the GCHP github.

@lizziel lizziel transferred this issue from geoschem/geos-chem Jan 8, 2025
@lizziel
Copy link
Contributor

lizziel commented Jan 8, 2025

Hi @ChenBHXMU, regarding your first issue with setting the restart link, I suggest opening the script setRestartLink.sh and adding prints to get additional information leading up to the error message.

Regarding the second issue, the issue has to do with number of "PE" which means processing execution thread. In other words, the number of cores you are using. Did you set the number of cores in setCommonRunSettings.sh? It needs to match the number of cores you use to run, or at least be fewer than the number of cores you request (in this case 6).

@lizziel
Copy link
Contributor

lizziel commented Jan 8, 2025

Just saw your other questions. Rather than set end date in GCHP you should set the run duration. This is set in setCommonRunSettings.sh. GCHP supports MPI runs across multiple nodes. I suggest looking at the example job scripts in the sampleRunScripts subdirectory in the run directory. Note that using mpirun -np 6 ./gchp will use only 6 cores and thus not take advantage of the multi-node capability.

@ChenBHXMU
Copy link
Author

Dear @lizziel

I found that I actually lack the Restart file, as shown in the image below.
微信截图_20250109144411

But I have used the following commands to download the data. These commands are:

cd /ExtData # navigate to GEOS-Chem data mkdir InputDataCatalogs # new directory for catalog files mkdir InputDataCatalogs/14.3 # for 14.3-*-specific catalogs (example) cd InputDataCatalogs wget http://geoschemdata.wustl.edu/ExtData/DataCatalogs/MeteorologicalInputs.csv cd 14.3 wget http://geoschemdata.wustl.edu/ExtData/DataCatalogs/14.3.0/ChemistryInputs.csv wget http://geoschemdata.wustl.edu/ExtData/DataCatalogs/14.3.0/EmissionsInputs.csv wget http://geoschemdata.wustl.edu/ExtData/DataCatalogs/14.3.0/InitialConditions.csv

bashdatacatalog-fetch InputDataCatalogs/*.csv InputDataCatalogs/**/*.csv

bashdatacatalog-list -am -r 2018-06-30,2019-01-01 InputDataCatalogs/*.csv InputDataCatalogs/**/*.csv

The version of GCHP is 14.3.0 now.

Thank you for your help!

Best regards,
Baihua Chen

@lizziel
Copy link
Contributor

lizziel commented Jan 9, 2025

Great, glad it works now.

@lizziel lizziel closed this as completed Jan 9, 2025
@ChenBHXMU
Copy link
Author

Dear @lizziel

I'm sorry that my reply has caused you confusion. I'm currently missing the Restart file, but I don't know how to deal with it. I've tried downloading the data, but it still hasn't been resolved.

@lizziel lizziel reopened this Jan 9, 2025
@lizziel
Copy link
Contributor

lizziel commented Jan 9, 2025

GEOS-Chem restart files are available for download from this following location on the web:
http://geoschemdata.wustl.edu/ExtData/GEOSCHEM_RESTARTS/GC_14.5.0/

@lizziel
Copy link
Contributor

lizziel commented Jan 9, 2025

If you are now using a different version then you can check to see what location the restart file symbolic link is pointing to. For example:

cd Restarts
file GEOSChem.Restart.20190701_0000z.c24.nc4

with result:
GEOSChem.Restart.20190701_0000z.c24.nc4: symbolic link to /n/holyscratch01/external_repos/GEOS-CHEM/gcgrid/gcdata/ExtData/GEOSCHEM_RESTARTS/GC_14.5.0/GEOSChem.Restart.fullchem.20190701_0000z.c24.nc4

Then download the restart files from the path shown, replacing the part before ExtData with the geoschem.data.wustl.edu path in the above link.

@ChenBHXMU
Copy link
Author

Hi @lizziel

There is still a question I can't solve.

I‘ve downloaded these related files from http://geoschemdata.wustl.edu/ExtData/GEOSCHEM_RESTARTS/GC_14.3.0/ However, when i run ./setRestartLink.sh script, there's still a file that can't be found, as shown in the image below. I'm not sure where I can download this file.
微信截图_20250110161228

In addition, if i can solve this problem, the next step is to run the following command: srun -n 60 -N 2 -m plane=30 --mpi=pmix ./gchp. Is this correct? When I run this command now, I face the following errors.
微信截图_20250110161301

I also tried the command: mpirun -np 90 ./gchp, but there were still errors.

微信截图_20250110161332

Thank you!

@lizziel
Copy link
Contributor

lizziel commented Jan 10, 2025

There is no such file GEOSChem.Restart.EXPID:__Outpuz.c24.nc4. Somehow the setRestartLink.sh script is botching getting the time string from cap_restart. As I previously suggested you can open the script and add prints to find out what it is doing.

The pmix command is specific to the Harvard cluster. How you submit your MPI jobs may be dependent on your particular libraries. I suggest reaching out to your system administrator for your cluster on how to submit an MPI job. You can then edit the run script as needed.

@ChenBHXMU
Copy link
Author

Hi @lizziel , I recently executed the ./setRestartLink.sh script, but unfortunately encountered an issue. The error message indicated that the gchp_restart.nc4 file is missing. I'm not sure where I can obtain this file. Your guidance on this matter would be greatly appreciated. Thank you in advance for your assistance.
微信截图_20250111151027

@lizziel
Copy link
Contributor

lizziel commented Jan 13, 2025

Hi @ChenBHXMU, the script setRestartLink.sh is a bash script that you can read and edit. The content is this:

rst_link_name=gchp_restart.nc4

# Get simulation start from cap_restart
if [ -f cap_restart ]; then
   start_str=$(sed 's/ /_/g' cap_restart)
else
   echo "ERROR: Unable to set ${rst_link_name} link because cap_restart does not exist! Create cap_restart containing simulation start date with format YYYYMMDD HHmmSS."
   exit	1
fi

# Set restart name, check that file exists, and set symlink
N=$(grep "CS_RES=" setCommonRunSettings.sh | cut -c 8- | xargs)
rst_target=./Restarts/GEOSChem.Restart.${start_str:0:13}z.c${N}.nc4
if [[ -f "${rst_target}" ]]; then
   ln -nsf ${rst_target} ${rst_link_name}
   echo "Restart symlink ${rst_link_name} set to ${rst_target}"
else
  echo "ERROR: Unable to set symlink ${rst_link_name} because file ${rst_target} does not exist! Create file or link with that name, or change start date in cap_restart and/or grid resolution in setCommonRunSettings.sh to match restart file that exists."
  exit 1
fi

The line ln -nsf ${rst_target} ${rst_link_name} is creating a symbolic link called gchp_restart.nc4. It does not get created if the restart file is not found (variable rst_target). Try going through the various lines in this script at the command line rather than running the script. This should help figure out where the issue is.

@ChenBHXMU
Copy link
Author

Hi @lizziel

The question related to setRestartLink.sh is solved. Thank you. But I have other questions.

I ran this script which is as following.
#!/bin/bash
#SBATCH -N 1
#SBATCH -n 96
#SBATCH -J GCHP
cd $SLURM_SUBMIT_DIR
module load gchp_libs
date
mpirun ./gchp
date

I encountered some errors which are as follows.
微信图片_20250114144458
微信截图_20250114145032

In addition, when I changed NUM_NODES to 2, TOTAL_CORES to 192 in setCommonRunsetting.sh file, and ran ./setCommonRunsetting.sh . The errors was shown as follows.
微信图片_20250114145108

Best regards
Baihua Chen

@lizziel
Copy link
Contributor

lizziel commented Jan 14, 2025

Hi @ChenBHXMU, the error messages you are encountering explain what the issue is, such as "file not found" and "grid resolution is too low for core count requested".

@ChenBHXMU
Copy link
Author

@lizziel Excuse me for asking again, how can I download this missing file? I appreciate your patience!

@lizziel
Copy link
Contributor

lizziel commented Jan 14, 2025

We recommend GCHP users download data via the bashdatacatalog. Documentation for this is at https://gchp.readthedocs.io/en/stable/user-guide/getting-input-data.html#download-data-catalogs. You can also download from WashU as you would data from an external server, e.g. using wget. Much of the input data is the same as GC-Classic so you could also use the dry run method with GC-Classic to get most of it, and then manually get the rest based on what files you see missing when you run. See also your previous issue about downloading data for more resources on data download and the input data archive: geoschem/geos-chem#2600.

@ChenBHXMU
Copy link
Author

Hi @lizziel . I have downloaded data via the bashdatacatalog. I also used dry run method with standard GEOS-Chem and HEMCO. But these errors still can't be solved. This missing file is ship_plume_lut_02ms.txt and the other erorr is about not matching start time. I found the previous discussion, and it seems that these two errors are bugs. The specific link is: [FEATURE REQUEST] All external data files should be read in through ExtData.rc #68 and [FEATURE REQUEST] All external data files should be read in through ExtData.rc #68 I don't know how to solve this problem and hope to receive your help again.

@lizziel
Copy link
Contributor

lizziel commented Jan 15, 2025

We would like to eventually have all text files ported to netcdf for ExtData to read. For now HEMCO reads these files. This is why HEMCO is throwing the file not found error for the ship plume file.

@ChenBHXMU
Copy link
Author

Hi @lizziel I have solved the question about the missing files by dry run method, but I should change LUT data format to txt in HEMCO_Config.rc when I use GEOS-Chem to dry run. However, other errors still can't be solved. The error is as follows.
微信截图_20250116130829
微信截图_20250116153627

I also checked in GCHP.rc according to this link 413 and found that WRITE_RESTART_BY_OSERVER has been set to NO. Therefore, do I need netcdf with parallel support or the pNETCDF library? Or do you have any suggestions? Thank you for your help.

@lizziel
Copy link
Contributor

lizziel commented Jan 16, 2025

Please see the debugging page of GCHP ReadTheDocs for tips on how to debug these error messages. Also note that you may need to download additional data for GCHP if use the dry run option for GC-Classic. These files can be downloaded via wget if you are not using the bashdatacatalog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants