Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in extraction of thalweg polygons in the Alaska domain #8

Open
BahramKhazaei-NOAA opened this issue Sep 13, 2023 · 28 comments
Open

Comments

@BahramKhazaei-NOAA
Copy link

@cuill After fixing the problem with impi, @SorooshMani-NOAA were trying to run the pyDEM package on a DEM in the Alaska domain which is extracted from GMRT. However, there was a problem with processing the DEM and his pyDEM script failed. We then tried another DEM that has coverage over the Cook Inlet and got another error that seems like is due to existence of missing values in the DEM (it has 'no data' value in some pixels). Here are the error messages in processing of each DEM:

Cook Inlet DEM:

(meshdev) [Soroosh.Mani@compute-0001 Bahram]$ python run_serial.py
./cook_1e7

A Priority-Flood+Epsilon
C Barnes, R., Lehman, C., Mulla, D., 2014. Priority-flood: An optimal depression-filling and watershed-labeling algorithm for digital elevation models. Computers & Geosciences 62, 117–127. doi:10.1016/j.cageo.2013.04.024

c topology = D8
p Setting up boolean flood array matrix...
p Adding cells to the priority queue...
p Performing Priority-Flood+Epsilon...
t succeeded in 2.66116 s
m Cells processed = 36056748
m Cells in pits = 30266840
W W In assigning negligible gradients to depressions, some depressions rose above the surrounding cells. This implies that a larger storage type should be used. The problem occured for 70 of 36056748.
total time=19.3s: ./cook_1e7
finish processing ./cook_1e7
---------compute watershed------------------------------------
---------sort watershed number--------------------------------
computing river network
Traceback (most recent call last):
  File "/lustre/meshing/river/extract/pyDEM_Samples/Bahram/run_serial.py", line 38, in <module>
    gdf = S.write_shapefile(npt_smooth=3)
  File "/home/Soroosh.Mani/sandbox/RiverMeshTools/pyDEM/pyDEM/dem.py", line 2387, in write_shapefile
    yi,xi=self.get_coordinates(SF.data[i])
  File "/home/Soroosh.Mani/sandbox/RiverMeshTools/pyDEM/pyDEM/dem.py", line 2304, in get_coordinates
    indy,indx=unravel_index(sind,ds)
  File "<__array_function__ internals>", line 200, in unravel_index
TypeError: only int indices permitted

GMRT DEM:

(meshdev) [Soroosh.Mani@compute-0001 Bahram]$ python run_serial.py
./gmrt_1e7

A Priority-Flood+Epsilon
C Barnes, R., Lehman, C., Mulla, D., 2014. Priority-flood: An optimal depression-filling and watershed-labeling algorithm for digital elevation models. Computers & Geosciences 62, 117–127. doi:10.1016/j.cageo.2013.04.024

c topology = D8
p Setting up boolean flood array matrix...
p Adding cells to the priority queue...
p Performing Priority-Flood+Epsilon...
t succeeded in 7.31966 s
m Cells processed = 60838806
m Cells in pits = 44421818
W W In assigning negligible gradients to depressions, some depressions rose above the surrounding cells. This implies that a larger storage type should be used. The problem occured for 37223 of 60838806.
total time=36.3s: ./gmrt_1e7
finish processing ./gmrt_1e7
---------compute watershed------------------------------------
---------sort watershed number--------------------------------
computing river network
Traceback (most recent call last):
  File "/lustre/meshing/river/extract/pyDEM_Samples/Bahram/run_serial.py", line 38, in <module>
    gdf = S.write_shapefile(npt_smooth=3)
  File "/home/Soroosh.Mani/sandbox/RiverMeshTools/pyDEM/pyDEM/dem.py", line 2402, in write_shapefile
    idxs=np.squeeze(np.argwhere(np.isnan(river[:,0])))
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

These errors occur when using the sample code from pyDEM readme (run_serial.py just with updated paths for our DEMs). He tried to perform some pre-processing on both DEMs but even with fixed DEMs he got the exact same errors.

I will send you the GMRT file (too big to attach here). It would be great if you can take a look at this and see if you have a solution/fix for this.

@cuill
Copy link
Member

cuill commented Sep 13, 2023

@BahramKhazaei-NOAA The dem file has "nan" values, which may cause this error because pyDEM takes "-9999" as nodata. The gdal command to change nodata value cannot process this file:
gdal_calc.py -A dem_with_ocean.tif --calc="numpy.where(A<-100000, -9999, A)" --outfile dem_with_ocen_nodata.tif --NoDataValue=-9999

Do you have any method to change 'nan' to -9999? @BahramKhazaei-NOAA @SorooshMani-NOAA

@BahramKhazaei-NOAA
Copy link
Author

@cuill
Copy link
Member

cuill commented Sep 13, 2023

@BahramKhazaei-NOAA
Yes, just use the last rule:

# Set large negative values to -9999
#Z[Z<-9999]= -9999
# Choose your preference: (comment either rule)
#Z[Z==-9999]= np.nan
# Or 
Z[np.isnan(Z)]= -9999

@SorooshMani-NOAA
Copy link
Contributor

I'll try this out and let you know if we still see issues. Thanks @cuill

@BahramKhazaei-NOAA
Copy link
Author

I'll try this out and let you know if we still see issues. Thanks @cuill

Thanks, Soroosh.

@SorooshMani-NOAA
Copy link
Contributor

@cuill using GMRT DEM, I read the raster values and set all nan's to -9999, but I still get the same error as before:

Traceback (most recent call last):
  File "/lustre/meshing/river/extract/pyDEM_Samples/Bahram/run_serial.py", line 38, in <module>
    gdf = S.write_shapefile(npt_smooth=3)
  File "/home/Soroosh.Mani/sandbox/RiverMeshTools/pyDEM/pyDEM/dem.py", line 2402, in write_shapefile
    idxs=np.squeeze(np.argwhere(np.isnan(river[:,0])))
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed```

This is my raster now:

In [5]: with rio.open('./fixed_gmrt.tif', 'r') as ds:
   ...:     v = ds.read(1)
   ...:

In [6]: v
Out[6]:
array([[  -31.025713,   -31.001385,   -30.999046, ...,  1909.2639  ,
         1822.1892  ,  1735.6036  ],
       [  -30.996735,   -31.061066,   -31.002304, ...,  1906.1929  ,
         1839.0338  ,  1778.5045  ],
       [  -30.894072,   -30.888163,   -31.011707, ...,  1883.3633  ,
         1867.0518  ,  1846.1737  ],
       ...,
       [-5467.1333  , -5472.786   , -5457.6455  , ..., -3490.741   ,
        -3495.3052  , -3498.335   ],
       [-9999.      , -9999.      , -9999.      , ..., -9999.      ,
        -9999.      , -9999.      ],
       [-9999.      , -9999.      , -9999.      , ..., -9999.      ,
        -9999.      , -9999.      ]], dtype=float32)

@BahramKhazaei-NOAA
Copy link
Author

Thank you Soroosh for testing again. Looks like river[:,0] array is defined as a 2D array in dem.py while a 1D array is passed to the function.

@BahramKhazaei-NOAA
Copy link
Author

@cuill, any thoughts on this error message?

@cuill
Copy link
Member

cuill commented Sep 18, 2023

@BahramKhazaei-NOAA I just came back from vacation. I'll take a look at this error when I get a chance.

@BahramKhazaei-NOAA
Copy link
Author

@BahramKhazaei-NOAA I just came back from vacation. I'll take a look at this error when I get a chance.

Appreciate your help and quick responses as always. Looks like for this error something is not compatible wit the definition of the river array.

@cuill
Copy link
Member

cuill commented Sep 19, 2023

@BahramKhazaei-NOAA @SorooshMani-NOAA
I cannot reproduce this error but I got another error related to the package GEOS (see issue here shapely/shapely#1345) if using shapely2.0. So, please upgrade GEOS to 3.12.0.

I've pushed the version that worked on my side, please pull it. Here are the thalwegs extracted with a threshold of 1e4.
Screen Shot 2023-09-19 at 11 24 42 AM

@SorooshMani-NOAA
Copy link
Contributor

@cuill thanks for your feedback, so you just used the original DEMs and set all the nans to -9999 and ran pyDEM?

@cuill
Copy link
Member

cuill commented Sep 20, 2023

@SorooshMani-NOAA correct.

@BahramKhazaei-NOAA
Copy link
Author

Hi @cuill I was able to install pyDEM and run the both serial and parellel test cases provided here. However, when I try to run it based on my DEM it gives me the same error as Soroosh. Here is the error:

Traceback (most recent call last):
  File "/work2/noaa/nos-surge/bkhazaei/work/run_dir/Alaska_Modeling/pre_processing/grid/bearing_sea_ocsmesh/pyDEM/run_serial.py", line 47, in <module>
    gdf = S.write_shapefile(npt_smooth=3)
  File "/work2/noaa/nos-surge/bkhazaei/envs/miniconda3/envs/RiverMesh3_10/lib/python3.10/site-packages/pyDEM/dem.py", line 2400, in write_shapefile
    idxs=np.squeeze(np.argwhere(np.isnan(river[:,0])))
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

I made sure that the nan values in my DEM are replaced with -9999 as we discussed above. I'm using run_serial.py that is provided for the savannah river test case and didn't change anything in that script except for the path to the DEM file. I'm pasting the script here:

#!/usr/bin/env python3
import sys
import glob
import time

from pylib import *
from pyDEM.dem import *

if __name__ == '__main__':
    #input tiff files
    PathToInputDEMs = '/work2/noaa/nos-surge/bkhazaei/work/run_dir/alaska_domain/mesh/ocsmesh/cook_inlet/DEMs_inputs'
    PathToDomainBoundary = PathToInputDEMs+"/GMRTv4_1_1_20230815topo_NaN_removed.tif"
    print('Path to DEM input:', str(PathToInputDEMs))
    
    files = glob.glob(PathToDomainBoundary)
    files.sort()

    t0 = time.time()

    for fname in files:
        names = [fname]

        #output filename
        sname = f"./{fname.split('.')[0].split('/')[-1]}_1e7"
        print(sname)

        #declaer a dem object
        S=dem()

        #read data
        if not os.path.exists('{}.npz'.format(sname)):
            S.proc_demfile(names,sname=sname,depth_limit=[-100,1000],subdomain_size=2e10)
        S.read_data('{}.npz'.format(sname))

        #compuate watershed information
        S.compute_watershed()

        #extract river network (area_limit: catchment area)
        S.compute_river(acc_limit=1e7)

        #write shapefile for river network
        gdf = S.write_shapefile(npt_smooth=3)
        gdf.to_file(f'{sname}.shp')
        print(f'It took {(time.time() - t0)/60} mins!')

Did you use the same script to get the result you shared above? If not, could you please share that with me?

@cuill
Copy link
Member

cuill commented Oct 19, 2023

@BahramKhazaei-NOAA Yes, I used the same script but with a smaller threshold (acc_limit=1e4). As you can see from the thalwegs above, the threshold of 1e4 has a relatively coarse thalwegs. I think your threshold of 1e7 results in zero thalwegs which causes the error. I will change the script to throw the error if this happens. At the time being, you can test with different threshold.

@BahramKhazaei-NOAA
Copy link
Author

@BahramKhazaei-NOAA Yes, I used the same script but with a smaller threshold (acc_limit=1e4). As you can see from the thalwegs above, the threshold of 1e4 has a relatively coarse thalwegs. I think your threshold of 1e7 results in zero thalwegs which causes the error. I will change the script to throw the error if this happens. At the time being, you can test with different threshold.

Thank you for your response. I'm going to try with your recommended acc_limit.

@BahramKhazaei-NOAA
Copy link
Author

@cuill I was able to reproduce your results with acc_limit=1e4. Thank you for your help.

@BahramKhazaei-NOAA
Copy link
Author

@cuill @feiye-vims I'm running the pyDEM serial script for this GMRT DEM in northern pacific and Bering Sea. It does a good job in extraction of main channels over land and nearshore, yet in most of underwater areas I get these segments of straight lines that doesn't seem to be natural. I tried to play with the parameters in the script (acc_limit and depth limits), however, I couldn't change these patterns. Do you have a solution for this?

image

@cuill
Copy link
Member

cuill commented Oct 24, 2023

@BahramKhazaei-NOAA Normally, we don't need thalwegs in the water areas (e.g., lakes, ponds) to guide meshing. We deleted these straight lines with the pre-defined polygons.

@cuill
Copy link
Member

cuill commented Oct 24, 2023

@BahramKhazaei-NOAA It was done in either ArcGIS or QGIS.

@BahramKhazaei-NOAA
Copy link
Author

@BahramKhazaei-NOAA Normally, we don't need thalwegs in the water areas (e.g., lakes, ponds) to guide meshing. We deleted these straight lines with the pre-defined polygons.

I see. By pre-defined polygons you mean a polygon that covers the area in which the thalwegs are not needed like the under water areas?

@cuill
Copy link
Member

cuill commented Oct 24, 2023

@BahramKhazaei-NOAA No, in the opposite way, a polygon that covers areas where thalwegs are needed. Then after clipping, thalwegs outside the polygon are removed.

@BahramKhazaei-NOAA
Copy link
Author

@BahramKhazaei-NOAA No, in the opposite way, a polygon that covers areas where thalwegs are needed. Then after clipping, thalwegs outside the polygon are removed.

Sounds good, thank you for your quick response.

@feiye-vims
Copy link
Member

feiye-vims commented Oct 24, 2023

When we created the RiverMeshTool, we already had polygons manually made that roughly delineate the shoreline of the ocean and large lakes/estuaries. These waterbodies are considered well resolved so the thalwegs inside them were clipped out.

The target of the RiverMeshTool are rivers that have a clear direction, so we always cut out lakes. However, even if you retain the thalwegs in the lakes, it won't do much harm for the final mesh.

I would always cut out the ocean part though.

@BahramKhazaei-NOAA
Copy link
Author

When we created the RiverMeshTool, we already had polygons manually made that roughly delineate the shoreline of the ocean and large lakes/estuaries. These waterbodies are considered well resolved so the thalwegs inside them were clipped out.

The target of the RiverMeshTool are rivers that have a clear direction, so we always cut out lakes. However, even if you retain the thalwegs in the lakes, it won't do much harm for the final mesh.

I would always cut out the ocean part though.

That is reasonable.

As a side note for future developments: by taking into account the open water areas, the tool can be used to inform mesh generation process for important underwater features like ridges and continental slopes. These features might not be of immediate importance for physical models but could be important for local or biological applications.

@feiye-vims
Copy link
Member

Saeed also mentioned this and I agree. I'm taking a note on this for future development.

I'm thinking maybe using the location of largest bathymetry gradient to identify the edge of an underwater dredged channel or ridges.

@BahramKhazaei-NOAA
Copy link
Author

Saeed also mentioned this and I agree. I'm taking a note on this for future development.

I'm thinking maybe using the location of largest bathymetry gradient to identify the edge of an underwater dredged channel or ridges.

That is probably the best strategy to start with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants