Pecan: Error in PEcAn.data.atmosphere::download.CRUNCEP()

Created on 13 Sep 2017  ·  21Comments  ·  Source: PecanProject/pecan

When using the latest develop branch I am getting this error when trying to use PEcAn.data.atmosphere::download.CRUNCEP()

> remotefunc <- function() {PEcAn.data.atmosphere::download.CRUNCEP(site_id=1000000105, lat.in=9.154, lon.in=-79.848, model=NULL, scenario=NULL, ensemble_member=NULL, overwrite=FALSE, outfolder='/data/Model_Output/pecan.output/dbfiles/CRUNCEP_site_1-105/', start_date='1990-01-01', end_date='2004-12-31')}
> remoteout <- remotefunc()
2017-09-13 08:30:22 INFO   [PEcAn.logger::logger.info] : 
   Downloading 
   /data/Model_Output/pecan.output/dbfiles/CRUNCEP_site_1-105//CRUNCEP.1990.nc 
2017-09-13 08:30:22 INFO   [PEcAn.logger::logger.info] : 
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_tair_1990_v1.nc4 
2017-09-13 08:30:32 INFO   [PEcAn.logger::logger.info] : 
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_1990_v1.nc4 
2017-09-13 08:30:40 INFO   [PEcAn.logger::logger.info] : 
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_press_1990_v1.nc4 
2017-09-13 08:30:47 INFO   [PEcAn.logger::logger.info] : 
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_swdown_1990_v1.nc4 
2017-09-13 08:30:52 INFO   [PEcAn.logger::logger.info] : 
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_uwind_1990_v1.nc4 
2017-09-13 08:30:59 INFO   [PEcAn.logger::logger.info] : 
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_vwind_1990_v1.nc4 
Error in Rsx_nc4_get_vara_double: NetCDF: Index exceeds dimension bound
Var: time  Ndims: 1   Start: 0 Count: 1460
Error in ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id, default_missval_ncdf4(),  : 
  C function R_nc4_get_vara_double returned error
Calls: remotefunc -> <Anonymous> -> <Anonymous> -> ncvar_get_inner
Execution halted

Description

Bug fix. I think the issue is the use of "verbose" in the call to ncdf4::ncvar_def

It should be verbose= TRUE/FALSE but currently in the function it is:

      var.list[[j]] <- ncdf4::ncvar_def(name = as.character(var$CF.name[j]),
                                 units = as.character(var$units[j]),
                                 dim = dim,
                                 missval = -999,
                                 verbose = verbose)

Which leads to this error:

>       var.list[[j]] <- ncdf4::ncvar_def(name = as.character(var$CF.name[j]),
+                                  units = as.character(var$units[j]),
+                                  dim = dim,
+                                  missval = -999,
+                                  verbose = verbose)
Error in ncdf4::ncvar_def(name = as.character(var$CF.name[j]), units = as.character(var$units[j]),  :
  object 'verbose' not found

But if you change to verbose = TRUE

>       var.list[[j]] <- ncdf4::ncvar_def(name = as.character(var$CF.name[j]),
+                                  units = as.character(var$units[j]),
+                                  dim = dim,
+                                  missval = -999,
+                                  verbose = TRUE)
[1] "ncvar_def: entering"
[1] "ncvar_def: prec= float"
[1] "ncvar_def: making ncvar object for var eastward_wind"

But we probably want verbose = FALSE?

Context

Bug fix

Possible Implementation

NA

All 21 comments

With verbose on

Loading required package: PEcAn.benchmark
>  download.CRUNCEP(site_id=2000000003, lat.in=9.280225, lon.in=-79.975527, model=NULL, scenario=NULL, ensemble_member=NULL, overwrite=TRUE, outfolder='/data/Model_Output/p2017-09-13 09:16:10 INFO   [PEcAn.logger::logger.info] : 01/01', end_date='2002/12/31')
   Downloading
   /data/Model_Output/pecan.output/dbfiles/CRUNCEP_site_2-3//CRUNCEP.2001.nc
2017-09-13 09:16:10 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_tair_2001_v1.nc4
[1] "ncvar_def: entering"
[1] "ncvar_def: prec= float"
[1] "ncvar_def: making ncvar object for var air_temperature"
2017-09-13 09:16:17 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_2001_v1.nc4
[1] "ncvar_def: entering"
[1] "ncvar_def: prec= float"
[1] "ncvar_def: making ncvar object for var surface_downwelling_longwave_flux_in_air"

OK, so turing on verbose helped it through one year but then failed on the next?

2017-09-13 09:16:55 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_qair_2001_v1.nc4
[1] "ncvar_def: entering"
[1] "ncvar_def: prec= float"
[1] "ncvar_def: making ncvar object for var specific_humidity"
2017-09-13 09:17:05 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_rain_2001_v1.nc4
[1] "ncvar_def: entering"
[1] "ncvar_def: prec= float"
[1] "ncvar_def: making ncvar object for var precipitation_flux"
2017-09-13 09:17:14 INFO   [PEcAn.logger::logger.info] :
   Downloading
   /data/Model_Output/pecan.output/dbfiles/CRUNCEP_site_2-3//CRUNCEP.2002.nc
2017-09-13 09:17:14 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_tair_2002_v1.nc4
Error in Rsx_nc4_get_vara_double: NetCDF: file not found
Var: time  Ndims: 1   Start: 0 Count: 1460
Error in ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id, default_missval_ncdf4(),  :
  C function R_nc4_get_vara_double returned error

And if I go to the output I can see the files:

-rw-rw-r--. 1 apache test 59760 Sep 13 09:17 CRUNCEP.2001.nc
-rw-rw-r--. 1 apache test 59732 Dec  5  2016 CRUNCEP.2002.nc
-rw-rw-r--. 1 apache test 59732 Dec  5  2016 CRUNCEP.2003.nc
-rw-rw-r--. 1 apache test 59892 Dec  5  2016 CRUNCEP.2004.nc
-rw-rw-r--. 1 apache test 59732 Dec  5  2016 CRUNCEP.2005.nc
-rw-rw-r--. 1 apache test 59732 Dec  5  2016 CRUNCEP.2006.nc

Strange, if I try to check the next file, 2002 lwdown it does suggest it isn;t there?

> ncdf4::nc_open('https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_2002_v1.nc4')
Error in Rsx_nc4_get_vara_double: NetCDF: file not found
Var: time  Ndims: 1   Start: 0 Count: 1460
Error in ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id, default_missval_ncdf4(),  :
  C function R_nc4_get_vara_double returned error

But on the web I can see the file?

screen shot 2017-09-13 at 9 23 17 am

Ugh! Wait now files that did work aren't workign

> ncdf4::nc_open('https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_tair_2001_v1.nc4')
Error in Rsx_nc4_get_vara_double: NetCDF: file not found
Var: time  Ndims: 1   Start: 0 Count: 1460
Error in ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id, default_missval_ncdf4(),  :
  C function R_nc4_get_vara_double returned error

OK, not really sure what the deal is but it seems fairly spotty

2017-09-13 09:26:18 INFO   [PEcAn.logger::logger.info] :
   Downloading
   /data/Model_Output/pecan.output/dbfiles/CRUNCEP_site_2-3//CRUNCEP.2002.nc
2017-09-13 09:26:18 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_tair_2002_v1.nc4
[1] "ncvar_def: entering"
[1] "ncvar_def: prec= float"
[1] "ncvar_def: making ncvar object for var air_temperature"
2017-09-13 09:26:24 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_2002_v1.nc4
[1] "ncvar_def: entering"
[1] "ncvar_def: prec= float"
[1] "ncvar_def: making ncvar object for var surface_downwelling_longwave_flux_in_air"
2017-09-13 09:26:30 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_press_2002_v1.nc4
Error in Rsx_nc4_get_vara_double: NetCDF: file not found
Var: press  Ndims: 3   Start: 0,161,200 Count: 1460,1,1
Error in ncvar_get_inner(ncid2use, varid2use, nc$var[[li]]$missval, addOffset,  :
  C function R_nc4_get_vara_double returned error

Do we need to add some sort of try/catch or delay? It seems like if there is a slight delay from THREDDS is crashes, perhaps?

Added a 3 second delay to the function:

      # This throws an error if file not found
      dap <- ncdf4::nc_open(dap_file, verbose=FALSE)
      Sys.sleep(3)

at line 101

That made it work for 2 years

-rw-rw-r--. 1 apache test 59760 Sep 13 09:36 CRUNCEP.2001.nc
-rw-rw-r--. 1 apache test 59760 Sep 13 09:37 CRUNCEP.2002.nc

OK, so there definitely is an issue, at least on modex, where a single year crashes but if I keep trying it will then work

[1] "ncvar_def: making ncvar object for var air_temperature"
2017-09-13 09:38:37 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_1999_v1.nc4
Error in Rsx_nc4_get_vara_double: NetCDF: file not found
Var: time  Ndims: 1   Start: 0 Count: 1460
Error in ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id, default_missval_ncdf4(),  :
  C function R_nc4_get_vara_double returned error
>  download.CRUNCEP(site_id=2000000003, lat.in=9.280225, lon.in=-79.975527, model=NULL, scenario=NULL, ensemble_member=NULL, overwrite=TRUE, outfolder='/data/Model_Output/p2017-09-13 09:38:46 INFO   [PEcAn.logger::logger.info] : 01/01', end_date='2002/12/31')
   Downloading
   /data/Model_Output/pecan.output/dbfiles/CRUNCEP_site_2-3//CRUNCEP.1999.nc
2017-09-13 09:38:46 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_tair_1999_v1.nc4
Error in Rsx_nc4_get_vara_double: NetCDF: file not found
Var: time  Ndims: 1   Start: 0 Count: 1460
Error in ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id, default_missval_ncdf4(),  :
  C function R_nc4_get_vara_double returned error
>  download.CRUNCEP(site_id=2000000003, lat.in=9.280225, lon.in=-79.975527, model=NULL, scenario=NULL, ensemble_member=NULL, overwrite=TRUE, outfolder='/data/Model_Output/p2017-09-13 09:38:49 INFO   [PEcAn.logger::logger.info] : 01/01', end_date='2002/12/31')
   Downloading
   /data/Model_Output/pecan.output/dbfiles/CRUNCEP_site_2-3//CRUNCEP.1999.nc
2017-09-13 09:38:49 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_tair_1999_v1.nc4
[1] "ncvar_def: entering"
[1] "ncvar_def: prec= float"
[1] "ncvar_def: making ncvar object for var air_temperature"
2017-09-13 09:38:59 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_1999_v1.nc4

@robkooper @mdietze @ashiklom @jsimkins2 Any thoughts on this? Run-down, if I re-run it will eventually pull the data....but in some cases it might take a few attempts....there seems to be some issue with nc_open getting a delay via thredds and then it fails saying the file isn't there, when it actually is.....perhaps that is just a modex issue due to the firewall?

But if I keep re-running it seems it will eventually get the data that it was failing on

Perhaps we had some robust error handling to try and restart a few times before ultimately failing if indeed THREDDS is down?

I was also getting this error on the VM. And when I reran from rstudio I still got the same error.

Just saw this. We've run into this before here: #1546 and we would just rerun it and it would download without an error, making it hard to replicate and figure out what's wrong. We thought it could be an issue specific to our machine. Good to know it's not.

I agree with creating some robust error handling.

I think it has as much to do with the DOE THREDDS server either doing something like limiting max number of connections or preventing rapid number of calls to the server to prevent denial of service attacks or something. I know @jsimkins2 has had communication in the past with them about non reliable calls. I think the function isn't doing anything wrong.

One idea would be to add the sleep, detect error, and try three times before giving up?

@ankurdesai I agree with this assessment. I already add a Sys.sleep, which did help but I want to add a 10 retry to the function to give it enough chances to get a successful download.

@ankurdesai @serbinsh Does that mean there's a better time of day to try doing it? Like should I be downloading at night instead of in the middle of the day?

@araiho Not sure...I think it is an IT issue really. As Ankur mentioned, servers don't like rapid multi file requests as it can look like a DoS attack

What about something like:

retry <- function(expr, isError=function(x) "try-error" %in% class(x), maxErrors=5, sleep=0) {
  attempts = 0
  retval = try(eval(expr))
  while (isError(retval)) {
    attempts = attempts + 1
    if (attempts >= maxErrors) {
      msg = sprintf("retry: too many retries [[%s]]", capture.output(str(retval)))
      PEcAn.logger::logger.warn(msg)
      stop(msg)
    } else {
      msg = sprintf("retry: error in attempt %i/%i [[%s]]", attempts, maxErrors, 
                    capture.output(str(retval)))
      PEcAn.logger::logger.warn(msg)
      #warning(msg)
    }
    if (sleep > 0) Sys.sleep(sleep)
    retval = try(eval(expr))
  }
  return(retval)
}

where you call it as

dap <- retry(ncdf4::nc_open(dap_file, verbose=FALSE),maxErrors=10, sleep=2)

Produces something like

> retry(ncdf4::nc_open('https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_1999_v1.Error in Rsx_nc4_get_vara_double: NetCDF: file not found
Var: time  Ndims: 1   Start: 0 Count: 1460
Error in ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id, default_missval_ncdf4(),  :
  C function R_nc4_get_vara_double returned error
2017-09-13 10:50:17 WARN   [#13: PEcAn.logger::logger.warn] :
   retry: error in attempt 1/10 [[Class 'try-error' atomic [1:1] Error in
   ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id,
   default_missval_ncdf4(), : ]] retry: error in attempt 1/10 [[ C function
   R_nc4_get_| __truncated__]] retry: error in attempt 1/10 [[ ..- attr(*,
   "condition")=List of 2]] retry: error in attempt 1/10 [[ .. ..$ message:
   chr "C function R_nc4_get_vara_double returned error"]] retry: error in
   attempt 1/10 [[ .. ..$ call : language
   ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id,
   default_missval_ncdf4(), verbose = verbose)]] retry: error in attempt
   1/10 [[ .. ..- attr(*, "class")= chr [1:3] "simpleError" "error"
   "condition"]]
File https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_1999_v1.nc4 (NC_FORMAT_CLASSIC):

     6 variables (excluding dimension variables):
        int crs[]
            semi_major_axis: 6370997
            grid_mapping_name: latitude_longitude
            inverse_flattening: 0
        double lat_bnds[nv,lat]
            _ChunkSizes: 360
             _ChunkSizes: 2
        double lon_bnds[nv,lon]
            _ChunkSizes: 720
             _ChunkSizes: 2
        double time_bnds[nv,time]
            _ChunkSizes: 1
             _ChunkSizes: 2
        float lwdown[lon,lat,time]
            cell_methods: area:mean time:sum
            grid_mapping: crs
            long_name: Incoming_Long_Wave_Radiation
            units: W/m2
            missing_value: -999
            _FillValue: -999
            _ChunkSizes: 1
             _ChunkSizes: 360
             _ChunkSizes: 720
        int mask[lon,lat]
            _ChunkSizes: 360
             _ChunkSizes: 720

     4 dimensions:
        time  Size:1460   *** is unlimited ***
            bounds: time_bnds
            long_name: time
            units: days since 1700-01-01T00:00:00Z
            standard_name: time
            calendar: gregorian
            _ChunkSizes: 1
        lat  Size:360
            units: degrees_north
            standard_name: latitude
            bounds: lat_bnds
            _ChunkSizes: 360
        lon  Size:720
            units: degrees_east
            standard_name: longitude
            bounds: lon_bnds
            _ChunkSizes: 720
        nv  Size:2

    4 global attributes:
        model_name: STOMATE-SECHIBA-LPJ
        history: Tue Mar  1 10:57:59 2011: ncks -d time_counter,143176,144635 ncep9_lwdown_1901_2010_2D.nc cruncepV4_leap_lwdown_1999.nc
        Conventions: CF-1.4
        DODS_EXTRA.Unlimited_Dimension: time
Warning message:
In eval(expr) : restarting interrupted promise evaluation
>

Seems to be working

2017-09-13 11:12:28 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_lwdown_2000_v1.nc4
Error in Rsx_nc4_get_vara_double: NetCDF: file not found
Var: time  Ndims: 1   Start: 0 Count: 1464
Error in ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id, default_missval_ncdf4(),  :
  C function R_nc4_get_vara_double returned error
In addition: Warning message:
In eval(expr) : restarting interrupted promise evaluation
2017-09-13 11:12:29 WARN   [PEcAn.logger::logger.warn] :
   retry: error in attempt 1/10 [[Class 'try-error' atomic [1:1] Error in
   ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id,
   default_missval_ncdf4(), : ]] retry: error in attempt 1/10 [[ C function
   R_nc4_get_| __truncated__]] retry: error in attempt 1/10 [[ ..- attr(*,
   "condition")=List of 2]] retry: error in attempt 1/10 [[ .. ..$ message:
   chr "C function R_nc4_get_vara_double returned error"]] retry: error in
   attempt 1/10 [[ .. ..$ call : language
   ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id,
   default_missval_ncdf4(), verbose = verbose)]] retry: error in attempt
   1/10 [[ .. ..- attr(*, "class")= chr [1:3] "simpleError" "error"
   "condition"]]

Its handling hiccups in the function and continuing to run, so far

@serbinsh where did you put that retry function?

Here is an example where it retry several times

In eval(expr) : restarting interrupted promise evaluation
2017-09-13 11:12:55 WARN   [PEcAn.logger::logger.warn] :
   retry: error in attempt 5/10 [[Class 'try-error' atomic [1:1] Error in
   ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id,
   default_missval_ncdf4(), : ]] retry: error in attempt 5/10 [[ C function
   R_nc4_get_| __truncated__]] retry: error in attempt 5/10 [[ ..- attr(*,
   "condition")=List of 2]] retry: error in attempt 5/10 [[ .. ..$ message:
   chr "C function R_nc4_get_vara_double returned error"]] retry: error in
   attempt 5/10 [[ .. ..$ call : language
   ncvar_get_inner(d$dimvarid$group_id, d$dimvarid$id,
   default_missval_ncdf4(), verbose = verbose)]] retry: error in attempt
   5/10 [[ .. ..- attr(*, "class")= chr [1:3] "simpleError" "error"
   "condition"]]
2017-09-13 11:13:10 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_swdown_2000_v1.nc4
2017-09-13 11:13:19 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_uwind_2000_v1.nc4
2017-09-13 11:13:32 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_vwind_2000_v1.nc4
2017-09-13 11:13:43 INFO   [PEcAn.logger::logger.info] :
   https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1220/mstmip_driver_global_hd_climate_qair_2000_v1.nc4

@araiho I put it in base/utils.R on my develop branch. I will send a PR soon for comments

Closing this, because I think it has been resolved by a combination @serbinsh 's retry code #1651 and by implementing an alternative, more reliable remote data access method (NetCDF subset, or <method>ncss</method> in the PEcAn XML) by me in #2153.

Was this page helpful?
0 / 5 - 0 ratings