Part I here
I hypothesized that the user’s code did not use the fms-io::write_data interface because the variable attribute default_data is uninitialized. Combing through the FV3GFS code base confirms my assumption; the FV3GFS uses the following fms-io routines in the FV3GFS-IO module:
- register_restart_field
- save_restart
- restore_state
default_data should be set in register_restart_field via the call to setup_one_field:
if(PRESENT(data_default)) then
default_data=data_default
else
default_data = MPP_FILL_DOUBLE
endif
Here, data_default is an optional argument; if it is not present, default data is set to the parameter MPP_FILL_DOUBLE, which is defined in mpp_parameter.F90 as 9.9692099683868690e+36. I’m still not sure where default_data is getting set to a NaN, so it’s time for another date with ddt.
I set a breakpoint in the first fv3gfs_io::sfc_prop_restart_write call to register_restart_field:
do num = 1,nvar2m
var2_p => sfc_var2(:,:,num)
if (trim(sfc_name2(num)) == 'sncovr') then
id_restart = register_restart_field(Sfc_restart, fn_srf, sfc_name2(num), var2_p, domain=fv_domain, mandatory=.false.)
else
id_restart = register_restart_field(Sfc_restart, fn_srf, sfc_name2(num), var2_p, domain=fv_domain)
endif
enddo
Stepping through setup_one_field shows that default_data is set to, you guessed it, NaN. I’m wondering if I just have to initialize default_data to something before assigning MPP_FILL_DOUBLE, so I set it to 0.0, recompile, and open the new executable in the debugger. No dice.
After some more mucking around, I am unable to determine the exact cause of the NaN, but suspect that it may have to do with the fact that the sfc_restart data type is declared as a module variable in fv3gfs_io, rather than locally in each of the subroutines, which means that the NaN could have originated from another call to register_restart_field that passes sfc_restart as the fileObject argument.
Since I don’t want to mess with the FV3GFS code, I opt to implement a workaround in save_default_restart in case the user just wants a short-term fix so he can continue his experiment. I declare a local variable called save_restart, and add the following lines before the loop that calls mpp_write on the restart variables:
! set default data attribute to MPP_FILL_DOUBLE if it is undefined
if (isnan(cur_var%default_data)) then
default_data = MPP_FILL_DOUBLE
else
default_data = cur_var%default_data
endif
I then replace default_data=cur_var%default_data
with default_data=default_data
in the mpp_write calls.
Unfortunately, this solution is unsatisfactory, since it corrects the error after the fact. I consult with Boss R, who suggests that the NaN may indicate memory corruption. He points out where to put some print statements that will write the default_data values for each restart variable in the output logs. I’ll elaborate on the (hopefully) final steps I take to identify the NaN source in the final part of this debugging trilogy.
Pingback: Tales from the Help Desk: The NaN Trap Part III | Legacy Code Keeper