Tales From the Help Desk: The NaN Trap Part II

Part I here

I hypothesized that the user’s code did not use the fms-io::write_data interface because the variable attribute default_data is uninitialized. Combing through the FV3GFS code base confirms my assumption; the FV3GFS uses the following fms-io routines in the FV3GFS-IO module:

  • register_restart_field
  • save_restart
  • restore_state

default_data should be set in register_restart_field via the call to setup_one_field:

if(PRESENT(data_default)) then
default_data = MPP_FILL_DOUBLE

Here, data_default is an optional argument; if it is not present, default data is set to the parameter MPP_FILL_DOUBLE, which is defined in mpp_parameter.F90  as 9.9692099683868690e+36. I’m still not sure where default_data is getting set to a NaN, so it’s time for another date with ddt.

I set a breakpoint in the first fv3gfs_io::sfc_prop_restart_write call to register_restart_field:

do num = 1,nvar2m
var2_p => sfc_var2(:,:,num)
if (trim(sfc_name2(num)) == 'sncovr') then
id_restart = register_restart_field(Sfc_restart, fn_srf,    sfc_name2(num), var2_p, domain=fv_domain, mandatory=.false.)
id_restart = register_restart_field(Sfc_restart, fn_srf, sfc_name2(num), var2_p, domain=fv_domain)

Stepping through setup_one_field shows that default_data is set to, you guessed it, NaN. I’m wondering if I just have to initialize default_data to something before assigning MPP_FILL_DOUBLE, so I set it to 0.0, recompile, and open the new executable in the debugger. No dice.

After some more mucking around, I am unable to determine the exact cause of the NaN, but suspect that it may have to do with the fact that the sfc_restart data type is declared as a module variable in fv3gfs_io, rather than locally in each of the subroutines, which means that the NaN could have originated from another call to register_restart_field that passes sfc_restart as the fileObject argument.

Since I don’t want to mess with the FV3GFS code, I opt to implement a workaround in save_default_restart in case the user just wants a short-term fix so he can continue his experiment. I declare a local variable called save_restart, and add the following lines before the loop that calls mpp_write on the restart variables:

! set default data attribute to MPP_FILL_DOUBLE if it is undefined
if (isnan(cur_var%default_data)) then
default_data = MPP_FILL_DOUBLE
default_data = cur_var%default_data

I then replace default_data=cur_var%default_data with default_data=default_data in the mpp_write calls.

Unfortunately, this solution is unsatisfactory, since it corrects the error after the fact. I consult with Boss R, who suggests that the NaN may indicate memory corruption. He points out where to put some print statements that will write the default_data values for each restart variable in the output logs. I’ll elaborate on the (hopefully) final steps I take to identify the NaN source in the final part of this debugging trilogy.

1 thought on “Tales From the Help Desk: The NaN Trap Part II

  1. Pingback: Tales from the Help Desk: The NaN Trap Part III | Legacy Code Keeper

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s