Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RT baseline check is failing for coastal_scituateharbor_atm2fvc #2

Open
uturuncoglu opened this issue Feb 14, 2024 · 8 comments
Open
Labels
bug Something isn't working

Comments

@uturuncoglu
Copy link
Collaborator

@janahaddad @pvelissariou1 @saeed-moghimi-noaa I created baselines for supported test on Hercules and run the tests against baseline. The coastal_scituateharbor_atm2fvc test is failing as following,

==> logs/log_hercules/rt_002_coastal_scituateharbor_atm2fvc_intel.log <==

baseline dir = /work2/noaa/nems/tufuk/RT/NEMSfv3gfs/develop-20240126/coastal_scituateharbor_atm2fvc_intel
working dir  = /work2/noaa/stmp/tufuk/stmp/tufuk/FV3_RT/rt_732961/coastal_scituateharbor_atm2fvc_intel
Checking test 002 coastal_scituateharbor_atm2fvc_intel results ....
 Comparing OUTPUT/sci_0001.nc ............ALT CHECK......NOT OK
 Comparing OUTPUT/sci_restart_0001.nc ............ALT CHECK......NOT OK


Test 002 coastal_scituateharbor_atm2fvc_intel FAIL

It seems that we have reproducibility issue in here that needs to be solved. I don't think this configuration is ever tested agains the baseline before.

@uturuncoglu uturuncoglu added the bug Something isn't working label Feb 14, 2024
@uturuncoglu uturuncoglu self-assigned this Feb 14, 2024
@uturuncoglu
Copy link
Collaborator Author

uturuncoglu commented Feb 14, 2024

I also checked this one with NCAR's cprnc tool and lots of fields differ. I could not check the mediator history file since they are not included into the baseline but I'll change the test and add mediator history field to see them.

@uturuncoglu
Copy link
Collaborator Author

I check it again by including the mediator history file and it seems that we don't have issue with mediator history files. So, this shows that the data going to FVCOM is same but the answer is changing in the model side. I need to check the model cap and internal to see what is going on there but at least we could put mediator and data atmosphere out of picture.

baseline dir = /work2/noaa/nems/tufuk/RT/NEMSfv3gfs/develop-20240126/coastal_scituateharbor_atm2fvc_intel
working dir  = /work2/noaa/stmp/tufuk/stmp/tufuk/FV3_RT/rt_1112332/coastal_scituateharbor_atm2fvc_intel
Checking test 001 coastal_scituateharbor_atm2fvc_intel results ....
 Comparing OUTPUT/sci_0001.nc ............ALT CHECK......NOT OK
 Comparing OUTPUT/sci_restart_0001.nc ............ALT CHECK......NOT OK
 Comparing sci.cpl.hi.2022-06-22-00000.nc .........OK
 Comparing sci.cpl.hi.2022-06-23-00000.nc .........OK
 Comparing sci.cpl.hi.2022-06-24-00000.nc .........OK

@uturuncoglu
Copy link
Collaborator Author

This also failing on Orion agains the baseline,

baseline dir = /work/noaa/nems/tufuk/RT/NEMSfv3gfs/develop-20240126/coastal_scituateharbor_atm2fvc_intel
working dir  = /work/noaa/stmp/tufuk/stmp/tufuk/FV3_RT/rt_100793/coastal_scituateharbor_atm2fvc_intel
Checking test 002 coastal_scituateharbor_atm2fvc_intel results ....
 Comparing OUTPUT/sci_0001.nc ............ALT CHECK......NOT OK
 Comparing OUTPUT/sci_restart_0001.nc ............ALT CHECK......NOT OK
 Comparing sci.cpl.hi.2022-06-22-00000.nc .........OK
 Comparing sci.cpl.hi.2022-06-23-00000.nc .........OK
 Comparing sci.cpl.hi.2022-06-24-00000.nc .........OK

Binary file /work/noaa/stmp/tufuk/stmp/tufuk/FV3_RT/rt_100793/coastal_scituateharbor_atm2fvc_intel/out matches
Binary file /work/noaa/stmp/tufuk/stmp/tufuk/FV3_RT/rt_100793/coastal_scituateharbor_atm2fvc_intel/out matches

Test 002 coastal_scituateharbor_atm2fvc_intel FAIL

@janahaddad
Copy link
Collaborator

@uturuncoglu adding these FVCOM issues to ufs-coastal-project backlog for now...

@uturuncoglu
Copy link
Collaborator Author

@janahaddad Thanks. It sounds fine.

@uturuncoglu
Copy link
Collaborator Author

I check the first and second time step output of the model. The initial timesetp all the data is identical with the baseline but in the second time step ua and va fields have some difference.

 ua   (nele,time)  t_index =      1     1
          6    11153  (  1794,     1) (  7958,     1) ( 11033,     1) ( 11033,     1)
               11153   5.447403192520142E-01  -4.624237418174744E-01 1.9E-09  1.658797264099121E-02 4.5E-11  1.658797264099121E-02
               11153   5.447403192520142E-01  -4.624237418174744E-01          1.658797450363636E-02          1.658797450363636E-02
               11153  (  1794,     1) (  7958,     1)
          avg abs field values:    1.550020277500153E-02    rms diff: 2.2E-11   avg rel diff(npos):  4.5E-11
                                   1.550020277500153E-02                        avg decimal digits(ndif):  7.1 worst:  6.9
 RMS ua                               2.2055E-11            NORMALIZED  1.4229E-09
 va   (nele,time)  t_index =      1     1
          1    11153  (     1,     1) (  2066,     1) ( 11113,     1) ( 11113,     1)
               11153   2.148977994918823E+00  -4.877623021602631E-01 3.0E-08  3.256952166557312E-01 8.2E-12  3.256952166557312E-01
               11153   2.148977994918823E+00  -4.877623021602631E-01          3.256952464580536E-01          3.256952464580536E-01
               11153  (     1,     1) (  2066,     1)
          avg abs field values:    2.064534649252892E-02    rms diff: 2.8E-10   avg rel diff(npos):  8.2E-12
                                   2.064534649252892E-02                        avg decimal digits(ndif):  7.0 worst:  7.0
 RMS va                               2.8220E-10            NORMALIZED  1.3669E-08

@uturuncoglu
Copy link
Collaborator Author

It seems that there are some level of precision loss in here. The pointer that is returned from ESMF fields are r8 (double) but when it is set to internal data structures r4 is used for the temporary variables. To force the FVCOM to use double precision, -DDOUBLE_PRECISION=ON is added to compile step of the test. Still testing.

@uturuncoglu
Copy link
Collaborator Author

I run the code with -DDOUBLE_PRECISION=ON and generate baseline in double data type and checked agains it. The issue is still same. I also set dbug option to true and wrote incoming wind fields as VTK just after they are imported by FVCOM and they looks identical in all time steps. So, the current hipotestis is having change in those fields after they imported from coupling. This could be in outputting part of the code or setting model internal data structures with the coupled fields. Needs more digging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Blocked
Development

No branches or pull requests

2 participants