Change in Earthdata API

sakvaka_env
Posts: 25
Joined: Mon Mar 07, 2022 9:54 am
company / institution: Finnish Environment Institute
Location: Helsinki, Finland

Change in Earthdata API

Post by sakvaka_env »

There has appeared a change in Nasa's Earthdata API: if now the user tries to download a file which doesn't exist (say, an ozone product which doesn't exist like GMAO_FP.20210311T06000000.MET.NRT.nc), the server returns HTTP code 200 (OK) and a text page saying: 404 Error - File GMAO_FP.20210311T060000.MET.NRT.nc not found.

The error page is stored as a NetCDF and not identified as broken ancillary data, so opening it as a NetCDF fails later.
sakvaka_env
Posts: 25
Joined: Mon Mar 07, 2022 9:54 am
company / institution: Finnish Environment Institute
Location: Helsinki, Finland

Re: Change in Earthdata API

Post by sakvaka_env »

There doesn't seem to be an easy fix but here is a suggestion. The problem is that we need to distinguish between non-fatal assertions (like error 404) and fatal assertions (like missing or unfunctional Earthdata credentials, wget error etc.), and all temporary files need to be cleared nonetheless.

Code: Select all

--- ancillary.original  2023-10-04 13:43:30.199431318 +0300
+++ ancillary.py        2023-10-04 14:25:13.401267533 +0300
@@ -360,31 +360,45 @@
 
         assert basename(url) == basename(target)
 
+        class NonFatalException(Exception):
+            def __init__(self, message="A non-fatal exception occurred"):
+                self.message = message
+                super().__init__(self.message)
+
         with LockFile(lock):
 
             # follows https://support.earthdata.nasa.gov/index.php?/Knowledgebase/Article/View/43/21/how-to-access-urs-gated-data-with-curl-and-wget
             cmd = 'wget -nv --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --keep-session-cookies --auth-no-challenge {} -O {}'.format(url, target+'.tmp')
             ret = system(cmd)
-            if ret == 0:
+            try:
+                if ret != 0:
+                    raise Exception(f'Wget returned a non-zero error code: {ret}')
                 # sanity check
                 # raise an error in case of authentication error
                 # check that downloaded file is not HTML
                 with open(target+'.tmp', 'rb') as fp:
+                    filehead = fp.read(100)
+
                     errormsg = 'Error authenticating to NASA EarthData for downloading ancillary data. ' \
                     'Please provide authentication through .netrc. See more information on ' \
                     'https://support.earthdata.nasa.gov/index.php?/Knowledgebase/Article/View/43/21/how-to-access-urs-gated-data-with-curl-and-wget'
-                    assert not fp.read(100).startswith(b'<!DOCTYPE html>'), errormsg
+                    assert not filehead.startswith(b'<!DOCTYPE html>'), errormsg
+
+                    # may be the case after Oct 2023 when NASA changed the APIs
+                    if filehead.startswith(b'404 Error'):
+                        raise NonFatalException(filehead.decode('utf-8'))
 
                 cmd = 'mv {} {}'.format(target+'.tmp', target)
                 system(cmd)
-
-            else:
+            except NonFatalException as e:
+                print(f"A non-fatal exception occurred: {e}")
+                ret = 1
+            finally:
                 if exists(target+'.tmp'):
                     system('rm {}'.format(target+'.tmp'))
 
         return ret
 
-
     
     def try_resource(self, pattern, date):
         """
lanzhiqishi
Posts: 44
Joined: Fri May 22, 2020 7:00 pm
company / institution: university of maine
Location: maine

Re: Change in Earthdata API

Post by lanzhiqishi »

Dear sakvaka_env,

Could you provide the fixed ancillary.py file? I tried it as you follow, but still failed.

Best, Binbin
lanzhiqishi
Posts: 44
Joined: Fri May 22, 2020 7:00 pm
company / institution: university of maine
Location: maine

Re: Change in Earthdata API

Post by lanzhiqishi »

Dear sakvaka_env


Here is fixed ancillary.py file. Could you find any errors? thanks the error : line 375 try: IndentationError: expected an indented block .

Best, Binbin







assert basename(url) == basename(target)

class NonFatalException(Exception):
def __init__(self, message="A non-fatal exception occurred"):
self.message = message
super().__init__(self.message)


with LockFile(lock):

# follows https://support.earthdata.nasa.gov/inde ... l-and-wget
cmd = 'wget -nv --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --keep-session-cookies --auth-no-challenge {} -O {}'.format(url, target+'.tmp')
ret = system(cmd)
if ret == 0:
try:
if ret != 0:
raise Exception(f'Wget returned a non-zero error code: {ret}')
# sanity check
# raise an error in case of authentication error
# check that downloaded file is not HTML
with open(target+'.tmp', 'rb') as fp:
filehead = fp.read(100)

errormsg = 'Error authenticating to NASA EarthData for downloading ancillary data. ' \
'Please provide authentication through .netrc. See more information on ' \
'https://support.earthdata.nasa.gov/inde ... l-and-wget'
assert not fp.read(100).startswith(b'<!DOCTYPE html>'), errormsg
assert not filehead.startswith(b'<!DOCTYPE html>'), errormsg

# may be the case after Oct 2023 when NASA changed the APIs
if filehead.startswith(b'404 Error'):
raise NonFatalException(filehead.decode('utf-8'))

cmd = 'mv {} {}'.format(target+'.tmp', target)
system(cmd)

else:
except NonFatalException as e:
print(f"A non-fatal exception occurred: {e}")
ret = 1
finally:
if exists(target+'.tmp'):
system('rm {}'.format(target+'.tmp'))

return ret



def try_resource(self, pattern, date):
"""
Try to access pattern (string, like 'N%Y%j%H_MET_NCEP_1440x0721_f015.hdf')
at a given date
"""
target = date.strftime(join(self.directory, '%Y/%j/'+pattern))
if exists(target):
return target
url = date.strftime(self.url+pattern)

if not self.offline:
print('Trying to download', url, '... ')
sys.stdout.flush()
ret = self.download(url, target)
if ret == 0:
target = verify(target)
return target
else:
print('failure ({})'.format(ret))

return None


def find(self, date, patterns):
'''
Try to access offline or online resource defined by patterns,
at `date`
'''
for pattern in patterns:
res = [self.try_resource(pat, d) for pat, d in pattern(date)]

if None not in res:
return res
Last edited by lanzhiqishi on Thu Oct 12, 2023 11:32 am, edited 1 time in total.
lanzhiqishi
Posts: 44
Joined: Fri May 22, 2020 7:00 pm
company / institution: university of maine
Location: maine

Re: Change in Earthdata API

Post by lanzhiqishi »

Dear sakvaka_env


when i ask for this files from NASA, they told me "it doesn't appear that files are available before 2022-07-11". we could replace the MERRA2 to GMAO_FP file, the details in below website. Could you fix the bug? thanks


Best, bInbin


https://forum.earthdata.nasa.gov/viewto ... 693#p16693
sakvaka_env
Posts: 25
Joined: Mon Mar 07, 2022 9:54 am
company / institution: Finnish Environment Institute
Location: Helsinki, Finland

Re: Change in Earthdata API

Post by sakvaka_env »

lanzhiqishi wrote:
Tue Oct 10, 2023 1:25 pm
Dear sakvaka_env


Here is fixed ancillary.py file. Could you find any errors? thanks the error : line 375 try: IndentationError: expected an indented block .

Best, Binbin
My patch was in patchfile syntax, i.e., deleted lines are indicated with a minus sign as the first character and inserted lines with a plus sign as the first character. Also, you are asking help with an IndentationError, but the code you copypasted is not indented, and nevertheless I think general Python questions are outside this forum's scope.

Polymer's current logic is to try to download GMAO_FP ancillary first, and if it is missing, it tries MET_NCEPR2 for winds and AURAOMI for ozone, and if these are missing, MET_NCEP is tried for winds and TOMSOMI for ozone, and so on. The current issue, what I provided a patch for, is that Polymer cannot handle the dummy error files that Earthdata currently is returning for non-existing files.

The maintainers will probably fix this in the next release, I'm just a fellow user who's providing a temporary workaround for the community (use at own responsibility).
lanzhiqishi
Posts: 44
Joined: Fri May 22, 2020 7:00 pm
company / institution: university of maine
Location: maine

Re: Change in Earthdata API

Post by lanzhiqishi »

Dear sakvaka_env,


thank you for correct me. But i try it , can not download the ozone data "AURAOMI" files . the detail is below. thanks. Also, I copy the link to the website, it also not work. "Status: 403 Forbidden Apparently, you did not have privileges to access the requested filename", even if i use the VPN due to i am in China.


Best,binbin



Initialize MSI projection EPSG:32649
/public3/home/sc73004/.conda/envs/polymer/lib/python3.8/site-packages/pyproj/crs/crs.py:141: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable ... -in-proj-6
in_crs_string = _prepare_from_proj_string(in_crs_string)
Trying to download https://oceandata.sci.gsfc.nasa.gov/cgi ... MET.NRT.nc ...
wget: /public3/soft/anaconda/3-wxl/lib/libuuid.so.1: no version information available (required by wget)
2023-10-12 23:14:53 URL:https://oceandata.sci.gsfc.nasa.gov/get ... MET.NRT.nc [62] -> "ANCILLARY/METEO/2020/003/GMAO_FP.20200103T030000.MET.NRT.nc.tmp" [1]
A non-fatal exception occurred: 404 Error - File GMAO_FP.20200103T030000.MET.NRT.nc not found.
failure (1)
Trying to download https://oceandata.sci.gsfc.nasa.gov/cgi ... MET.NRT.nc ...
wget: /public3/soft/anaconda/3-wxl/lib/libuuid.so.1: no version information available (required by wget)
2023-10-12 23:14:57 URL:https://oceandata.sci.gsfc.nasa.gov/get ... MET.NRT.nc [62] -> "ANCILLARY/METEO/2020/003/GMAO_FP.20200103T060000.MET.NRT.nc.tmp" [1]
A non-fatal exception occurred: 404 Error - File GMAO_FP.20200103T060000.MET.NRT.nc not found.
failure (1)
Trying to download https://oceandata.sci.gsfc.nasa.gov/cgi ... MI_24h.hdf ...
wget: /public3/soft/anaconda/3-wxl/lib/libuuid.so.1: no version information available (required by wget)
2023-10-12 23:15:00 URL:https://oceandata.sci.gsfc.nasa.gov/get ... MI_24h.hdf [18734] -> "ANCILLARY/METEO/2020/003/N202000300_O3_AURAOMI_24h.hdf.tmp" [1]
Traceback (most recent call last):
sakvaka_env
Posts: 25
Joined: Mon Mar 07, 2022 9:54 am
company / institution: Finnish Environment Institute
Location: Helsinki, Finland

Re: Change in Earthdata API

Post by sakvaka_env »

Thanks for noticing that. It seems that for 2020/003 only TOAST ozone data is available, but Earthdata API returns an HTML page "403 Forbidden" when Polymer tries for AURAOMI. The page starts with <!doctype html> (lowercase), so it is missed by Polymer's assertion check, but even so this would be a non-fatal exception and should be handled like a missing data episode and not like a missing credentials issue.

It reinforces my belief that there is no easy fix...
lanzhiqishi
Posts: 44
Joined: Fri May 22, 2020 7:00 pm
company / institution: university of maine
Location: maine

Re: Change in Earthdata API

Post by lanzhiqishi »

Dear sakvaka_env

Thank you for your communication with auxdata download issues. We hope new version coming soon.

Best, Binbin
ncapon
Posts: 1
Joined: Thu Oct 26, 2023 10:09 am
company / institution: HYGEOS
Location: Lille

Re: Change in Earthdata API

Post by ncapon »

Thank you for your feedback on this problem. We have revised the handling of exceptions from NASA data downloads. This will be available with the release of the new version of Polymer.
Thank you for your contribution.

Nathan - HYGEOS Team
Post Reply