batch processing erorr

Post Reply
yonisherman
Posts: 5
Joined: Fri Oct 01, 2021 8:19 pm
company / institution: The City College of New York
Location: New York

batch processing erorr

Post by yonisherman »

Hi,
I am having an issue processing multi year data sets of OLCI. It was working for some time and then I get an error "OSError: [Errno 24] too many files open". Now everytime i run it after the error it gets through 36 files and errors out.

I see this was an issue in the past for someone else (see this thread- viewtopic.php?f=7&t=145&p=565&hilit=%5BErrno+24%5D#p565)
However in the thread while the issue was resolved the answer wasn't posted only mentioned in broad terms.
I understand that there is a pileup of .tmp files somewhere but I can't find them. As polymer runs I see the .tmp file be created in the outdir I designate and then the .nc file is created and the .tmp file is removed (recycle bin is empty the whole time). So I am not sure where the buildup of files occurs or where they should be closed. I have had a look in the root tmp folders and don't see anything there. any suggestions or recommendation?

Below is the script I use to run each years worth of data.

Thank you,
Jonathan Sherman

def polymer_olci_batch(data_path):
"""
Batch processing of Sentinel-3 OLCI
"""

from polymer.main import run_atm_corr, Level1, Level2
from polymer.level2_nc import Level2_NETCDF
from polymer.level1_olci import Level1_OLCI
import glob

l1_path = f'{data_path}/l1'
l2_path = f'{data_path}/l2/'

flist = sorted(glob.glob(f'{l1_path}/*'))
for fname in flist:

run_atm_corr(Level1_OLCI(fname), Level2_NETCDF(outdir=l2_path, ext='_polymer_L2.nc'), multiprocessing=-1)
# run_atm_corr(Level1_OLCI(fname), Level2_NETCDF(outdir=l2_path, ext='_polymer_L2.nc'))
User avatar
fsteinmetz
Site Admin
Posts: 314
Joined: Fri Sep 07, 2018 1:34 pm
company / institution: Hygeos
Location: Lille, France
Contact:

Re: batch processing erorr

Post by fsteinmetz »

Dear Jonathan,

There should be no pileup of tmp files. As you have noticed, there is a tmp file created for each processing, but it is moved to the final file at the end of the processing, avoiding unfinished files in case processing stops for whatever reason.
The other issue (viewtopic.php?f=7&t=145) was related to garbage file *references*, not necessarily garbage files. This may happen for example if you have a list of Level1 objects ; python would retain references to all Level1 objects, and the number of opened files may increase over time.
I don't understand where this would happen exactly in your case. My recommendations would be:
1) Move all imports at the top level. They should not be in the function.
2) Try to move the processing of a single file in a dedicated function such as process(fname), which would run only run_atm_corr(...)

I hope this helps.
Cheers,
François
Post Reply