Hi,
Thanks for Polymer. Very nice package. I am running against copernicus imagery in a high-performance computing environment, so I am running polymer in many dozens of containers at once. In most cases it is working perfectly, but in about 5% or less of cases I get an error from within the multiprocessing stack. It usually surfaces as a bus error or a segmentation fault followed by a broken pipe. It seems to occur more frequently in restricted memory situations - up to 15% of the time when I am running on only a couple of cores with 32G of memory, down to about 5% of the runs on 125G of memory and 32 cores. When I run the file it crashes on again it generally completes just fine. This is using polymer version 4.9, ubuntu xenial with python 3.5. Is this something that you have seen before? info from faulthandler below.
Thanks!
/spectral/derekja/pol4.9/polymer/level2_nc.py:112: MaskedArrayFutureWarning: setting an item on a masked array which has a shared mask will not copy the mask and also change the original mask array in the future.
Check the NumPy 1.11 release notes for more information.
  data[np.isnan(data)] = fill_value
Fatal Python error: Segmentation fault
Thread 0x00002b69973b9700 (most recent call first):
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 379 in _recv
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 407 in _recv_bytes
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 250 in recv
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 429 in _handle_results
  File "/usr/lib/python3.5/threading.py", line 862 in run
  File "/usr/lib/python3.5/threading.py", line 914 in _bootstrap_inner
  File "/usr/lib/python3.5/threading.py", line 882 in _bootstrap
Current thread 0x00002b69971b8700 (most recent call first):
  File "/spectral/derekja/pol4.9/polymer/level1_olci.py", line 212 in read_band
  File "/spectral/derekja/pol4.9/polymer/level1_olci.py", line 264 in read_block
  File "/spectral/derekja/pol4.9/polymer/level1_olci.py", line 350 in blocks
  File "/spectral/derekja/pol4.9/polymer/main.py", line 422 in blockiterator
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 305 in <genexpr>
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 380 in _handle_tasks
  File "/usr/lib/python3.5/threading.py", line 862 in run
  File "/usr/lib/python3.5/threading.py", line 914 in _bootstrap_inner
  File "/usr/lib/python3.5/threading.py", line 882 in _bootstrap
Thread 0x00002b6996fb7700 (most recent call first):
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 367 in _handle_workers
  File "/usr/lib/python3.5/threading.py", line 862 in run
  File "/usr/lib/python3.5/threading.py", line 914 in _bootstrap_inner
  File "/usr/lib/python3.5/threading.py", line 882 in _bootstrap
Thread 0x00002b698b343100 (most recent call first):
  File "/spectral/derekja/pol4.9/polymer/level2_nc.py", line 115 in write_block
  File "/spectral/derekja/pol4.9/polymer/level2.py", line 119 in write
  File "/spectral/derekja/pol4.9/polymer/main.py", line 506 in run_atm_corr
  File "./process2017.py", line 72 in <module>
Segmentation fault (core dumped)
Process ForkPoolWorker-50:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 125, in worker
    put((job, i, result))
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 355, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 398, in _send_bytes
    self._send(buf)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 130, in worker
    put((job, i, (False, wrapped)))
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 355, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process ForkPoolWorker-51:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 125, in worker
    put((job, i, result))
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 355, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 397, in _send_bytes
    self._send(header)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 130, in worker
    put((job, i, (False, wrapped)))
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 355, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process ForkPoolWorker-49:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 125, in worker
    put((job, i, result))
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 355, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 397, in _send_bytes
    self._send(header)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 130, in worker
    put((job, i, (False, wrapped)))
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 355, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
			
			
									
									
						occasional multiprocessing errors
- fsteinmetz
- Site Admin
- Posts: 319
- Joined: Fri Sep 07, 2018 1:34 pm
- company / institution: Hygeos
- Location: Lille, France
- Contact:
Re: occasional multiprocessing errors
Dear Derek,
Did you use the "multiprocessing" option ? If so, I suggest you try to leave the default (multiprocessing=0), where Polymer should not use the multiprocessing module at all, and see if the problem still appears.
I have indeed occasionnally encountered some segfaults when using the multiprocessing option *and* netcdf4 output (although these two features are not supposed to be related) - but reproducibility was unpredictable, and seemingly depended on particular library versions. I haven't encountered this crash for quite some time.
François
			
			
									
									
						Did you use the "multiprocessing" option ? If so, I suggest you try to leave the default (multiprocessing=0), where Polymer should not use the multiprocessing module at all, and see if the problem still appears.
I have indeed occasionnally encountered some segfaults when using the multiprocessing option *and* netcdf4 output (although these two features are not supposed to be related) - but reproducibility was unpredictable, and seemingly depended on particular library versions. I haven't encountered this crash for quite some time.
François
Re: occasional multiprocessing errors
The error does not occur when multiprocessing is disabled, but the detrimental impact on processing time is greater than simply re-running those instances that have generated the error.
Thank you, François, I will investigate different netcdf versions and will report back.
			
			
									
									
						Thank you, François, I will investigate different netcdf versions and will report back.
- fsteinmetz
- Site Admin
- Posts: 319
- Joined: Fri Sep 07, 2018 1:34 pm
- company / institution: Hygeos
- Location: Lille, France
- Contact:
Re: occasional multiprocessing errors
Ok, processing is indeed a lot faster with multiprocessing, but out of curiosity, do you think this option is still beneficial in an HPC environment ?
For your information, I have attached a list of the python packages I am currently using, and I don't encounter this bug ; you may use it to create an anaconda environment.
			
			
						For your information, I have attached a list of the python packages I am currently using, and I don't encounter this bug ; you may use it to create an anaconda environment.
You do not have the required permissions to view the files attached to this post.
			
									
						Re: occasional multiprocessing errors
Great, I can confirm that upgrading netcdf from version 4.4.0 to version 4.6.1 solved the issue. Thanks.
Normally, I wouldn't care about multiprocessing and would just request single core nodes. However, our job scheduler de-prioritizes jobs that run over 3 hours, which is often the case without multiprocessing. The geographers also sometimes submit a job with only a single image to process, which of course is preferable to use multiprocessing for.
Thanks for your help, François!
			
			
									
									
						Normally, I wouldn't care about multiprocessing and would just request single core nodes. However, our job scheduler de-prioritizes jobs that run over 3 hours, which is often the case without multiprocessing. The geographers also sometimes submit a job with only a single image to process, which of course is preferable to use multiprocessing for.
Thanks for your help, François!
- fsteinmetz
- Site Admin
- Posts: 319
- Joined: Fri Sep 07, 2018 1:34 pm
- company / institution: Hygeos
- Location: Lille, France
- Contact:
Re: occasional multiprocessing errors
Good news, thanks for your feedback 
			
			
									
									
						

