You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Errors that occur while a worker is executing routines directly in the sim_f function will cause the entire run to abort currently (see #73 ). In some cases these errors are random (see again #73) or isolated and the run could continue in a productive fashion if there was handling for the error.
Possible options for handling in the sim_f function:
Wrap the entire thing in a try block and allow to be re-run. The advantage is that we don't have to guess where things will fail. The downside is that some operations may be repeated. Though a more complex implementation could probably log the job execution state and retry from the end of the last completed job. Another significant downside is that it would be very easy for systemic errors to trigger useless re-tries with this approach.
Set up decorators for likely failure locations. These would mostly be where I/O takes place. This removes issues with re-running simulations unnecessarily, but now we would need to anticipate where failure might happen. If we go this way there is also the consideration of whether to introduce an outside library like https://github.com/jd/tenacity/tree/master/tenacity.
As far as I know libEnsemble doesn't currently provide tools for directly handling such issues. Though another approach would be to consider just allowing workers to report an error and be shut down or restart. Certainly in any approach given here can also just end with returning the appropriate failure flag and letting the generator move on or try to re-run.
The text was updated successfully, but these errors were encountered:
Errors that occur while a worker is executing routines directly in the sim_f function will cause the entire run to abort currently (see #73 ). In some cases these errors are random (see again #73) or isolated and the run could continue in a productive fashion if there was handling for the error.
Possible options for handling in the sim_f function:
As far as I know libEnsemble doesn't currently provide tools for directly handling such issues. Though another approach would be to consider just allowing workers to report an error and be shut down or restart. Certainly in any approach given here can also just end with returning the appropriate failure flag and letting the generator move on or try to re-run.
The text was updated successfully, but these errors were encountered: