Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle errors during sim_f execution #74

Open
cchall opened this issue Sep 3, 2021 · 0 comments
Open

Handle errors during sim_f execution #74

cchall opened this issue Sep 3, 2021 · 0 comments
Labels
enhancement New feature or request

Comments

@cchall
Copy link
Member

cchall commented Sep 3, 2021

Errors that occur while a worker is executing routines directly in the sim_f function will cause the entire run to abort currently (see #73 ). In some cases these errors are random (see again #73) or isolated and the run could continue in a productive fashion if there was handling for the error.

Possible options for handling in the sim_f function:

  • Wrap the entire thing in a try block and allow to be re-run. The advantage is that we don't have to guess where things will fail. The downside is that some operations may be repeated. Though a more complex implementation could probably log the job execution state and retry from the end of the last completed job. Another significant downside is that it would be very easy for systemic errors to trigger useless re-tries with this approach.
  • Set up decorators for likely failure locations. These would mostly be where I/O takes place. This removes issues with re-running simulations unnecessarily, but now we would need to anticipate where failure might happen. If we go this way there is also the consideration of whether to introduce an outside library like https://github.com/jd/tenacity/tree/master/tenacity.

As far as I know libEnsemble doesn't currently provide tools for directly handling such issues. Though another approach would be to consider just allowing workers to report an error and be shut down or restart. Certainly in any approach given here can also just end with returning the appropriate failure flag and letting the generator move on or try to re-run.

@cchall cchall mentioned this issue Sep 4, 2021
@cchall cchall added the enhancement New feature or request label Dec 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant