We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increasing the number of workers (to a number much greater than the number of available hosts) can generate a host unavailable error:
host 3 XXXXXX is unavailable Your cluster has been reassigned or some nodes are down. Please contact support for help with this issue.
host 3 XXXXXX is unavailable
Your cluster has been reassigned or some nodes are down. Please contact support for help with this issue.
Reducing the number of workers makes this error significantly less likely. Here's the code used for the config file test_python.yml:
test_python.yml
codes: - python: parameters: ind: min: 0 max: 49 start: 0 samples: 50 setup: input_file: test_python_script.py function: main force_executor: True execution_type: rsmpi cores: 1 options: run_dir: ./scan/ software: mesh_scan nworkers: 50 executor_options: hosts: [1,2,3,4,5]
And here's the code used for the test_python_script.py:
test_python_script.py
#run a simple python script import json def main(ind): #create a basic dictionary my_ind = int(ind) org = {} org[my_ind] = ind fn = 'my_ind.json' with open(fn, 'w') as file: json.dump(org,file) if __name__ == "__main__": main(ind)
I ran via rsopt sample configuration test_python.yml
rsopt sample configuration test_python.yml
The text was updated successfully, but these errors were encountered:
This looks like it is being caused by a change to libEnsemble job submission: Libensemble/libensemble#1468
Changing the wait_on_start from bool to the 8 seconds that is used for rsmpi submission fail_time allows failed submissions to be caught as expected.
wait_on_start
fail_time
@ncook882 you can install branch rsopt/rsmpi_wait_on_start branch to resolve this issue for the time being.
Sorry, something went wrong.
ncook882
cchall
evanrs0
No branches or pull requests
Increasing the number of workers (to a number much greater than the number of available hosts) can generate a host unavailable error:
Reducing the number of workers makes this error significantly less likely. Here's the code used for the config file
test_python.yml
:And here's the code used for the
test_python_script.py
:I ran via
rsopt sample configuration test_python.yml
The text was updated successfully, but these errors were encountered: