Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover missing state after collaborator restart #1268

Merged

Conversation

cloudnoize
Copy link
Contributor

After restart the collaborator gets a list of tasks that it needs to finish as part of the current round, that list might be partial as it already executed some tasks prior the restart, previous tasks might have generated state that is needed for the future tasks of this round.
This PR recover or initialized the state of previous tasks enabling the collaborator to resume the execution of the current round.

https://jira.devtools.intel.com/browse/FEDAI-1229

@cloudnoize cloudnoize force-pushed the elerer/collaborator_restart.2 branch from 47c2163 to 0222fd7 Compare January 14, 2025 12:32
else:
batch_size = 1
# evaluation needed before metrics can be resolved
self.model.evaluate(self.data_loader.get_valid_loader(batch_size), verbose=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other reviewers reference, this is needed for the scenario when the collaborator dies between the aggregated_model_validation and train task. If the collaborator restarts it will immediately be assigned train, and will not be able to resolve the necessary metrics. The names of needed metrics can be resolved by running the evaluate function on the model, but these metric values will not be sent to the aggregator.

@psfoley psfoley merged commit 3375609 into securefederatedai:develop Jan 15, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants