-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FederatedRuntime Workflow for CI Pipeline - 301 Watermarking notebook run #1267
Conversation
… run Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is a promising starting point for integrating FederatedRuntime
test cases into the CI test pipeline, as it effectively demonstrates multiple functionalities of the Workflow Interface.
During an internal discussion with @noopurintel, @payalcha, and @ishant162, a concern was raised regarding the challenges in automating FederatedRuntime
test cases. Specifically, many parameters required for executing the Workflow Interface are currently hardcoded in the Jupyter notebook, limiting flexibility and scalability.
Proposed Solution
To address this issue partially, I developed a Python script that extracts the execution logic from the Jupyter notebook. This script provides a way to run the FederatedRuntime notebook outside of the Jupyter environment, making it easier to automate and integrate into CI pipelines.
Below is the experimental Python script:
#######################################################################
# Test script (experimental basis) to execute FederatedRuntime
# notebook from outside of Jupyter notebook
#######################################################################
# Instantiate FederatedRuntime
from openfl.experimental.workflow.runtime import FederatedRuntime
director_info = {
'director_node_fqdn':'localhost',
'director_port':50050,
}
collaborator_names = ["Portland", "Seattle"]
federated_runtime = FederatedRuntime(
collaborators=collaborator_names,
director=director_info,
notebook_path='/home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/101_MNIST/workspace/MNIST_FederatedRuntime_reduced.ipynb'
)
# Check to make sure that envoys are connected
federated_runtime.get_envoys()
# Define a dummy flow to enable runtime execution
from openfl.experimental.workflow.interface import FLSpec
from openfl.experimental.workflow.placement import aggregator, collaborator
class AutomationFlow(FLSpec):
"""
Flow to automate the FederatedRuntime execution
NOTE: This flow will not execute on the Federation
"""
def __init__(self, **kwargs):
super().__init__(**kwargs)
# Run the flow (in reality the flow defined in jupyter notebook will run)
flflow = AutomationFlow()
flflow.runtime = federated_runtime
flflow.run()
Modified jupyter notebook that works with above can be referenced at
MNIST_FederatedRuntime.txt
Next Steps
This script is provided as an experimental enhancement and may not address certain corner cases. I would recommend to try this approach and if it is successful, we can refine and formally adopt it in subsequent PRs. WDYT ?
...rimental/workflow/FederatedRuntime/301_MNIST_Watermarking/workspace/MNIST_Watermarking.ipynb
Outdated
Show resolved
Hide resolved
...rimental/workflow/FederatedRuntime/301_MNIST_Watermarking/workspace/MNIST_Watermarking.ipynb
Outdated
Show resolved
Hide resolved
...rimental/workflow/FederatedRuntime/301_MNIST_Watermarking/workspace/MNIST_Watermarking.ipynb
Outdated
Show resolved
Hide resolved
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Noopur. Looks good !
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
Signed-off-by: noopur <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @noopurintel ! It's great to have an E2E test with the FederatedRuntime
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noopurintel Please hold off on doubling the timeout.
I am testing a couple of alternatives to reduce calls on downloading MNIST. Average CI time is 6 mins, and 15 mins is a safeguard on anomalies. Setting it to 30 mins causes CI to be blocked for long on other PRs.
We recently had a situation where CI compute was blocked for >4 hours solely for this reason. With current setting, failed actions can deallocate resources for "working" PRs faster.
@MasterSkepticista - increased timeout is anyways a temporary fix to unblock the PRs and testing process. Once the download issue/slowness is resolved, will revert the timeout to its previous value (i.e. 15m). |
Watermaking
->Watermarking
at all relevant places.wf_federated_runtime_tests.py
with a testtest_federated_runtime_301_watermarking
end_to_end
helper files.Successful run as part of PR pipeline itself - https://github.com/securefederatedai/openfl/actions/runs/12802027154?pr=1267
Successful run with display of output - https://github.com/noopurintel/openfl/actions/runs/12791676959
Successful display of error, if any - https://github.com/noopurintel/openfl/actions/runs/12784011165/job/35636177976 (induced explicitly for the testing purpose)