Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FederatedRuntime Workflow for CI Pipeline - 301 Watermarking notebook run #1267

Merged
merged 29 commits into from
Jan 16, 2025

Conversation

noopurintel
Copy link
Contributor

@noopurintel noopurintel commented Jan 13, 2025

  1. Added GitHub workflow "Federated Runtime 301 MNIST Watermarking" to run the notebook with same name.
  2. Enabled it to run for PR pipelines - with 3 rounds. # 5 rounds will take more time leading to longer duration of CI pipeline run, thus used 3.
  3. Corrected Watermaking -> Watermarking at all relevant places.
  4. Added pytest file wf_federated_runtime_tests.py with a test test_federated_runtime_301_watermarking
  5. Multiple changes across end_to_end helper files.

Successful run as part of PR pipeline itself - https://github.com/securefederatedai/openfl/actions/runs/12802027154?pr=1267

Successful run with display of output - https://github.com/noopurintel/openfl/actions/runs/12791676959

Successful display of error, if any - https://github.com/noopurintel/openfl/actions/runs/12784011165/job/35636177976 (induced explicitly for the testing purpose)

@noopurintel noopurintel marked this pull request as ready for review January 13, 2025 12:54
Copy link
Contributor

@scngupta-dsp scngupta-dsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is a promising starting point for integrating FederatedRuntime test cases into the CI test pipeline, as it effectively demonstrates multiple functionalities of the Workflow Interface.

During an internal discussion with @noopurintel, @payalcha, and @ishant162, a concern was raised regarding the challenges in automating FederatedRuntime test cases. Specifically, many parameters required for executing the Workflow Interface are currently hardcoded in the Jupyter notebook, limiting flexibility and scalability.

Proposed Solution
To address this issue partially, I developed a Python script that extracts the execution logic from the Jupyter notebook. This script provides a way to run the FederatedRuntime notebook outside of the Jupyter environment, making it easier to automate and integrate into CI pipelines.

Below is the experimental Python script:

#######################################################################
# Test script (experimental basis) to execute FederatedRuntime 
# notebook from outside of Jupyter notebook 
#######################################################################

# Instantiate FederatedRuntime
from openfl.experimental.workflow.runtime import FederatedRuntime

director_info = {
    'director_node_fqdn':'localhost',
    'director_port':50050,
}

collaborator_names = ["Portland", "Seattle"]

federated_runtime = FederatedRuntime(
    collaborators=collaborator_names,
    director=director_info, 
    notebook_path='/home/scngupta/openfl_latest/openfl/openfl-tutorials/experimental/workflow/FederatedRuntime/101_MNIST/workspace/MNIST_FederatedRuntime_reduced.ipynb'
)

# Check to make sure that envoys are connected 
federated_runtime.get_envoys()

# Define a dummy flow to enable runtime execution 
from openfl.experimental.workflow.interface import FLSpec
from openfl.experimental.workflow.placement import aggregator, collaborator

class AutomationFlow(FLSpec):
    """
    Flow to automate the FederatedRuntime execution
    NOTE: This flow will not execute on the Federation  
    """

    def __init__(self, **kwargs):
        super().__init__(**kwargs)


# Run the flow (in reality the flow defined in jupyter notebook will run)
flflow = AutomationFlow()
flflow.runtime = federated_runtime
flflow.run()

Modified jupyter notebook that works with above can be referenced at
MNIST_FederatedRuntime.txt

Next Steps
This script is provided as an experimental enhancement and may not address certain corner cases. I would recommend to try this approach and if it is successful, we can refine and formally adopt it in subsequent PRs. WDYT ?

@noopurintel noopurintel marked this pull request as draft January 15, 2025 06:16
@noopurintel noopurintel marked this pull request as ready for review January 15, 2025 08:20
@noopurintel noopurintel marked this pull request as draft January 15, 2025 15:13
@noopurintel noopurintel marked this pull request as ready for review January 15, 2025 15:59
Copy link
Contributor

@scngupta-dsp scngupta-dsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Noopur. Looks good !

Copy link
Collaborator

@teoparvanov teoparvanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @noopurintel ! It's great to have an E2E test with the FederatedRuntime.

@teoparvanov teoparvanov merged commit b8e2c70 into securefederatedai:develop Jan 16, 2025
21 checks passed
Copy link
Collaborator

@MasterSkepticista MasterSkepticista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noopurintel Please hold off on doubling the timeout.

I am testing a couple of alternatives to reduce calls on downloading MNIST. Average CI time is 6 mins, and 15 mins is a safeguard on anomalies. Setting it to 30 mins causes CI to be blocked for long on other PRs.

We recently had a situation where CI compute was blocked for >4 hours solely for this reason. With current setting, failed actions can deallocate resources for "working" PRs faster.

@noopurintel
Copy link
Contributor Author

@MasterSkepticista - increased timeout is anyways a temporary fix to unblock the PRs and testing process. Once the download issue/slowness is resolved, will revert the timeout to its previous value (i.e. 15m).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants