This directory contains all files pertaining to our own implementation of an E2E testing framework for AgentBaker.
E2E testing for Linux is currently implemented using a Golang framework built from the ground-up. Note that we soon plan on moving Windows over to this testing framework as well.
The goal of E2E testing with AgentBaker is to ensure that the node bootstrapping artifacts generted and returned by the primary AgentBaker API not only contain expected content, but also contain correct content that can be used as-is to bootstrap real Azure VMs so they can join real AKS clusters.
From a high-level, each E2E scenario makes a call out to the primary node-bootstrapping API GetLatestNodeBootstrapping with a set of parameters (represented by a NodeBootstrappingConfiugration) which define the given scenario to generate CSE and custom data. A new VMSS containing a single VM will then be created and associated with an AKS cluster that is already running in Azure. The CSE and custom data generated by AgentBaker will then be applied to the new VM so it can bootstrap and register itself with the apiserver of the running cluster. Liveness and health checks and then run to make sure the new VM's kubelet is posting NodeReady to the cluster's apiserver, and that workload pods can successfully be run on it. Lastly, a set of validation commands are remotely executed on the VM to ensure its live state (file existsnce, sysctl settings, etc.) is as expected.
Note: if you have changed code or artifacts used to generate custom data or custom script extension payloads, you should first run make generate
from the root of the AgentBaker repository.
To run the Go implementation of the E2E test suite locally, simply use e2e-local.sh
. This script will setup the go test
command for you while also implementing defaulting logic for a set of required environment variables used to interact with Azure. These environment variables include:
SUBSCRIPTION_ID
- default8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8
(ACS Test Subscription)LOCATION
- default:eastus
AZURE_TENANT_ID
- default:72f988bf-86f1-41af-91ab-2d7cd011db47
SCENARIOS_TO_RUN
may also optionally be set to specify a subset of the E2E scenarios to run during the testing session as a comma-separated list, for example:
SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
Furthermore, SCENARIOS_TO_EXCLUDE
may also optionally be set to specify the set of scenarios which will be excluded from the testing session as a commma-separated list. If both SCENARIOS_TO_RUN
and SCENARIOS_TO_EXCLUDE
are specified, SCENARIOS_TO_RUN
will take precedence.
KEEP_VMSS
can also be optionally specified to have the test suite retain the bootstrapped VM(s) for further debugging. When this option is specified, the private SSH key used to connect to each VM will be included within each scenario's log bundle respectively.
Note that when using e2e-local.sh
, a timeout value of 90 minutes is applied to the go test
command.
You may also run the test command with custom arguments yourself (assuming you've properly setup the required environment variables) from within the e2e/
directory like so:
go test -timeout 90m -v -run Test_All ./
The top-level package of the Golang E2E implementation is named e2e_test
and is entirely separate from all AgentBaker packages.
The e2e_test
package has a dependency on subpackage located in the scenario directory. Package scenario
is where all E2E scenarios are defined, each in their own separate files. This package also defines common types related to scenario and scenario configuration, as well as the hard-coded list of SIG version IDs located in images.go used for testing different OS distros. Package scenario
also contains the implementation of common cluster selectors and mutators within clusterconfiguration.go, though each scenario could define their own implementations if needed.
The primary testing function is located in suite_test.go, which is run by go test ...
.
When configuring E2E scenarios, a VHDSelector
must be specified in order to tell the suite which particular VHD it should use to bootstrap the VM.
VHDSelector
s select from a "base" VHD catalog, initialized from scenario/base_vhd_catalog.json as an embedding. Each entry in the catalog is represented as a VHD
, which contains a resource ID that gets injected into the VMSS model when the given scenario is ran. The aforementioned JSON file contains configurations for the current set of default catalog entries. At any given time, those default entries will point to VHDs stored within our testing subscription, guarded by resouce deletion locks.
For example, scenario_ubuntu2204.go defines the Ubuntu 2204 scenario, which specifies the Ubuntu2204Gen2Containerd
VHD selector. This selector will always select the Ubuntu2204/gen2 VHD catalog entry from the base catalog. If running the suite using some arbitrary VHD build for testing, then the selector will take the corresponding Ubuntu2204/gen2 VHD from the given build instead of the default entry.
To update the set of default VHD catalog entries to point towards new VHDs, simply update the resourceId
field of the respective VHD within scenario/base_vhd_catalog.json. If you're making this change as a part of a PR, you need to make sure to lock the new VHDs with resource deletion locks to ensure they're always available going forward. Note that if you run the suite in a region other than eastus, you'll need to make sure the VHDs you point the suite towards are appropriately replicated in the given region as well.
If you'd like to run the E2E suite using a set of VHDs built from some arbitrary run of the VHD build pipeline in the MSFT tenant, you can do so by specifying the ID of the build. This is an alternative to manually updating the set of default VHD catalog entries. If a given scenario is ran which selects a VHD that was not built as a part of the specified VHD build, the selector will select the corresponding default catalog entry instead.
NOTE: This feature can only be used with test VHD builds, using builds from official build pipeline is not supported.
VHD_BUILD_ID=123456789 SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
NOTE: To utilize this feature, you'll also need to provide the suite with an ADO PAT (personal access token) with which it can access the ADO resources to download the appropriate build artifacts.
To specify your PAT, simply set the ADO_PAT
environment variable accordingly:
ADO_PAT=<secret> VHD_BUILD_ID=123456789 SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
or:
export ADO_PAT=<secret>
VHD_BUILD_ID=123456789 SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
VHD_BUILD_ID=234567891 SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
...
VHD_BUILD_ID=345678912 SCENARIOS_TO_RUN=base,gpu,ubuntu2204,ubuntu2204-arm64 ./e2e-local.sh
When adding a new scenario which uses a VHD that doesn't currently have an associated entry in the base catalog, please make sure to follow these steps to register it with the suite:
- Build and delete-lock the underlying image version to be referenced in the base catalog
- Update base_vhd_catalog.json with a new entry, referencing the resource ID of the new VHD built in the previous step, as well as the VHD's artifact name. The artifact name is used when downloading publishing info artifacts from VHD builds in ADO. To determine this value:
- Navigate to the latest run of the
[TEST All VHDs] AKS Linux VHD Build - Msft Tenant
build which has built the SKU you'd like to register (or queue a new build which includes the particular SKU). - Navigate to the particular run's published artifacts and identitfy the
publishing-info-<artifactName>
artifact for your SKU. The suffix of this string afterpublishing-info-
is the name of the artifact. - Alternatively, you can get this value from navigating to .vsts-vhd-builder-release.yaml, identifying the corresponding build stage for your SKU, and looking at the value of
artifactName
specified when calling the.builder-release-template.yaml
template.
- Navigate to the latest run of the
- Within scenario/vhd.go, update the corresponding subcatalog struct (e.g.
Ubuntu2204
,AzureLinuxV2
) with the new entry, and correctly add its corresponding JSON tag used to unmarshal from base_vhd_catalog.json - Also within scenario/vhd.go, add a corresponding case block to the switch statement within
addEntryFromPublishingInfo()
to make sure the VHD's name (parsed from the publishing info file) is associated with the new subcatalog entry added in the previous step - this is to ensure that catalog entries are properly overwritten when using VHDs from arbitrary testing builds - Add a new
VHDSelector
within scenario/vhd.go in the form of a method on the*VHDCatalog
type, which returns the new entry of the given subcatalog added in step 3 - Reference the new
VHDSelector
added in the previous step when defining the new E2E scenario(s).
Example PR: TODO(cameissner)
Minimally, each E2E scenario is parameterized with a set of "mutators" that change/set various properties of a base NodeBootstrappingConfiguration struct. This struct is then fed into GetLatestNodeBootstrapping to generate CSE and custom data. The most commonly mutated property of this struct across all scenarios is the OS distro. This is primarily because each scenario currently uses a separate VHD corresponding to the respective distro.
E2E scenarios can also be configured with VMSS configuration mutators that change/set properties on the VMSS model used to deploy the new VM to be bootstrapped. This is primarily useful when testing out different VM SKUs, especially for GPU-enabled scenarios which affect which code paths AgentBaker will use to generate CSE and custom data
Further, in order to support E2E scenarios which test different underlying AKS cluster configurations, such as the cluster's network plugin, each E2E scenario has its own "cluster selector" and "cluster mutator". Cluster selectors determine whether or not the given live AKS cluster is viable for running the given scenario, while cluster mutators will mutate a base AKS cluster model such that the model represents a cluster which is viable for running the given scenario. For example, a scenario meant to run on an AKS cluster configured with the kubenet network plugin would have a cluster selector which selects on the NetworkProfile.NetworkPlugin
property specifically for kubenet, while its cluster mutator would set this property to kubenet so a new cluster can be created for it to run on.
Lastly, E2E scenarios also consist of a list of live VM validators. Each live VM validator consists of a description, a bash command which will actually be run on the newly bootstrapped VM, and an "asserter" function that will perform assertions on the contents of both the stdout and stderr streams that result from the execution of the command. The validators can be used to assert on numerous types of properties of the live VM, such as the live file system and kernel state.
You can find all implemented scenarios in the scenario pacakge within files prefixed with scenario_
. The Scenario
struct definition can be found in scenario/types.go.
To implement a new scenario, you need to do the following:
- Create a new file in the scenario package directory named
scenario_<scenario-name>.go
- Within this new file, implement a private function with a representative name which returns a
*Scenario
representing the scenario's configuration - Add a call to the newly implemented function within the return value of the
scenarios()
function defined in scenarios/init.go - Implement any additional logic in the testing framework required by the new scenario
Each E2E scenario will generate its own logs after execution. Currently, these logs consist of:
cluster-provision.log
- CSE execution log, retrieved from/var/log/azure/aks/cluster-provision.log
(collected in success and CSE failure cases)kubelet.log
- the kubelet systemd unit's logs retrived by runningjournalctl -u kubelet
on the VM after bootstrapping has finished (collected in success and CSE failure cases)vmssId.txt
- a single line text file containing the unique resource ID of the VMSS created by the respective scenario, mainly collected for the purposes of posthoc resource deletion (collected in all cases where the VMSS is able to be created)
These logs will be uploaded in a bundle of the format:
└── scenario-logs
└── <scenario>
├── cluster-provision.log
├── kubelet.log
├── vmssId.txt
After a PR is created in AgentBaker's repo on GitHub, a pipeline calculating code coverage changes will automatically run.
We are utilizing coveralls to display the coverage report. The coverage report will be available in the PR's description. You can also view previous runs for the AgentBaker repo here.
We calculate code coverage for both unit tests and E2E tests.
To generate E2E coverage reports, we use code coverage changes introduced in Go 1.20.
Coverage report is generated by running AgentBaker's API server locally as a binary created with the -cover flag. E2E tests are then ran against that binary.
The following packages are used during calculation of coverage for E2E tests:
- github.com/Azure/agentbaker/apiserver
- github.com/Azure/agentbaker/cmd
- github.com/Azure/agentbaker/cmd/starter
- github.com/Azure/agentbaker/pkg/agent
- github.com/Azure/agentbaker/pkg/agent/datamodel
- github.com/Azure/agentbaker/pkg/templates
You can generate an E2E coverage report while running the E2E tests locally. To do so, follow the steps below:
- Build the AgentBaker server binary with -cover flag:
cd cmd
go build -cover -o baker -covermode count
GOCOVERDIR=covdatafiles ./baker start &
- Create directory for coverage report files
mkdir -p covdatafiles
- Run the binary
GOCOVERDIR=covdatafiles ./baker start &
- Run the E2E tests locally
/bin/bash e2e/e2e-local.sh
- Stop the binary - once the tests finish executing, you have to stop the binary with exit code 0 to generate the report. See the docs here.
kill $(pgrep baker)
- Display the coverage report within the terminal
go tool covdata percent -i=./cmd/somedata