Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional load testing recommendations #12974

Merged
merged 5 commits into from
Jan 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/continous-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,7 @@ jobs:
- name: Prevent race condition in poetry build
# More context about race condition during poetry build can be found here:
# https://github.com/python-poetry/poetry/issues/7611#issuecomment-1747836233
if: needs.changes.outputs.backend == 'true'
run: |
poetry config installer.max-workers 1

Expand Down
14 changes: 14 additions & 0 deletions docs/docs/monitoring/load-testing-guidelines.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,26 @@ In order to gather metrics on our system's ability to handle increased loads and
In each test case we spawned the following number of concurrent users at peak concurrency using a [spawn rate](https://docs.locust.io/en/1.5.0/configuration.html#all-available-configuration-options) of 1000 users per second.
In our tests we used the Rasa [HTTP-API](https://rasa.com/docs/rasa/pages/http-api) and the [Locust](https://locust.io/) open source load testing tool.


| Users | CPU | Memory |
|--------------------------|----------------------------------------------|---------------|
| Up to 50,000 | 6vCPU | 16 GB |
| Up to 80,000 | 6vCPU, with almost 90% CPU usage | 16 GB |


### Some recommendations to improve latency
- Sanic Workers must be mapped 1:1 to CPU for both Rasa Pro and Rasa Action Server
- Create `async` actions to avoid any blocking I/O
- `enable_selective_domain: true` : Domain is only sent for actions that needs it. This massively trims the payload between the two pods.
- Consider using compute efficient machines on cloud which are optimized for high performance computing such as the C5 instances on AWS.
However, as they are low on memory, models need to be trained lightweight.


| Machine | RasaPro | Rasa Action Server |
|--------------------------------|------------------------------------------------|--------------------------------------------------|
| AWS C5 or Azure F or Gcloud C2 | 3-7vCPU, 10-16Gb Memory, 3-7 Sanic Threads | 3-7vCPU, 2-12Gb Memory, 3-7 Sanic Threads |


### Debugging bot related issues while scaling up

To test the Rasa [HTTP-API](https://rasa.com/docs/rasa/pages/http-api) ability to handle a large number of concurrent user activity we used the Rasa Pro [tracing](./tracing.mdx) capability
Expand Down
Loading