Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mysterious rapid memory growth on long-running rosbridge instance #989

Open
jay-sridharan opened this issue Jan 4, 2025 · 0 comments
Open
Labels

Comments

@jay-sridharan
Copy link

jay-sridharan commented Jan 4, 2025

Description

On a long-running rosbridge server instance, I am seeing occasional rapid increases in memory usage that seem to not be correlated to any apparent event.

  • Library Version: 2.0.0-1jammy.20241008.092533
  • ROS Version: Humble
  • Platform / OS: Ubuntu Linux 22.04

Steps To Reproduce

I have had rosbridge server running on my robotic system for 26 days. I finally killed it because rosbridge had eaten up nearly 80% of the physical memory on the machine, and messages sent via the web-socket were no longer appearing in ROS.

Monitoring

The monitoring was performed using atop. This tool records metrics on a per-process basis and archives the data, which I later extracted to produce the following graphs

High Level Data View

image
image

The above graphs show the memory usage by rosbridge starting on December 5, going on until I killed the process on January 2.
Both graphs present the same data - the second graph just has the RMEM and PMEM series removed so that the jumps can be seen more clearly.

The data series are:

  • VMEM: The total virtual memory usage consumed by this process (details)
  • RMEM: The total resident memory usage consumed by this process (details)
  • PMEM: The proportional memory size of this process (details)

As you can see, there are occasional rapid memory growths, which are zoomed into in the following sections. Each of the following sections includes a graph of that particular memory event as well as logs generated by rosbridge before, during and after the time of memory growth.

Memory Event 12/09

image

Log

INFO: 2024-12-09 09:25:58,409: [INFO] Client disconnected. 3 clients total.
INFO: 2024-12-09 10:08:42,530: [INFO] [Client 7fe2d246-1f56-49d6-9a10-bdf229f4890a] Subscribed to <topic>
INFO: 2024-12-09 10:08:42,539: [INFO] [Client 7fe2d246-1f56-49d6-9a10-bdf229f4890a] Subscribed to <topic>
INFO: 2024-12-09 10:09:43,180: [INFO] Calling services in existing thread
INFO: 2024-12-09 10:09:43,183: [INFO] Client connected. 4 clients total.
INFO: 2024-12-09 10:09:43,202: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,213: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,223: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,237: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,246: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,255: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,268: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,280: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,294: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,309: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,314: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,330: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,337: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,345: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,356: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,362: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,371: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,387: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,400: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,409: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,419: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,428: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,430: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,432: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,445: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,460: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,466: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,474: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,475: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>>
INFO: 2024-12-09 10:09:43,477: [ERROR] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] [id: advertise:<topic>>:40] advertise: Unable to import msg class attr from package std_msgs. Caused by module 'std_msgs.msg' has no attribute 'MyAttr'
INFO: 2024-12-09 10:09:43,484: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:09:43,494: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:09:43,495: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:09:43,499: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:09:43,508: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:09:43,517: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:09:43,519: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:09:43,520: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:09:43,532: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:09:43,539: [ERROR] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] [id: advertise:<topic>>:50] advertise: Unable to import msg class attr from package std_msgs. Caused by module 'std_msgs.msg' has no attribute 'MyAttr'
INFO: 2024-12-09 10:09:43,541: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:09:43,543: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:09:43,555: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:09:43,562: [ERROR] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] [id: advertise:<topic>>:54] advertise: Unable to import msg class attr from package std_msgs. Caused by module 'std_msgs.msg' has no attribute 'MyAttr'
INFO: 2024-12-09 10:09:43,563: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:09:43,585: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:09:43,591: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:09:43,598: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:15:35,734: [INFO] [Client ac435ea4-5e3f-4489-a4e5-d1dea8fea1be] Subscribed to <topic>
INFO: 2024-12-09 10:15:35,740: [INFO] [Client 13b602a7-0bb1-464a-ad08-1a49c9f0df60] Subscribed to <topic>
INFO: 2024-12-09 10:15:35,765: [INFO] [Client 30e9ef46-9b8f-4228-a41c-687e7aac8ca4] Subscribed to <topic>
INFO: 2024-12-09 10:24:32,897: [INFO] Client disconnected. 3 clients total.

Notes:

  • The rosbridge log shows a connection event at 10:09:43 and a disconnection event at 10:24:32
  • The memory growth elapsed between 10:09:36 and 10:24:46 (+-5 seconds due to atop sampling resolution).
  • The periods of rapid memory growth seem to allocate about 700Kb/s

Memory Event 12/13

image

Log

INFO: 2024-12-13 15:01:52,828: [INFO] Client connected. 2 clients total.
INFO: 2024-12-13 15:01:52,912: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:52,926: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:52,935: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:52,942: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:52,951: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:52,958: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:52,973: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:52,981: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:52,990: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:52,997: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,002: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,009: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,010: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,016: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,025: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,028: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,029: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,036: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,037: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,046: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,054: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,060: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,062: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,064: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,075: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,085: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,086: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,174: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,188: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,200: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,204: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,217: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 15:01:53,231: [INFO] [Client dd4a1c80-a623-486d-b40b-17b36b5da602] Subscribed to <topic>
INFO: 2024-12-13 16:11:42,679: [INFO] Client disconnected. 1 clients total.

Notes:

  • The rosbridge log does not show any event at the start of memory growth, but shows a disconnect event at the time the memory growth stops.
  • The memory growth elapsed between 15:58:47 and 16:11:37 (+-5 seconds due to atop sampling resolution).
  • The periods of rapid memory growth seem to allocate about 500Kb/s

Memory Event 12/18

image

Log

INFO: 2024-12-18 15:27:38,174: [INFO] [Client 167dcbde-1018-4dd0-9068-cbb6cfab38ab] Subscribed to <topic>
INFO: 2024-12-18 15:27:38,504: [INFO] [Client 7cf84bc2-9987-41b7-8183-c20044e6abcb] Subscribed to <topic>
INFO: 2024-12-18 15:27:53,988: [INFO] [Client 167dcbde-1018-4dd0-9068-cbb6cfab38ab] Subscribed to <topic>
INFO: 2024-12-18 15:27:53,994: [INFO] [Client 167dcbde-1018-4dd0-9068-cbb6cfab38ab] Subscribed to <topic>
INFO: 2024-12-18 15:40:14,030: [INFO] Client disconnected. 3 clients total.
INFO: 2024-12-18 15:45:31,507: [INFO] [Client 8012807e-676d-4070-b36a-88ecff2caf35] Subscribed to <topic>

Notes:

  • The rosbridge log does not show any event at the start of memory growth, but shows a disconnect event at the time the memory growth stops.
  • The memory growth elapsed between 15:38:07 and 15:40:07 (+-5 seconds due to atop sampling resolution).

Memory Event 01/02

image

Log

INFO: 2025-01-02 08:37:17,375: [INFO] [Client 7d302d11-0096-40dd-a780-6f60ff0c3567] Subscribed to /stargate/welder/out/state/welder_controller
INFO: 2025-01-02 08:39:25,683: [INFO] [Client b4f57c05-0f52-45a7-a719-653aaf780cb2] Subscribed to /stargate/endeffector/out/state/controller
INFO: 2025-01-02 08:39:25,701: [INFO] [Client 7d302d11-0096-40dd-a780-6f60ff0c3567] Subscribed to /stargate/endeffector/out/state/controller
INFO: 2025-01-02 10:39:53,864: [INFO] Client disconnected. 1 clients total.

Notes:

  • The memory growth starts at 09:04:02, continues steadily until 10:13:52, briefly stops, starts again at 10:18:22, and continues until 10:26:12
  • There does not seem to be any indication of activity in the rosbridge logs during that time
  • The periods of rapid memory growth seem to allocate about 700Kb/s

Questions & Thoughts

  • The memory events on 12/09 and 01/02 (which finally rendered the process unusable), both allocated memory at a rate of 700kbps, quite linearly. Perhaps this is a hint on what is causing the allocations / leaks.
  • I am a bit stuck on the best way to continue drilling down this issue. Clearly, its quite intermittent, but a memory leak is certainly present.
  • I am unsure if this is caused by system conditions, or a bug in rosbridge, or likely both, but the symptom presents itself clearly as runaway memory consumption by rosbridge server.
  • Does anyone have any thoughts on what the potential cause may be, or how I can go about collecting other useful data?

Thank you for the help!

@jay-sridharan jay-sridharan changed the title Mysterious Memory Growth on long-running rosbridge instance Mysterious rapid memory growth on long-running rosbridge instance Jan 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant