Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't memory-leak promises passed to waitUntil #75041

Merged
merged 2 commits into from
Jan 17, 2025

Conversation

lubieowoce
Copy link
Member

@lubieowoce lubieowoce commented Jan 17, 2025

Overview

next start stores pending promises passed to waitUntil so that we can await them before a (graceful) server shutdown and avoid interrupting work that's still running.
The problem here was that this set lives a long time (the whole lifetime of the server), but we weren't removing resolved promises from this set, so we'd end up holding onto them forever and leaking memory.

This bug would be triggered by any code using waitUntil in next start (notably, a recent change: #74164)

Details

The bug here is that AwaiterMulti would hold onto resolved promises indefinitely. AwaiterMulti ends up being used by NextNodeServer.getInternalWaitUntil, which is what provides waitUntil in next start.

The code was already attempting to clean up promises to avoid leaks. But the problem was that we weren't adding promise to this.promises, we were adding promise.then(...) which is a completely different object.
So the this.promises.delete(promise) that cleanup did was effectively doing nothing, and this.promises would keep growing.

@ijjk ijjk added created-by: Next.js team PRs by the Next.js team. type: next labels Jan 17, 2025
@lubieowoce lubieowoce changed the title fix: don't leak promises in AwaiterMulti.waitUntil fix: don't memory-leak promises passed to waitUntil Jan 17, 2025
@ijjk
Copy link
Member

ijjk commented Jan 17, 2025

Tests Passed

@ijjk
Copy link
Member

ijjk commented Jan 17, 2025

Stats from current PR

Default Build
General Overall increase ⚠️
vercel/next.js canary vercel/next.js lubieowoce/fix-next-start-waituntil-leak Change
buildDuration 17.7s 15.4s N/A
buildDurationCached 14.4s 12.1s N/A
nodeModulesSize 418 MB 418 MB ⚠️ +820 B
nextStartRea..uration (ms) 394ms 395ms N/A
Client Bundles (main, webpack)
vercel/next.js canary vercel/next.js lubieowoce/fix-next-start-waituntil-leak Change
5306-HASH.js gzip 54 kB 54 kB N/A
8276.HASH.js gzip 169 B 168 B N/A
8377-HASH.js gzip 5.44 kB 5.44 kB N/A
bccd1874-HASH.js gzip 52.9 kB 52.9 kB
framework-HASH.js gzip 57.5 kB 57.5 kB N/A
main-app-HASH.js gzip 240 B 242 B N/A
main-HASH.js gzip 34.3 kB 34.3 kB N/A
webpack-HASH.js gzip 1.71 kB 1.71 kB N/A
Overall change 52.9 kB 52.9 kB
Legacy Client Bundles (polyfills)
vercel/next.js canary vercel/next.js lubieowoce/fix-next-start-waituntil-leak Change
polyfills-HASH.js gzip 39.4 kB 39.4 kB
Overall change 39.4 kB 39.4 kB
Client Pages
vercel/next.js canary vercel/next.js lubieowoce/fix-next-start-waituntil-leak Change
_app-HASH.js gzip 193 B 193 B
_error-HASH.js gzip 193 B 193 B
amp-HASH.js gzip 512 B 510 B N/A
css-HASH.js gzip 343 B 342 B N/A
dynamic-HASH.js gzip 1.84 kB 1.84 kB
edge-ssr-HASH.js gzip 265 B 265 B
head-HASH.js gzip 363 B 362 B N/A
hooks-HASH.js gzip 393 B 392 B N/A
image-HASH.js gzip 4.57 kB 4.57 kB N/A
index-HASH.js gzip 268 B 268 B
link-HASH.js gzip 2.35 kB 2.34 kB N/A
routerDirect..HASH.js gzip 328 B 328 B
script-HASH.js gzip 397 B 397 B
withRouter-HASH.js gzip 323 B 326 B N/A
1afbb74e6ecf..834.css gzip 106 B 106 B
Overall change 3.59 kB 3.59 kB
Client Build Manifests
vercel/next.js canary vercel/next.js lubieowoce/fix-next-start-waituntil-leak Change
_buildManifest.js gzip 749 B 747 B N/A
Overall change 0 B 0 B
Rendered Page Sizes
vercel/next.js canary vercel/next.js lubieowoce/fix-next-start-waituntil-leak Change
index.html gzip 524 B 522 B N/A
link.html gzip 539 B 536 B N/A
withRouter.html gzip 520 B 519 B N/A
Overall change 0 B 0 B
Edge SSR bundle Size
vercel/next.js canary vercel/next.js lubieowoce/fix-next-start-waituntil-leak Change
edge-ssr.js gzip 129 kB 129 kB N/A
page.js gzip 208 kB 208 kB N/A
Overall change 0 B 0 B
Middleware size
vercel/next.js canary vercel/next.js lubieowoce/fix-next-start-waituntil-leak Change
middleware-b..fest.js gzip 670 B 667 B N/A
middleware-r..fest.js gzip 155 B 156 B N/A
middleware.js gzip 31.3 kB 31.3 kB N/A
edge-runtime..pack.js gzip 844 B 844 B
Overall change 844 B 844 B
Next Runtimes
vercel/next.js canary vercel/next.js lubieowoce/fix-next-start-waituntil-leak Change
274-experime...dev.js gzip 322 B 322 B
274.runtime.dev.js gzip 314 B 314 B
app-page-exp...dev.js gzip 374 kB 374 kB
app-page-exp..prod.js gzip 130 kB 130 kB
app-page-tur..prod.js gzip 143 kB 143 kB
app-page-tur..prod.js gzip 139 kB 139 kB
app-page.run...dev.js gzip 362 kB 362 kB
app-page.run..prod.js gzip 126 kB 126 kB
app-route-ex...dev.js gzip 37.6 kB 37.6 kB
app-route-ex..prod.js gzip 25.6 kB 25.6 kB
app-route-tu..prod.js gzip 25.6 kB 25.6 kB
app-route-tu..prod.js gzip 25.4 kB 25.4 kB
app-route.ru...dev.js gzip 39.2 kB 39.2 kB
app-route.ru..prod.js gzip 25.4 kB 25.4 kB
pages-api-tu..prod.js gzip 9.69 kB 9.69 kB
pages-api.ru...dev.js gzip 11.6 kB 11.6 kB
pages-api.ru..prod.js gzip 9.68 kB 9.68 kB
pages-turbo...prod.js gzip 21.8 kB 21.8 kB
pages.runtim...dev.js gzip 27.6 kB 27.6 kB
pages.runtim..prod.js gzip 21.8 kB 21.8 kB
server.runti..prod.js gzip 916 kB 916 kB
Overall change 2.47 MB 2.47 MB
build cache
vercel/next.js canary vercel/next.js lubieowoce/fix-next-start-waituntil-leak Change
0.pack gzip 2.1 MB 2.1 MB N/A
index.pack gzip 75.1 kB 74.5 kB N/A
Overall change 0 B 0 B
Diff details
Diff for main-HASH.js

Diff too large to display

Commit: f19be4e

@lubieowoce lubieowoce force-pushed the lubieowoce/fix-next-start-waituntil-leak branch from fa7aee2 to f44a931 Compare January 17, 2025 15:38
@lubieowoce
Copy link
Member Author

lubieowoce commented Jan 17, 2025

This is a potential fix for #74855, which started happening in Next.js 15.1.4, where we shipped #74164. That PR started using waitUntil in a pretty common codepath, which would make this leak much more likely to become noticeable -- previously, it'd only come up when using after().

@lubieowoce lubieowoce enabled auto-merge (squash) January 17, 2025 16:29
@lubieowoce lubieowoce merged commit ab8c6ce into canary Jan 17, 2025
130 checks passed
@lubieowoce lubieowoce deleted the lubieowoce/fix-next-start-waituntil-leak branch January 17, 2025 16:41
@hdodov
Copy link
Contributor

hdodov commented Jan 18, 2025

Could this possibly fix the eventual CPU spikes in #74129?

The issue there looks just like a memory leak — CPU usage increases until it suddenly drops (e.g. garbage collector passes).

lubieowoce added a commit that referenced this pull request Jan 20, 2025
### Overview

`next start` stores pending promises passed to `waitUntil` so that we
can await them before a (graceful) server shutdown and avoid
interrupting work that's still running.
The problem here was that this set lives a long time (the whole lifetime
of the server), but we weren't removing resolved promises from this set,
so we'd end up holding onto them forever and leaking memory.

This bug would be triggered by any code using `waitUntil` in `next
start` (notably, a recent change:
#74164)

### Details 
The bug here is that `AwaiterMulti` would hold onto resolved promises
indefinitely. `AwaiterMulti` ends up being used by
`NextNodeServer.getInternalWaitUntil`, which is what provides
`waitUntil` in `next start`.

The code was already attempting to clean up promises to avoid leaks. But
the problem was that we weren't adding `promise` to `this.promises`, we
were adding `promise.then(...)` which is a completely different object.
So the `this.promises.delete(promise)` that `cleanup` did was
effectively doing nothing, and `this.promises` would keep growing.
@DonikaV
Copy link

DonikaV commented Jan 22, 2025

I just switched all server-side requests to AXIOS because I I had memory-leaking issues, can I switch back?

@lubieowoce
Copy link
Member Author

lubieowoce commented Jan 22, 2025

@DonikaV if switching to axios fixed the leak, then it's almost certainly unrelated to this PR -- that's a completely different area. but please open an issue if you have time!

@DonikaV
Copy link

DonikaV commented Jan 22, 2025

I didn't tested axios on prod, but we had similar issue as other people had with nextjs 15*
image
So I belive its related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants