-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving Compile Times And/Or Add Pluggable Deserializable Types #481
Comments
Another option is to look at crates like https://github.com/dtolnay/miniserde or https://github.com/not-fl3/nanoserde. Disabling the deserializer derives on the |
I definitely agree with you. The reason why
You can already do this by using the lambda_runtime directly. If you don't care about about having an
Feel free to bring ideas and suggestions to the |
Something else to point out, is that the apigw events in |
Brilliant! Somehow I missed that possibility. I'll take a look to see how much that helps. Thank you.
I didn't know that, that's great. Should provide a lot more flexibility when it comes to optimizing.
That's a shame. Is there any way The A more sane option would be to make the
Absolutely, happy to help. Modifying P.S. For anyone who is curious on digging into this for themselves, here are some tangential issues/posts I found while investigating the problem. Most of them aren't actual bugs per se but they include very useful scripts/information on how to profile this stuff:
There are also some existing PRs to serde/serde_json by Marwes/dtolnay that might help with reducing the amount of generated IR, if merged: |
cc @matklad. It seems you were on a similar crusade last year. I'm curious to know what conclusions you came to regarding speeding up |
Yeah, in rust-analyzer we also noticed that serde adds a lot of code. For us, that’s not a problem as this happens in the leaf crate, not in the API crate.
I think proc macros are evaluated once, eagerly, in the crate which defines the struct. But if proc macro generates a bunch of generic functions, those fns than are monimorphised lazily, with duplication across leaf crates. So another avenue for exploration is to find a trick to force monomorohisstion happen upstream somehow, something along these lines
If the consumer of the above crate used Foo via Serialize interface, compile times will suffer. Using Foo2json non-generic function should be fine though! |
Also, I feel that "JSON serialization takes a lot time to compile" is a problem with hits pretty-much the whole Rust web services ecosystem and, subjectively, it has smaller visibility than it deserves. So I am wondering if some t-compiler folks at AWS might want to take a look here? cc @pnkfelix |
Before we go though the route of big optimizations, a short term, specific, work that we can do, is to identify exactly which fields are required in those events, and which are optional, and remove as much serde attributes as possible. The reason why there are so many is because the Rust events are based on the Go events, and the language semantics are very different. See this issue for example: We could look into other event packages, like the Java one, which might be more stricts, and strip as many serde attributes as possible. |
Sorry I've been MIA. I'd love to see what the dumbest / straightforward path of just expanding the macros and checking in the generated code before we get fancy. |
hey @indiv0, I got curious this morning, so I took a look at how long it takes to compile for me locally. I noticed that for a simple app, we're bringing all the features from Tokio and Tower, which might not be necessary. The codegen phase for aws_lambda_events is only about 2 seconds on my laptop. There is definitely room for improvement, and we might be able to reduce compilation time with some adjustments in the dependencies that we're bringing. Can you share your cargo timings information so I can take a look? I put my timings output in this URL so you can take a look: https://starlit-buttercream-cf1ff0.netlify.app/ |
I did a basic experiment over the weekend to measure how much time These are some ideas that we could explore:
|
@calavera , this 2-way Ser/Deser is one of those bugs that were there for long enough to become a feature. This PR added the same for Context to enable stubbing out the lambda and running it locally via an SQS proxy - the payload itself already had both. If we are to make it one way only, it would be great to still have the other way as an optional feature. A few of my clients rely on it for development. If you do remove it we should still be able to use remote derive, but that can be painful. |
Yes, that should definitely be available. I was thinking something in the line of how Serde itself works, were you have to import |
I realized this morning that there is a much simple solution, implemented in #497. #497 introduces feature flags to remove the event types that you're not interested in. By choosing one event type warm compilation time drops from 18 to 4.5 seconds. |
Hey! First let me say that I love this project. Clearly a lot of thoughtful design work has been put in. Kudos to all the contributors. Sorry about the wall of text in advance.
My scenario is this: I’ve noticed that even debug incremental recompiles of an absolutely barebones
lambda_http
app take upwards of 15 seconds. That doesn’t sound like much but multiply this by 10x lambdas in a workspace and it gets out of hand pretty quick.I did some cursory investigation (the results of which I can post later). My investigation indicates that the slowness comes during the
codegen_crate
and subsequent LLVM optimize steps of the build.Checking the amount of emitted LLVM IR, I see that it’s in the 4M+ lines range! This is a lot of code for LLVM to optimize, so it explains why both the steps take so long.
I took a look at why the IR is being generated but I couldn’t find any obvious pathological cases. It appears that the majority of the IR is
serde::Deserialize
impls on theaws_lambda_events
types used by this crate.I took a look at those types, and they have a heck of a lot of
#[serde(deserialize_with=“…”)]
on their fields. That attribute isn’t too expensive on its own, but multiplied across every field of every struct -- that’s a lot of codegen. There was also some duplicate code because#[derive(Deserialize)]
implementsvisit_seq
andvisit_map
. The former is unnecessary because we’re dealing with JSON and it is strictly invalid in most cases to allow users to deserialize a JSON array into a struct.(Side-note: there was a proposal to share the
deserialize_with
wrappers betweenvisit_seq
andvisit_map
, but it was closed aswontfix
)So I just cut the
visit_seq
code out entirely via a custom#[derive(Deserialize)]
impl. That plus stripping down this crate’sLambdaRequest
enum to only cover the cases I care about let me bring the LLVM IR count down from 4.5M to about 3.5.That was it for the low-hanging fruit. The remaining deserialization code seems necessary, but is a bit bloated, purely because serde is very generous with error handling logic. I could probably write it out by hand to be more efficient, or improve the existing codegen of the
#[derive(Deserialize)]
macro.But I’ve been rethinking my approach. The problem is that we’re re-running the
#[derive(Deserialize)]
macro every time the lambda binary compiles. There are some efforts to disable derives when running cargo check. This is clever for the purposes of improving type hint resolution times, etc. But I don’t think it’s sufficient. What’s needed is a true way to cache the results of proc macro evaluations -- if said proc macros are pure. This finally brings me to my first question: Does anyone know of a way to cache the results of a proc-macro evaluation between builds?Alternatively: Is it possible to force rustc to evaluate a proc macro for a struct, before said struct is used in a leaf crate?
cc @LegNeato @calavera as maintainers of the aws-lambda-events crate, any ideas? Perhaps we could Selectively cargo-expand the
Deserialize
impls in the aws-lambda-events crate? It would result in a much larger source file, but could cut out at least some of thecrate_codegen
time in incremental builds. Not sure how much but it’d be a decent chunk of the 40% it seems to take now.cc @calavera @nmoutschen as maintainers of this crate, any ideas? Perhaps we could refactor things a bit so that users can provide their own APIGW types and deserialize on their end? That way there’s a bit more flexibility. For example I could fork aws-lambda-events, strip it to just the types I need, and hand roll
Deserialize
impls. At the moment I’m locked in to the types this crate provides. I understand that this crate does this because it handles a lot of the boilerplate when it comes to talking to AWS, but I’m fine with my crate handling some of that.The text was updated successfully, but these errors were encountered: