Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce smart pointer copies in destructors #1983

Merged
merged 3 commits into from
Jan 20, 2025

Conversation

mcm001
Copy link
Contributor

@mcm001 mcm001 commented Jan 20, 2025

#1441 introduced a new recursive destructor to avoid stack overflows. We can improve runtime a bit by avoiding copying shared pointers around when we don't need to. I tested this using a robot stationary at 0,0,0 looking at an apriltag 2.5 meters in front of me a total of 50,000 observations, updating ISAM with each observation. This decreases mean runtime by 17% in this particular toy problem (see below).

With this optimization applied and for this -particular- toy problem, I'm still seeing ~half my runtime spent in the BayesTree destructor (see attached perf data from a release build):

image
perf.zip

IsamBenchmark.cpp (program that generates the output below): https://github.com/mcm001/gtsam/blob/44d36c1e173fe9c9c3752eb0dab376059610272e/examples/IsamBenchmark.cpp

Without patch With patch
Run times, microseconds 6.01E+07 6.28E+07
8.22E+07 7.07E+07
8.02E+07 8.14E+07
8.32E+07 7.25E+07
8.24E+07 6.72E+07
8.61E+07 5.99E+07
8.39E+07 6.51E+07
8.28E+07 6.29E+07
8.44E+07 6.27E+07
8.49E+07 6.34E+07
Mean, seconds 81.01 66.86
p-value, 2-tailed t-test, equal variance 2.62E-04

I suspect that there's further opportunities for optimization in these destructors, but I'll leave that to people smarter than I :)

Also fixes some stray ifdef/ifs for boost builds that squeaked in in MR #1948 and #1789 (see 6697452 ). Perhaps this is an indication that we need to add a no-boost build without boost installed in CI?

-- ===============================================================	
-- ================  Configuration Options  ======================	
--  CMAKE_CXX_COMPILER_ID type                       : GNU	
--  CMAKE_CXX_COMPILER_VERSION                       : 11.4.0	
--  CMake version                                    : 3.31.4	
--  CMake generator                                  : Ninja	
--  CMake build tool                                 : /usr/bin/ninja	
-- Build flags	
--  Build Tests                                      : Enabled	
--  Build examples with 'make all'                   : Enabled	
--  Build timing scripts with 'make all'             : Disabled	
--  Build shared GTSAM libraries                     : Enabled	
--  Put build type in library name                   : Enabled	
--  Build libgtsam_unstable                          : Enabled	
--  Build GTSAM unstable Python                      : Enabled	
--  Build MATLAB Toolbox for unstable                : Disabled	
--  Build for native architecture                    : Disabled	
--  Build type                                       : Release	
--  C compilation flags                              :  -O3 -DNDEBUG	
--  C++ compilation flags                            :  -O3 -DNDEBUG	
--  Enable Boost serialization                       : OFF	
--  GTSAM_COMPILE_FEATURES_PUBLIC                    : cxx_std_17	
--  GTSAM_COMPILE_OPTIONS_PUBLIC                     :	
--  GTSAM_COMPILE_DEFINITIONS_PUBLIC                 :	
--  GTSAM_COMPILE_OPTIONS_PUBLIC_RELEASE             :	
--  GTSAM_COMPILE_DEFINITIONS_PUBLIC_RELEASE         :	
--  Use System Eigen                                 : OFF (Using version: 3.4.0)	
--  Use System Metis                                 : OFF	
--  Using Boost version                              :	
--  Use Intel TBB                                    : Yes (Version: 2021.5.0)	
--  Eigen will use MKL                               : MKL not found	
--  Eigen will use MKL and OpenMP                    : OpenMP found but GTSAM_WITH_EIGEN_MKL is disabled	
--  Default allocator                                : TBB	
--  Cheirality exceptions enabled                    : YES	
--  Build with ccache                                : Yes	
-- Packaging flags	
--  CPack Source Generator                           : TGZ	
--  CPack Generator                                  : TGZ	
-- GTSAM flags	
--  Quaternions as default Rot3                      : Disabled	
--  Runtime consistency checking                     : Disabled	
--  Build with Memory Sanitizer                      : Disabled	
--  Rot3 retract is full ExpMap                      : Enabled	
--  Pose3 retract is full ExpMap                     : Enabled	
--  Enable branch merging in DecisionTree            : Enabled	
--  Enable timing machinery                          : Disabled	
--  Allow features deprecated in GTSAM 4.3           : Enabled	
--  Metis-based Nested Dissection                    : Enabled	
--  Use tangent-space preintegration                 : Enabled	
-- MATLAB toolbox flags	
--  Install MATLAB toolbox                           : Disabled	
-- Python toolbox flags	
--  Build Python module with pybind                  : Disabled	
-- ===============================================================	

@calcmogul
Copy link
Contributor

calcmogul commented Jan 20, 2025

We can improve runtime a bit by avoiding copying shared pointers around when we don't need to.

Well, the underlying pointers are still getting copied around. The difference is moving instead of copying the shared pointer here avoids incrementing then immediately decrementing the refcount.

Since the std::queue is emptied before the next for loop iteration, you could try hoisting the declaration out of the for loop to hopefully reuse the underlying storage.

Copy link
Member

@dellaert dellaert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. LGTM. Note some of the #if changes where also made in the concurrent #1984.

@dellaert
Copy link
Member

dellaert commented Jan 20, 2025

PS I agree with you on having a CI flow that catches it - I guess the issue is that we compile without boost but BOOST is still installed and that masks issues?

@dellaert dellaert merged commit ee7616e into borglab:develop Jan 20, 2025
33 checks passed
@dellaert dellaert mentioned this pull request Jan 20, 2025
@mcm001 mcm001 deleted the bayestree-dtor-std-move branch January 20, 2025 17:05
@mcm001
Copy link
Contributor Author

mcm001 commented Jan 20, 2025

Thanks for getting this merged this morning! And yeah those boost build system changes look helpful as well in #1984. I'll give Tyler's suggestion a shot this week and see if there's any more low-hanging fruit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants