Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump the default leafsize from 10 to 25 #198

Merged
merged 1 commit into from
Jun 29, 2024
Merged

bump the default leafsize from 10 to 25 #198

merged 1 commit into from
Jun 29, 2024

Conversation

KristofferC
Copy link
Owner

@KristofferC KristofferC commented Jun 29, 2024

Since this package was written CPUs have in general become faster (at a rate faster than memory can be fetched). The choice of a good default leafsize might therefore be worth reevaluating. Remember that with a large leafsize we have to traverse fewer nodes in the tree while we have to explicitly check more points for the distance. As CPUs gets faster compared to memory it is generally expected that leafsize can be increased.

The checks / benchmarks below are artificial but in lack of real benchmark data it is the best I have.

First, let's look at the number of flops required for a leafsize of 10 vs 25 (here using a BallTree):

julia> using StaticArrays, NearestNeighbors, StableRNGs

julia> using NearestNeighbors: HyperSphere

julia> V = SVector{3, Float64}

julia> data = rand(StableRNG(1), 3, 10^4);

julia> r = 0.01

julia> v = rand(StableRNG(2), 3);

julia> ball = HyperSphere(convert(V, v), convert(eltype(V), r));

julia> using GFlops

julia> x = BallTree(data; leafsize=10); @count_ops NearestNeighbors.inrange_kernel!(x, 1, v, ball, Int[])
Flop Counter: 1464 flop
┌─────┬─────────┐
│     │ Float64 │
├─────┼─────────┤
│ add │     438 │
│ sub │     468 │
│ mul │     558 │
└─────┴─────────┘

julia> x = BallTree(data; leafsize=25); @count_ops NearestNeighbors.inrange_kernel!(x, 1, v, ball, Int[])
Flop Counter: 1475 flop
┌─────┬─────────┐
│     │ Float64 │
├─────┼─────────┤
│ add │     434 │
│ sub │     484 │
│ mul │     557 │
└─────┴─────────┘

So the flops are roughly equal. However, instead of traversing through a bunch of nodes in the leafsize=10 case we instead just blast through and evaluate points (which if the tree is reordered are all laying next to each other in memory):

julia> vs = rand(StableRNG(2), data, 3, 10^3);

julia> x = BallTree(data; leafsize=10); @btime inrange(x, vs, 0.05);
  1.237 ms (2074 allocations: 173.05 KiB)

julia> x = BallTree(data; leafsize=25); @btime inrange(x, vs, 0.05);
  975.084 μs (2074 allocations: 173.05 KiB)

The memory occupied by a tree with a larger leaf size is also smaller:

julia> x = BallTree(data; leafsize=25);

julia> Base.summarysize(x) - sizeof(x.data)
105792

julia> x = BallTree(data; leafsize=10);

julia> Base.summarysize(x) - sizeof(x.data)
144192

Also, let's spot check a benchmark for KDTree:

julia> x = KDTree(data; leafsize=10); @btime inrange(x, vs, 0.05);
  532.292 μs (2074 allocations: 173.11 KiB)

julia> x = KDTree(data; leafsize=25); @btime inrange(x, vs, 0.05);
  470.084 μs (2074 allocations: 173.11 KiB)

where the difference is smaller albeit still noticeable.

@KristofferC KristofferC merged commit 1a5c45f into master Jun 29, 2024
7 checks passed
@KristofferC KristofferC deleted the kc/leafsize branch June 29, 2024 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant