README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
options(width = 120)
```

# bench

<!-- badges: start -->

[![CRAN status](https://www.r-pkg.org/badges/version/bench)](https://cran.r-project.org/package=bench)
[![R-CMD-check](https://github.com/r-lib/bench/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/r-lib/bench/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/r-lib/bench/graph/badge.svg)](https://app.codecov.io/gh/r-lib/bench)

<!-- badges: end -->

The goal of bench is to benchmark code, tracking execution time, memory allocations and garbage collections.

## Installation

You can install the release version from [CRAN](https://cran.r-project.org/) with:

```{r, eval = FALSE}
install.packages("bench")
```

Or you can install the development version from [GitHub](https://github.com/) with:

``` r
# install.packages("pak")
pak::pak("r-lib/bench")
```

## Features

`bench::mark()` is used to benchmark one or a series of expressions, we feel it has a number of advantages over [alternatives](#alternatives).

-   Always uses the highest precision APIs available for each operating system (often nanoseconds).
-   Tracks memory allocations for each expression.
-   Tracks the number and type of R garbage collections per expression iteration.
-   Verifies equality of expression results by default, to avoid accidentally benchmarking inequivalent code.
-   Has `bench::press()`, which allows you to easily perform and combine benchmarks across a large grid of values.
-   Uses adaptive stopping by default, running each expression for a set amount of time rather than for a specific number of iterations.
-   Expressions are run in batches and summary statistics are calculated after filtering out iterations with garbage collections. This allows you to isolate the performance and effects of garbage collection on running time (for more details see [Neal 2014](https://radfordneal.wordpress.com/2014/02/02/inaccurate-results-from-microbenchmark/)).

The times and memory usage are returned as custom objects which have human readable formatting for display (e.g. `104ns`) and comparisons (e.g. `x$mem_alloc > "10MB"`).

There is also full support for plotting with [ggplot2](https://ggplot2.tidyverse.org/) including custom scales and formatting.

## Usage

### `bench::mark()`

Benchmarks can be run with `bench::mark()`, which takes one or more expressions to benchmark against each other.

```{r example, cache = TRUE}
library(bench)
set.seed(42)

dat <- data.frame(
  x = runif(10000, 1, 1000),
  y = runif(10000, 1, 1000)
)
```

`bench::mark()` will throw an error if the results are not equivalent, so you don't accidentally benchmark inequivalent code.

```{r example-1, error = TRUE, cache = TRUE, dependson = "example"}
bench::mark(
  dat[dat$x > 500, ],
  dat[which(dat$x > 499), ],
  subset(dat, x > 500)
)
```

Results are easy to interpret, with human readable units.

```{r example-2, cache = TRUE, dependson = "example"}
bnch <- bench::mark(
  dat[dat$x > 500, ],
  dat[which(dat$x > 500), ],
  subset(dat, x > 500)
)
bnch
```

By default the summary uses absolute measures, however relative results can be obtained by using `relative = TRUE` in your call to `bench::mark()` or calling `summary(relative = TRUE)` on the results.

```{r example-3, cache = TRUE, dependson = "example-2"}
summary(bnch, relative = TRUE)
```

### `bench::press()`

`bench::press()` is used to run benchmarks against a grid of parameters.
Provide setup and benchmarking code as a single unnamed argument then define sets of values as named arguments.
The full combination of values will be expanded and the benchmarks are then *pressed* together in the result.
This allows you to benchmark a set of expressions across a wide variety of input sizes, perform replications and other useful tasks.

```{r example2, cache = TRUE}
set.seed(42)

create_df <- function(rows, cols) {
  out <- replicate(cols, runif(rows, 1, 100), simplify = FALSE)
  out <- setNames(out, rep_len(c("x", letters), cols))
  as.data.frame(out)
}

results <- bench::press(
  rows = c(1000, 10000),
  cols = c(2, 10),
  {
    dat <- create_df(rows, cols)
    bench::mark(
      min_iterations = 100,
      bracket = dat[dat$x > 500, ],
      which = dat[which(dat$x > 500), ],
      subset = subset(dat, x > 500)
    )
  }
)

results
```

## Plotting

`ggplot2::autoplot()` can be used to generate an informative default plot.
This plot is colored by gc level (0, 1, or 2) and faceted by parameters (if any).
By default it generates a [beeswarm](https://github.com/eclarke/ggbeeswarm#geom_quasirandom) plot, however you can also specify other plot types (`jitter`, `ridge`, `boxplot`, `violin`).
See `?autoplot.bench_mark` for full details.

```{r autoplot, message = FALSE, warning = FALSE, cache = TRUE, dependson = "example2", dpi = 300}
ggplot2::autoplot(results)
```

You can also produce fully custom plots by un-nesting the results and working with the data directly.

```{r custom-plot, message = FALSE, cache = TRUE, dependson = "example2", dpi = 300}
library(tidyverse)

results %>%
  unnest(c(time, gc)) %>%
  filter(gc == "none") %>%
  mutate(expression = as.character(expression)) %>%
  ggplot(aes(x = mem_alloc, y = time, color = expression)) +
  geom_point() +
  scale_color_bench_expr(scales::brewer_pal(type = "qual", palette = 3))
```

## `system_time()`

**bench** also includes `system_time()`, a higher precision alternative to [system.time()](https://www.rdocumentation.org/packages/base/versions/3.5.0/topics/system.time).

```{r system-time, cache = TRUE}
bench::system_time({
  i <- 1
  while (i < 1e7) {
    i <- i + 1
  }
})

bench::system_time(Sys.sleep(.5))
```

## Alternatives {#alternatives}

-   [rbenchmark](https://cran.r-project.org/package=rbenchmark)
-   [microbenchmark](https://cran.r-project.org/package=microbenchmark)
-   [tictoc](https://cran.r-project.org/package=tictoc)
-   [system.time()](https://www.rdocumentation.org/packages/base/versions/3.5.0/topics/system.time)