Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs for v17 incremental backup and recovery #1524

Merged
merged 18 commits into from
Jul 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions content/en/docs/17.0/reference/features/recovery.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,23 @@ aliases: ['/docs/recovery/pitr','/docs/reference/pitr/']

## Point in Time Recovery

Vitess supports incremental backup and recoveries, AKA point in time recoveries. `v17` offers restore-to-position functionality, and `v18` is slated to support restore-to-timestamp functionality in addition.

Point in time recoveries are based on full and incremental backups. It is possible to recover a database to a position that is _covered_ by some backup.

See [Backup Types](../../../user-guides/operating-vitess/backup-and-restore/overview/#backup-types) and [Restore Types](../../../user-guides/operating-vitess/backup-and-restore/overview/#restore-types) for an overview of incremental backups and restores.

See the user guides for how to [Create an Incremental Backup](../../../user-guides/operating-vitess/backup-and-restore/creating-a-backup/#create-an-incremental-backup-with-vtctl) and how to [Restore to a position](../../../user-guides/operating-vitess/backup-and-restore/bootstrap-and-restore/#restore-to-a-point-in-time).

### Supported Databases
- MySQL 5.7, 8.0

### Notes

This functionality replaces a legacy functionality, based on binlog servers and transient binary logs.

## Point in Time Recovery: legacy functionality based on binlog server

Comment on lines +24 to +25
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this entire section in the 18.0 docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @deepthi , we should keep it for a while.

### Supported Databases
- MySQL 8.0

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ weight: 3
aliases: ['/docs/user-guides/backup-and-restore/']
---

## Restoring a backup
Restores can be done automatically by way of seeding/bootstrapping new tablets, or they can be invoked manually on a tablet to restore a full backup or do a point-in-time recovery.
## Auto restoring a backup on startup

When a tablet starts, Vitess checks the value of the `--restore_from_backup` command-line flag to determine whether to restore a backup to that tablet. Restores will always be done with whichever engine was used to create the backup.

Expand Down Expand Up @@ -32,3 +33,41 @@ Bootstrapping a new tablet is almost identical to restoring an existing tablet.
```

The bootstrapped tablet will restore the data from the backup and then apply changes, which occurred after the backup, by restarting replication.

## Manual restore

A manual restore is done on a specific tablet. The tablet's MySQL server is shut down and its data is wiped out.

### Restore a full backup

To restore the tablet from the most recent full backup, run:

```shell
vtctldclient --server=<vtctld_host>:<vtctld_port> RestoreFromBackup <tablet-alias>
```

Example:

```shell
vtctldclient --server localhost:15999 --alsologtostderr RestoreFromBackup zone1-0000000101
```

If successful, the tablet's MySQL server rejoins the shard's replication stream, to eventually captch up and be able to serve traffic.

### Restore to a point-in-time

`v17` supports incremental restore, or restoring to a specific _position_:

```shell
vtctlclient -- RestoreFromBackup --restore_to_pos <position> <tablet-alias>
```

Example:

```shell
vtctlclient -- RestoreFromBackup --restore_to_pos "MySQL56/0d7aaca6-1666-11ee-aeaf-0a43f95f28a3:1-60" zone1-0000000102
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a nice enhancement in 18.0 if we expected a valid MySQL GTID set value. i.e. not requiring MySQL56/. That is unnecessary anyway as we only support MySQL (5.7 and 8.0) today. An argument to leave that thought would be that we could re-add support for MariaDB at some point. We can also infer this based on the format of the value though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching up on email after vacation and I now see that you're ahead of me 🙂

vitessio/vitess#13415

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```

This restore method assumes backups have been taken that cover the specified position. The restore process will first determine a restore path: a sequence of backups, starting with a full backup followed by zero or more incremental backups, that when combined, include the specified position. See more on [Restore Types](../overview/#restore-types) and on [Taking Incremental Backup](../creating-a-backup/#create-an-incremental-backup-with-vtctl).

`v18` will supports restore to a given timestamp.
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ weight: 2
aliases: ['/docs/user-guides/backup-and-restore/']
---

## Choosing the backup type

As described in [Backup types](../overview/#backup-types), you choose to run a full Backup (the default) or an incremental Backup.

Full backups will use the backup engine chosen in the tablet's [configuration](#configuration). Incremental backups will always copy MySQL's binary logs, irrespective of the configured backup engine.

## Using xtrabackup

The default backup implementation is `builtin`, however we strongly recommend using the `xtrabackup` engine as it is more robust and allows for non-blocking backups. Restores will always be done with whichever engine was used to create the backup.
Expand Down Expand Up @@ -75,11 +81,11 @@ I0310 12:49:32.279773 215835 backup.go:163] I0310 20:49:32.279485 xtrabackupeng

To continue with risk: Set `--xtrabackup_backup_flags=--no-server-version-check`. Note this occurs when your MySQL server version is technically unsupported by `xtrabackup`.

## Create backups with vtctl
## Create a full backup with vtctl
mattlord marked this conversation as resolved.
Show resolved Hide resolved

__Run the following vtctl command to create a backup:__

``` sh
```sh
vtctldclient --server=<vtctld_host>:<vtctld_port> Backup <tablet-alias>
```

Expand All @@ -89,10 +95,31 @@ If the engine is `xtrabackup`, the tablet can continue to serve traffic while th

__Run the following vtctl command to backup a specific shard:__

``` sh
```sh
vtctldclient --server=<vtctld_host>:<vtctld_port> BackupShard [--allow_primary=false] <keyspace/shard>
```

## Create an incremental backup with vtctl
mattlord marked this conversation as resolved.
Show resolved Hide resolved

An incremental backup requires additional information: the point from which to start the backup. An incremental backup is taken by supplying `--incremental_from_pos` to the `Backup` command. The argument may either indicate a valid position, or the value `auto`. Examples:

```sh
vtctlclient -- Backup --incremental_from_pos="MySQL56/0d7aaca6-1666-11ee-aeaf-0a43f95f28a3:1-53" zone1-0000000102
mattlord marked this conversation as resolved.
Show resolved Hide resolved

vtctlclient -- Backup --incremental_from_pos="auto" zone1-0000000102
```

When `--incremental_from_pos="auto"`, Vitess chooses the position of the last successful backup as the starting point for the incremental backup. This is a convenient way to ensure a sequence of contiguous incremental backups.

An incremental backup backs up one or more MySQL binary log files. These binary log files may begin with the requested position, or with an earlier position. They will necessarily include the requested position. When the incremental backup begins, Vitess rotates the MySQL binary logs on the tablet, so that it does not back up an active log file.

An incremental backup fails in these scenarios:

- It is unable to find binary log files that covers the requested position. This can happen if the binary logs are purged earlier than the incremental backup was taken. It essentially means there's a gap in the changelog events. **Note** that while on one tablet the binary logs may be missing, another tablet may still have binary logs that cover the requested position.
- There is no change to the database since the requested position, i.e. the GTID position has not changed since.

`v17` only supports `--incremental_from_pos` in the `Backup` command, not in `BackupShard`. Also, only `vtctlclient` supports the flag, where `vtctldclient` does not. `v18` is expected to support incremental backups for `BackupShard` and for `vtctldclient`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess that's why we're using vtctlclient in the examples. IMO we should not add new features w/o vtctldclient support as we've been telling people not to use vtctlclient now for 2 releases. Was the feature we're documenting here already merged? Who is going to add the vtctldclient support and when? There's already supposed to be parity for everything but OnlineDDL and VReplication AFAIK, which are themselves supposed to be done for v18.

Looks like the PRs are already merged, so we should create an issue for the vtctldclient work, if we have not already, and mark that as required for the v18 milestone.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm catching up on email after vacation, I see that you're ahead of me (again) 🙂

vitessio/vitess#13416

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we should have everything in vtctldclient; I will add that support.

Was the feature we're documenting here already merged?

It was.


## Backing up Topology Server

The Topology Server stores metadata (and not tablet data). It is recommended to create a backup using the method described by the underlying plugin:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,36 @@ The engine is the techology used for generating the backup. Currently Vitess has
* Builtin: Shutdown an instance and copy all the database files (default)
* XtraBackup: An online backup using Percona's [XtraBackup](https://www.percona.com/software/mysql-database/percona-xtrabackup)

### Backup types

Vitess supports full backups as well as incremental backups, and their respective counterparts full restores and point-in-time restores.

* A full backup contains the entire data in the database. The backup represents a consistent state of the data, i.e. it is a snapshot of the data at some point in time.
* An incremental backup contains a changelog, or a transition of data from one state to another. Vitess implements incremental backups by making a copy of MySQL binary logs.

Generally speaking and on most workloads, the cost of a full backup is higher, and the cost of incremental backups is lower. The time it takes to create a full backup is significant, and it is therefore impractical to take full backups in very small intervals. Moreover, a full backup consumes the disk space needed for the entire dataset. Incremental backups, on the other hand, are quick to run, and have very little impact, if any, to the running servers. They only contain the changes in between two points in time, and on most workloads are more compact.

Full and incremental backups are expected to be interleaved. For example: one would create a full backup once per day, and incremental backups once per hour.

Full backups are simply states of the database. Incremental backups, however, need to start with some point and end with some point. The common practice is for an incremental backup to continue from the point of the last good backup, which can be a full or incremental backup. An inremental backup in Vitess end at the point in time of execution.

The identity of the tablet on which a full backup or an incremental backup is taken is immaterial. It is possible to take a full backup on one tablet and incremental backups on another. It is possible to take full backups on two different tablets. It is also possible to take incremental backups, independently, on two different tablets, even though the contents of those incremental backups overlaps. Vitess uses MySQL GTID sets to determine positioning and prune duplicates.

### Restores

Restores are the counterparts of backups. A restore uses the engine utilized to create a backup. One may run a restore from a full backup, or a point-in-time restore (PITR) based on additional incremental backups.

A Vitess restore operates on a tablet. The restore process completely wipes out the data in the tablet's MySQL server and repopulates the server with the backup(s) data. The MySQL server is shutdown during the process. As a safety mechanism, Vitess by default prevents a restore onto a `PRIMARY` tablet. Any non-`PRIMARY` tablet is otherwise eligible to restore.

### Restore Types

Vitess supports full restores and incremental (AKA point-in-time) restores. The two serve different purposes.

* A full restore loads the dataset from a full backup onto a non-`PRIMARY` tablet. Once the data is loaded, the restore process starts the MySQL service and makes it join the replication stream. It is expected that a freshly restored server will lag behind the shard's `PRIMARY` for a period of time.
The full restore flow is useful for seeding new replica tablets. It may also be used to fix replicas that have been corrupted.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we mean REPLICA tablet types here or really non-PRIMARY tablets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use any non-PRIMARY type to restore a backup.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that then to avoid any confusion (in particular for RDONLY tablets).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* An incremental, or a point-in-time restore, restores a tablet/MySQL up to a specific position or time. This is done by first loading a full backup dataset, followed by applying the changelog captured in zero or more incremental backups. Once that is complete, the tablet type is set to `DRAINED` and the tablet does _not_ join the replication stream.
The common purpose of point-in-time restore is to recover data from an accidental write/deletion. If the database administrator knows at about what time the accidental write took place, they can restore a replica tablet to a point in time shortly before the accidental write. Since the server does not join the replication stream, its data then remains static, and the administrator may review or copy the data as they please. Finally, it is then possible to change the tablet type back to `REPLICA` and have it join the shard's replication.

## Vtbackup, VTTablet and Vtctld

Vtbackup, VTTablet, and Vtctld may all participate in backups and restores.
Expand Down
17 changes: 17 additions & 0 deletions content/en/docs/18.0/reference/features/recovery.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,23 @@ aliases: ['/docs/recovery/pitr','/docs/reference/pitr/']

## Point in Time Recovery

Vitess supports incremental backup and recoveries, AKA point in time recoveries. `v17` offers restore-to-position functionality, and `v18` is slated to support restore-to-timestamp functionality in addition.

Point in time recoveries are based on full and incremental backups. It is possible to recover a database to a position that is _covered_ by some backup.

See [Backup Types](../../../user-guides/operating-vitess/backup-and-restore/overview/#backup-types) and [Restore Types](../../../user-guides/operating-vitess/backup-and-restore/overview/#restore-types) for an overview of incremental backups and restores.

See the user guides for how to [Create an Incremental Backup](../../../user-guides/operating-vitess/backup-and-restore/creating-a-backup/#create-an-incremental-backup-with-vtctl) and how to [Restore to a position](../../../user-guides/operating-vitess/backup-and-restore/bootstrap-and-restore/#restore-to-a-point-in-time).

### Supported Databases
- MySQL 5.7, 8.0

### Notes

This functionality replaces a legacy functionality, based on binlog servers and transient binary logs.

## Point in Time Recovery: legacy functionality based on binlog server

### Supported Databases
- MySQL 8.0

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ weight: 3
aliases: ['/docs/user-guides/backup-and-restore/']
---

## Restoring a backup
Restores can be done automatically by way of seeding/bootstrapping new tablets, or they can be invoked manually on a tablet to restore a full backup or do a point-in-time recovery.
## Auto restoring a backup on startup

When a tablet starts, Vitess checks the value of the `--restore_from_backup` command-line flag to determine whether to restore a backup to that tablet. Restores will always be done with whichever engine was used to create the backup.

Expand Down Expand Up @@ -32,3 +33,41 @@ Bootstrapping a new tablet is almost identical to restoring an existing tablet.
```

The bootstrapped tablet will restore the data from the backup and then apply changes, which occurred after the backup, by restarting replication.

## Manual restore

A manual restore is done on a specific tablet. The tablet's MySQL server is shut down and its data is wiped out.

### Restore a full backup

To restore the tablet from the most recent full backup, run:

```shell
vtctldclient --server=<vtctld_host>:<vtctld_port> RestoreFromBackup <tablet-alias>
```

Example:

```shell
vtctldclient --server localhost:15999 --alsologtostderr RestoreFromBackup zone1-0000000101
```

If successful, the tablet's MySQL server rejoins the shard's replication stream, to eventually captch up and be able to serve traffic.

### Restore to a point-in-time

`v17` supports incremental restore, or restoring to a specific _position_:

```shell
vtctlclient -- RestoreFromBackup --restore_to_pos <position> <tablet-alias>
```

Example:

```shell
vtctlclient -- RestoreFromBackup --restore_to_pos "MySQL56/0d7aaca6-1666-11ee-aeaf-0a43f95f28a3:1-60" zone1-0000000102
```

This restore method assumes backups have been taken that cover the specified position. The restore process will first determine a restore path: a sequence of backups, starting with a full backup followed by zero or more incremental backups, that when combined, include the specified position. See more on [Restore Types](../overview/#restore-types) and on [Taking Incremental Backup](../creating-a-backup/#create-an-incremental-backup-with-vtctl).

`v18` will supports restore to a given timestamp.
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ weight: 2
aliases: ['/docs/user-guides/backup-and-restore/']
---

## Choosing the backup type

As described in [Backup types](../overview/#backup-types), you choose to run a full Backup (the default) or an incremental Backup.

Full backups will use the backup engine chosen in the tablet's [configuration](#configuration). Incremental backups will always copy MySQL's binary logs, irrespective of the configured backup engine.

## Using xtrabackup

The default backup implementation is `builtin`, however we strongly recommend using the `xtrabackup` engine as it is more robust and allows for non-blocking backups. Restores will always be done with whichever engine was used to create the backup.
Expand Down Expand Up @@ -75,11 +81,11 @@ I0310 12:49:32.279773 215835 backup.go:163] I0310 20:49:32.279485 xtrabackupeng

To continue with risk: Set `--xtrabackup_backup_flags=--no-server-version-check`. Note this occurs when your MySQL server version is technically unsupported by `xtrabackup`.

## Create backups with vtctl
## Create a full backup with vtctl

__Run the following vtctl command to create a backup:__

``` sh
```sh
vtctldclient --server=<vtctld_host>:<vtctld_port> Backup <tablet-alias>
```

Expand All @@ -89,10 +95,31 @@ If the engine is `xtrabackup`, the tablet can continue to serve traffic while th

__Run the following vtctl command to backup a specific shard:__

``` sh
```sh
vtctldclient --server=<vtctld_host>:<vtctld_port> BackupShard [--allow_primary=false] <keyspace/shard>
```

## Create an incremental backup with vtctl

An incremental backup requires additional information: the point from which to start the backup. An incremental backup is taken by supplying `--incremental_from_pos` to the `Backup` command. The argument may either indicate a valid position, or the value `auto`. Examples:

```sh
vtctlclient -- Backup --incremental_from_pos="MySQL56/0d7aaca6-1666-11ee-aeaf-0a43f95f28a3:1-53" zone1-0000000102

vtctlclient -- Backup --incremental_from_pos="auto" zone1-0000000102
```

When `--incremental_from_pos="auto"`, Vitess chooses the position of the last successful backup as the starting point for the incremental backup. This is a convenient way to ensure a sequence of contiguous incremental backups.

An incremental backup backs up one or more MySQL binary log files. These binary log files may begin with the requested position, or with an earlier position. They will necessarily include the requested position. When the incremental backup begins, Vitess rotates the MySQL binary logs on the tablet, so that it does not back up an active log file.

An incremental backup fails in these scenarios:

- It is unable to find binary log files that covers the requested position. This can happen if the binary logs are purged earlier than the incremental backup was taken. It essentially means there's a gap in the changelog events. **Note** that while on one tablet the binary logs may be missing, another tablet may still have binary logs that cover the requested position.
- There is no change to the database since the requested position, i.e. the GTID position has not changed since.

`v17` only supports `--incremental_from_pos` in the `Backup` command, not in `BackupShard`. Also, only `vtctlclient` supports the flag, where `vtctldclient` does not. `v18` is expected to support incremental backups for `BackupShard` and for `vtctldclient`.

## Backing up Topology Server

The Topology Server stores metadata (and not tablet data). It is recommended to create a backup using the method described by the underlying plugin:
Expand Down
Loading