-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs for v17 incremental backup and recovery #1524
Changes from all commits
1d66e5f
014e04a
43adafe
3c511f0
044d0a4
d5678c7
495687b
6b91dcb
09697fa
d80e273
9499d20
a062690
514210c
795ec4c
49d9642
c895545
8acd838
c6bdede
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,7 +4,8 @@ weight: 3 | |
aliases: ['/docs/user-guides/backup-and-restore/'] | ||
--- | ||
|
||
## Restoring a backup | ||
Restores can be done automatically by way of seeding/bootstrapping new tablets, or they can be invoked manually on a tablet to restore a full backup or do a point-in-time recovery. | ||
## Auto restoring a backup on startup | ||
|
||
When a tablet starts, Vitess checks the value of the `--restore_from_backup` command-line flag to determine whether to restore a backup to that tablet. Restores will always be done with whichever engine was used to create the backup. | ||
|
||
|
@@ -32,3 +33,41 @@ Bootstrapping a new tablet is almost identical to restoring an existing tablet. | |
``` | ||
|
||
The bootstrapped tablet will restore the data from the backup and then apply changes, which occurred after the backup, by restarting replication. | ||
|
||
## Manual restore | ||
|
||
A manual restore is done on a specific tablet. The tablet's MySQL server is shut down and its data is wiped out. | ||
|
||
### Restore a full backup | ||
|
||
To restore the tablet from the most recent full backup, run: | ||
|
||
```shell | ||
vtctldclient --server=<vtctld_host>:<vtctld_port> RestoreFromBackup <tablet-alias> | ||
``` | ||
|
||
Example: | ||
|
||
```shell | ||
vtctldclient --server localhost:15999 --alsologtostderr RestoreFromBackup zone1-0000000101 | ||
``` | ||
|
||
If successful, the tablet's MySQL server rejoins the shard's replication stream, to eventually captch up and be able to serve traffic. | ||
|
||
### Restore to a point-in-time | ||
|
||
`v17` supports incremental restore, or restoring to a specific _position_: | ||
|
||
```shell | ||
vtctlclient -- RestoreFromBackup --restore_to_pos <position> <tablet-alias> | ||
``` | ||
|
||
Example: | ||
|
||
```shell | ||
vtctlclient -- RestoreFromBackup --restore_to_pos "MySQL56/0d7aaca6-1666-11ee-aeaf-0a43f95f28a3:1-60" zone1-0000000102 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be a nice enhancement in 18.0 if we expected a valid MySQL GTID set value. i.e. not requiring There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Catching up on email after vacation and I now see that you're ahead of me 🙂 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
``` | ||
|
||
This restore method assumes backups have been taken that cover the specified position. The restore process will first determine a restore path: a sequence of backups, starting with a full backup followed by zero or more incremental backups, that when combined, include the specified position. See more on [Restore Types](../overview/#restore-types) and on [Taking Incremental Backup](../creating-a-backup/#create-an-incremental-backup-with-vtctl). | ||
|
||
`v18` will supports restore to a given timestamp. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,12 @@ weight: 2 | |
aliases: ['/docs/user-guides/backup-and-restore/'] | ||
--- | ||
|
||
## Choosing the backup type | ||
|
||
As described in [Backup types](../overview/#backup-types), you choose to run a full Backup (the default) or an incremental Backup. | ||
|
||
Full backups will use the backup engine chosen in the tablet's [configuration](#configuration). Incremental backups will always copy MySQL's binary logs, irrespective of the configured backup engine. | ||
|
||
## Using xtrabackup | ||
|
||
The default backup implementation is `builtin`, however we strongly recommend using the `xtrabackup` engine as it is more robust and allows for non-blocking backups. Restores will always be done with whichever engine was used to create the backup. | ||
|
@@ -75,11 +81,11 @@ I0310 12:49:32.279773 215835 backup.go:163] I0310 20:49:32.279485 xtrabackupeng | |
|
||
To continue with risk: Set `--xtrabackup_backup_flags=--no-server-version-check`. Note this occurs when your MySQL server version is technically unsupported by `xtrabackup`. | ||
|
||
## Create backups with vtctl | ||
## Create a full backup with vtctl | ||
mattlord marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
__Run the following vtctl command to create a backup:__ | ||
|
||
``` sh | ||
```sh | ||
vtctldclient --server=<vtctld_host>:<vtctld_port> Backup <tablet-alias> | ||
``` | ||
|
||
|
@@ -89,10 +95,31 @@ If the engine is `xtrabackup`, the tablet can continue to serve traffic while th | |
|
||
__Run the following vtctl command to backup a specific shard:__ | ||
|
||
``` sh | ||
```sh | ||
vtctldclient --server=<vtctld_host>:<vtctld_port> BackupShard [--allow_primary=false] <keyspace/shard> | ||
``` | ||
|
||
## Create an incremental backup with vtctl | ||
mattlord marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
An incremental backup requires additional information: the point from which to start the backup. An incremental backup is taken by supplying `--incremental_from_pos` to the `Backup` command. The argument may either indicate a valid position, or the value `auto`. Examples: | ||
|
||
```sh | ||
vtctlclient -- Backup --incremental_from_pos="MySQL56/0d7aaca6-1666-11ee-aeaf-0a43f95f28a3:1-53" zone1-0000000102 | ||
mattlord marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
vtctlclient -- Backup --incremental_from_pos="auto" zone1-0000000102 | ||
``` | ||
|
||
When `--incremental_from_pos="auto"`, Vitess chooses the position of the last successful backup as the starting point for the incremental backup. This is a convenient way to ensure a sequence of contiguous incremental backups. | ||
|
||
An incremental backup backs up one or more MySQL binary log files. These binary log files may begin with the requested position, or with an earlier position. They will necessarily include the requested position. When the incremental backup begins, Vitess rotates the MySQL binary logs on the tablet, so that it does not back up an active log file. | ||
|
||
An incremental backup fails in these scenarios: | ||
|
||
- It is unable to find binary log files that covers the requested position. This can happen if the binary logs are purged earlier than the incremental backup was taken. It essentially means there's a gap in the changelog events. **Note** that while on one tablet the binary logs may be missing, another tablet may still have binary logs that cover the requested position. | ||
- There is no change to the database since the requested position, i.e. the GTID position has not changed since. | ||
|
||
`v17` only supports `--incremental_from_pos` in the `Backup` command, not in `BackupShard`. Also, only `vtctlclient` supports the flag, where `vtctldclient` does not. `v18` is expected to support incremental backups for `BackupShard` and for `vtctldclient`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I guess that's why we're using vtctlclient in the examples. IMO we should not add new features w/o vtctldclient support as we've been telling people not to use vtctlclient now for 2 releases. Was the feature we're documenting here already merged? Who is going to add the vtctldclient support and when? There's already supposed to be parity for everything but OnlineDDL and VReplication AFAIK, which are themselves supposed to be done for v18. Looks like the PRs are already merged, so we should create an issue for the vtctldclient work, if we have not already, and mark that as required for the v18 milestone. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I'm catching up on email after vacation, I see that you're ahead of me (again) 🙂 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, we should have everything in
It was. |
||
|
||
## Backing up Topology Server | ||
|
||
The Topology Server stores metadata (and not tablet data). It is recommended to create a backup using the method described by the underlying plugin: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,6 +26,36 @@ The engine is the techology used for generating the backup. Currently Vitess has | |
* Builtin: Shutdown an instance and copy all the database files (default) | ||
* XtraBackup: An online backup using Percona's [XtraBackup](https://www.percona.com/software/mysql-database/percona-xtrabackup) | ||
|
||
### Backup types | ||
|
||
Vitess supports full backups as well as incremental backups, and their respective counterparts full restores and point-in-time restores. | ||
|
||
* A full backup contains the entire data in the database. The backup represents a consistent state of the data, i.e. it is a snapshot of the data at some point in time. | ||
* An incremental backup contains a changelog, or a transition of data from one state to another. Vitess implements incremental backups by making a copy of MySQL binary logs. | ||
|
||
Generally speaking and on most workloads, the cost of a full backup is higher, and the cost of incremental backups is lower. The time it takes to create a full backup is significant, and it is therefore impractical to take full backups in very small intervals. Moreover, a full backup consumes the disk space needed for the entire dataset. Incremental backups, on the other hand, are quick to run, and have very little impact, if any, to the running servers. They only contain the changes in between two points in time, and on most workloads are more compact. | ||
|
||
Full and incremental backups are expected to be interleaved. For example: one would create a full backup once per day, and incremental backups once per hour. | ||
|
||
Full backups are simply states of the database. Incremental backups, however, need to start with some point and end with some point. The common practice is for an incremental backup to continue from the point of the last good backup, which can be a full or incremental backup. An inremental backup in Vitess end at the point in time of execution. | ||
|
||
The identity of the tablet on which a full backup or an incremental backup is taken is immaterial. It is possible to take a full backup on one tablet and incremental backups on another. It is possible to take full backups on two different tablets. It is also possible to take incremental backups, independently, on two different tablets, even though the contents of those incremental backups overlaps. Vitess uses MySQL GTID sets to determine positioning and prune duplicates. | ||
|
||
### Restores | ||
|
||
Restores are the counterparts of backups. A restore uses the engine utilized to create a backup. One may run a restore from a full backup, or a point-in-time restore (PITR) based on additional incremental backups. | ||
|
||
A Vitess restore operates on a tablet. The restore process completely wipes out the data in the tablet's MySQL server and repopulates the server with the backup(s) data. The MySQL server is shutdown during the process. As a safety mechanism, Vitess by default prevents a restore onto a `PRIMARY` tablet. Any non-`PRIMARY` tablet is otherwise eligible to restore. | ||
|
||
### Restore Types | ||
|
||
Vitess supports full restores and incremental (AKA point-in-time) restores. The two serve different purposes. | ||
|
||
* A full restore loads the dataset from a full backup onto a non-`PRIMARY` tablet. Once the data is loaded, the restore process starts the MySQL service and makes it join the replication stream. It is expected that a freshly restored server will lag behind the shard's `PRIMARY` for a period of time. | ||
The full restore flow is useful for seeding new replica tablets. It may also be used to fix replicas that have been corrupted. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we mean There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can use any non- There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd say that then to avoid any confusion (in particular for RDONLY tablets). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
* An incremental, or a point-in-time restore, restores a tablet/MySQL up to a specific position or time. This is done by first loading a full backup dataset, followed by applying the changelog captured in zero or more incremental backups. Once that is complete, the tablet type is set to `DRAINED` and the tablet does _not_ join the replication stream. | ||
The common purpose of point-in-time restore is to recover data from an accidental write/deletion. If the database administrator knows at about what time the accidental write took place, they can restore a replica tablet to a point in time shortly before the accidental write. Since the server does not join the replication stream, its data then remains static, and the administrator may review or copy the data as they please. Finally, it is then possible to change the tablet type back to `REPLICA` and have it join the shard's replication. | ||
|
||
## Vtbackup, VTTablet and Vtctld | ||
|
||
Vtbackup, VTTablet, and Vtctld may all participate in backups and restores. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove this entire section in the 18.0 docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per @deepthi , we should keep it for a while.