Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

Innodb Page Compression #21

Open
artfiedler opened this issue Jan 15, 2021 · 9 comments
Open

Innodb Page Compression #21

artfiedler opened this issue Jan 15, 2021 · 9 comments

Comments

@artfiedler
Copy link

It seems one of the ibd files I'm looking to recover (I altered the table thinking I was doing a "create like", so some columns got dropped) has a mix of compressed pages and uncompressed. I'm able to extract X number or rows that I can visibility see in the ibd file that I believe was before I set the page compression on a year ago. However everything since then is not being pulled out, I believe its because of this page compression, I see in the 36mb file a bunch of compressed looking text (believe zlib) however the extracted data results in 5.5mb or so

Does this tool support decompressing pages? Will it?

@artfiedler
Copy link
Author

Here is some information on the page compression, https://mariadb.com/kb/en/innodb-page-compression/

@akuzminsky
Copy link
Member

stream_parser cannot handle compressed pages. It doesn't understand that format and a size of a page is different (less than 16k).
You may want to look at https://bazaar.launchpad.net/~akuzminsky/percona-data-recovery-tool-for-innodb/decompress/view/head:/page_parser.c .
It's an experimental branch. AFAIR, if page_parser sees a compressed page it will uncompress it and save as a separate file.

@artfiedler
Copy link
Author

artfiedler commented Jan 16, 2021

Well, I modified(hacked it with an axe) your stream_parser and it seems to now support mariadb's innodb page compression. Previously out of a 36MB file 372 pages were uncompressed and resulted in about 5.5MB of data extracted... now with this page compression support added I'm able to get another 1345 pages extracted resulting in 27MB data extracted... there appears to also be some other "mysql" compression pages as well which were skipped until I find some information on that.

However, now I ran into the issue that c_parser errors on sql_parser.y line 149(this was due to wrong field name in the table create script for the primary key) now getting Segmentation fault which I think is telling me its hitting some pages that only have 3 fields verse the 9 fields of the original table (alter table dropped some columns) so hopefully I'll be able to wack this with a branch and see if I can get c_parser to output 2 different schemas or skip pages that dont match the current schema and just run it twice.

@artfiedler
Copy link
Author

Score! Was able to extract the data I needed only lost 0.001% (hand full of rows), but those wont be worth my time recovering... they are probably there just in that mysql compression format instead of the mariadb page compression format.

Few problems I ran into generating the data with c_parser

  • datetime in mariadb has an optional microsecond, it seems based on the create script if you exclude it, it was consuming more bytes from the pages so everything was offset by about 3-4 bytes, set the create script to datetime(0) and it worked correctly.
  • column character set for DB trax id or the other internal field was producing a segment fault, but it was only for the debug printing it seems so I just commented out that line, this was before correcting the datetime(0), so maybeee.... this issue would have went away by itself, not sure.
  • sql tab separated format would not load, so I made an option -s to generate insert values() lines instead, worked fine

I'll removed my debugging junk I threw in and I'll attach the updated files here... you may want to organize the code a little differently, I was all about getting it done as fast as possible

@artfiedler
Copy link
Author

artfiedler commented Jan 16, 2021

Attached at the modified files, needs zlib, it should be easy to add other compression support just need the references and add the call to the libraries decompress function. I rarely write c/c++ so may need to fix a data type here or there, not sure it matters it works!

After I removed my forced debug for the c_parser it doesn't seg fault on the debug print... anyway see attached, merge if you would like.
modified.zip

@akuzminsky
Copy link
Member

Thank you for your contribution!

@bmakan
Copy link

bmakan commented Feb 24, 2021

@artfiedler I'm trying to compile your modified code, but it's failing with fatal error: zlib/zlib.h: No such file or directory even after I installed the zlib-devel library (centos). Do I need to do something else to compile this beside running make?

Edit:
Sorry for bothering you. I managed to do it. Had to replace quote include with the sharp bracket include.

Turns out it didn't help my case. The parsed data is still missing innodb pages and even the parsed rows have weird values for some columns (usually the first few are fine).
The logs always ignores a lot of pages:

Stream contained 0 blob, 5 innodb, 0 mysql compressed, 0 mariadb compressed and 1567 ignored page read attempts

I suppose my data is corrupted beyond recovery.

@artfiedler
Copy link
Author

artfiedler commented Feb 24, 2021 via email

@xinxinfly
Copy link

Attached at the modified files, needs zlib, it should be easy to add other compression support just need the references and add the call to the libraries decompress function. I rarely write c/c++ so may need to fix a data type here or there, not sure it matters it works!

After I removed my forced debug for the c_parser it doesn't seg fault on the debug print... anyway see attached, merge if you would like. modified.zip

hey, buddy, I tried,compressed table cannot work

voltageek added a commit to voltageek/undrop-for-innodb that referenced this issue Jul 27, 2024
- Handles compressed pages (based on twindb#21)
- Additional logging
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants