diff --git a/Readme.md b/Readme.md index c3adbc6..58907d6 100644 --- a/Readme.md +++ b/Readme.md @@ -1,25 +1,87 @@ # Flat files decoder for firehose -## Usage +[![CI status](https://github.com/semiotic-ai/flat-files-decoder/workflows/ci/badge.svg)][gh-ci] + + +this crate is designed to decompress and decode headers from [binary files, which are called flat files,](https://github.com/streamingfast/firehose-ethereum/blob/develop/proto/sf/ethereum/type/v2/type.proto) generated from Firehose. Flat files store all information necessary to reconstruct the transaction and receipt tries. It also checks the validity of +receipt roots and transaction roots present in the block headers by recalculating them via the block body data. Details of the implementation can be found [here](https://github.com/streamingfast/dbin?tab=readme-ov-file) + +This tool was first presented as a mean to enhance the performance and verifiability of The Graph protocol. However, +it turns out it could be used as a solution for EIP-4444 problem of full nodes stopping to provide historical data over one year. +The idea is that the flat files that this crate can decode could also be used as an archival format similar to era1 files, specially +if they can be verified. + +## Getting Started ### Prerequisites -- Rust installed -- Cargo installed -- [protoc installed](https://grpc.io/docs/protoc-installation/) +- [Rust (stable)](https://www.rust-lang.org/tools/install) +- Cargo (Comes with Rust by default) +- [protoc](https://grpc.io/docs/protoc-installation/) - Firehose dbin files to decode - An example file is provided `example0017686312.dbin` -### Running -- Run `cargo run --release` in the root directory of the project -- The program will decode all files in the input_files directory - - It will verify the receipt root & transaction root matches the computed one for all blocks +## Running + +### Commands + +The tool provides the following commands for various operations: + +- `stream`: Stream data continuously. +- `decode`: Decode files from input to output. +- `help`: Print this message or the help of the given subcommand(s). ### Options -- `--input `: Specify a directory or single file to read from (default: input_files) -- `--output `: Specify a directory to write output files to (if missing it will not write to disk) -### Benchmarks +You can use the following options with the commands for additional functionalities: + +- `-h, --help`: Print help information about specific command and options. +- `-V, --version`: Print the version information of the tool. + + +#### NOTICE: either streaming or reading from directory it will verify the receipt root & transaction root matches the computed one for all blocks + +## Usage Examples + +Here are some examples of how to use the commands: + +1. To stream data continuously from `stdin`: + + ```bash + # simply turning on stream stdin reading + cargo run stream + + # or from files into stdin + cat example0017686312.dbin | cargo run stream + ``` + +This will output decoded header records as bytes into `stdout` + +2. To check a folder of dbin files: + +```bash +cargo run decode --input ./input_files/ +``` + +This will store the block headers as json format in the output folder. +By passing `--headers-dir` a folder of assumed valid block headers can be provided to compare +with the input flat files. Valid headers can be pulled from the [sync committee subprotocol](https://github.com/ethereum/annotated-spec/blob/master/altair/sync-protocol.md) for post-merge data. + + +**NOTICE:**For pre-merge data another approach using [header accumulators](https://github.com/ethereum/portal-network-specs/blob/8ad5bc33cb0d4485d2eab73bf2decc43e7566a8f/history-network.md#the-header-accumulator) is necessary since +sync committees will not provide these headers. + +## Goals + +We hope that flat files decoder will be able to handle +both post merge and pre merge data. Post-merge can be validated +using the Consensus Layer via the sync committee subprotocol. Pre-merge requires +headers accumulators and another step besides decoding the flat files is necessary. + +## Benchmarking - Run `cargo bench` in the root directory of the project - Benchmark results will be output to the terminal - Benchmark time includes reading from disk & writing output to disk - Results can be found in `target/criterion/report/index.html` + +For proper benchmarking of future improvements, fixes and features please compare baselines. +Refer to [the end of this section of Criterion documentation](https://bheisler.github.io/criterion.rs/book/user_guide/command_line_options.html) for more information on creating and comparing baselines. \ No newline at end of file diff --git a/src/lib.rs b/src/lib.rs index 09beebb..1d45bcb 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -48,6 +48,7 @@ pub enum DecodeInput { Path(String), Reader(Box), } + /** * Decode & verify flat files from a directory or a single file. * Input can be a directory or a file. @@ -210,9 +211,8 @@ pub fn extract_blocks(mut reader: R) -> Result, DecodeError> /// /// # Arguments /// -/// * `end_block`: Header Accumulator solution is expensive. For blocks after the merge, -/// Ethereum consensus should be used in this scenario. This zis why the default block -/// for this variable is the MERGE_BLOCK (block 15537393) +/// * `end_block`: For blocks after the merge, Ethereum sync committee should be used. This is why the default block +/// for this param is the MERGE_BLOCK (block 15537393) /// * `reader`: where bytes are read from /// * `writer`: where bytes written to pub async fn stream_blocks( diff --git a/src/main.rs b/src/main.rs index 9251915..4a7da53 100644 --- a/src/main.rs +++ b/src/main.rs @@ -13,17 +13,23 @@ struct Cli { enum Commands { /// Stream data continuously Stream { + /// decompress .dibn files if they are compressed with zstd #[clap(short, long, default_value = "false")] decompress: bool, + /// the block to end streaming #[clap(short, long)] end_block: Option, }, /// Decode files from input to output Decode { + /// input folder where flat files are stored #[clap(short, long)] input: String, #[clap(long)] + /// folder where valid headers are stored so decoded blocks can be validated against + /// their headers. headers_dir: Option, + /// output folder where decoded headers will be stored as .json #[clap(short, long)] output: Option, },