-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Create RFC-0009-native-tpcds-connector.md
- Loading branch information
Showing
1 changed file
with
46 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# **RFC-0009 for Presto** | ||
|
||
## Presto - Native TPC-DS connector | ||
|
||
Proposers | ||
|
||
* Pratik Joseph Dabre | ||
* Pramod Satya | ||
|
||
## [Related Issues] | ||
|
||
Related issues: https://github.com/prestodb/presto/issues/22361 | ||
|
||
Related PRs: https://github.com/prestodb/presto/pull/23067 | ||
|
||
## Summary | ||
|
||
A native TPC-DS connector capable of generating in-memory data on the fly is proposed. | ||
|
||
## Background | ||
|
||
Currently , Presto does not have a native implementation of the TPC-DS connector. This RFC proposes the addition of a new TPC-DS connector. The new connector can be used as a Presto - Native catalog. | ||
|
||
### [Optional] Goals | ||
|
||
1. Add a TPC-DS connector to generate TPC-DS data in Presto native. | ||
2. Write end-to-end tests in Presto native with TPC-DS tables and conduct microbenchmarks in Velox. | ||
|
||
### [Optional] Non-goals | ||
|
||
## Proposed Implementation | ||
|
||
The Presto - Native TPC-DS connector will be a wrapper for the generator distributed (dsdgen) by the TPC organization from C. This means we need our implementation to have the exact same behavior as the C implementation. DuckDB already has a TPC-DS connector of their own and they have wrapped the C files into C++ files, we are going to use these C++ files in our implementation. | ||
|
||
In the C++ implementation, there are two types of tables: source tables and target tables used for generation. Source table files are prefixed with "s_", while target table files are prefixed with "w_". For instance, there may be files like "s_call_center.c" and "w_call_center.c". It appears that source tables are only utilized when running the "dsdgen" with an update flag, though the exact function of this flag and the purpose of the source tables have not yet been explored. Currently, our focus is solely on implementing functionalities for the target tables (w_ tables). | ||
|
||
In the target table files prefixed with “w_”, there are some helper functions(need to be implemented by us) precisely called as “append_row_start“ and “append_row_end“ which help in the row generation. Depending on the schema of the table, there will be “append_ “ functions depending on the data type to be appended. | ||
|
||
A new TPC-DS config `tpcds.toggle-char-to-varchar` will be added to toggle the char columns to varchar, addressing the lack of support for the char data type in Presto - Native. This config allows the toggling of the char to varchar when required, ensuring consistency between Presto - Java and Presto - Native. | ||
|
||
## Adoption Plan | ||
|
||
## Test Plan | ||
|
||
Native end-to-end tests are added in https://github.com/prestodb/presto/pull/23067. | ||
Future enhancements will include adding SpeedTest and ConnectorTest to the Velox repository. |