Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
pdabre12 authored and Pratik Joseph Dabre committed Sep 17, 2024
1 parent 9d62ffe commit de2d633
Showing 1 changed file with 46 additions and 0 deletions.
46 changes: 46 additions & 0 deletions RFC-0009-native-tpcds-connector.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# **RFC-0009 for Presto**

## Presto - Native TPC-DS connector

Proposers

* Pratik Joseph Dabre
* Pramod Satya

## [Related Issues]

Related issues: https://github.com/prestodb/presto/issues/22361

Related PRs: https://github.com/prestodb/presto/pull/23067

## Summary

A native TPC-DS connector capable of generating in-memory data on the fly is proposed.

## Background

Currently , Presto does not have a native implementation of the TPC-DS connector. This RFC proposes the addition of a new TPC-DS connector. The new connector can be used as a Presto - Native catalog.

### [Optional] Goals

1. Add a TPC-DS connector to generate TPC-DS data in Presto native.
2. Write end-to-end tests in Presto native with TPC-DS tables and conduct microbenchmarks in Velox.

### [Optional] Non-goals

## Proposed Implementation

The Presto - Native TPC-DS connector will be a wrapper for the generator distributed (dsdgen) by the TPC organization from C. This means we need our implementation to have the exact same behavior as the C implementation. DuckDB already has a TPC-DS connector of their own and they have wrapped the C files into C++ files, we are going to use these C++ files in our implementation.

In the C++ implementation, there are two types of tables: source tables and target tables used for generation. Source table files are prefixed with "s_", while target table files are prefixed with "w_". For instance, there may be files like "s_call_center.c" and "w_call_center.c". It appears that source tables are only utilized when running the "dsdgen" with an update flag, though the exact function of this flag and the purpose of the source tables have not yet been explored. Currently, our focus is solely on implementing functionalities for the target tables (w_ tables).

In the target table files prefixed with “w_”, there are some helper functions(need to be implemented by us) precisely called as “append_row_start“ and “append_row_end“ which help in the row generation. Depending on the schema of the table, there will be “append_ “ functions depending on the data type to be appended.

A new TPC-DS config `tpcds.toggle-char-to-varchar` will be added to toggle the char columns to varchar, addressing the lack of support for the char data type in Presto - Native. This config allows the toggling of the char to varchar when required, ensuring consistency between Presto - Java and Presto - Native.

## Adoption Plan

## Test Plan

Native end-to-end tests are added in https://github.com/prestodb/presto/pull/23067.
Future enhancements will include adding SpeedTest and ConnectorTest to the Velox repository.

0 comments on commit de2d633

Please sign in to comment.