Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiDB: A Raft-based HTAP Database #22

Open
mrdrivingduck opened this issue May 17, 2023 · 6 comments
Open

TiDB: A Raft-based HTAP Database #22

mrdrivingduck opened this issue May 17, 2023 · 6 comments
Assignees
Labels
area/database-management-system DBMS area/distributed-system Distributed system area/storage-system Storage topic/htap Hybrid Transactional and Analytical Processing

Comments

@mrdrivingduck
Copy link
Owner

p3072-huang.pdf

为做技术参考而粗读一下。后面有时间再研究细节。

@mrdrivingduck mrdrivingduck added area/database-management-system DBMS topic/htap Hybrid Transactional and Analytical Processing area/storage-system Storage area/distributed-system Distributed system labels May 17, 2023
@mrdrivingduck mrdrivingduck self-assigned this May 17, 2023
@mrdrivingduck
Copy link
Owner Author

为了实现 HTAP,隔离处理 OLTP 和 OLAP 使其不互相干扰是一个比较好的思路。为了实现不互相干扰,很直接的想法就是 维护两份数据副本。然而,维护两份数据又需要考虑数据一致性和新鲜度的问题。

TiDB 把数据按照范围打散到行式存储 TiKV 上,存储每个数据范围的所有 TiKV 组成一个 Raft 组,由一个 Leader TiKV 和一堆 Follower TiKV 组成。此外,TiDB 扩展了 Raft 协议,引入了 Learner 角色的 TiFlash 列式存储。TiFlash 与 Leader TiKV 保持异步的 Raft 日志复制,并把数据做行转列的格式转换。Learner 不参与 Raft 协议的日志提交和选主。

@mrdrivingduck
Copy link
Owner Author

4.1 Row-based Storage (TiKV)

@mrdrivingduck
Copy link
Owner Author

4.2 Column-based Storage (TiFlash)

@mrdrivingduck
Copy link
Owner Author

5.1 Transactional Processing

@mrdrivingduck
Copy link
Owner Author

5.2 Analytical Processing

@mrdrivingduck
Copy link
Owner Author

5.3 Isolation and Coordination

为了避免 OLTP 和 OLAP 的资源冲突,TiKV 和 TiFlash 被部署到不同的服务器上。OLTP 负载主要访问 TiKV,OLAP 负载主要访问 TiFlash。

由于 TiKV 和 TiFlash 的数据可以被认为是一致的,因此 SQL 引擎中的查询优化器可以有以下选择:

  • TiKV row scan
  • TiKV index scan
  • TiFlash column scan

三种选择的开销、数据顺序属性各不相同。总体来说,优化器可以选择三种计划中代价最小的计划。每个计划的计算公式如下:

  • 行扫描的代价为:元组宽度 * 元组数量 * I/O 开销 + 分区数量 * FileSeek
  • 列扫描的代价为:所有要扫描的列的 (列宽度 * 元组数量 * I/O 开销 + 列分区数量 * FileSeek)
  • 索引扫描的代价为:索引宽度 * 元组数量 * I/O 开销 + 分区数量 * FileSeek + 回表开销

由于 TiKV 和 TiFlash 的数据一致,因此可以同时选择两种计划进行扫描,比如两表做 JOIN。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/database-management-system DBMS area/distributed-system Distributed system area/storage-system Storage topic/htap Hybrid Transactional and Analytical Processing
Projects
None yet
Development

No branches or pull requests

1 participant