Doris Chapter4
Doris Chapter4
缪翎
自我介绍
缪翎
• SQL 明明长的都差不多,为啥有的快有的慢?
• 数据是如何一步步从分布式的集群中进行计算,并返回 Doris
结果的?
课程大纲
1. 生成查询计划
• 看懂 Explain 中打印的查询计划
2. 执行查询
• 数据流是如何通过计算一步步变化并返回给 Client
Plan fragment 0
生成查询计划
SELECT i_category,sum(ss_sales_price)
FROM item JOIN store_sales ON
ss_item_sk=i_item_sk
GROUP BY i_category Plan fragment 2 Plan fragment 3
ORDER BY sum(ss_sales_price)
SortNode
ORDER BY sum(ss_sales_price);
生成查询计划
PlanNode = 逻辑算子
OlapScanNode OlapScanNode
PlanNodeTree = 逻辑执行计划 Table: item Table: store_sales
Plan fragment 1
ResultSink
SortNode
生成查询计划
SortNode AggregationNode
ORDER BY sum(ss_sales_price);
ExchangeNode
AggregationNode
JoinNode
`i_item_sk` = `ss_item_sk`
HashJoinNode
ExchangeNode ExchangeNode
OlapScanNode OlapScanNode
Table: item Table: store_sales
DataSink DataSink
OlapScanNode OlapScanNode
生成查询计划 BE 1
2. 数据传输 BE 2 BE 3
OlapScanNode OlapScanNode
Table: item Table: store_sales
OlapScanNode OlapScanNode
BE 1
生成查询计划 DataSink
BE 2 BE 3
2. 数据传输
OlapScanNode OlapScanNode
Table: item Table: store_sales
DataSink DataSink
OlapScanNode OlapScanNode
DataSink + ExchangeNode
Plan Fragment 1
生成查询计划 DataSink
2. 数据传输
PlanFragment 2 Plan Fragment 3
OlapScanNode OlapScanNode
Table: item Table: store_sales
DataSink DataSink
OlapScanNode OlapScanNode
SELECT i_category,sum(ss_sales_price)
FROM item JOIN store_sales ON
ss_item_sk=i_item_sk
GROUP BY i_category
ORDER BY sum(ss_sales_price)
SQL
SortNode
ORDER BY sum(ss_sales_price);
生成查询计划
AggregationNode
GROUP BY `i_category`
sum(`ss_sales_price`)
SELECT i_category,sum(ss_sales_price)
FROM item JOIN store_sales ON
ss_item_sk=i_item_sk JoinNode
`i_item_sk` = `ss_item_sk`
GROUP BY i_category
ORDER BY sum(ss_sales_price)
OlapScanNode OlapScanNode
Table: item Table: store_sales
SQL PlanNodeTree
Plan fragment 0
SortNode
ORDER BY sum(ss_sales_price);
生成查询计划
AggregationNode
GROUP BY `i_category`
Plan fragment 1
sum(`ss_sales_price`)
SELECT i_category,sum(ss_sales_price)
FROM item JOIN store_sales ON
ss_item_sk=i_item_sk JoinNode
`i_item_sk` = `ss_item_sk`
GROUP BY i_category
ORDER BY sum(ss_sales_price)
Plan fragment 2 Plan fragment 3
OlapScanNode OlapScanNode
Table: item Table: store_sales
• Desc graph
• Explain
• Desc verbose
DESC GRAPH
SELECT i_category,sum(ss_sales_price)
FROM item JOIN store_sales ON
ss_item_sk=i_item_sk
GROUP BY i_category
ORDER BY sum(ss_sales_price)
查看查询计划
• Desc graph
• Explain
• Desc verbose
DESC GRAPH
SELECT i_category,sum(ss_sales_price)
FROM item JOIN store_sales ON
ss_item_sk=i_item_sk
GROUP BY i_category
ORDER BY sum(ss_sales_price)
查看查询计划
• Desc graph
• Explain
• Desc verbose
EXPLAIN
SELECT i_category,sum(ss_sales_price)
FROM item JOIN store_sales ON
ss_item_sk=i_item_sk
GROUP BY i_category
ORDER BY sum(ss_sales_price)
课程大纲
1. 生成查询计划
2. 执行查询
• 分配,分发
• 完整的数据流流程
执行查询
1. 分配,分发
Coordinator.java
2. 完整的数据流流程 1. prepare
2. scheduler
• computeScanRange
• assignFragment
3. send
执行查询
FE
1. 分配,分发
数据流
磁盘 磁盘
执行查询 FE
1. 单个 Fragment 执行流程
Plan fragment 1
2. Fragment 和 Fragment 之间的数据交互
4. FE 将数据返回给前端展示
Plan fragment 2 Plan fragment 3
磁盘 磁盘
Plan fragment 1
DataSink
_sink
PlanFragmentExecutor
执行查询 _plan Hash Join Node
1. prepare
Exchange Exchange
Node1 Node2
2. open
1. 单个 Fragment 执行流程
send
3. close
执行查询
Plan fragment 1
DataSink
Exchange Exchange
Node1 Node2
1. 单个 Fragment 执行流程
• HashJoinNode->get_next()
open:准备工作
2. Fragment 之间的数据交互
4. FE 将数据返回给前端展示
get_next: 返回 batch 结果
1. 读取左孩子的一个 batch
2. 根据 hash table 找到 match 的行
3. 左右行拼接成 out_row,放入
out_batch
Exchange Node Exchange Node
receiver receiver
执行查询
4. FE 将数据返回给前端展示
send 主逻辑
1. 计算 row 的hash值
2. 将 row 放入对应的channel 中
执行查询
FE:Coordinator FE
1. get next batch
from BE
1. 单个 Fragment 执行流程 ResultReceiver
2. 把 batch 放入
2. Fragment 之间的数据交互 mysql channel
3. FE 和 Top Fragment数据交互
4. FE 将数据返回给前端展示
Top Fragment: Result Sink
Row buffer
BE
MysqlResultWriter 1. 把 batch 放
入 row
buffer 中缓
存。
执行查询
1. 单个 Fragment 执行流程
总结
1. 生成查询计划
Plan fragment 1
• 逻辑查询计划 PlanNodeTree,每个 PlanNode 代表一种运算。
2. 执行查询
Plan fragment 2 Plan fragment 3
FE
总结
Plan fragment 1
1. 生成查询计划
2. 执行查询
Exchange Node
• FE • BE
• PlanNode 及子类 • PlanNode 及子类
• PlanFragment • PlanFragmentExecutor
• Coordinator • PlanFragmentMgr
• MysqlChannel • DataSink 及子类
• StmtExecutor • MysqlResultWriter
回到最初的问题
• 生成查询计划,执行查询计划
• SQL 明明长的都差不多,为啥有的快有的慢?
• Explain 一下查询计划,不同的查询计划,执行速度不同。
• 数据是如何一步步从分布式的集群中进行计算,并返回结果的?
• 简单的执行算子实现
• 简单的查询规划优化
• 向量化执行引擎
• 查询优化器
进阶课程
• 查询 Profile 分析
• 查询优化
Apache Doris 直播课程群 Apache Doris 微信公众号
Thank You