query plan optimization
The goal of query plan optimization is to find an optimal query plan among a space of query plans.
Course material
Jens Dittrich's video
- logical optimizations: https://www.youtube.com/watch?v=Fqi0rEMSYH0
- cost-based optimizations: https://www.youtube.com/watch?v=Kb_A_RM8lVM
- playlist : https://www.youtube.com/playlist?list=PLC4UZxBVGKtcZgLCrIUenuano53pbPpf1
Query plan enumeration
The point is to explore the query space
The thesis DBLP:phd/hal/Fejza23 of Amela Fejza should contain a good introduction on the subject.
Join ordering enumeration
down-up : DBLP:conf/vldb/OnoL90
top-down : DBLP:conf/sigmod/DeHaanT07
Related techniques
Multiple query optimization
The aim of multiple query optimization is to optimize the execution of several queries by factorizing the work that is common between several query execution.
While it is very nice theoritical results, it is quite impossible to apply them in practice for executing the UCQ obtained from query rewriting, according to François Goasdoué, which have unsuccessfully worked with Damian Bursztyn and Michael Thomazo during one year.
Implementations
- Orca : DBLP:conf/sigmod/SolimanAREGSCGRPWNKB14
- Apache Calcite : https://calcite.apache.org, DBLP:conf/sigmod/BegoliCHML18
Plan optimization for data integration
Cost estimation
While it seems that the cost of moving the data from the relation (source) is some time neglected (eg. cost as the size of the intermediate results), it is important in a mediation scenario. So, bind-join can help to reduce this cost.
Related work:
- CostFed cost-based query optimization for SPARQL endpoint federation DBLP:conf/i-semantics/0002PSHN18
- a survey of robust query optimization methods with repect to Estimation Errors DBLP:journals/sigmod/YinHM15
Join operations
Pipelined hash join
Jens Dittrich's video on the operator: https://www.youtube.com/watch?v=3-aHFDyUUc8
- Is there a way to have pipelined hash join as a worst-case optimal join algorithm ?