Note: The project repo is held private according to the course requirements. If you really want the code, please contact me via my email.
Content
Specification
BusTub DBMS is used as the skeleton code. In this project, I add support for query execution to the database system. I create executors that are able to take query plans and execute them. The following operations are supported:
- Access Methods: Sequential Scan
- Modifications: Insert, Update, Delete
- Miscellaneous: Nested Loop Join, Hash Join, Aggregation, Limit, Distinct
We use the iterator query processing model (the Volcano model). In this model, every query plan executor implements a Next
function. When the DBMS invokes an executor’s Next
function, it returns either a single tuple or an indicator that there are no more tuples available. In the implementation, each executor implements a loop that continues calling Next
on its children to retrieve tuples and process them one-by-one. In BusTub’s implementation, the Next
function returns a record identifier RID
along with a tuple. The correctness of the project depends on our previous implementations of buffer pool manager and extendible hash table.
This project assumes a single-threaded context.
Task 1 - Executors
We implement nine executors. For each query plan operator type, there is a corresponding executor object that implements the Init
and Next
methods. The Init
method initializes the internal stage of the operator. The Next
method provides the iterator interface that returns a tuple and corresponding RID on each invocation.
The interfaces of executors are defined in the following header files:
src/include/execution/executors/seq_scan_executor.h
src/include/execution/executors/insert_executor.h
src/include/execution/executors/update_executor.h
src/include/execution/executors/delete_executor.h
src/include/execution/executors/nested_loop_join_executor.h
src/include/execution/executors/hash_join_executor.h
src/include/execution/executors/aggregation_executor_executor.h
src/include/execution/executors/limit_executor.h
src/include/execution/executors/distinct_executor.h
Several other executors such as index_scan_executor
and nested_index_join_executor
already exist in the repository.
Plan nodes are the individual elements that compose a query plan. Each plan node defines information specific to the operator that it represents. Each executor is responsible for processing a single plan node type. The plan nodes are defined in the following header files:
src/include/execution/plans/seq_scan_plan.h
src/include/execution/plans/insert_plan.h
src/include/execution/plans/update_plan.h
src/include/execution/plans/delete_plan.h
src/include/execution/plans/nested_loop_join_plan.h
src/include/execution/plans/hash_join_plan.h
src/include/execution/plans/aggregation_plan.h
src/include/execution/plans/limit_plan.h
src/include/execution/plans/distinct_plan.h
The ExecutionEngine
helper class converts the input query plan to a query executor, and executes it until all results have been produced. This helper class may also catch any potential exceptions during the execution of a query plan.
Some executors may modify the table as well as the index. We use the extendible hash table implementation from Project 2 as the underlying data structure for all indexes in this project.
BusTub maintains an internal catalog to keep track of meta-data about the database. In this project, some executors will interact with the system catalog to query information regarding tables, indexes, and the schemas. The catalog implementation is in src/include/catalog.h
.
Sequential Scan
The SeqScanExecutor
iterates over a table and return its tuples, one-at-a-time (per the Volcano model’s requirement). A sequential scan operation is specified by a SeqScanPlanNode
. The plan node specifies the table (identified by table_oid_t
which is defined in the system catalog file) over which the scan is performed, along with an optional predicate (e.g. it can be any filter predicate in the where clause such as a > 100
).
Each time a single record is scanned, along with an evaluation of an optimal predicate.
Insert
The InsertExecutor
inserts tuples into a table and update indexes. The system supports two types of insertions.
- The first type of insertion is embedded inside the plan node, referred as raw insert.
- The second type of insertion takes values to be inserted from a child executor. For example, we can have an
InsertPlanNode
with aSeqScanPlanNode
as its child to implement anINSERT INTO ... SELECT ...
We assume that the InsertExecutor
is always at the root of the query plan and should not modify its result set, it only modifies the table and indexes in place.
Multiple rows may be inserted sequentially for a single execution plan. We perform the insertion first, followed by index update.
Update
The UpdateExecutor
modified existing tuples in a specified table and update the indexes. It always has a child executor that tells the tuple to be updated (identified by RID
). In our implementation, UpdatePlanNode
will have a (at most one child) SeqScanPlanNode
as its child.
The GenerateUpdateTuple
method defined in UpdateExecutor
class helps construct an updated tuple based on the provided update attributes.
The workflow for the UpdateExecutor
is as following:
- Initialize the
child_executor_
- Run the
child_executor_
, gather the results - Generate updated tuples based on the
update_attrs
given in anUpdatePlanNode
- Update the records by invoking
UpdateTuple
method for a specifiedTableHeap
- Update indexes: there is no direct way to update an index in place, so delete it first, followed by another insertion
Delete
The DeleteExecutor
deletes tuples from tables and removes their entries from all table indexes. Tuples to be deleted are pulled from a child executor. In our implementation, it is a SeqScanExecutor
instance.
We assume that the DeleteExecutor
is always the root of the query plan.
Note: As we learn from a database course, in most modern DBMSs, a deletion of records from a table is implemented in a lazy approach, which means that the disk space is not freed immediately. Instead, the records are marked as deleted and the space is freed afterwards.
The workflow for the DeleteExecutor
is as following:
- Initialize the
child_executor_
- Run the
child_executor_
, gather the results - Delete records from the table, identified by
RID
, by invoking theMarkDelete
method of aTableHeap
object - Delete corresponding entries from the indexes by invoking the
DeleteEntry
method of anIndex
object
Nested Loop Join
The NestedLoopJoinExecutor
implements a basic nested loop join that combines that tuples from its two child executors together.
The algorithm checks for each tuple in the outer table, if there is match in the inner table, then it emits an output tuple if the join predicate (an ON
clause) is satisfied. The tuples are joined if predicate(tuple) = true
or predicate = nullptr
.
The workflow for the NestedLoopJoinExecutor
is as following:
Initialize the
left_executor_
Initialize the
right_executor_
Run
left_executor_->Next()
, for each tuple fetched, iterate through all the tuples produced by theright_executor_
For each pair of tuples, evaluate the
Predicate
of the query plan. A predicate of a join query can be understood as aON
clause, such as:1
2
3select *
from table1 inner join table2
on table1.colA = table2.colA;If the predicate evaluates to be true, then compute the output tuple, this can be done by iterate through all the columns of the query plan output schema, for each column, evaluate the expression to produce a value for a single column. Finally all column values are concatenated together to form a single tuple.
For each call
Next
, increment aIterator
object to fetch the next tuple, until the end.
Note: We don’t calculate all the join results before emitting. We try to find a tuple satisfying the join predicate in the Next
function call instead, in order to be pipelined.
Hash Join
The HashJoinExecutor
implements a hash join operation that combines the tuples from two child executors using a hash table.
In our implementation, we make the simplifying assumption that the hash table fits entirely in memory, which means that we don’t need to worry about spilling temporary partitions of the build-side table to disk.
The HashJoinPlanNode
uses the LeftJoinKeyExpression()
and RightJoinKeyExpression()
accessors to get the join keys in the hash table. The left join key and right join key are both of type ColumnValueExpression
.
Remembering that the build phase of a hash join is a pipeline breaker, meaning that the whole hash table has to be built before Next
the probe phase and Next
can be called. So the build phase for the hash table has to be performed in HashJoinExecutor::Init()
. The probe phase, however, can be pipeline for parallel execution.
There are two phases for the basic hash join algorithm:
- Phase #1: Build
- Scan the outer relation and populate a hash table using the hash function on the join attributes.
- Each hash table entry is a pair of
<Key, Value>
, for which theKey
is hashed value of the attributes to be joined on, and theValue
may vary per implementation. In my implementation, it is a collection of full tuples. We may also store a tuple identifier (RID
for example) in order to reduce the space overhead. However, there will be an extra search time overhead to fetch a tuple in the outer table for each comparison.
- Phase #2: Probe
- Scan the inner relation and use the same hash function on each tuple to jump to a location in the hash table (which is previously built), do a brute force search to find a match.
Optimization approach: We may use a Bloom Filter to pre-filter out some keys before actually doing probing.
Note: Currently in my implementation, a hash join can only be performed on a single join attribute. **It is not my fault. That is due to the signature of the HashJoinPlanNode
constructor. The constructor only accepts two AbstractExpression
instead of vectors of AbstractExpression
. **
Aggregation
The AggregationExecutor
implements COUNT
, SUM
, MIN
, and MAX
aggregations. It also supports GROUP By
and HAVING
clauses. The executor supports group by on multiple columns and at most one HAVING
clause, as well as multiple aggregations in a single query. The aggregation is performed over the results produced by a child executor.
An example of an aggregation query:
1 | SELECT count(col_a), col_b, sum(col_c) FROM test_1 Group By col_b HAVING count(col_a) > 100; |
We use a hash table to implement aggregations.
In our implementation, we make the simplifying assumption that the aggregating hash table fits entirely in memory, which means that we don’t need to worry about implementing the two-stage (Partition, Rehash) strategy, and may assume that all of the aggregation results can reside in a in-memory hash table.
In the context of a query plan, aggregations are pipeline breakers. Because not until the aggregation is run for all tuples in the table it can be emitted to other plan nodes. This means that the Build phase of the aggragation must be performed in AggregationExecutor::Init()
.
The workflow for the AggregationExecutor
is as following:
Initialize the
child_executor_
Run the
child_executor_
, gather the resultsFor each tuple produced by the
child_executor->Next()
, compute a running aggregation, depending on the aggregation type, performing group by if necessaryQuery a result in the
Next
method, by checking whether the current tuple pointed by theSimpleAggregationHashTable::Iterator
satisfies theHAVING
predicate. If it is, then reconstruct an output tuple and emit the result. If not, check the next tuple until reaching the end
Limit
The LimitExecutor
constrains the number of output tuples produced by its child executor. If the number of tuples produced by its child executor is less than the limit specified in the LimitPlanNode
, this executor has no effect and yield all of the tuples that it receives.
Distinct
The DistinctExecutor
eliminates duplicates tuples received from its child executor.
We use a hash table to implement distinct operators.
In our implementation, we make the simplifying assumption that the hash table fits entirely in memory, which means that we don’t need to worry about implementing the two-stage (Partition, Rehash) strategy and moving blocks forward and backward between memory and disk.
We should notice that the hash distinct operation can be pipelined. The only requirement for the operation is not to produce a previously produced tuple from the child executor.
Testing
Waning! Waning! Warning!
cmake -DCMAKE_BUILD_TYPE=Debug ..
enables AddressSanitizer which may produces false positives for overflow on STL containers. Set the environment variable ASAN_OPTIONS=detect_container_overflow=0
help resolve the issue!
Run the following command from the CLI if you encounter this issue:
1 | $ export ASAN_OPTIONS=detect_odr_violation=0 |
GTest is used to test each individual components of the assignment. The test files for this assignment locate in:
Executor
:test/execution/executor_test.cpp
Compile and run each test individually by:
1 | $ mkdir build |
Or use make check-tests
to run all of the test cases. Disable tests in GTest by adding DISABLE_
prefix to the test name.
Memory Safety
Use Valgrind on a Linux machine to check memory leaks:
1 | $ make -j executor_test |
Formatting
The code is compliant with Google C++ Style Guide.
To lint, run the following commands:
1 | $ make -j format |
Debugging
Instead of using printf
, logging the output using LOG_*
macros:
1 | LOG_INFO("# Pages: %d", num_pages); |
To enable logging, do the following configuration:
1 | $ mkdir build |
Add #include "common/logger.h"
to any file in which you want to make use of the logging infrastructure.
Gradescope
Make a submission:
1 | $ zip project3-submission.zip \ |
![12361654030329_.pic](/Users/qianzecheng/Library/Containers/com.tencent.xinWeChat/Data/Library/Application Support/com.tencent.xinWeChat/2.0b4.0.9/4db9bdab375d6565216e197f61225841/Message/MessageTemp/9e20f478899dc29eb19741386f9343c8/Image/12361654030329_.pic.jpg)
Known Issues
Not found yet