Note: The project repo is held private according to the course requirements. If you really want the code, please contact me via my email.
Content
- Specification
- Task 1 - Lock Manager && Task 2 - Deadlock Prevention
- Task 3 - Concurrent Query Execution
- Testing
- Memory Safety
- Formatting
- Debugging
- Gradescope
- Known Issues
Specification
BusTub DBMS is used as the skeleton code. In this project, I implement a lock manager and use it to support concurrent query execution. The lock manager is responsible for keeping track of the tuple-level locks issued to transactions and supporting shared & exclusive locks granted and released based on isolation levels. We use a two-phase locking (2PL) based pessimistic concurrency control (use read/write locks) in this project.
The project is comprised of the following three tasks:
- Task 1 - Lock Manager
- Task 2 - Deadlock Prevention
- Task 3 - Concurrent Query Execution
The implementation is thread-safe.
Task 1 - Lock Manager && Task 2 - Deadlock Prevention
The DBMS will use a lock manager (LM) to control when transactions are allowed to access data items. The LM is an internal data structure about the locks currently held by active transactions. Transactions issues lock requests to the LM before they are allowed to access a data item. The LM will either grant the lock to the calling transaction, block that transaction, or abort it.
In the implementation, there will be a global LM for the entire system. The TableHeap
(in memory representation of a table on disk) and Executor
(query executor) classes will use the LM to acquire locks on tuple records (by record id RID
) whenever a transaction wants to access/modify a tuple.
The lock manager uses the WOUND-WAIT algorithm for deciding which transactions to abort.
I implement a tuple-level LM that supports the three common isolation levels:
READ_UNCOMMITED
READ_COMMITTED
REPEATABLE_READ
The LM grants or releases locks according to a transaction’s isolation level. Failed lock operation leads to an ABORTED transaction state and throw an exception. The transaction manager would further catch this exception and rollback write operations executed by the transaction.
I implement theLockManager
class which locates in the following files:
concurrency/lock_manager.cpp
concurrency/lock_manager.h
The following functions are implemented:
LockShared(Transaction, RID)
: Transaction txn tries to take a shared lock on record id rid. This function is blocked on waiting and returns true when granted. Return false if transaction is rolled back (aborts).LockExclusive(Transaction, RID)
: Transaction txn tries to take an exclusive lock on record id rid. This should be blocked on waiting and returns true when granted. Return false if transaction is rolled back (aborts).LockUpgrade(Transaction, RID)
: Transaction txn tries to upgrade a shared to exclusive lock on record id rid. This should be blocked on waiting and returns true when granted. Return false if transaction is rolled back (aborts). This function also aborts the tranasction and returns false if another transaction is already waiting to upgrade their lock.Unlock(Transaction, RID)
: Unlock the record identified by the given record id that is held by the transaction.
There are four transaction states: GROWING
, SHRINKING
, COMMITTED
, ABORTED
, among which the SHRINKING
state is only used in 2PL.
LockShared
We assume that logically there is no locking/unlocking request when a transaction is in its Committed
stage. The logic is implemented on the application level. If our client is smart enough (i.e. the client will not do any read/write when he/she decides commit the transaction).
The workflow is as following:
If the current transaction is in the
TransactionState::ABORTED
state, do nothing and return falseIf in
REPEATABLE_READ
orREAD_UNCOMMITTED
isolation levels, and the current transaction is in theTransactionState::SHRINKING
state, abort the transaction and rollbackIf the current transaction has a pending request for
RID
in itsLockRequest
. Do nothing and return true (we do not allow duplicate requests in the same locking mode)If the current isolation level is
READ_UNCOMMITTED
, abort the transaction and rollback. We should not grant a read lock forREAD_UNCOMITTED
because we are using 2PL. If a read lock is granted, then uncommitted writes cannot be read before releasing the locks (right before the commit stage)Otherwise acquire the request queue on this
RID
Insert the lock request into the request queue, in
LockMode::SHARED
modePopulating the current transactions’s shared lock set
Iterate through the request queue, if a younger
LockMode::EXCLUSIVE
request is found in the request queue before the current one, abort it and rollback the corresponding transactionIf any transaction is aborted during the previous stage, call
notify_all
to wake up all waiting threadsCheck whether the current locking request can be granted in a while loop (busy waiting is undesirable, we can use a while-wait loop instead)
If the locking condition is met, grant the lock by adding it to the current transaction’s shared lock set, return true
LockExclusive
Similar to LockShared
LockUpgrade
Similar to the previous two, there can be different implementations, depending on how you define fairness
Unlock
Unlock
releases the lock held by some transaction. The workflow is as following:
- Check the state of the transaction, if it is in
TransactionState::GROWING
state, and the current isolation is eitherREPEATABLE_READ
orREAD_COMMITTED
, change the transaction state toTransactionState::SHRINKING
. The isolation levelREAD_UNCOMMITTED
does not have a shrinking stage in order for all the uncommitted writes to be read by other transactions - Otherwise acquire the request queue on the specified
RID
- Find all the requests in the request queue with a transaction id equal to the specified transaction id
- Release the lock for that transaction by removing the request from the request queue and erase the transaction’s shared/exclusive lock set
- Explicitly wake up all the other waiting threads on the current request queue
Task 3 - Concurrent Query Execution
During concurrent query execution, executors are required to lock/unlock tuples appropriately to achieve different isolation levels.
We add support for concurrent query execution in the following executors:
src/execution/seq_scan_executor.cpp
src/execution/insert_executor.cpp
src/execution/update_executor.cpp
src/execution/delete_executor.cpp
SeqScanExecutor
- For any tuple to be read from a table, acquire a read lock (we do not need to worry about isolation levels when acquiring locks if it has been properly handled in the lock manager)
- When finish reading, whether to release the lock depends on different isolation levels:
- If in
TransactionState::REPEATABLE_READ
, do not release the read lock, otherwise you cannot perform any read/write unlesse you know that you will perform a read/write in the future and acquire a write lock beforehead. However, you cannot predict what a transaction will do in the future! - If in
TransactionState::READ_COMITTED
, release the lock and do not enter theSHRINKING
stage cuz we want to let other transactions acquire write locks on it and read what other transactions has committed. If we do not release, then it prevents from unrepeatable read, which is not necessary in this isolation level. If we release the read lock and let the current transaction enter theSHRINKING
stage, then the current transaction cannot perform any read on it in the future, this is stupid because you don’t know whether the current transaction will perform a read in the future or not… - If in
TransactionState::READ_UNCOMITTED
, as we illustrated above, we do not have any read lock on any record in this isolation level, the underlying data structure is only protected by OS latches (so we can ignore race condition and concurrency issues on the system level). This is the lowest isolation level in which many unpredictable things may happen. Cuz we don’t have read locks on it, we do not perform unlock when finish reading
- If in
InsertExecutor
- Before insertion, the record is not present in the table, so we do nothing before insertion actually happen on the table/index level
- After inserting a record into the table, acquire a write lock
- Append the change into the current transaction table write set and index write set, which are used for rollbacks
- When finish writing, whether to release the lock depends on different isolation levels:
- If in
TransactionState::REPEATABLE_READ
, do not release the write lock, for the same reason as before - If in
TransactionState::READ_COMITTED
, do not release the write lock because we don’t write dirty write to happen (writes in the current transaction is overwritten by other transactions, which is not allowed in all isolation levels) - If in
TransactionState::READ_UNCOMITTED
, do not release the write lock because we don’t write dirty write to happen
- If in
DeleteExecutor
Similar to InsertExecutor
UpdateExecutor
Similar to InsertExecutor
Note: It is worth noting that the phantom phenomenon can possibly happen in all isolation levels because we didn’t implement the gap-lock.
Testing
Waning! Waning! Warning!
cmake -DCMAKE_BUILD_TYPE=Debug ..
enables AddressSanitizer which may produces false positives for overflow on STL containers. Set the environment variable ASAN_OPTIONS=detect_container_overflow=0
help resolve the issue!
Run the following command from the CLI if you encounter this issue:
1 | $ export ASAN_OPTIONS=detect_odr_violation=0 |
GTest is used to test each individual components of the assignment. The test files for this assignment locate in:
LockManager
:test/concurrency/lock_manager_test.cpp
Transaction
:test/concurrency/transaction_test.cpp
Compile and run each test individually by:
1 | $ mkdir build |
Or use make check-tests
to run all of the test cases. Disable tests in GTest by adding DISABLE_
prefix to the test name.
Memory Safety
Use Valgrind on a Linux machine to check memory leaks:
1 | $ make -j lock_manager_test |
Formatting
The code is compliant with Google C++ Style Guide.
To lint, run the following commands:
1 | $ make -j format |
Debugging
Instead of using printf
, logging the output using LOG_*
macros:
1 | LOG_INFO("# Pages: %d", num_pages); |
To enable logging, do the following configuration:
1 | $ mkdir build |
Add #include "common/logger.h"
to any file in which you want to make use of the logging infrastructure.
Gradescope
Make a submission:
1 | $ zip project4-submission.zip \ |
Known Issues
Not found yet