Note: The project repo is held private according to the course requirements. If you really want the code, please contact me via my email.
Content
- Specification
- Task 1 - LRU Replacement Policy
- Task 2 - Buffer Pool Manager Instance
- Task 3 - Parallel Buffer Pool Manager
- Testing
- Formatting
- Debugging
- Hints
- Gradescope
- Known Issues
Specification
BusTub DBMS is used as the skeleton code. In this project, a buffer pool is implemented in the storage manager. The buffer pool is responsible for moving physical pages back and forth from main memory to disk. It uses the idea of virtual memory / page mapping that allows a DBMS to support databases that are larger than the amount of memory that is available to the system. The buffer pool provides an abstraction layer to the system so that users/programmers do not need to interact with physical storage directly.
The implementation is thread-safe. Multiple threads can access the internal data structures simultaneously and concurrency control is properly handled.
The following components in the storage manager are implemented in this project:
- LRU Replacement Policy
- Buffer Pool Manager Instance
- Parallel Buffer Pool Manager
Task 1 - LRU Replacement Policy
The LRUReplacer
component is responsible for tracking page usage in the buffer pool. The maximum number of pages for LRUReplacer
is the same as the size of the buffer pool since it contains placeholders for all of the frams in the BufferPoolManager
.
The list
container in C++ is used to implement this component. APIs of LRUReplacer
are listed below:
Victim(frame_id_t*)
: Remove the object that was accessed least recentlyPin(frame_id_t)
: Called after a frame is pinned (marked as in use) to a frame in the buffer poolUnpin(frame_id_t)
: Called when thepin_count
of a page in the buffer pool becomes 0Size()
: Return the number of frames that are currently in theLRUReplacer
Task 2 - Buffer Pool Manager Instance
The BufferPoolManagerInstance
is responsible for fetching database pages from the DiskManager
and storing them in memory. The BufferPoolManagerInstance
should also flush dirty pages out to disk when it is explicitly instructed to do so or when it needs to evict a page to make space for a new page.
All in-memory pages in the system are represented by Page
objects. Each Page
object acts as a container that the DiskManager
will use as a location to copy the contents of a physical page that it reads from disk. Page
can be reused as it moves back and forth to disk, which means that the same Page
object may contain a different physical page throughout the life of the system. The page_id
is an identifier for each page that keeps track of the location of the physical page. The page_id
field must be set to INVALID_PAGE_ID
if it does not contain a physical page.
Each Page
object also maintains a counter for the number of threads that have ‘pinned’ that page. A pinned page is not allowed to be freed. Each page object also keeps track of whether it is dirty or not. A dirty page must be written to disk before it can be unpinned.
The BufferPoolManagerInstance
object uses LRUReplacer
to keep track of when Page
objects are accessed so that it can decide which one to evict when necessary. It also uses a free list to keep track of the current available slots in the buffer pool.
The following functions are implemented:
FetchPgImp(page_id)
: Fetch the requested page from the buffer pool, identified bypage_id
. The target page must exist on disk- Search the page table, if exists, pin it and return it
- Otherwise if the page does not exist, find a replacement page from either the free list or the replacer
- If both the free list and the replacer are not available, return nullptr, indicating that this page cannot be swapped into the memory
- If a replaced page is found via the replacer, flush it onto disk if it is dirty
- Delete the replaced page from the page table and add a new entry for the new page
- Update the metadata for the new page and read the contents from disk
- Return the results
UnpinPgImp(page_id, is_dirty)
: Unpin the target page from the buffer pool- Check whether the page exists in the buffer pool, if not, do nothing
- Modify the dirty flag of the unpinned page
- Check the
pin_count
of the target page to be unpinned, if already 0, which means that this page has been unpinned previously and no actions are needed, return - Otherwise decrement the
pin_count
, if it decreases to 0, call replacer’sUnpin
function to notify it that this page can be replaced - Return the results
FlushPgImp(page_id)
: Flushes the target page to disk regardless of its pin status- Check whether the page exits in the buffer pool, if not, do nothing
- Otherwise call
disk_manager_->WritePage
to flush the page onto disk
NewPgImg(page_id)
: Creates a new page in the buffer pool. The target page does not exist on disk- Check whether all the pages in the buffer pool are pinned, if they are, do nothing and return nullptr
- Otherwise pick a victim page from either the freelist or the replacer
- Flush the old page onto disk if necessary, then update the page’s metadata, return the results
DeletePgImg(page_id)
: Delete a page from the buffer page- Search the page table, if cannot find the target page, do nothing and return true
- If find the target page in the page table, which means that the target page is in the buffer pool, check the
pin_count
for the page - If the
pin_count
is non-zero, which means that someone is using the page, do nothing and return false - Otherwise flush the page onto disk if dirty, and excute subroutines to reset and deallocate the page
FlushAllPagesImpl()
: Flush all the pages in the buffer pool onto disk- Only need a
for
loop to flush every page onto disk
- Only need a
Pay attention to the implementation of NewPgImp
and FetchPgImp
, especially some boundary checking. Other functions are trivial.
Task 3 - Parallel Buffer Pool Manager
To support parallelism, we can use multiple buffer pools, each with its own latch.
In this task, we implement a ParallelBufferPoolManager
class that holds multiple BufferPoolManagerInstance
s. We use page_id mod num_instances
to map a request to some page to some instance of BufferPoolManagerInstance
.
The following functions are implemented:
ParallelBufferPoolManager(num_instances, pool_size, disk_manager, log_manager)
: Constructor, allocate and create individual BufferPoolManagerInstances~ParallelBufferPoolManager()
: Destructor, destruct all BufferPoolManagerInstances and deallocate any associated memoryGetPoolSize()
: Get size of the buffer pool (sum of the buffer pool size for eachBufferPoolManagerInstance
)- As simple as
num_instances_ * pool_size_
- As simple as
GetBufferPoolManager(page_id)
: Retrieve a pointer to the BufferPoolManager responsible for handling given page id, determine bypage_id mod num_instances
FetchPgImp(page_id)
: Fetch the requested page from the buffer poolUnpinPgImg(page_id, is_dirty)
: Unpin the target page from the buffer poolFlushPgImg(page_id)
: Flushes the target page to diskNewPgImg(page_id)
: Creates a new page in the buffer poolDeletePgImp(page_id)
: Deletes a page from the buffer poolFlushAllPagesImpl()
: Flushes all the pages in the buffer pool to disk
Task 3 is just a just trivial wrap up of task 2, don’t need to do any concurrency control as long as you have done it in part 2.
Testing
GTest is used to test each individual components of the assignment. The test file for this assignment locates in:
LRUReplacer
:test/buffer/lru_replacer_test.cpp
BufferPoolManagerInstance
:test/buffer/buffer_pool_manager_instance_test.cpp
ParallelBufferPoolManager
:test/buffer/parallel_buffer_pool_manager_test.cpp
Compile and run each test individually by:
1 | # take LRUReplacer test for example |
Or use make check-tests
to run all of the test cases. Disable tests in GTest by adding DISABLE_
prefix to the test name.
Formatting
The code is compliant with Google C++ Style Guide.
To lint, run the following commands:
1 | $ make format |
Debugging
Instead of using printf
, logging the output using LOG_*
macros:
1 | LOG_INFO("# Pages: %d", num_pages); |
To enable logging, do the following configuration:
1 | $ mkdir build |
Add #include "common/logger.h"
to any file in which you want to make use of the logging infrastructure.
Hints
To speed up the build process, pass the -j
flag to enable multi-threading compiling:
1 | $ make -j 4 |
Gradescope
Known Issues
Not found yet