A Mimer SQL table is stored in a highly optimized, balanced tree structure called a B*-tree. This method offers fast access to all data for both indexed and sequential searches. By using only one access method for all types of data, the Mimer SQL DBMS has been highly optimized and always provides optimal performance.
The B*-tree structures are reorganized automatically by Mimer SQL so as to always employ a perfect balance giving continuous optimal performance.
Fundamental to any database system is how the data is stored. In Mimer SQL, information is stored in Databanks. A databank is a standard Operating System (OS) direct access file and contains an arbitrary number of tables. A Mimer SQL database may contain an arbitrary number of databanks at the discretion of the systems administrator.
The use of standard OS files allows the databanks to be expanded dynamically when required; it also allows the use of OS facilities such as disk striping and RAID systems.
Within a UNIX environment, databanks can also be set up as ‘Raw Device’ files which are often more efficient than ordinary direct access files.
Mimer SQL performs data transfers between disk and the bufferpool on a page basis. The page size used depends on the record size of the table, but will be either 4, 32, or 128 Kbytes (from Mimer SQL version 11). Each databank has a bitmap to indicate which pages are used and which are available. The bitmap can be regarded as a directory of space utilization. If a table requires a new page, this page is marked as in use in the bitmap. If a page is ‘released’ (e.g. when deleting data), it is then marked as free.
Mimer SQL uses a 64-bit integer for page pointers. This allows the databank files to be unlimited of size. In practice, the limitation is always set by the maximum file-size as defined by the file-system used.
The use of bitmaps allows the internal structure of the databank to change, without any requirement for a table to be allocated a fixed size in the databank.
When creating a new databank, Mimer SQL automatically creates the bitmap and the root page. The root page is effectively a master directory of all the tables in the databank. If the initial root page is not large enough to contain the entire table directory then additional root pages are automatically created and linked to the initial page.
In the same way, as the databank grows, additional bitmap pages are created. All other pages in a databank are either free or used to store index or data pages.
Internal Structure of a Databank
The B*-tree consists of an index section and a data section. The index section (the ‘non-leaf nodes’) is essentially a map to enable rapid physical location of the required page on the next level. The rows of the table are stored in the data section (i.e. the ‘leaf nodes’ of the tree).
The top-level index for a table is identified via the root page of the databank containing the table. The root page is the directory of the tables in the databank and contains page number references to the B*-tree structure.
Rows in a Mimer SQL table are identified by the values of their primary keys, which are comprised of one or more columns in the table. It is these values which are used in the index pages in the B*-tree.
Table Structure (B*-tree)
Inside all pages including the data section, data is kept sorted according to the key values. This makes it possible to use a fast binary search algorithm to find a row, or to find the page where a row should be inserted. By holding the data section pages in sorted order the rows, are automatically clustered by the key.
When inserting new rows, the tree grows and when deleting rows, the tree gets smaller. Sometimes, this will mean that a page will become full and have to be split, or will be empty enough for the data in it to be able to be merged with adjacent pages and the page to be released. Splitting and merging are known as reorganization.
The algorithm used for reorganization is a pre-emptive, top-down scheme using a ‘careful replacement’ technique. If an insertion procedure encounters a full node while searching for the insertion position, this node is split and the node at the previous level is updated to reflect the split. The higher node cannot itself be full (otherwise it would have been split when the insertion procedure first encountered it), so the splitting effect does not propagate upwards through the tree. When the insertion search reaches the leaf node, the row can be inserted there and the operation is completed. A similar algorithm is used for deletions.
The ‘careful replacement’ protocol ensures that the permanent storage holds a consistent version of the B*-tree structure at all stages during the reorganization. The split versions of the node are written to new pages taken from the free pages. When these writes to disk are complete, the node at the higher level is updated to reflect the change and written to disk. After this is completed, the old version of the node that required splitting is marked as free. Even if there is a machine failure during the execution of the reorganization, the careful replacement protocol ensures that the disk version of the tree will never be inconsistent.
Mimer SQL performs reorganizations automatically. This totally eliminates the need to perform manual reorganizations while still ensuring that the tree is always kept perfectly balanced giving continuous optimal performance. The reorganizations are always small, involving only the branch of the tree that is being traversed so there is no noticeable effect on response time for the user.
To further improve space utilization, Mimer SQL may compress data before it is stored in the B*-tree structure. This also improves performance, since disk I/O is an expensive operation, and when the data pages are compressed, more records can be stored in one page. In this way, more records are read from (or written to) disk in one I/O operation, this has a positive effect on performance.
The B*-tree structure can handle very large quantities of data in relatively shallow trees. Also, it does not suffer from any fragmentation of free space within either data or index pages.
As the tree is always perfectly balanced, the number of index levels to traverse is always the same for all the rows in a table. There are never any overflow chains, which is a common problem in competing products and can lead to needing a large number of I/Os to access a specific row.
Having the database automatically reorganized continuously means that this is not a task that has to be considered when the system is put into production. The run-time cost can be kept low.
B-Trees: Balanced Tree Data Structures, Peter Neubauer »
Or, enter “b-tree” as search criteria into any Internet search engine.