
![]()
This monograph describes the techniques used for disk space allocation within Progress databases in versions 6, 7, and 8. This is a complex topic and understanding it fully requires serious effort.
The data in a Progress database can be stored either in a single file or in multiple files. A single-file database can be changed into a multi-file database.
Blocks
A Progress database is divided into fixed-size units called database blocks or database pages. The database manager performs all disk i/o operations in units of one or more pages. Before version 8.2A, the database block size is fixed but differs among the various operating systems Progress is available for. See appendix A for a list of specific block sizes by operating system. In version 8.2A and later, the block size can be selected from one the values 1024, 2048, 4096, or 8192 bytes at the time the database is created.
Each block has a standard 16 byte header that contains the information show below.
There are a number of different types of blocks. Each type of block is used to store a different kind of information. The table below lists the various block types.
Block Type
Contents
1 - Master Block
Database status and other information
2 - Index Block
Index entry data
3 - Record Block
Record data
4 - Free Block
Free space (previously used)
5 - Index Anchor Block
Index root block pointers
6 - Sequence Block
Sequence values
7 - Empty Block
Empty space (never used)
Block Addressing
Each database block is uniquely identified by a 32 bit quantity called a dbkey. A block's dbkey is its address in the database. It is an encoding of the block's exact location within the database. Blocks are numbered sequentially, beginning with block number 1. To compute the dbkey of a block, multiply the block number by 32 (64 if the block size is 8192).
When the database block size is less than 8192 bytes, then the block number field of the dbkey is 26 bits, allowing a database to contain up to 67,107,839 blocks.
When the database block size is 8192 bytes, then the block number field of the dbkey is 25 bits, allowing a database to contain up to 33,554,431 blocks.
The dbkey layout in combination with the block size determines the maximum possible database size. The table below gives the maximum database size in bytes for each block size.
Database Block Size (bytes)
Maximum Database Size (bytes)
1024
68,719,475,712
2048
137,438,951,424
4096
274,877,902,848
8192
274,877,898,752
There are several different types of blocks in the database, each identified according to the type of data they contain. The first block in a database is always the master block, which has a dbkey of 32 (64 if the block size is 8192). This is followed by other blocks of various types.
Single-file databases
When the database is a single-file database, all data blocks are stored in a single file named <something>.db, where <something> is the database name. There are also a few other files that are part of the database.They are:
<something>.db (this is the single file that contains data)
<something>.bi (this is the single file that contain the required before-image log)
<something>.ai (this is the single file that contain the optional after-image log)
<something>.lg (this is a file that contains startup, login, error, and shutdown messages)
The first block in the .db file is the first block of the database and the remaining blocks follow in sequence up to the last block. As data are added to the database, the .db file will grow (as described later). A single file database can grow until the data file is at most 2 GB long.
Multi-file databases
A multi-file database consists of a file named <something>.db, where <something> is the name of the database, and multiple data files, having names of the form <something.dnn>. Like single-file databases, there are a number of other files that are also part of the database. The <something>.db file contains a header with information about the database, and a list of the names and locations of all the other files that are part of that database.
<something>.db (contains status information and a list of all the other files)
<something>.dnn (these contain the data)
<something>.bnn (these contain the required before-image log)
<something>.ann (these contain the optional after-image log)
<something>.lg (this text file contains startup, login, error, and shutdown messages)
<something>.st (this text file contains the database structure definition that was used to create it)
When a database is stored in multiple files, each file is called an extent. Except for the last, extents are fixed size. Each extent has a header that does not contain data (and which is not counted when counting blocks), and then some number of database blocks. Blocks are numbered sequentially across extents. The first extent contains blocks 1 through n1, the second blocks n1+1 through n1+n2, the third blocks n1+n2+1 through n1+n2+n3, and so on (where n1, n2, and n3 are the number of blocks in extent 1, 2, and 3 respectively).
Multi-file databases have the following advantages over single-file databases:
- Your database can grow larger than 2 gigabytes even though individual files cannot.
- You can put your database on more than one disk.
- You can put your transaction logs on more than one disk and their locations are remembered automatically.
Single-file databases and multi-file databases are pretty much the same from the standpoint of how the database manager functions during normal operation. The only interesting differences are that a multi-file database can contain large quantities of unused space that has already been allocated by the operating system, and a multi-file database can be much larger than a single file database.
Regardless of whether a database is composed of one or multiple files, no individual file can be larger than 2 GB. This limit stems from the fact that the lseek() C library function, used to provide random access positioning for files, cannot address more because the file offset is specified as a signed 32-bit integer value. The same 2 GB limit applies to all Progress implementations, even if the underlying operating system allows larger files, and to all files accessed by Progress, including files accessed by 4gl programs and the 4gl runtime system.
The database manager allocates disk space primarily to store lists of key-value/rowid pairs (index entries) and to store row values for tables. Index entries are stored as compressed b-trees. Row values are stored as variable-length structures that can span multiple blocks if necessary.
Database blocks that contain index entries are called index blocks and database blocks that contain row values are called RM blocks (for "record manager").
When the database manager needs a block to store something in, it first examines a list of unused blocks, called the free chain. If the free chain has one or more blocks on it, the first block in the chain is used. Blocks on the free chain, called free blocks, are blocks that were once used to store something, but now are no longer needed. Note that the way this works is different from Version 6 (see below).
![]()
If the free chain is exhausted, the database is logically extended by using a block that has never been used before. These blocks are called empty blocks and do not contain any valid data. They may contain random garbage.
If there are any empty blocks at the end of the database, then the high-water mark (the last block that contains useful data) is increased by one and the next empty block is used and converted into a block of the desired type. Once an empty block has been used, it will never again become an empty block. If it should no longer be needed, it will be made into a free block and placed on the free chain. Version 6 databases do not have empty blocks.
If there are no empty blocks in the database, then the database is physically expanded by growing the last extent, provided it is variable-length. Each time the database is physically expanded, a multiple of 16 pages are added. At the first expansion of a session, 16 pages are added, then 32 the next time, then 48, then 64, and so on, up to 128. Each time the database is expanded thereafter, 128 pages at a time are added. The newly added pages become empty blocks above the high-water mark. After the expansion, the first empty block is used, as described above.
If the last extent is fixed-length, then the database cannot be physically expanded and the database operation currently being performed fails and the session is terminated (i.e. the database is shut down).
Note that when a new multi-volume structure is created, all the space within the fixed extents become empty blocks. If a database is loaded into the structure (using procopy or prorest), some or all of the space will be used by the database being loaded in. Any unused space remains empty blocks.
When the database manager needs a block to store something in, it first examines the free chain. If the free chain has one or more blocks on it, the first block in the chain is used.
If there are no free blocks, then the database is physically expanded by growing the last extent, provided it is variable-length. Each time the database is physically expanded, a multiple of 16 pages are added. At the first expansion of a session, 16 pages are added, then 32 the next time, then 48, then 64. Each time the database is expanded thereafter, 64 pages at a time are added. The newly added pages become free blocks and they are added to the the free chain. After the expansion, the first free block is used, as described above.
If the last extent is fixed-length, then the database cannot be expanded and the database operation currently being performed fails and the session is terminated (i.e. the database is shut down).
Note that when a new multi-volume structure is created, all the space within all fixed extents is formatted as free blocks and linked to the free chain. If a database is loaded into the structure (using procopy or prorest), some or all of the space will be used by the database being loaded in. Any remaining unused space remains on the database's free chain.
RM blocks are used to store rows (records) from one or more tables. They have a variable size block header that is approximately 20 + (2 * number of records in the block) bytes long.
For database block sizes below 8192 bytes, up to 32 records can be stored in a block, if they are small enough to all fit. For database block sizes of 8192 bytes or larger, up to 64 records can be stored in a block. If a record is too large to fit in one block, it is split into two or more fragments that are chained together. Each fragment is stored in a different block.
Records (and the field values they are composed of) are stored as variable length structures. The length is determined by the amount of space required to store the values of each field, plus a small amount of overhead for field lengths, record lengths, etc. The details of how each data type is stored are not described in this article.
Row Addressing
Each record in the database is uniquely identified by a 31 bit quantity called a rowid. A rowid is an encoding of the relative position where the record is stored within the database. Rowid's have two parts, consisting of a 25 or 26 bit database block number part and a 5 or 6 bit record number part that identifies the particular record within the block. The record number within the block is sometimes referred to as a record slot number.
When the database block size is less than 8192 bytes, then the block number part is 26 bits, allowing a database to contain up to 67,108,863 blocks. The row number within the block is 5 bits, allowing up to 32 records to be stored in a block. Assuming there were no indexes or schema in the database, up to 2,147,483,520 records can be addressed.
When the database block size is 8192 bytes, then the block number part is 25 bits, allowing a database to contain up to 33,554,431 blocks. The row number within the block is 6 bits, allowing up to 64 records to be stored in a block. Assuming there were no indexes or schema in the database, up to 2,147,483,392 records can be addressed.
Once a record has been assigned a rowid, it remains constant for the life of the record. If the record is eventually deleted, once the deleting transaction is committed, the rowid can be reused when a new record is created later.
Creating Records
When the database manager creates a new record, it must choose a location to store it. A side-effect of choosing the storage location is assigning the rowid.
Before describing the algorithm, three quantities that affect the outcome of the algorithm must be described.
Record allocation constants
The first quantity is the expansion space constant. The database manager tries to leave some unused space in each record block to allow expanding records without having to split them into multiple fragments too frequently. The expansion space constant is fixed at 75 bytes for all systems.
The second quantity is the free space constant. Record blocks which have the same or more unused space than the value of the free space constant are kept on a linked list of partially filled record blocks called the "RM chain". This is done so that the unused space can be easily found and used later.
![]()
The value of the free space constant is dependent on the database block size. See the block size table at the end for the precise values used.
The third quantity is the size of the relative row number in a recid. It is 5 bits long. So a block can contain at most 32 records, no matter how much unused space it has.
How space is allocated for records
To find space to store a record, or a record fragment, the database manager first looks at the RM chain to see if an existing record block with unused space is available. If the block at the head of the RM chain contains enough space to store the fragment while still leaving expansion space, and the block has unused record slots, then that block is used. The record is copied to the block and the amount of unused space in the block is updated.
If the block at the head of the RM chain cannot be used to store the record, and it has less unused space than the free space constant, or if all 32 record slots have been used, it is removed from the RM chain. Otherwise it is moved to the tail end of the chain. In either case, the next block on the front of the RM chain is considered and the process is repeated until sufficient space has been found or the search limits have been reached.
To limit the search time, no more than 100 blocks on the RM chain will be examined and removed from the chain at a time. No more than 3 blocks at a time will be examined and moved to the tail end. If either of these limits is exceeded during a search for space, the search for reusable space is abandoned and a free block is used.
If no existing record block can be used, an unused block is allocated as described previously. The block is formatted as an RM block, and the record is stored in it.
From the foregoing, one can see that the recid that will be assigned to a new record cannot be predicted easily. In a database that has had no deletions or updates, recids will generally be assigned monotonically increasing values. If the database has had many updates or deletions, recid's will become progressively more random. When several transactions are executing concurrently, they will consume space from the same pools and each will affect the recids that are used by the others.
After a record has been created, the amount of unused space in the block will have changed and the algorithm described later for dealing with this situation is preformed.
Updating records
When a transaction updates a record, a new record replaces an existing one. The new record may be the same size, smaller, or larger than the existing one.
If the new record is the same size, then the new record simply replaces the old one.
If the new record is smaller, then the new record replaces the old one, the extra space is compressed so that all the free space in the block is contiguous, and the amount of free space in the block is updated. If the original record was fragmented, then any unneeded fragments are deleted using the algorithm described below for deleting records, except that record space locks are not created.
If the new record is larger and the block containing the original record does not have enough free space to hold the entire new record, the new record will be fragmented. The first fragment will replace the original record (or original first fragment) and consume the remaining free space in the block. The rest of the record is divided into one or more additional fragments and space is allocated using the same algorithm described above for creating new records.
After a record (or fragment) has been updated, if the amount of free space in the block has changed, the algorithm described later for dealing with this situation is performed.
Deleting Records
When a transaction deletes a record, the record is replaced by a recid lock or place holder. These locks are sometimes called hold notes. This is done so that the original record can be restored to the same position, allowing it to retain its recid, if the deleting transaction rolls back. The lock prevents another transaction from using the same recid to store a different record. Once the deleting transaction has committed, the lock can be removed. But it is inefficient and time consuming to remove them at commit time. Instead, they are left there and removed later if the block is modified for some other reason.
After a record has been deleted, the amount of free space in the block will probably have changed and the algorithm described below for dealing with this situation is performed.
When the Free Space In an RM Block Changes
The amount of free space or the number of used record slots in a block can change whenever a record is created, updated, or deleted. After any of these operations, both these quantities are checked to determine if any further action is required.
First, the block is examined to see if any obsolete recid locks need to be removed. Any recid lock whose originating transaction is no longer active can be removed.
The block is then examined to see if it should be added to or removed from the RM chain. If the block is not on the RM chain, has unused record slots, and has at least as much free space as the free space constant, it is added to the RM chain. If the block has more than 4 unused slots, it is added to the front of the chain. Otherwise it is added to the end of the chain.
If the block is already on the RM chain, but now has less free space than the free space constant, or all the record slots are filled, then the block should be removed from the RM chain. But, since the RM chain is linked only in the forward direction, it can only be removed if it is the first block on the chain. If it is not the first block, it will be left in the chain. It will be removed later while searching for record space, if it should become the first block on the chain during the search.
Note that a record block whose records have all been deleted does not become a free block. It will be kept on the RM chain for later reuse. This is sensible because when a record is deleted, it is replaced by a recid lock which is not removed until the block that contains it is being examined to see if it contains sufficient free space to store a new record. Since they will contain at least the recid locks for deleted records, record blocks are never really completely empty.
Index entries are ordered lists of pairs of key values and recids. By searching an index to find a matching key value, the recid of a record that contains the key value can be found. By traversing an index or part of an index, records containing a range of key values (or all key values) can be found, in the order determined by the ordering of key values.
Database blocks that are used to store index entries are called index blocks or index pages. Each index block contains entries from only one index, arranged in order by key value. Since an index will probably be too large to fit into a single block, the Progress database manager uses an arrangement of multiple index blocks called a compressed B-tree.
Compressed B-trees have the following advantages:
- On the average, performance is very good for most operations.
- Any leaf block (the lowest level in the tree) can be accessed with the same number of block accesses as any other leaf block.
- The tree is self-balancing with respect to blocks.
- In the best case, an entry for a non-unique index can be represented by a single bit.
- The disk space consumed by indices is used extremely efficiently. Indices can be very dense.
A B-tree consists of a balanced, hierarchical arrangement of blocks of several different types. At the lowest level are leaf blocks. These are the blocks that contain key value/recid pairs.
![]()
To keep track of the multiple leaf blocks, the database manager keeps an index of them in some other blocks that contain key-value/dbkey pairs. Since there can be many of these also, it keeps indexes of them as well. The result is a tree with some number of leaf blocks, then a smaller number of blocks which point to the leaf blocks, then a still smaller number of blocks which point to blocks which point to leaf blocks, etc.
The blocks which contain pointers to other index blocks are called interior or intermediate blocks. At the highest level, there is a single block of pointers which is called the root block of the index.
Each index block has a header that describes the block, how many entries are in it, how much unused space there is, etc. The header is followed by a variable number of index entries.
In version 7.2A and later, the index entries in a block are compressed to eliminate as much redundant information as possible. The compression scheme is extremely efficient. In the best possible case, an index entry can be represented by only a single bit.
Creating Index Entries
Since index entries are ordered, when a new entry is created, there is exactly one location (in a unique index) where a new entry must be inserted, between a key value that is smaller than the new value, and a key value that is larger. The correct location is found by traversing the index from the root to the appropriate leaf block. The new entry is inserted between two existing entries in the leaf block. If the index block has enough unused space, this is simple and straightforward to do. But what if there isn't enough room?
The block will be split into two blocks. Approximately half the entries are moved from the existing block to a new block. Then an entry pointing to the new block is inserted in the existing leaf block's parent. This may require splitting the parent if it does not have enough free space to hold the new entry, and the splitting process is repeated as far up the tree as needed.
Updating Index Entries
Index entries are updated by deleting the existing entry and then creating a new one, exactly as described in the sections on creating and deleting.
Deleting Index Entries
In a non-unique index, a transaction deletes an index entry by removing it from the index block that contains it and recompressing the following entries, if necessary.
When the last entry in a leaf block of an index is removed and block becomes empty, its parent (the intermediate level index block that points to it) must be updated to delete the pointer. The empty leaf block is then returned to the free chain. When the pointer in the parent block is deleted, it too may become empty, so the pointer in its parent must be deleted and that block returned to the free list as well. This can repeat all the way up to the root of the tree.
In a unique index, a transaction deletes an index entry by replacing it with an "index entry lock" so that another transaction cannot create an entry with the same key value until the deleting transaction commits. This is necessary to ensure that the deleting transaction will be able to restore the original entry if it rolls back.
Index entry locks must eventually be removed from the tree. This is done by collecting blocks that have index entry locks in linked lists, called "index delete chains". that are owned by each active transaction. An index block with a newly created index entry lock in it is placed on that transaction's index delete chain unless it is already on another transaction's index delete chain. At the end of the transaction, the locks are deleted by traversing the chain of index blocks and deleting each lock.
In all releases prior to 8.2A, the database block size was fixed when we built Progress for a particular system. The sizes we used in the past are:
Operating System
Block Size
Free Space Constant
Expansion Constant
MS-DOS
512
150
75
Novell NLM
512
150
75
OS/2
512
150
75
Windows 3.1
512
150
75
VMS
1024
250
75
Solaris
1024
250
75
HP-UX
1024
250
75
DG-UX
1024
250
75
AIX
1024
250
75
SCO Unix
1024
250
75
Unix System V.4
1024
250
75
Pyramid
2048
250
75
Sequent PTX
2048
250
75
Sequent Dynix
2048
250
75
VAX Ultrix
2048
250
75
Nixdorf
2048
250
75
Windows NT
4096
250
75
Windows 95
4096
250
75
Any Unix not mentioned above
1024
250
75
Those sizes are now the defaults for version 8.2A.
You can use
"proutil dbname -C dbanalys"to get information about how much space is being used by various things in the database.
In version 7.3A and later, you can use
"proutil dbname -C tabanalys"to get lots of information about each table (number of rows, average row size, amount of frgamentation, etc.).
In version 7.3C and later you can also use
"proutil dbname -C ixanalys"to get lots of information about indexes, listed by table name and index name.
![]()
Copyright 1997, Progress Software Corp., All Rights Reserved