|
|
0. Contents
- Introduction
- Row Identifiers
- Index Cursors
- Index Brackets
- Row Retrieval Process
- Index Selection
- Using the Compiler's xref
- Cost of Adding an Index
- Design Considerations
1.
Introduction
The Progress RDBMS uses indexes to rapidly locate a
specific row or a group of rows in a table. When a query is
executed, one or more indexes will be selected to be used to
retrieve the requested data. Understanding index selection
is very important in the design of a database system.
Improper use of index may cause a significant bottleneck in
the performance of a system. It could lead to performance
degradation not only for the process using the improper
index, but can also affect the performance of the entire
system.
Indexes are used for the following purposes:
- Fast access and retrieval of a specific row or set of
rows.
- To retrieve rows in specific order.
- To enforce uniqueness of column values.
- To allow fast location of rows that contain a
specific word or phrase
This article applies to Progress Versions 7 and 8.
2. Row
Identifiers
In order to understand how Progress retrieves rows, it is
important to understand row identifiers and how they are
used as a part of an index. A "rowid" is a row identifier
that uniquely encodes the exact disk location of a row
within the database. In other words, each row in a database
has one unique row identifier. As soon as a row is created,
a row identifier is assigned to it and stays the same for
the row's entire life. Even when the primary index key
values are changed, the rowid remains the same. Once a row
is deleted, another row with the same identifier may be
created.
When an index is added to a table, Progress automatically
adds the row identifier as a component to the index. Even
when no index is added to a table, Progress automatically
creates a default primary index with the row identifier as
its component.
In order for Progress to retrieve any row, it needs its
row identifier. The following will explain the role of row
identifiers and index selection criteria:
- Before Progress can retrieve any row, it must first
obtain its row identifier. Even for a sequential read of
a table without a WHERE, BY, or USE-INDEX
clause, Progress will obtain the row identifier of
each of the rows using the primary index. Therefore
sequential reading of a table involves reading the
primary index sequentially, obtaining the row identifier,
and then fetching the row using its row identifier.
- Using a specific row identifier to retrieve a row
does not require the use of any index. A row identifier
is a unique identifier for a row. It is used to directly
retrieve a row without using any index.
- Using the USE-INDEX clause ensures that
Progress will use the specified index. Only one index can
be specified with a USE-INDEX clause.
- In all other cases, Progress determines the indexes
to be used at compile time, using the conditions
specified in the WHERE or BY clauses.
3. Index
Cursors
An "index cursor" is information that is used to maintain
a position within an index. It can be moved to the next, the
previous, or a specific index entry, and it can be used to
read the row at the current entry.
Index cursors are managed by the server on behalf of the
clients. A cursor is opened, closed, repositioned, and used
to fetch rows for a specific client at the request of
client.
Multiple index cursors can be opened on one or more
indexes at any time.
4. Index
Brackets
- An "index bracket" is a set of consecutive entries in
an index. A bracket is defined by an index identifier
(index number), a low-key value, and a high-key value.
All the index entries starting with low-key value and
ending with high-key value are included in the bracket. A
"bracket scan" is an operation which examines index
entries from the low-key value to high-key value.
There are two types of brackets: equality brackets and
range brackets. Equality brackets define a set of
consecutive index entries that have an equality match on a
key. The low-key value and high-key values are the same for
an equality match. Range brackets define a set of
consecutive index entries from the low-key value to high-key
value.
Except when you are using a find by rowid, Progress
always uses at least one index bracket to retrieve a row or
rows.
5. Row
Retrieval Process
To retrieve rows, Progress first uses the index brackets.
The client opens a new index cursor or uses an already open
one. It then sends the server a request that includes the
cursor identifier and the bracket range, which are the
low-key and high-key values, and asks for the next,
previous, first or last rows in the bracket.
Once a row is retrieved using the index brackets, further
selection process continues by applying any remaining
selection conditions specified in the WHERE clause.
It is important to note that there are some selection
operations that the server can not do, either because they
require access to program variables on the client side, or
because they are not implemented on the server side, such as
can-find. In such cases the server will send the rows in the
bracket to the client and the client will apply the
remaining selection conditions.
6. Index
Selection
The following section describes the index selection
process. All the examples use CUSTOMER table with
following indexes:
|
Index Name
|
Index Columns
|
Unique
|
Cust-Num (Primary)
|
Cust-Num
|
Yes
|
Name
|
Name
|
No
|
Sales-Rep
|
Sales-Rep
|
No
|
Country-Post
|
Country, Postal-code
|
No
|
Comments
|
Comments
|
Word Index
|
6.1 Using ROWID
If you retrieve a row using ROWID, Progress will
directly retrieve the row by using the rowid and will not
use an index.
|
Statement
|
Index Used
|
Remarks
|
find customer
where (rowid(customer) = rowid-customer).
|
None
|
The row is directly retrieved
using the rowid. This
is the only way a row can be retrieved without
using any index at all.
|
6.2 Using USE-INDEX
When you use the USE-INDEX option, Progress will
use the specified index instead of those the compiler might
have chosen.
|
Statement
|
Index Used
|
Remarks
|
find customer
where (cust-num = 45) use-index cust-num.
|
Cust-num
|
The specified index
cust-num will be used.
|
find customer
where (cust-num = 45) use-index name.
|
Name
|
The specified index
name will be used. Since the WHERE clause is
using cust-num, the
net result will be bracket on the whole index
name. This statement will result in a full
index scan to retrieve the row.
|
6.3 Single Index Selection
When a query does not use ROWID or
USE-INDEX, a single index or multiple indexes may
be used to retrieve the requested data, based on the
conditions specified in the WHERE or BY
clause. The following rules are used by the compiler for
single index retrievals:
- Select an index which is unique and all the
components are involved in the equality matches.
- Select the index with more active equality matches.
- Select the index with more active range matches.
- Select the index with more active sort matches.
- Select the index that is the primary index.
- Select the first index alphabetically by index name.
|
Statement
|
Index Used
|
Remarks
|
for each customer:
|
Cust-num
|
Bracket on the whole index
cust-num. The primary index cust-num is
used since there is no WHERE clause.
|
for each customer where
(state = "MD"):
|
Cust-num
|
Bracket on the whole index
cust-num. The primary index cust-num is
used since there is no index on state. This
will result in a full index scan to retrieve the
rows.
|
find customer where
(cust-num = 12).
|
Cust-num
|
Single bracket on one
index.
|
for each customer where
(sales-rep = "John"):
|
Sales-rep
|
Single bracket on one
index.
|
for each customer where
(sales-rep begins "J"):
|
Sales-rep
|
Single bracket on one
index.
|
for each customer where
(cust-num > 20) and (cust-num < 40):
|
Cust-num
|
Single bracket on one
index.
|
for each customer where
(cust-num > 56) by name:
|
Cust-num
|
Single bracket on
cust-num
index. The index on name will not
be used for sorting, instead internal sorting will
be done once the rows are retrieved.
|
for each customer
by name:
|
Name
|
Bracket on the whole index
name.
|
If a function or expression is used for
the components of an index, an index or bracket will not be
used.
|
Statement
|
Index Used
|
Remarks
|
for each customer where
(substring (name, 1, 1) = "A"):
|
Cust-num
|
The index on name will not
be used, instead primary index on Cust-num will
be used. This expression will result in a full
index scan to retrieve the rows.
|
for each customer where
(if rowid-customer <> ? then
rowid (customer) = rowid-customer
else true):
|
Cust-num
|
In this case row will not be
retrieved directly using the rowid. Because
Progress selects the index at compile time, it will
not be able to evaluate the if statement. Therefore
the primary index on cust-num will
be selected, resulting in full index scan to
retrieve the row.
|
for each customer where
(name matches "A*"):
|
Cust-num
|
The index on name will not
be used, instead primary index on cust-num will
be used. This expression will result in a full
index scan to retrieve the rows.
|
6.4 Multiple Index
Selection
When the WHERE clause uses AND or
OR clauses and indexes are available for both sides
of the AND or OR, more than one index can
be used. The following rules and examples describe how the
compiler uses multiple indexes:
It is important to note that multiple indexes are not
used for queries using FIND. Find statements can
only use one index.
When the selection criteria includes the use of
AND, more than one index will be used when all the
components of each index are involved in equality matches,
and the indexes are not unique.
|
Statement
|
Index Used
|
Remarks
|
for each customer where
(name = "Scott") and
(sales-rep = "Jim"):
|
Name
SalesRep
|
Two indexes used, because both
indexes are non-unique and all components are
involved in equality matches.
|
for each customer where
(name > "scott") and
(sales-rep > "Jim"):
|
Name
|
Only one index used, because the
match is not equality match.
|
for each customer where
(country = "USA" and
postal-code = "21000") and
(sales-rep = "Jim"):
|
Country-Post
Sales-rep
|
Two indexes used, because both
indexes are non-unique and all components are
involved in equality matches.
|
for each customer where
(country = "USA" and
sales-rep = "Jim"):
|
Sales-rep
|
Only one index used, since not
all the components of country-post
are involved in the equality match.
|
for each customer where
(cust-num = 65) and
(sales-rep = "Jim"):
|
Cust-num
|
Only one index used, because
cust-num is unique.
|
When an OR is used in the WHERE clause
and both the left and right side of the OR contain
at least the lead component of an index using either the
equality or range matches, then multiple indexes are used.
|
Statement
|
Index Used
|
Remarks
|
for each customer where
(country = "USA" and
postal-code = "21000") OR
(sales-rep = "Jim"):
|
Country-post
Sales-rep
|
Two indexes are used, since the
selection criteria on both sides of OR use at
least the leading components of both
indexes.
|
for each customer where
(country = "USA" and
postal-code = "21000") OR
(sales-rep > "Jim"):
|
Country-post
Sales-rep
|
Two indexes are used, since the
selection criteria on both sides of OR use at
least the leading components of both
indexes.
|
for each customer where
(country = "USA") OR
(sales-rep = "Jim"):
|
Country-post
Sales-rep
|
Two indexes are used, since the
selection criteria on both sides of OR use at
least the leading components of both
indexes.
|
for each customer where
(postal-code = "21000") OR
(sales-rep = "Jim"):
|
Cust-num
|
Since the postal code is
not the leading component, two indexes are not
selected. In addition since an OR is used,
Progress selects the primary index in this
case.
|
for each customer where
(name begins "J") OR
(country = "USA"):
|
Name
Country-post
|
Two indexes used, since both the
matches are on the leading components of the
index.
|
for each customer where
(cust-num = 99) OR
(cust-num = 187):
|
Cust-num
Cust-num
|
One index and two brackets
used.
|
for each customer where
(cust-num < 99) OR
(name = "John") OR
(name = "Scott"):
|
Cust-num
Name
Name
|
Two indexes and three brackets
used.
|
When a CONTAINS clause with a column that has a
word index is used in the WHERE clause, the word
index, as well as other indexes, will be used.
|
Statement
|
Index Used
|
Remarks
|
for each customer where
(comments contains "amount") and
sales-rep = "Jim"):
|
Comments
Sales-rep
|
Comments is word index, therefore it is selected.
Also AND is used
with the equality match, therefore sales-rep is
also selected.
|
for each customer where
(comments contains "amount" and
name = "John") or
(country = "USA" and
postal-code = "21000"):
|
Comments
Name
Country-Post
|
Comment index is word index, therefore it is
selected. The AND after the
comments contains an equality match therefore
selected and the selection on the right of
OR uses the lead components of the index,
therefore it is selected.
|
7. Using the
Compiler's xref
To learn what index or indexes the compiler will use,
compile with the xref option. This creates a
cross-reference listing with a SEARCH label
indicating which indexes will be used. The presence of
multiple SEARCH labels for the same
statement indicates that multiple indexes or brackets will
be used for the query.
|
Statement
|
Xref Listing
|
Remarks
|
find customer where
(rowid(customer) =
rowid-customer).
|
SEARCH tmp.Customer RECID
|
Recid (Rowid) is used directly.
|
find customer where
(cust-num = 46)
use-index cust-num.
|
SEARCH tmp.Customer
Cust-Num
|
Index Cust-Num is
used.
|
find customer where
(cust-num = 46)
use-index name.
|
SEARCH tmp.Customer Name
WHOLE-INDEX
|
Index Name is
used.
WHOLE-INDEX
after the index
name indicates that no bracketing is possible.
No bracketing also means that the entire index is
bracketed. This will result in a full index scan,
reading every index entry, to retrieve the
rows.
|
for each customer:
|
SEARCH tmp.Customer Cust-Num
WHOLE-INDEX
|
Index Cust-Num is
used. WHOLE-INDEX
after the index name
indicates that no bracketing is possible.
|
for each customer where
(name = "Scott") and
(Sales-rep = "Jim"):
|
SEARCH tmp.Customer Name
SEARCH tmp.Customer Sales-Rep
|
Two entries with
the
SEARCH
label appear in the xref
listing for the same listing statement number. This
indicates that two indexes Name and
Sales-Rep are being used.
|
for each customer where
(cust-num < 99) or
(name = "John") or
(name = "Scott"):
|
SEARCH tmp.Customer Cust-Num
SEARCH tmp.Customer Name
SEARCH tmp.Customer Name
|
Three entries with
the
SEARCH
label appear in the xref
listing for the same listing statement number. In
this case index Cust-num and
Name are used. Name
appears twice indicating
two separate brackets on name are
used.
|
8. Cost of
Adding an Index
Although indexes usually provide performance gains in the
retrieval of specific row or set of rows, there is a cost in
adding indexes to a table:
- Slower create and delete operations on rows. This is
because index entries will be added or deleted when rows
are added or deleted. The more indexes you have, the more
index entries will have to be added or deleted.
- Slower updates of index columns. This is because the
index entries must be updated whenever a column that is a
component of an index is updated with a new value.
- Additional storage space. This is because the index
entries will occupy additional disk space.
- Additional administration and maintenance. This is
because the increase in number of indexes will increase
the time to rebuild indexes. In addition, the increase in
size of the database will result in more time to manage
the functions such as backup and restore.
9. Design
Considerations
As there are advantages and costs involved in adding
indexes, therefore it is very important to understand the
need for indexes and avoid unnecessary or redundant indexes.
It is also a good idea to review the components of each
index during design time along with the type of queries that
will be performed on the table. This can help in reducing
the number of indexes needed by organizing them to take
advantage of multiple index selection. The following should
be considered when evaluating the need for an index:
- How many rows are in the table?
- How often the rows will accessed using this
index?
- How many rows will be accessed using this index?
- Are there existing indexes which could be used
instead?
- Is the row access required during on-line transaction
processing or is it required for nightly batch runs? If
it required for nightly batch runs, is the index really
required?
|