Selection

Engine Crew Monograph Number 15.
Last Updated August 1, 1998

Arvind Agrawal, Corporate Consulting,
Progress Software Corporation
Assistance from Lori Lashway and Amnon Waisman

0. Contents

  1. Introduction
  2. Row Identifiers
  3. Index Cursors
  4. Index Brackets
  5. Row Retrieval Process
  6. Index Selection
  7. Using the Compiler's xref
  8. Cost of Adding an Index
  9. Design Considerations

1. Introduction

The Progress RDBMS uses indexes to rapidly locate a specific row or a group of rows in a table. When a query is executed, one or more indexes will be selected to be used to retrieve the requested data. Understanding index selection is very important in the design of a database system. Improper use of index may cause a significant bottleneck in the performance of a system. It could lead to performance degradation not only for the process using the improper index, but can also affect the performance of the entire system.

Indexes are used for the following purposes:

  • Fast access and retrieval of a specific row or set of rows.
  • To retrieve rows in specific order.
  • To enforce uniqueness of column values.
  • To allow fast location of rows that contain a specific word or phrase

This article applies to Progress Versions 7 and 8.

2. Row Identifiers

In order to understand how Progress retrieves rows, it is important to understand row identifiers and how they are used as a part of an index. A "rowid" is a row identifier that uniquely encodes the exact disk location of a row within the database. In other words, each row in a database has one unique row identifier. As soon as a row is created, a row identifier is assigned to it and stays the same for the row's entire life. Even when the primary index key values are changed, the rowid remains the same. Once a row is deleted, another row with the same identifier may be created.

When an index is added to a table, Progress automatically adds the row identifier as a component to the index. Even when no index is added to a table, Progress automatically creates a default primary index with the row identifier as its component.

In order for Progress to retrieve any row, it needs its row identifier. The following will explain the role of row identifiers and index selection criteria:

  • Before Progress can retrieve any row, it must first obtain its row identifier. Even for a sequential read of a table without a WHERE, BY, or USE-INDEX clause, Progress will obtain the row identifier of each of the rows using the primary index. Therefore sequential reading of a table involves reading the primary index sequentially, obtaining the row identifier, and then fetching the row using its row identifier.
  • Using a specific row identifier to retrieve a row does not require the use of any index. A row identifier is a unique identifier for a row. It is used to directly retrieve a row without using any index.
  • Using the USE-INDEX clause ensures that Progress will use the specified index. Only one index can be specified with a USE-INDEX clause.
  • In all other cases, Progress determines the indexes to be used at compile time, using the conditions specified in the WHERE or BY clauses.

3. Index Cursors

An "index cursor" is information that is used to maintain a position within an index. It can be moved to the next, the previous, or a specific index entry, and it can be used to read the row at the current entry.

Index cursors are managed by the server on behalf of the clients. A cursor is opened, closed, repositioned, and used to fetch rows for a specific client at the request of client.

Multiple index cursors can be opened on one or more indexes at any time.

4. Index Brackets

An "index bracket" is a set of consecutive entries in an index. A bracket is defined by an index identifier (index number), a low-key value, and a high-key value. All the index entries starting with low-key value and ending with high-key value are included in the bracket. A "bracket scan" is an operation which examines index entries from the low-key value to high-key value.

There are two types of brackets: equality brackets and range brackets. Equality brackets define a set of consecutive index entries that have an equality match on a key. The low-key value and high-key values are the same for an equality match. Range brackets define a set of consecutive index entries from the low-key value to high-key value.

Except when you are using a find by rowid, Progress always uses at least one index bracket to retrieve a row or rows.

5. Row Retrieval Process

To retrieve rows, Progress first uses the index brackets. The client opens a new index cursor or uses an already open one. It then sends the server a request that includes the cursor identifier and the bracket range, which are the low-key and high-key values, and asks for the next, previous, first or last rows in the bracket.

Once a row is retrieved using the index brackets, further selection process continues by applying any remaining selection conditions specified in the WHERE clause.

It is important to note that there are some selection operations that the server can not do, either because they require access to program variables on the client side, or because they are not implemented on the server side, such as can-find. In such cases the server will send the rows in the bracket to the client and the client will apply the remaining selection conditions.

6. Index Selection

The following section describes the index selection process. All the examples use CUSTOMER table with following indexes:

Index Name

Index Columns

Unique

Cust-Num (Primary)
Cust-Num

Yes

Name
Name

No

Sales-Rep
Sales-Rep

No

Country-Post
Country, Postal-code

No

Comments
Comments

Word Index

6.1 Using ROWID

If you retrieve a row using ROWID, Progress will directly retrieve the row by using the rowid and will not use an index.

Statement

Index Used

Remarks

find customer
    where (rowid(customer) = rowid-customer).

None

The row is directly retrieved using the rowid. This is the only way a row can be retrieved without using any index at all.

6.2 Using USE-INDEX

When you use the USE-INDEX option, Progress will use the specified index instead of those the compiler might have chosen.

Statement

Index Used

Remarks

find customer
    where (cust-num = 45) use-index cust-num.

Cust-num

The specified index cust-num will be used.

find customer
    where (cust-num = 45) use-index name.

Name

The specified index name will be used. Since the WHERE clause is using cust-num, the net result will be bracket on the whole index name. This statement will result in a full index scan to retrieve the row.

6.3 Single Index Selection

When a query does not use ROWID or USE-INDEX, a single index or multiple indexes may be used to retrieve the requested data, based on the conditions specified in the WHERE or BY clause. The following rules are used by the compiler for single index retrievals:

  • Select an index which is unique and all the components are involved in the equality matches.
  • Select the index with more active equality matches.
  • Select the index with more active range matches.
  • Select the index with more active sort matches.
  • Select the index that is the primary index.
  • Select the first index alphabetically by index name.

Statement

Index Used

Remarks

for each customer:

Cust-num

Bracket on the whole index cust-num. The primary index cust-num is used since there is no WHERE clause.

for each customer where
    (state = "MD"):

Cust-num

Bracket on the whole index cust-num. The primary index cust-num is used since there is no index on state. This will result in a full index scan to retrieve the rows.

find customer where
    (cust-num = 12).

Cust-num

Single bracket on one index.

for each customer where
    (sales-rep = "John"):

Sales-rep

Single bracket on one index.

for each customer where
    (sales-rep begins "J"):

Sales-rep

Single bracket on one index.

for each customer where
    (cust-num > 20) and (cust-num < 40):

Cust-num

Single bracket on one index.

for each customer where
    (cust-num > 56) by name:

Cust-num

Single bracket on cust-num index. The index on name will not be used for sorting, instead internal sorting will be done once the rows are retrieved.

for each customer
    by name:

Name

Bracket on the whole index name.

If a function or expression is used for the components of an index, an index or bracket will not be used.

Statement

Index Used

Remarks

for each customer where
    (substring (name, 1, 1) = "A"):

Cust-num

The index on name will not be used, instead primary index on Cust-num will be used. This expression will result in a full index scan to retrieve the rows.

for each customer where
    (if rowid-customer <> ? then
        rowid (customer) = rowid-customer
     else true):

Cust-num

In this case row will not be retrieved directly using the rowid. Because Progress selects the index at compile time, it will not be able to evaluate the if statement. Therefore the primary index on cust-num will be selected, resulting in full index scan to retrieve the row.

for each customer where
    (name matches "A*"):

Cust-num

The index on name will not be used, instead primary index on cust-num will be used. This expression will result in a full index scan to retrieve the rows.

6.4 Multiple Index Selection

When the WHERE clause uses AND or OR clauses and indexes are available for both sides of the AND or OR, more than one index can be used. The following rules and examples describe how the compiler uses multiple indexes:

It is important to note that multiple indexes are not used for queries using FIND. Find statements can only use one index.

When the selection criteria includes the use of AND, more than one index will be used when all the components of each index are involved in equality matches, and the indexes are not unique.

Statement

Index Used

Remarks

for each customer where
    (name = "Scott") and
    (sales-rep = "Jim"):

Name
SalesRep

Two indexes used, because both indexes are non-unique and all components are involved in equality matches.

for each customer where
    (name > "scott") and
    (sales-rep > "Jim"):

Name

Only one index used, because the match is not equality match.

for each customer where
    (country = "USA" and
     postal-code = "21000") and
    (sales-rep = "Jim"):

Country-Post
Sales-rep

Two indexes used, because both indexes are non-unique and all components are involved in equality matches.

for each customer where
    (country = "USA" and
     sales-rep = "Jim"):

Sales-rep

Only one index used, since not all the components of country-post are involved in the equality match.

for each customer where
    (cust-num = 65) and
    (sales-rep = "Jim"):

Cust-num

Only one index used, because cust-num is unique.

When an OR is used in the WHERE clause and both the left and right side of the OR contain at least the lead component of an index using either the equality or range matches, then multiple indexes are used.

Statement

Index Used

Remarks

for each customer where
    (country = "USA" and
     postal-code = "21000") OR
    (sales-rep = "Jim"):

Country-post
Sales-rep

Two indexes are used, since the selection criteria on both sides of OR use at least the leading components of both indexes.

for each customer where
    (country = "USA" and
     postal-code = "21000") OR
    (sales-rep > "Jim"):

Country-post
Sales-rep

Two indexes are used, since the selection criteria on both sides of OR use at least the leading components of both indexes.

for each customer where
    (country = "USA") OR
    (sales-rep = "Jim"):

Country-post
Sales-rep

Two indexes are used, since the selection criteria on both sides of OR use at least the leading components of both indexes.

for each customer where
    (postal-code = "21000") OR
    (sales-rep = "Jim"):

Cust-num

Since the postal code is not the leading component, two indexes are not selected. In addition since an OR is used, Progress selects the primary index in this case.

for each customer where
    (name begins "J") OR
    (country = "USA"):

Name
Country-post

Two indexes used, since both the matches are on the leading components of the index.

for each customer where
    (cust-num = 99) OR
    (cust-num = 187):

Cust-num
Cust-num

One index and two brackets used.

for each customer where
    (cust-num < 99) OR
    (name = "John") OR
    (name = "Scott"):

Cust-num
Name
Name

Two indexes and three brackets used.

When a CONTAINS clause with a column that has a word index is used in the WHERE clause, the word index, as well as other indexes, will be used.

Statement

Index Used

Remarks

for each customer where
    (comments contains "amount") and
     sales-rep = "Jim"):

Comments
Sales-rep

Comments is word index, therefore it is selected. Also AND is used with the equality match, therefore sales-rep is also selected.

for each customer where
    (comments contains "amount" and
     name = "John") or
    (country = "USA" and
     postal-code = "21000"):

Comments
Name
Country-Post

Comment index is word index, therefore it is selected. The AND after the comments contains an equality match therefore selected and the selection on the right of OR uses the lead components of the index, therefore it is selected.

7. Using the Compiler's xref

To learn what index or indexes the compiler will use, compile with the xref option. This creates a cross-reference listing with a SEARCH label indicating which indexes will be used. The presence of multiple SEARCH labels for the same statement indicates that multiple indexes or brackets will be used for the query.

Statement

Xref Listing

Remarks

find customer where
    (rowid(customer) =
        rowid-customer).

SEARCH tmp.Customer RECID

Recid (Rowid) is used directly.

find customer where
    (cust-num = 46)
    use-index cust-num.

SEARCH tmp.Customer Cust-Num

Index Cust-Num is used.

find customer where
    (cust-num = 46)
    use-index name.

SEARCH tmp.Customer Name
WHOLE-INDEX

Index Name is used. WHOLE-INDEX after the index name indicates that no bracketing is possible. No bracketing also means that the entire index is bracketed. This will result in a full index scan, reading every index entry, to retrieve the rows.

for each customer:

SEARCH tmp.Customer Cust-Num
WHOLE-INDEX

Index Cust-Num is used. WHOLE-INDEX after the index name indicates that no bracketing is possible.

for each customer where
    (name = "Scott") and
    (Sales-rep = "Jim"):

SEARCH tmp.Customer Name
SEARCH tmp.Customer Sales-Rep

Two entries with the SEARCH label appear in the xref listing for the same listing statement number. This indicates that two indexes Name and Sales-Rep are being used.

for each customer where
    (cust-num < 99) or
    (name = "John") or
    (name = "Scott"):

SEARCH tmp.Customer Cust-Num
SEARCH tmp.Customer Name
SEARCH tmp.Customer Name

Three entries with the SEARCH label appear in the xref listing for the same listing statement number. In this case index Cust-num and Name are used. Name appears twice indicating two separate brackets on name are used.

8. Cost of Adding an Index

Although indexes usually provide performance gains in the retrieval of specific row or set of rows, there is a cost in adding indexes to a table:

  • Slower create and delete operations on rows. This is because index entries will be added or deleted when rows are added or deleted. The more indexes you have, the more index entries will have to be added or deleted.
  • Slower updates of index columns. This is because the index entries must be updated whenever a column that is a component of an index is updated with a new value.
  • Additional storage space. This is because the index entries will occupy additional disk space.
  • Additional administration and maintenance. This is because the increase in number of indexes will increase the time to rebuild indexes. In addition, the increase in size of the database will result in more time to manage the functions such as backup and restore.

9. Design Considerations

As there are advantages and costs involved in adding indexes, therefore it is very important to understand the need for indexes and avoid unnecessary or redundant indexes. It is also a good idea to review the components of each index during design time along with the type of queries that will be performed on the table. This can help in reducing the number of indexes needed by organizing them to take advantage of multiple index selection. The following should be considered when evaluating the need for an index:

  • How many rows are in the table?
  • How often the rows will accessed using this index?
  • How many rows will be accessed using this index?
  • Are there existing indexes which could be used instead?
  • Is the row access required during on-line transaction processing or is it required for nightly batch runs? If it required for nightly batch runs, is the index really required?

     

 

Copyright 1998, Progress Software Corp., All Rights Reserved