4GL Query Concepts and Usage

Engine Crew Monograph No. 1

Last updated August 10, 1997

G Bjorklund, Tim Sargent, Gabriel Steinberg, Mary Szekely, Amnon Waisman,

Thanks to Greg Higgins for HTML assistance

Contents

1. Introduction

This document describes the functionality and behavior of PROGRESS 4gl queries. By understanding how queries work, you can make more effective use of them in your applications. We concentrate on those features that affect the PROGRESS server, discussing 4gl semantics and client features to the extent that they are needed to understand how queries are executed. We do not describe the complete query semantics as they are manifested in 4gl programs. We do not address PROGRESS SQL queries or queries for foreign data managers (DataServers).

It is assumed that the reader is somewhat familiar with the concept of indexes, index cursors, client buffers, and some other basic terms related to the PROGRESS database. There is a lot of fairly complex material presented here. Please don't fall asleep, fall out of your chair, and injure yourself.

2. What are 4gl queries ?

4gl queries are data manipulation operations that are used to retrieve or change data stored in the database. In PROGRESS version 6 and earlier, 4gl queries came in two main flavors: FIND queries and block oriented queries (FOR EACH and PRESELECT). In version 7, the OPEN QUERY/GET flavors of queries were added, with the expectation that they would replace the FIND queries in most cases.

GET queries were added because although FIND queries provide very useful functionality, they are very difficult to optimize exactly because of that functionality. There was a need to provide better performance, but also to maintain existing functionality, and the best way to achieve that was by adding new flavors of queries with somewhat different functionality.

The 4gl language constructs used with the three query types are:

 
FOR EACH / PRESELECT
 
FOR EACH
REPEAT PRESELECT
DO PRESELECT
 
FIND
 
FIND [ FIRST | NEXT | LAST | PREV ]
FIND
FIND CURRENT
FIND record WHERE ROWID (record) = expression.
CAN-FIND ( [ FIRST | LAST ] )
CAN-FIND (record WHERE ...)
 
GET
 
DEFINE QUERY
OPEN QUERY
GET [FIRST | NEXT | LAST | PREV]
GET CURRENT

Other 4gl language constructs related to queries are:

3. The elements of query execution

3.1 Index cursors

An index cursor is a structure which is used to maintain a position within an index. It can be moved to the next index entry, the previous entry or a specific entry, and can be used to read the record at the current entry.

Index cursors are managed by the server on behalf of clients. A cursor is opened, closed, repositioned and used to fetch records for a specific client, at the client's request.

Multiple index cursors can be opened on one or more indexes at any time. Any interaction between them (such as the repositioning of related cursors) is initiated by the client; as far as the server is concerned there is no relationship between them.

3.2 Index brackets

An index bracket is a set of consecutive entries in an index. A bracket is defined by the index identifier (index number), a lowest key value (the low limit) and a highest key value (the high limit). All the index entries starting with the low value and going up to the high value are included in the bracket. Abracket scan is an operation in which all the index entries from the low limit to the high limit are examined. A bracket scan is a fundamental method for retrieving data and uses an index cursor.

There are two classes of brackets: equality brackets and range brackets. Equality brackets define a set of consecutive index entries that have an equality match on a key (or on part of a key). The low and high limit key values are the same.

Range brackets define a set of entries from the low key value to the high key value. The entries in a range bracket often have many different key values.

To evaluate a query, Progress always uses at least one index bracket (except in the case of a find by rowid, when the record is fetched without using an index at all). The 4gl compiler analyzes the query to see if it can use the elements of a WHERE clause, OF, USING, etc., the available indexes and key components to form brackets. The most restrictive brackets will be the most efficient to apply first. If the query does not define any brackets, the compiler will supply a default bracket, usually a table's primary index. The expression

(name BEGINS "M")

describes a set of entries where the name starts with the letter "M", which is part of the name key. This forms an equality bracket. The expression

((name > "Off The Wall") and (name < "Quick Toss Lacrosse"))

describes a set of consecutive entries by the name index. This forms a range bracket.

3.3 Using one index bracket to retrieve records

To retrieve records using an index bracket, the client opens a new index cursor, or uses an already open one. It then sends the server a request that includes the cursor identifier and the bracket range (low and high key values), and asks for the next, previous, first or last records in the bracket. The following queries each use the default bracket on the entire index:
 
        for each customer: /* entire table - no where clause */
        end.
 
        for each customer by state: /* sort, no index on state */
        end.
 
        for each customer
            where (state = "OH"): /* no index on state */
        end.

The following queries each use a single bracket on part of one index:

        for each customer
            where (zip = 12345):
        end.
 
        for each customer
            where (name = "Off The Wall"):
        end.
 
        for each customer
            where (name begins "B"):
        end.
 
        for each customer
            where (zip > 50000) and (zip < 60000):
        end.
 
        for each customer
            where (name = "Off The Wall") and (zip > 50000):
        end.

3.4 Using more than one index bracket to retrieve records

Multi-bracket queries are used by the GET, FOR EACH and PRESELECT statements in version 7 and later. They provide functionality similar to that of index cursors, but can use multiple index brackets and multiple index cursors and provide better performance. They combine index brackets by applying algorithms which take advantage of the index structure, and improve performance for queries with OR and AND operators. For example, if a single index bracket were to be used to execute the query

        for each customer
            where (cust-num <= 10) or (name = "Mary"):
        end.

the whole customer table would have to be scanned, because no smaller bracket will include all the desired records. By using two brackets, many fewer records need to be accessed.

The following queries use two brackets on the zip code index.

for each customer
    where (zip = 12345) or (zip > 40000):
end.
 
for each customer
    where ((zip > 50000) and (zip < 60000)) or
          ((zip > 70000) and (zip < 80000)):
end.

The following query uses three brackets, two on the name index and one on the zip code index.

for each customer
    where (name = "Off The Wall") or
          (zip = 17030) or
          (name = "Sticky Wicket Cricket"):
end.

The following query uses four brackets, two on the name index and two on the zip code index.

for each customer
    where ((name = "Off The Wall") and
       (zip = 01824)) or
       ((name = "Sticky Wicket Cricket") and
       (zip = 22070)):
end.

Multi-bracket queries are typically more efficient (faster) than single index cursors in retrieving records, but they can not be repositioned like index cursors: multi-bracket queries can only be moved forward to the next record, and operations such as GET PREV must be supported by the client via a result list; single-bracket queries can be repositioned by repositioning the underlying index cursor, a feature which is accessible to the 4gl via the INDEXED-REPOSITION option.

3.5 Record selection

Record selection means determining whether a record satisfies a query by evaluating an expression involving its contents rather than finding it via an index. The checking is done by evaluating an expression that is in the form of "E-code" - which returns a result of true or false - against the record. For example, in the query

for each customer
     where (cust-num < 10) and (city = "Boston"):
end.

where the cust-num field is indexed and the city field is not, PROGRESS does the following:

It is important to note that there are some selection operations that the server cannot do, either because they require access to program variables in the client, or because they are not implemented on the server (the most important such function is CAN-FIND which is not yet implemented on the server). In such a case, the server sends the records to the client along with an indication that it cannot perform the selection, and the client must do it.

3.6 Query by words (word-indexes)

Query by words is supported through the 4gl CONTAINS clause. It uses word-indexes, which have the same structure as regular indexes, but contain an entry for each word in a character field rather than one entry for the whole field.

Evaluating CONTAINS clauses differs from evaluating other index expressions in two ways:

3.7 Finding records by ROWID

A query of the form

find customer where (rowid (cust) = <some rowid value>).

does not require the use of any index. The ROWID identifies the exact location of the record in the database, and the server can retrieve it without using any index bracket.

3.8 Unique FIND queries

A FIND query without a NEXT, PREV, FIRST or LAST performs an operation which is different from other FIND operations: it not only locates a record, but also ensures that it is the only record which satisfies the query. If more than one record is found, it returns an error. In order to achieve this, PROGRESS must find the first record, and then look for the next record and not find one. The 2 nd step can be very costly in some cases: for example, if the city field is not indexed, the query

find customer where (city = "Boston")

requires PROGRESS to scan the customer table until such a record is found, and then continue scanning the rest of the table to ensure that no other such record exists.

Whenever possible, the verification is done by the server, but if the server cannot execute the query, the client must verify the record's uniqueness in addition to performing the selection.

3.9 Joins

A join is an operation that combines two tables into one by comparing column values in one table with those in another and including those that are equal. PROGRESS translates joins into multiple single-table queries, which are the only types of queries the server can execute today. For example, the query

for each customer, each order of customer:

is executed by the client by opening two queries: one for cust and one for order. After each customer record is received, the client opens the 2 nd query for orders where order.cust-num = cust.cust-num, and retrieves all the orders for the customer. When it is done with the orders it goes on to the next customer, and so on.

3.10 Field lists

The FIELDS and EXCEPT clauses instruct PROGRESS to retrieve only selected fields from a record. This can reduce the number of bytes sent over the network when accessing a remote server.

Field lists do not reduce the amount of database I/O performed by a query. Records are always read into the server's buffers in their entirety. The server then discards any unnecessary fields from it before sending it to the client.

In single-user or self-serving mode, reducing record sizes does not improve performance. Nevertheless, PROGRESS does discard the unwanted fields in order to help detect 4gl programming errors which would occur when running the same program with a remote server.

The example below illustrates the use of FIELDS.

for each customer fields (name balance):
  ....
end.

The example below illustrates the use of EXCEPT.

for each customer except (name balance):
  ....
end.

Note that PROGRESS will extend the field list with additional fields automatically when they are needed. For instance, the example below, customer.cust-num will be added to the field list because it is needed for the join.

for each customer fields (name),
  each order fields (order-num sales-rep)
    of customer:
    ....
end.

3.11 Prefetch - multiple records into network messages

In order to reduce network traffic, a remote PROGRESS server can package multiple records into each network message when executing a query. It packages up to the size of one network message, which is controlled by the value of the -Mm startup parameter.

This is called prefetch, because it requests the server to fetch additional records before the client requests them.

The initiative for this operation comes from the client. When the client sends a query request to the server, it can set two flags in the request: the first indicates that it would prefer to receive more than one record in each message when possible; the second indicates that, in addition, it wants each message to contain as many records as will fit. If only the first flag is set, the server will balance the goal of reducing the number of network messages with the need to provide reasonable response time, and may send less than full messages. If both are set, the server will not respond until it completely fills a message packet.

The client sets both flags when executing a PRESELECT or when presorting query results. For example, in:

do preselect each customer:
    * * *
end.

the client reads all the customer records and stores their rowids in a result list before entering the loop. It does not proceed until it has all the records, so it doesn't benefit if the first message comes back faster and is only partially full.

The client sets the first flag when executing a FOR EACH or scrolling query with NO-LOCK (unless it contains a NO-PREFETCH qualifier). For example,

for each customer no-lock
    * * *
end.

This is limited to NO-LOCK because it requires fetching and locking more records than the 4gl asked for: in this example, when the FOR EACH loop asks for the 1 st record, the client might get back 4 records. If a lock is used, the client will have obtained locks on the extra records, which are not yet available to the 4gl - they are stored in the network buffer - and if one of them causes a deadlock, the client cannot release it, because its 4gl program isn't aware that it got it.

3.12 Result lists

Result lists are lists of rowids of a query's retrieved records. They are kept by a client and updated as the results of a query are received from a server. They are used to allow the client 4gl program to execute operations such as GET PREV on multi-index bracket queries, where the server can only do GET NEXT. In addition, they are used by PRESELECT queries, and by queries where the results must be sorted (see sorting, below).

3.13 Sorting query results

When a BY clause is used with a query, PROGRESS must produce the query results in the specified order. It does so in one of two ways:

  1. If an index that is in the desired order exists, and a single bracket on it can be used to execute the query while retrieving the minimum number of records, it is used.
  2. Otherwise, the query is executed in a two pass process. First every record is fetched, and the fields required to determine the order are retrieved, along with the ROWIDs. These values are placed in a result list, which is then sorted in the desired order. In the second pass records are retrieved again, using the sorted ROWIDs in the result list.

3.14 Query repositioning

Queries with result lists can be repositioned via the REPOSITION statement. Most such operations are executed by the client using the result list, and don't require server support. For example:

define query q for customer scrolling.
open query q preselect each customer.
 
/* the next statement will simply reposition the query within the result list */

reposition q to row 3.
 
/* the next statement will read customer 3 via its rowid stored in the result list */
 
get next q.

The REPOSITION statement may be handled by the server only when INDEXED-REPOSITION is used on a single index bracket, single table query. For example:

define query q for customer scrolling.
open query q for each customer indexed-reposition.
find customer where ... xxx = rowid (customer).

/* the next statement will cause the server to reposition the index cursor used by the query */

reposition q to rowid xxx.

3.15 The query cache

A query cache is a client mechanism that keeps the most recently read records of a query in the client's buffer pool. This speeds up browsing, when GET PREVIOUS and GET NEXT requests are issued repeatedly.

A query cache is created by specifying the CACHE option in the DEFINE QUERY statement. It is created by default for queries which are browsed by the 4gl browser.

The following program fragment shows how it can be used:

define query q for customer cache 10.
open query q for each customer no-lock.

/* get customer 1 */

get next q.

/* get customer 2, leave customer 1 in the cache */

get next q.

/* get customer 1 from the cache, without access to db or server */

get prev q.

Query caches can only be used with queries that use NO-LOCK.

4. Execution strategy

For almost all queries, there will be a variety of different ways to retrieve the desired data. Among the choices that must be made are:

Query execution strategy is established by the PROGRESS 4gl compiler when a program is compiled. Data structures describing the execution strategy are stored in the generated r-code. For example, if the cust-num and zip fields are indexed and the city field is not, the query

for each customer
    where ((cust-num <= 10) and (city = "Boston"))
           or (zip > 01824):

end.

is compiled by generating the following data and storing it into the r-code:

Whenever possible, the compiler will use multiple index brackets.

For WHERE clauses that contain sub expressions connected by an OR operator, multiple brackets will be used when there are usable brackets on both sides of the OR.

For WHERE clauses that contain sub expressions connected by an AND operator, multiple brackets will be used when all the key components are used in equality matches, and the index on one side is not unique.

The compiler uses the following weights for ranking the available indexes when analyzing brackets and deciding which indexes to make use of:

  1. An index that was specified in USE-INDEX
  2. A unique index when all key components are used for equality matches
  3. The index with the most equality matches
  4. The index with the most range matches
  5. Word indexes referenced by the contains operator
  6. The index with the most sort matches (by)
  7. A table's primary index
  8. The index that comes first alphabetically, by index name

You can determine which indexes will be used by a particular query by examining the output produced by the cross-reference (XREF) option of the 4gl compiler. Each index bracket that will be used will produce one line marked with the tag "SEARCH". When a default bracket on an entire index is being used, it will be marked "WHOLE-INDEX". Sorts required for a BY column when there is no suitable index will be marked with the tag "SORT-ACCESS".

Below is an extract from the cross-reference produced by compiling some of the examples used in this section. Only those lines relevant to queries are shown.

1 COMPILE p3.p
1 CPINTERNAL iso8859-1
1 CPSTREAM ibm850
3 SEARCH demo.customer cust-num WHOLE-INDEX
6 SEARCH demo.customer cust-num WHOLE-INDEX
6 SORT-ACCESS demo.customer Phone
9 SEARCH demo.customer cust-num WHOLE-INDEX
15 SEARCH demo.customer zip
19 SEARCH demo.customer name
23 SEARCH demo.customer name
27 SEARCH demo.customer zip
31 SEARCH demo.customer name
37 SEARCH demo.customer zip
37 SEARCH demo.customer zip
41 SEARCH demo.customer zip
41 SEARCH demo.customer zip
48 SEARCH demo.customer name
48 SEARCH demo.customer zip
48 SEARCH demo.customer name
56 SEARCH demo.customer name
56 SEARCH demo.customer zip
56 SEARCH demo.customer name
56 SEARCH demo.customer zip

As you can see, each line begins with a line number followed by an identifying "tag". Several different tags are used. The ones of interest at the moment are those that provide information about how a query will be executed and what access strategy the compiler has selected. The list below is a summary of the various query related tags.

SEARCH - Indicates an index bracket or look up will be used. The logical database name, table name, and index names are listed. When multiple brackets and indexes are used for the same query, you will see one search line for each bracket.

SEARCH ... WHOLE-INDEX - Indicates that a suitable bracket could not be constructed and an index scan over the entire table will be performed using the index noted.

SORT-ACCESS - Indicates that the query result is to be ordered by a particular column value and no suitable index exists so a sort of the query result on the noted column value is required.

Other tags related to database access are listed below.

ACCESS - Indicates that the specified table and field value is used at this point in the program.

CREATE - Indicates that a record is created at this location in the program.

DELETE - Indicates that a record is deleted at this location in the program.

UPDATE - Indicates that the specified field value of a table is updated at this location in the program.

5. Characteristics of FOR EACH Queries

5.1 Execution

In version 6, FOR EACH queries use a single index cursor and a single index bracket, and their performance is similar to that of FIND queries. In version 7 and later, they use server queries, and utilize multiple index brackets and multiple indexes when possible.

There are two ways to force FOR EACH queries in version 7 to behave like they did in version 6: by using the -v6q session flag, or by using the USE-INDEX clause.

5.2 Block-orientation

FOR EACH queries are executed within an iterating 4gl block; the next record is always fetched at the top of the block.

5.3 Record ordering

The order of retrieved records is "random" when multiple indexes are used. Otherwise it is based on the index that was chosen. If multiple indexes are possible, the BY phrase and USE-INDEX can be used to guarantee a particular order. In version 6, with USE-INDEX or when using -v6q the order is guaranteed to be that of the index used.

5.4 Positioning

FOR EACH queries always move forward. The next record is fetched at the top of each iteration, and no navigation is allowed. There is, however, an exception to this rule in version 6, or when using USE-INDEX or -v6q in later versions: the index cursor can be repositioned via a FIND into the same buffer in a sub-procedure; see the later section about FIND queries.

5.5 Variable binding

FOR EACH queries evaluate variables used in the WHERE clause once, before entering the block. For example, the program fragment

i = 5.
/* the value of i is taken here, only once */
for each customer where (cust-num > i):
    display cust-num.
    i = 1.
end.

gets all customers where cust-num > 5. The fact that the value of the variable i is changed inside the loop doesn't affect the WHERE clause or the outcome of the query.

5.6 Query cache

FOR EACH queries do not use the query cache. A cache is useful only when accessing records more than once, and FOR EACH queries normally do not.

5.7 Prefetch

With NO-LOCK, prefetch is the default for FOR EACH queries. It can be turned off by specifying NO-PREFETCH. If a presort is done, the presort always uses prefetch and field lists, retrieving only the necessary fields and using network messages containing as many records as possible.

6. Characteristics of PRESELECT Queries

6.1 Execution

As far as the server is concerned, PRESELECT queries execute exactly like FOR EACH queries.

On the client, PRESELECT queries work in a two pass sequence, similar to that of a sorting FOR EACH query: first all the records that satisfy the query are read, and a complete result list, with all the ROWIDs, is prepared, and, if necessary, sorted. Then, in response to FIND statements, the records are read again, this time via their ROWIDs from the result list, and passed to the 4gl program. For example:

/* prepare the complete result list */
do preselect each customer:

    /* gets a rowid from list, then reads record */
    find next customer.

    /* gets last rowid from list, reads record */
    find last customer.
end.

6.2 Block-orientation

A PRESELECT query is specified in the header of a 4gl block - either a REPEAT block or a DO block. The preselect pass (the first pass) is completed before the block is entered. Records are read within the block via FIND statements.

6.3 Record ordering

The order of retrieved records is not guaranteed unless a BY or USE-INDEX clause is used. In version 6, with USE-INDEX or when using -v6q, the order is guaranteed to be that of the index used.

6.4 Positioning

A PRESELECT query can move to the next, previous, first, or last record in the result list. The query cannot be positioned by any other means.

6.5 Variable binding

PRESELECT queries evaluate variables used in the WHERE clause once, before entering the block, exactly like a FOR EACH.

6.6 Query cache

Not used.

6.7 Prefetch

Prefetch is used in the first pass, reading the ROWID and, if necessary, the sort fields, with messages that contain as many records as possible. It cannot be turned off.

7. Characteristics of FIND Queries

7.1 Execution

FIND queries always use a single index cursor and a single index bracket. The WHERE clause, and optionally a USE-INDEX clause, determine which index is used. The index and the buffer used by the query define which index cursor is used. Thus, if two queries use the same buffer and the same index, they use the same index cursor. For example, the following three queries all use the same index cursor:

find first customer where (cust-num > 10) use-index cust-num.
 
find next customer where (name = "Mary") use-index cust-num.
 
find prev customer where (cust-num < 100).

Thus, an index cursor is not owned by a specific FIND query, and can be used by multiple FINDs. As a result, a FIND query can change the position of an index cursor used by another query and thus affect the result returned by the other query.

7.2 Block-orientation

Cursors used by find queries are scoped to the scope of the associated buffer.

7.3 Record ordering

The order of returned records is determined by the index used, which can be specified using a USE-INDEX clause. A BY clause is not allowed.

7.4 Positioning

In addition to allowing FIND NEXT, PREV, FIRST and LAST, FIND queries also get repositioned automatically by other queries: when a FIND or FOR EACH query fetches a record, all the index cursors which are used by FIND queries for the same buffer are positioned to the same record. For example, if a customer table contains rows with the following values for the columns cust-num and name:

7.5 Variable binding

In FIND queries, variables used in the WHERE clause are evaluated each time before they are executed. For example, the program fragment

i = 5.
repeat:
    find next customer where (cust-num > i):
    display cust-num.
    i = 53.
end.

displays customers 6 and 54. Since the value of i in "cust-num > i" is evaluated each time the FIND is executed, the query looks for customers with cust-num > 53 in the 2 nd iteration.

7.6 Query cache

Not used.

7.7 Prefetch

Not used.

8. Characteristics of GET Queries

8.1 Execution

GET queries use multi-bracket queries, and utilize multiple index brackets when possible.

8.2 Block-orientation

None. GET queries can span multiple blocks, procedures and modules.

8.3 Record ordering

Not guaranteed unless a BY or USE-INDEX clause is used.

8.4 Positioning

In addition to allowing GET NEXT, PREV, FIRST and LAST, GET queries also support the REPOSITION statement, which changes the current query position.

GET NEXT statements are supported for any query.

GET PREV, GET LAST and GET FIRST statements are supported if either of the following is true:

The REPOSITION statement is supported only if SCROLLING is defined, since it requires a result list. It can move the query some number of rows forward or backward, to a specific row in the result list, or to a record with a specific ROWID.

Most of these operations are performed by the client, using the result list. The server can perform the following operations:

It should be noted that the result list operations are sometimes time-consuming. For example, if the result list is not complete, a GET LAST on a multi-index query requires fetching all the remaining records in the query to complete the result list. When used correctly, however, the result list provides excellent navigation capabilities for browsing.

8.5 Variable binding

Variables referenced in the WHERE clause or a GET query are evaluated once, when the query is opened. For example, the program fragment

i = 5.
open query q for each customer where (cust-num > i).
get next q.
display cust-num.
i = 1.
get next q.
display cust-num.

displays customers 6 and 7. The change in the value of i doesn't affect the query.

8.6 Query cache

Used when the CACHE option is specified, or, by default, when a browser is defined on the query.

8.7 Prefetch

When SCROLLING (or a CACHE) is specified, and NO-LOCK is used, prefetch is the default, but can be turned off via a NO-PREFETCH option. If a presort or PRESELECT is done, the presort always uses prefetch and field lists, retrieving only the necessary fields with messages that contain as many records as possible.

9. When To Use Various Types Of Queries

9.1 FOR EACH Queries

FOR EACH queries are useful for sequential processing of records or joins, where every record needs to be visited only once. Typical examples are:

9.2 PRESELECT Queries

DO and REPEAT PRESELECT queries provide two capabilities which FOR EACH queries do not:

9.3 FIND Queries

FIND queries have the following advantages:

These advantages are very useful for interactive application such as browsing. An example of how this might be used is to browse a customer table in cust-num order, while providing a dialog box to allow the user to enter a customer name, moving the display so that the specified customer is displayed in the middle of the screen. Normal browsing - line up, line down, first screen, next screen, etc. - is done via FIND FIRST, NEXT, LAST and PREV statements. Repositioning to the specific customer is done as follows:

FIND queries have the following disadvantages:

9.4 GET Queries

GET queries were added in version 7 with the intention that they would replace FIND queries in most usage. To a large extent they have taken the place of FIND queries for browsing, especially since the PROGRESS browser requires them. They are also used to support SQL in PROGRESS. Their advantages are:

GET queries are not always the best type of query to use. They have the following disadvantages:

Go to monograph index

Copyright 1997, Progress Software Corp., All Rights Reserved