One-To-N Relationships

The automatic sets of the Enterprise Database Server can represent one-to-N relationships between two data sets or on a single data set using two automatic sets—one for each direction the relation can be traversed.

One-To-N Relationships on a Single Data Set

The system file naming structure gives rise to a logical tree structure where each node of the tree has an associated identifier with a maximum of 17 EBCDIC characters. It is desirable to physically store the actual identifiers in such a way as to conserve storage, because eventually there is a very large number of file names stored. Therefore, instead of using a data set of file titles, consider using a data set of nodes containing the file identifiers. There is a relationship between the nodes of this data set that give the arcs in the tree‑of‑file titles. This situation can be represented by Figure D–4.

Figure 54. One-To-N Relationships on a Single Data Set

One-To-N Relationships on a Single Data Set

A single file identifier can appear at several different nodes. This can easily be seen from the two file titles A/A and A/C/A. The identifier A appears at three different nodes in the tree structure for the two titles as shown in Figure D–5.

Figure 55. Single File Identifier Tree Structure

Single File Identifier Tree Structure

Therefore, it is natural to introduce a unique node number called NODENUM.

By convention, let there be two virtual nodes that are not actually stored and that have node numbers 0 and 1. Node 0 is the ultimate parent node of all system files, and node 1 is the ultimate parent node of all usercode files. A file title is the concatenation of all the node identifiers on the path from the root node to a leaf node of the tree.

It is preferable to avoid storing an identifier twice if it is both a directory and a final identifier of a file title. For example, for the two file titles A and A/B you want to store A only once. (Note that with the file titles A, A/B, and A/B/A, identifier A is stored twice.) This can be accomplished by introducing two flags in the nodes record. DIRFLAG indicates whether the identifier of the node is part of a longer file title or not, and FILEFLAG indicates whether the identifier is the final identifier of a file title or not. When both flags are on in a node record, you have eliminated the need for storing the identifier a second time. (Actually, only FILEFLAG is necessary. However, DIRFLAG can make lookups more efficient. If DIRFLAG is reset, then there are no more nodes below the current one; thus, an unnecessary call on the FIND operator can be avoided for leaf nodes, since a NOTFOUND exception would result.)

Besides its own unique identifier, NODENUM, each node has a unique (immediate) parent node, called PARENTNODENUM.

You can represent the preceding relationship with a DASDL description as follows:

NODES DATA SET
 (
  NODENUM        FIELD(24);
  PARENTNODENUM  FIELD(24);
  ID             ALPHA(17);
  FLAGS FIELD
   (
    DIRFLAG;
    FILEFLAG;
   );
 );
 
Note: Field items have been used in the DASDL description to maximize efficiency in storage space utilization.

For example, suppose you store the following file titles, in order, and assign unique node numbers consecutively starting at 2:

*A
*A/B/C
*A/B/D
*A/Z
*X

The following records are stored in the NODES data set:

NODENUM

PARENT NODENUM

ID

DIRFLAG

FILEFLAG

2

0

A

TRUE

TRUE

3

2

B

TRUE

FALSE

4

3

C

FALSE

TRUE

5

3

D

FALSE

TRUE

6

2

Z

FALSE

TRUE

7

0

X

FALSE

TRUE

This situation represents the following tree, where only NODENUM and ID are shown for each node for the sake of clarity:

   0,*
  /   \
7,X    2,A
      /   \
    6,Z   3,B
         /\
       4,C   5,D
Note: The node with NODENUM=0 is implicit and is not stored in the NODES data set.

The arcs relationship can be represented with the following two sets:

ARCSET SET OF NODES KEY(PARENTNODENUM, ID);
NODESET SET OF NODES KEY(NODENUM);

By default the two sets are index sequential structures with no duplicate keys allowed. Retrievals using index sequential structures are very efficient if the selection expression is FIRST, NEXT, PRIOR, or LAST. When a key condition is specified, the retrievals are very efficient if the key condition allows at least a partial binary search on the index. A binary search on the index results if, and only if, the following four conditions are met when a key condition is specified:

  • The first N of the M key items are used once each in Boolean primaries (which are either equalities or inequalities).

  • All the primaries are connected by the logical connective AND, and the connectives NOT and OR do not appear.

  • The primaries for the first N‑1 key items are equalities.

  • The primary for the Nth key item is as follows:

  • If the key item is ascending and the operation is a FIND FIRST or a FIND NEXT, or if the key item is descending and the operation is a FIND LAST or a FIND PRIOR, then the relational operator is one of equal to (=), greater than (>), or greater than or equal to (>=).

  • If the key item is descending and the operation is a FIND FIRST or a FIND NEXT, or if the key item is ascending and the operation is a FIND LAST or a FIND PRIOR, then the relational operator is one of equal to (=), less than (<), or less than or equal to (<=).

For example, the following selection expressions specifying key conditions on ARCSET are the most efficient. To find the first or the next ARCSET at the specified position, use any of the following expressions:

PARENTNODENUM = V1 AND ID = V2
PARENTNODENUM = V1 AND ID > V2
PARENTNODENUM = V1 AND ID GEQ V2
PARENTNODENUM = V1
PARENTNODENUM > V1
PARENTNODENUM GEQ V1

To find the last or the prior ARCSET at the specified position, use any of the following expressions:

PARENTNODENUM = V1 AND ID = V2
PARENTNODENUM = V1 AND ID < V2
PARENTNODENUM = V1 AND ID LEQ V2
PARENTNODENUM = V1
PARENTNODENUM < V1
PARENTNODENUM LEQ V1

If ID is a descending key item, the same selection expressions would be optimized except that > would be replaced by <, < by>, GEQ by LEQ, and LEQ by GEQ.

Thus, given a node X, the nodes immediately below it can be found simply and efficiently (in alphabetical order of identifier) as follows:

SET ARCSET TO BEGINNING;
    DO BEGIN
       FIND NEXT ARCSET AT PARENTNODENUM=X : RSLT;
       IF NOT RSLT THEN
          BEGIN
          [PROCESS NODES RECORD]
          END;
       END UNTIL RSLT;
    IF REAL(RSLT).DMERROR NEQ NOTFOUND THEN
    DMTERMINATE(RSLT);

And given a node Y, its parent can be found as follows:

GET NODES(PARENTNODENUM : =PARENTNODENUM);
FIND NODESET AT NODENUM=PARENTNODENUM : RSLT;

The unique node numbers are created by the user-written update program. Because of creations and deletions over time, the maximum node number is continually increased. In theory, any fixed field size for the node number is eventually exceeded. To avoid this, it is desirable to reuse node numbers. This can be accomplished by introducing a flag in the node record called INUSEFLAG. INUSEFLAG=1 indicates that the node is in use. INUSEFLAG=0 indicates that the node is not in use, but it is not to be deleted in the Enterprise Database Server sense. Introduction of INUSEFLAG results in the following DASDL description:

NODES DATA SET
 (
  NODENUM        FIELD(24);
  PARENTNODENUM  FIELD(24);
  ID             ALPHA(17);
  FLAGS FIELD
   (
    DIRFLAG;
    FILEFLAG;
   );
  INUSEFLAG      NUMBER(1);
 );
 ARCSET SET OF NODES KEY(INUSEFLAG, PARENTNODENUM, ID);
 NODESET SET OF NODES KEY(INUSEFLAG, NODENUM);

The nodes immediately below a node X can be found as follows:

SET ARCSET TO BEGINNING;
DO BEGIN
   FIND NEXT ARCSET AT INUSEFLAG=1 AND
   PARENTNODENUM=X : RSLT;
   IF NOT RSLT THEN
   BEGIN
      [PROCESS ARCS RECORD]
   END;
END UNTIL RSLT;
IF REAL(RSLT).DMERROR NEQ NOTFOUND THEN
DMTERMINATE(RSLT);

The parent node of a node Y can be found as follows:

GET NODES(PARENTNODENUM : = PARENTNODENUM);
FIND NODESET AT INUSEFLAG=1 AND NODENUM=PARENTNODENUM;

A new node can be allocated as follows:

LOCK FIRST NODESET AT INUSEFLAG=0 : RSLT;
IF RSLT THEN
   BEGIN
     IF REAL(RSLT).DMERROR NEQ NOTFOUND THEN
     DMTERMINATE(RSLT);
     LOCK LAST NODESET : RSLT;
     IF RSLT THEN
        BEGIN
          IF REAL(RSLT).DMERROR NEQ NOTFOUND THEN
          DMTERMINATE(RSLT);
          [NODES DATA SET ENTIRELY EMPTY.
          STORE FIRST NODE WITH NODENUM=1.]
        END ELSE
        BEGIN
          GET NODES(X : = NODENUM);
          [ALL NODES IN USE.  CREATE A NEW NODE WITH
          NODENUM=X+1.]
        END;
   END ELSE
   BEGIN
     [THIS NODE NOT IN USE. REUSE THIS NODE.
     CHANGE INUSEFLAG FROM 0 TO 1, BUT KEEP
     SAME VALUE FOR NODENUM.]
   END;
Note: To mark a node as not in use or in use, the INUSEFLAG value has to be changed. Since no duplicates are allowed on the two sets, this cannot be done with a lock operation, changing the INUSEFLAG value, and a store operation in series. It has to be done with a delete operation, a re-create operation, changing the INUSEFLAG value, and a store operation in series. Alternatively, duplicates could be allowed on the sets, but this would negate the automatic enforcement of no duplicates by the Enterprise Database Server.

The following procedure can be used to search for a file title. Whether the file title is found or not, it indicates the last node at which there is a common identifier. This facilitates storing the new title if it is not found.

EBCDIC ARRAY E[0: 13, 0: 16]; % E[n,*] holds the nth
                            % identifier of the title
                            % (n > 0).
PROCEDURE SEARCH(E, N, LASTCOMNODE, LASTLEVMATCHED);
     VALUE N;
     EBCDIC ARRAY E[0,0]; % holds identifiers
     REAL
        LASTCOMNODE, % node num of last common node
        LASTLEVMATCHED, % index of last identifier
        MATCHED
        N;           % number of identifiers
 BEGIN
   REAL I, NODENUM;
   BOOLEAN DIRFLAG, RSLT;
   DIRFLAG : = TRUE;
   I : = 1;
   DO BEGIN
      FIND ARCSET AT INUSEFLAG=1 AND
      PARENTNODE=NODENUM AND
      ID=E[I,0] : RSLT;
      IF NOT RSLT THEN
         GET NODES
         (NODENUM : = NODENUM
         ,DIRFLAG : = DIRFLAG
         );
      END UNTIL (I : =  *+1) > N OR RSLT OR NOT DIRFLAG;
      LASTLEVMATCHED : = I-1;
      LASTCOMNODE : = NODENUM;
      IF RSLT THEN
         IF REAL(RSLT).DMERROR NEQ NOTFOUND THEN
         DMTERMINATE(RSLT);
END;

One-To-N Relationships Between Two Data Sets

In an archive database it is necessary to store information about where each file is backed up. This can be done by using a data set called BACKUPINFO. Each record of BACKUPINFO contains information about one backup copy of a file. The unique node number of the last node of the file title can be used to relate the file to its backup information in a one-to-N fashion. In addition, it would be desirable to be able to obtain a list of all the files backed up on a particular serial number tape or pack. Finally, it should be easy to obtain the backup information for all files from a particular pack family or from disk (in case the entire family is lost, for example). The file structure can be represented by Figure D–6.

Figure 56. One-To-N Relationships Between Two Data Sets

One-To-N Relationships Between Two Data Sets

In addition to the data set nodes and its sets, the following data set description would be added to the archive DASDL description:

BACKUPINFO DATA SET
 (
  NODENUM        FIELD(24);
  SERIALNUM      ALPHA(6);
  DATETIME GROUP
   (
    YEAR           NUMBER(2);
    MONTH          NUMBER(2);
    DAY            NUMBER(2);
    HOUR           NUMBER(2);
    MINUTE         NUMBER(2);
    SECOND         NUMBER(4,2);
   );
  FILEKIND       ALPHA(17);
  CYCLE          NUMBER(4);
  VERSION        NUMBER(2);
  FAMILYID       ALPHA(17);
  MEDIATYPE      ALPHA(6);
  FAMILYINDEX    NUMBER(3);
  BASESERIAL     ALPHA(6);
  NEXTSERIAL     ALPHA(6);
  MEDIADATE GROUP
   (
    MEDIAYEAR      NUMBER(2);
    MEDIAMONTH     NUMBER(2);
    MEDIADAY       NUMBER(2);
   );
 );
FILEINFO SET OF BACKUPINFO KEY(NODENUM,
DATETIME DESCENDING);
SERIALSET SET OF BACKUPINFO KEY(SERIALNUM) DUPLICATES;
FAMILYSET SET OF BACKUPINFO KEY(FAMILYID) DUPLICATES;

DATETIME has been included in the FILEINFO set as a descending key item to facilitate retrieval of the backup information about a file in reverse time order.

Duplicates have been allowed for SERIALSET and FAMILYSET but not specified as first or last, because that is more efficient for STORE and DELETE operations if the number of duplicates that can occur is large. However, the use of duplicates costs one extra word for each key entry.