Understanding Job Restart Failure

Any of the following circumstances can prevent a WFL job from restarting after a halt/load:

  • The system switches to using a different job description file after the halt/load. The operator can use system commands to cause the switch to a different job description file. For further information, refer to the discussion of the job description file in the System Operations Guide.

  • The operator physically transfers the pack containing the job description file to an incompatible type of system and attempts to make it the new job description file for that system.

    • If the operator uses the DL JOBS ON <family> command to mark the pack as the location of the next job description file, then after the next halt/load, the system attempts to restart the jobs from the specified job description file. The jobs should restart successfully, provided that the pack was transferred to a compatible type of system.

    • If you transfer a job description file between incompatible systems, then the restart of each job fails with the error INCOMPATIBLE SYSTEM TYPE. Additionally transferring the job description file between incompatible systems can cause the system to halt/load again.

  • The operating system option AUTORECOVERY is reset. The operator can reset this option using the OP (Options) system command. Resetting AUTORECOVERY causes the mix limit for each job queue to be set to zero after a halt/load. Any job that would have restarted will instead remain in a job queue until the operator uses the MQ (Make or Modify Queue) system command to assign a new mix limit to the job queue.

    Resetting AUTORECOVERY also prevents automatic halt/loads in some situations. For details, refer to the System Commands Reference.

  • An operator changes the job queue definitions after the job is initiated, but before the halt/load. For example, the job attribute list of a job might set CLASS = 10 and MAXPROCTIME = 60. The definition of job queue 10 might include a PROCESSTIME limit of 120. The job is submitted through job queue 10 originally. While the job is executing, an operator might use the MQ (Make or Modify Queue) system command to lower the PROCESSTIME limit for that job queue to 30. Then a halt/load might occur. After the halt/load, the job cannot restart because its MAXPROCTIME value is greater than the PROCESSTIME limit that is now defined for job queue 10. The job terminates abnormally with a queue violation.

  • A task of the job executed a checkpoint and then was terminated by the halt/load. In this case, the job is suspended after the halt/load and appears in the W (Waiting Entries) system command display. For information about operator responses to this situation, refer to Restarting a Checkpointed Task later in this section.