The advanced job management training focuses on array job, job dependencies, checkpoint and restart, file stage-in/out and troubleshooting job submission issues. This will help users to more efficiently run their jobs by best utilizing the hardware.
Speaker: Manjunath Doddam (Altair)
- Introduction
- Job management and project info in brief
- Job exit codes
- MPI jobs in batch mode
- “mpirpocs” parameter
- MPI tight integration
- Multithreaded jobs and OMP_NUM_THREADS in batch mode
- Details on memory enforcement
- Job Arrays
- Job dependencies
- PBS Reservations
- Using Check pointing
- File stage-in/out
- Using IME
- Troubleshooting
- Lab Session
- Using Compute Manager
- Using Display Manager
- A valid user account on NSCC system, ASPIRE1
- Pre-installed SSH client like Putty or Moba-Xterm to connect to ASPIRE1 on user’s laptop
- Basic understanding of Linux commands.
- File management
- “vi” editor
- Using “modules” in Linux
- Process management
- Basic PBS Pro job management
At the end of this course one will have fair understanding about advanced job management such as array jobs, reservations, Job dependencies, file stage-in/out, IME and Compute, Display Manager.