Skip to content
Snippets Groups Projects
capacity-computing.md 1.29 KiB
Newer Older
  • Learn to ignore specific revisions
  • Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    # Capacity Computing
    
    ## Introduction
    
    
    In many cases, it is useful to submit a huge number of computational jobs into the Slurm queue system.
    
    A huge number of (small) jobs is one of the most effective ways to execute embarrassingly parallel calculations,
    
    achieving the best runtime, throughput, and computer utilization. This is called **Capacity Computing**
    
    
    However, executing a huge number of jobs via the Slurm queue may strain the system. This strain may
    result in slow response to commands, inefficient scheduling, and overall degradation of performance
    
    and user experience for all users.  
    We **recommend** using [**Job arrays**][1] or [**HyperQueue**][2] to execute many jobs.
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    There are two primary scenarios:
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    1. Number of jobs < 1500, **and** the jobs are able to utilize one or more **full** nodes:  
    
        Use [**Job arrays**][1].  
    
        The Job array allows to submit and control up to 1500 jobs (tasks) in one packet. Several job arrays may be submitted.
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    2. Number of jobs >> 1500, **or** the jobs only utilize a **few cores/accelerators** each:  
    
        Use [**HyperQueue**][2].  
    
        HyperQueue can help efficiently load balance a very large number of jobs (tasks) amongst available computing nodes.
    
        HyperQueue may be also used if you have dependencies among the jobs.
    
    Lukáš Krupčík's avatar
    Lukáš Krupčík committed
    
    
    
    [1]: job-arrays.md
    [2]: hyperqueue.md