Available Partitions

Slurm divides resources into partitions, sometimes called queues. Each partition targets specific hardware or workloads.

DSAI Partition Summary

Partition

# Nodes

CPU cores / node

Memory / core (MB)

GPUs / node

Time limit (hh:mm:ss)

Key features

cpu

80

108

4 000

— (N/A)

72:00:00

Intel Xeon Platinum 8480+ (56-core) dual-socket nodes

interactive

4

88

10 000

8 × NVIDIA A100 80 GB

72:00:00

AMD EPYC 7443 (24-core) + A100 GPUs

l40s

8

124

6 000

8 × NVIDIA L40S 48 GB

72:00:00

AMD EPYC 9534 (64-core) + high-mem L40S GPUs

a100

11

88

10 000

8 × NVIDIA A100 80 GB

72:00:00

AMD EPYC 7443 (24-core) + A100 GPUs

h100

16

124

12 000

4 × NVIDIA H100 80 GB

72:00:00

AMD EPYC 9534 (64-core) + H100 GPUs

nvl

16

124

12 000

4 × NVIDIA H100-NVL 96 GB

72:00:00

AMD EPYC 9534 (64-core) + H100-NVL GPUs

Partition Descriptions

cpu

  • No GPUs – ideal for CPU only jobs.

interactive

  • Interactive, short, hands-on debugging or exploratory runs (not for long production jobs).

  • Up to 1 node per job; MaxTime = 3 days (72:00:00).

  • Runs on A100 nodes (c012–c015), same chassis as the a100 partition.

l40s

  • 8 × L40 S 48 GB per node.

a100

  • 8 × A100 80 GB per node.

h100

  • 4 × H100 80 GB per node.

  • Connected via Mellanox NDR and may give good performance for parallel GPU jobs.

nvl

  • 4 × H100-NVL 96 GB per node.

  • Connected via Mellanox NDR and may give good performance for parallel GPU jobs.

GPU core-billing ratios

Partition

Billed CPU cores per GPU

l40s

14

a100

10

h100 / nvl

30

Only request the GPUs you truly need—extra GPUs multiply your billed core-hours and may increase queue time.

Viewing Partition Configuration

You can view details about any partition with the scontrol command. This is helpful to check limits, available nodes, default memory settings, and which QoS values are allowed or denied.

  • Use scontrol show partition without any arguments to see all partitions.

  • To find which QoS values are allowed or blocked in a partition, look at QoS= and DenyQos=.

Example:

scontrol show partition=h100

Sample Output:

PartitionName=h100
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=3 MaxTime=3-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=128 MaxCPUsPerSocket=UNLIMITED
   Nodes=h[01-16]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=YES:4
   OverTimeLimit=NONE PreemptMode=REQUEUE
   State=UP TotalCPUs=2048 TotalNodes=16 SelectTypeParameters=NONE
   JobDefaults=DefCpuPerGPU=32
   DefMemPerCPU=12000 MaxMemPerCPU=12000
   TRES=cpu=1984,mem=24752000M,node=16,billing=24320,gres/gpu=64,gres/gpu:h100=64
   TRESBillingWeights=CPU=10,Mem=0.83G,GRES/gpu=380

Key Fields to Note

  • MaxTime: The maximum wall-clock time allowed for jobs in this partition.

  • DefMemPerCPU: The default memory available per core (can be overridden with –mem or –mem-per-cpu).

  • Nodes: The physical nodes available for this partition.

  • OverSubscribe: Indicates if jobs can share nodes.

  • DenyQos: QOS values that are explicitly blocked from this partition.

  • TRES: Total Resources (CPUs, memory, nodes) assigned to this partition.

Helpful Tips

  • You can view the current load on each partition with:

    [root@dsailogin ~]$ sinfo -s
    PARTITION AVAIL  TIMELIMIT   NODES(A/I/O/T) NODELIST
    l40s*        up 3-00:00:00          7/1/0/8 l[01-08]
    a100         up 3-00:00:00        14/0/1/15 c[001-015]
    nvl          up 3-00:00:00        14/2/0/16 n[01-16]
    h100         up 3-00:00:00        16/0/0/16 h[01-16]
    cpu          up 3-00:00:00       2/62/16/80 cpu[001-080]
    Secondary    up 3-00:00:00          3/1/0/4 c015,h16,l08,n16
    

    This provides a summary view of each partition’s usage and availability.

  • To see the list of available partitions and their state:

    sinfo -o "%P %.5D %.10t %.10l %.6c %.10m"
    

    This will output:

    • Partition name

    • Node count

    • State (idle/alloc/mix)

    • Max time

    • CPUs per node

    • Memory

Partition Best Practices

  • Use --partition= to explicitly request a partition in your batch script.

  • Avoid defaulting to GPU partitions unless required — this helps ensure fair usage.

  • Read memory policies carefully (e.g., shared nodes have 4 GB/core).

  • Always pair GPU partitions with the appropriate QOS and allocation account.