Cluster Configuration¶

This guide explains how to configure CraneSched through the /etc/crane/config.yaml file to set up your cluster topology, partitions, and scheduling policies.

Info

The configuration file must be identical on all nodes (control and compute nodes). Any changes require restarting the affected services.

Quick Start Example¶

A minimal configuration for a 4-node cluster:

# Cluster identification
ControlMachine: crane01
ClusterName: my_cluster

# Database configuration
DbConfigPath: /etc/crane/database.yaml

# Node definitions
Nodes:
  - name: "crane[01-04]"
    cpu: 4
    memory: 8G

# Partition definitions
Partitions:
  - name: compute
    nodes: "crane[01-04]"
    priority: 5

DefaultPartition: compute

Prerequisite: Crane System User¶

Control node only: The crane user is required when starting with systemd (automatic with packages):

sudo groupadd --system crane 2>/dev/null || true
sudo useradd --system --gid crane --shell /usr/sbin/nologin --create-home crane 2>/dev/null || true

Note

Compute nodes run craned as root, no crane user needed
Running binaries directly uses the current user, no crane user needed
When using centralized user management, ensure the crane user has write access to the CraneBaseDir directory

Essential Configuration¶

Cluster Settings¶

Define basic cluster information:

# Hostname of the node running cranectld (control node)
ControlMachine: crane01

# Name of this cluster
ClusterName: my_cluster

# Path to database configuration file
DbConfigPath: /etc/crane/database.yaml

# Base directory for CraneSched data and logs
CraneBaseDir: /var/crane/

ControlMachine: Must be the actual hostname of your control node
ClusterName: Used for identification in multi-cluster environments
CraneBaseDir: All relative paths are based on this directory

Node Definitions¶

Specify compute node resources:

Nodes:
  # Node range notation
  - name: "crane[01-04]"
    cpu: 4
    memory: 8G

  # Individual nodes
  - name: "crane05"
    cpu: 8
    memory: 16G

  # Nodes with GPUs
  - name: "crane[06-07]"
    cpu: 8
    memory: 32G
    gres:
      - name: gpu
        type: a100
        DeviceFileRegex: /dev/nvidia[0-3]

Node Parameters:

name: Hostname or range (e.g., node[01-10])
cpu: Number of CPU cores
memory: Total memory (supports K, M, G, T suffixes)
gres: Generic resources like GPUs (optional)

Node Range Notation:

crane[01-04] expands to: crane01, crane02, crane03, crane04
cn[1-3,5] expands to: cn1, cn2, cn3, cn5

Partition Configuration¶

Organize nodes into partitions:

Partitions:
  # CPU partition
  - name: CPU
    nodes: "crane[01-04]"
    priority: 5

  # GPU partition  
  - name: GPU
    nodes: "crane[05-08]"
    priority: 3
    DefaultMemPerCpu: 4096  # 4GB per CPU (in MB)
    MaxMemPerCpu: 8192      # 8GB maximum per CPU

# Default partition for job submission
DefaultPartition: CPU

Partition Parameters:

name: Partition identifier
nodes: Node range belonging to this partition
priority: Higher values = higher priority (affects scheduling)
DefaultMemPerCpu: Default memory per CPU in MB (0 = let scheduler decide)
MaxMemPerCpu: Maximum memory per CPU in MB (0 = no limit)

Scheduling Policy¶

Configure job scheduling behavior:

# Scheduling algorithm
# Options: priority/basic, priority/multifactor
PriorityType: priority/multifactor

# Favor smaller jobs in scheduling
PriorityFavorSmall: true

# Maximum age for priority calculation (days-hours format)
PriorityMaxAge: 14-0

# Priority factor weights
PriorityWeightAge: 500           # Job wait time weight
PriorityWeightFairShare: 10000   # Fair share weight
PriorityWeightJobSize: 0         # Job size weight (0=disabled)
PriorityWeightPartition: 1000    # Partition priority weight
PriorityWeightQoS: 1000000       # QoS priority weight

Network Settings¶

Control Node (cranectld)¶

# Listening address and ports for cranectld
CraneCtldListenAddr: 0.0.0.0
CraneCtldListenPort: 10011
CraneCtldForInternalListenPort: 10013

Compute Nodes (craned)¶

# Listening address and port for craned
CranedListenAddr: 0.0.0.0
CranedListenPort: 10010

# Health check settings
Craned:
  PingInterval: 15        # Ping cranectld every 15 seconds
  CraneCtldTimeout: 5     # Timeout for cranectld connection

Advanced Options¶

TLS Encryption¶

Enable encrypted communication between nodes:

TLS:
  Enabled: true
  InternalCertFilePath: /etc/crane/tls/internal.pem
  InternalKeyFilePath: /etc/crane/tls/internal.key
  ExternalCertFilePath: /etc/crane/tls/external.pem
  ExternalKeyFilePath: /etc/crane/tls/external.key
  CaFilePath: /etc/crane/tls/ca.pem
  AllowedNodes: "crane[01-10]"
  DomainSuffix: crane.local

Gres Configuration¶

Device resource related configuration

Define generic resources like GPUs, NPUs, and other accelerators:

Nodes:
  - name: "gpu[01-02]"
    cpu: 16
    memory: 64G
    gres:
      - name: gpu
        type: a100
        # Regex matching device files
        DeviceFileRegex: /dev/nvidia[0-3]
        # Additional device files per GPU
        DeviceFileList:
          - /dev/dri/renderer[0-3]
        # Environment injector for runtime
        EnvInjector: nvidia

Gres Parameters:

name: Generally the resource type such as: GPU, NPU, etc.
type: Generally the resource model such as: A100, 3090, etc.
DeviceFileRegex: The device files under the /dev directory corresponding to the resource, suitable for resources where one physical device corresponds to one device file, each file corresponds to one Gres resource in the system, supports Regex format. Common device corresponding device files. Such as Nvidia, AMD, Hygon DCU, Ascend, etc.
DeviceFileList: Suitable for Gres resources where one physical device corresponds to multiple device files under the /dev directory, each group of files corresponds to one Gres resource in the system, supports Regex format.

Choose one between DeviceFileRegex and DeviceFileList, the above device files must exist, otherwise Craned will report an error and exit during startup

EnvInjector: Environment variables that the device needs to inject
- Optional values: corresponding environment variables
- nvidia: CUDA_VISIBLE_DEVICES
- hip: HIP_VISIBLE_DEVICES
- ascend: ASCEND_RT_VISIBLE_DEVICES
Common vendor device file paths and related configurations

Vendor	Device File Path	EnvInjector
Nvidia	/dev/nvidia0 ...	nvidia
AMD/Hygon DCU	/dev/dri/renderer128...	hip
Ascend	/dev/davinci0 ...	ascend
Iluvatar	/dev/iluvatar0 ...	nvidia

Queue Limits¶

Control job queue size and scheduling behavior:

# Maximum pending jobs in queue (max: 900000)
PendingQueueMaxSize: 900000

# Jobs to schedule per cycle (max: 200000)
ScheduledBatchSize: 100000

# Reject jobs when queue is full
RejectJobsBeyondCapacity: false

Partition Access Control¶

Restrict partition access by account:

Partitions:
  - name: restricted
    nodes: "special[01-04]"
    priority: 10
    # Only these accounts can use this partition
    AllowedAccounts: project1,project2

  - name: public
    nodes: "compute[01-20]"
    priority: 5
    # All accounts except these can use this partition
    DeniedAccounts: banned_account

Warning

AllowedAccounts and DeniedAccounts are mutually exclusive. If AllowedAccounts is set, DeniedAccounts is ignored.

Logging and Debugging¶

Configure log levels and locations:

# Log levels: trace, debug, info, warn, error
CraneCtldDebugLevel: info
CranedDebugLevel: info

# Log file paths (relative to CraneBaseDir)
CraneCtldLogFile: cranectld/cranectld.log
CranedLogFile: craned/craned.log

# Run in foreground (useful for debugging)
CraneCtldForeground: false
CranedForeground: false

CraneCtld:
  # Maximum log file size for cranectld
  MaxLogFileSize: 50M
  # Maximum log file number for cranectld
  MaxLogFileNum: 3  

Craned:
  # Maximum log file size for craned
  MaxLogFileSize: 50M
  # Maximum log file number for craned
  MaxLogFileNum: 3  

Supervisor:
  # Maximum log file size for supervisor
  MaxLogFileSize: 50M
  # Maximum log file number for supervisor
  MaxLogFileNum: 3

Supervisor Configuration¶

Supervisor is CraneSched's job execution management component, responsible for controlling job steps on compute nodes.

Supervisor:
  # Path to supervisor executable
  Path: /usr/libexec/csupervisor

  # Supervisor log level: trace, debug, info, warn, error
  DebugLevel: trace

  # Log directory (relative to CraneBaseDir)
  LogDir: supervisor

Supervisor Parameters:

Path: Full path to the supervisor executable. The default path is /usr/libexec/csupervisor, which is typically set correctly during installation.
DebugLevel: Controls the verbosity of supervisor logs. Available values include trace (most verbose), debug, info, warn, error (least verbose). For production environments, info or warn is recommended.
LogDir: Directory for supervisor log files, relative to the CraneBaseDir setting. Log files are helpful for diagnosing job execution issues.

Tip

When troubleshooting job execution problems, you can temporarily set DebugLevel to debug or trace for more detailed log information.

Container Support¶

CraneSched supports running jobs in containers through CRI (Container Runtime Interface):

Container:
  # Enable container support (experimental)
  Enabled: false

  # Temporary directory for container data (relative to CraneBaseDir)
  TempDir: supervisor/containers/

  # Path to container runtime socket
  RuntimeEndpoint: /run/containerd/containerd.sock

  # Path to image service socket (usually same as RuntimeEndpoint)
  ImageEndpoint: /run/containerd/containerd.sock

  # DNS configuration (optional)
  Dns:
    ClusterDomain: "cluster.local"
    Servers:
      - "127.0.0.1"
    Searches: []
    Options: []

For more container-related configuration, see the Container runtime configuration guide.

Craned Health Check Configuration¶

HealthCheck:
  # The absolute path to the health check script to be executed
  Program: /path/to/your/script
  # The interval between two periodic health checks, in seconds
  Interval: 60
  # Precisely controls under which node states the HealthCheckProgram is triggered
  NodeState: ANY
  # Indicates whether to randomly stagger the start time of health checks within the interval, rather than starting all checks simultaneously at each interval
  Cycle: false

NodeState options:

IDLE: Only execute the health check when no jobs are assigned to the node. This is the safest and most conservative option, as it completely avoids any performance impact on running applications.
ALLOC: Only execute the health check when the node is fully allocated. This provides broader coverage but may interfere with job performance.
MIXED: Execute the health check when the node is partially allocated.
ANY: Execute the health check regardless of the node's allocation state. This offers the widest monitoring coverage but also poses the highest risk of interfering with jobs.
NONDRAINED_IDLE: Execute the health check only when the node has no jobs assigned and its state is not DRAIN.
START_ONLY: Only execute the health check once when Craned starts.

Applying Changes¶

After modifying the configuration:

Distribute to all nodes:

# Using pdsh
pdcp -w crane[01-04] /etc/crane/config.yaml /etc/crane/

Restart services:

# Control node
systemctl restart cranectld

# Compute nodes
pdsh -w crane[01-04] systemctl restart craned

Verify changes:

cinfo  # Check node and partition status

Troubleshooting¶

Nodes not appearing: Check ControlMachine hostname matches actual control node hostname.

Configuration mismatch warnings: Ensure /etc/crane/config.yaml is identical on all nodes.

Jobs not scheduling: Verify partition configuration and node membership.

Resource limits: Check that requested resources don't exceed node definitions.