Skip to content

CraneSched vs. Slurm Feature Comparison

CraneSched is China's first open-source, domestically-developed compute scheduling system supporting both HPC and AI workloads. It comprehensively benchmarks against Slurm — the leading international scheduler — and surpasses it in performance, container support, and domestic hardware compatibility.


Performance Comparison

CraneSched significantly outperforms Slurm and OpenPBS in scheduling throughput. Measured results:

Average Jobs Scheduled Per Minute

Scheduler Avg. Jobs/Min Relative to CraneSched
CraneSched 105,538 1x
OpenPBS 11,136 9.4x slower
Slurm 4,259 24.7x slower

Peak Jobs Scheduled Per Minute

Scheduler Peak Jobs/Min Relative to CraneSched
CraneSched 122,427 1x
OpenPBS 20,541 6x slower
Slurm 4,551 26.9x slower

Key Performance Metrics

Metric CraneSched
Scheduling throughput 5–20x faster than Slurm
Cluster scale Supports 100,000+ nodes
Job throughput Real-time scheduling of 10,000+/sec; hourly throughput exceeds 38 million
Concurrency 2,000,000+ concurrent jobs
Response latency Millisecond-level low latency

Scheduling Feature Comparison

Feature CraneSched Slurm Description
Basic Scheduling
Backfill Scheduling Run short jobs in idle time windows to improve utilization
Fair-Share Scheduling Fair scheduling policy based on historical usage
Priority Scheduling Multi-factor priority calculation
FIFO Scheduling Basic first-in, first-out scheduling
Resource Management
Preemption High-priority jobs preempt resources from lower-priority ones
Reservation Reserve resource time windows for specific users or jobs
TRES Fine-Grained Tracking Trackable resource types (CPU, memory, GPU, etc.)
QOS Management Differentiated service level control
Resource Escape Protection Prevent jobs from exceeding allocated resources
Job Management
Job Dependencies Control dependency relationships between jobs
Job Arrays Batch submission of parameterized jobs
Job Steps Multi-step management within a job
Interactive Jobs Real-time interactive computing
Energy Saving & Efficiency
Power Saving Scheduling Automatically shut down idle nodes under low load
AI Job Runtime Prediction (ORA) LLM-based job runtime prediction; 41% accuracy improvement
Smart Fair-Share (TSMF) In-house two-stage multi-factor algorithm; utilization improved to 97.3%
Automated Power Saving (EcoSched) Automated power control; 78.64% energy reduction under low load
Account & Permissions
Hierarchical Account Management Tree-structured user/account management
RBAC Access Control Role-based access control
High Availability
Automatic Fault Recovery Automatic recovery after control node failure
Distributed Fault Tolerance No single point of failure

Container Support Comparison

CraneSched and Slurm take different technical approaches to container support:

Dimension CraneSched Slurm
Technical approach Based on K8s-underlying CRI RPC interface (the de facto cloud-native standard) OCI model compatibility via CLI
Container runtime containerd / CRI-O (via CRI interface) Singularity / Enroot (via CLI)
Image management Auto-pull; no manual handling needed User must download and convert image formats
Dedicated CLI ccon command, Docker CLI-inspired design, easy to learn Native Slurm commands (sbatch/srun), different from Docker usage
Network isolation CNI-based multi-tenant network isolation (Calico Underlay) Minimal support
Filesystem isolation Full user/network/mount namespace isolation Limited
Fake Root User Namespace-based, root experience inside container Relies on Singularity's fakeroot
RDMA support Supports SR-IOV shared RNIC and direct passthrough Limited
Operations tools Mature tools: crictl/nerdctl/ctr, etc. Relies on community tools

Unique Advantages of CraneSched Containers

  • Ease of use: No manual image pulling; dedicated CLI (ccon) designed for Docker users with no Slurm experience
  • Complete network isolation: CNI support allows admins to implement various container networking strategies including multi-tenant isolation
  • RDMA network support: Supports mid-to-large-scale RoCE networks (SR-IOV) and large-scale AI training clusters (Spine-Leaf architecture)
  • Pod/Job concept mapping: Maps K8s Pod/Container concepts to Job/Step, enabling imperative orchestration

Command Compatibility

CraneSched provides an in-house Slurm & LSF Wrapper with full compatibility for Slurm and LSF command-line syntax:

Slurm Command CraneSched Native Function
sbatch cbatch Submit batch jobs
squeue cqueue View job queue
srun crun Run interactive jobs
salloc calloc Allocate interactive resources
sinfo cinfo View cluster information
sacct cacct View job history
sacctmgr cacctmgr Account management
scancel ccancel Cancel jobs
scontrol ccontrol System control

Zero migration cost: Via the Slurm Wrapper, users can switch from Slurm to CraneSched transparently without modifying any scripts or workflows. Peking University's Weiming Teaching Cluster No.2 has successfully completed a transparent migration from Slurm to CraneSched, supporting hundreds of user software packages.


Heterogeneous Hardware Support

CraneSched fully supports mainstream domestic and international hardware platforms:

Architecture Support

Architecture Support
X86
ARM
RISC-V

CPU Compatibility

Category Supported Brands
International Intel, AMD
Domestic Phytium, Hygon, Huawei Kunpeng

Accelerator Compatibility

Category Supported Brands
International Nvidia GPU, AMD GPU
Domestic Huawei Ascend, Hygon DCU, Cambricon MLU, Iluvatar CoreX, Kunlunxin, Metax, Moore Threads

Operating System Compatibility

Category Supported Systems
International CentOS, Ubuntu, Rocky Linux
Domestic OpenEuler, KylinOS

CraneSched has received product compatibility certifications from multiple vendors including Inspur, Phytium, Hygon, and Kunlunxin.


Summary

Dimension CraneSched Advantage
Performance 5–20x faster than Slurm
Features Full Slurm feature coverage plus AI prediction, intelligent power saving, and more
Containers Native CRI/CNI support, multi-tenant network isolation, RDMA support
Compatibility Fully compatible with Slurm/LSF commands, zero migration cost
Domestic hardware Full support for domestic CPUs, GPUs/NPUs, and operating systems
Convergence HPC + AI integration; full Storage·Compute·Usage convergence