CraneSched¶
A distributed scheduling system for HPC and AI workloads — built for performance, scale, and simplicity.
Get started Try the demo GitHub
Why CraneSched?¶
-
Performance
Over 100k scheduling decisions per second with fast job–resource matching.
-
Scalability
Proven design for million-core clusters and large-scale deployments.
-
Usability
Clean, consistent CLI for users and admins (cbatch, cqueue, crun, calloc, cinfo…).
-
Security
RBAC and encrypted communication out of the box.
-
Resilience
Automatic job recovery, no single point of failure, fast state restoration.
-
Open Source
Community-driven and extensible with a pluggable architecture.
Quick Start¶
-
Deploy Backend (Rocky Linux 9)
Recommended for production.
-
Configure Cluster
Database, partitions, nodes, and policies.
-
Deploy Frontend
User tools and services (CLI, cfored, cplugind).
-
Run Your First Job
Architecture¶

CraneSched introduces a Resource Manager to support both HPC and AI workloads:
- HPC jobs: the Cgroup Manager allocates resources and provides cgroup-based isolation.
- AI jobs: the Container Manager uses Kubernetes for resource allocation and container lifecycle management.
CLI Reference¶
- User commands: cbatch, cqueue, crun, calloc, cinfo
- Admin commands: cacct, cacctmgr, ceff, ccontrol, ccancel
- Exit codes: reference
Links¶
- Demo cluster: https://hpc.pku.edu.cn/demo/cranesched
- Backend: https://github.com/PKUHPC/CraneSched
- Frontend: https://github.com/PKUHPC/CraneSched-FrontEnd
License¶
CraneSched is dual-licensed under AGPLv3 and a commercial license. See LICENSE or contact mayinping@pku.edu.cn for commercial licensing.