Skip to content

CraneSched

A distributed scheduling system for HPC and AI workloads — built for performance, scale, and simplicity.

Get started Try the demo GitHub


Why CraneSched?

  • Performance


    Over 100k scheduling decisions per second with fast job–resource matching.

  • Scalability


    Proven design for million-core clusters and large-scale deployments.

  • Usability


    Clean, consistent CLI for users and admins (cbatch, cqueue, crun, calloc, cinfo…).

  • Security


    RBAC and encrypted communication out of the box.

  • Resilience


    Automatic job recovery, no single point of failure, fast state restoration.

  • Open Source


    Community-driven and extensible with a pluggable architecture.


Quick Start

  • Deploy Backend (Rocky Linux 9)


    Recommended for production.

    Open guide →

  • Configure Cluster


    Database, partitions, nodes, and policies.

    DatabaseConfig

  • Deploy Frontend


    User tools and services (CLI, cfored, cplugind).

    Open guide →

  • Run Your First Job


    Batch: cbatch • Interactive: crun, calloc


Architecture

CraneSched architecture

CraneSched introduces a Resource Manager to support both HPC and AI workloads:

  • HPC jobs: the Cgroup Manager allocates resources and provides cgroup-based isolation.
  • AI jobs: the Container Manager uses Kubernetes for resource allocation and container lifecycle management.

CLI Reference



License

CraneSched is dual-licensed under AGPLv3 and a commercial license. See LICENSE or contact mayinping@pku.edu.cn for commercial licensing.