CraneSched Container Support¶
CraneSched Container Support provides a containerized runtime environment for cluster users alongside traditional process-based jobs. Using a "Container Job + Container Step" model, it allows resource reuse within a single job, submission of container steps, and collaborative execution with traditional batch/interactive steps.
Positioning
CraneSched is not based on Kubernetes but is an independently implemented HPC + AI hybrid scheduling system. CraneSched Container Support focuses on "batch and compute job scheduling," distinct from Kubernetes's "service-oriented orchestration and autonomy."
Features¶
-
Containerized Execution
Container images and commands are unified, ensuring consistent and reproducible environments.
-
Unified Scheduling
Leverages existing partition, account, QoS, and reservation scheduling policies.
-
In-Job Reuse
After a container job starts its Pod, multiple container steps can be appended.
-
Mixed Steps
Batch scripts, container steps, and interactive steps can coexist within the same job.
-
Runtime Interaction
Debug and troubleshoot container steps using Attach/Exec.
-
Security Isolation
UID Mapping/Idmapped Mount provides secure Fake Root experience for regular users.
Basic Concepts¶
- Container Job: Allocates resources, creates and maintains a Pod to host subsequent container steps. Container jobs can also include non-container steps (such as batch scripts or interactive steps).
- Container Step: The actual execution unit appended within a container job, corresponding to at least one container in the Pod. Each container step can specify independent image, command, environment variables, and mount configurations.
- Pod Metadata: Job-level container configuration (e.g., DNS, ports) created when submitting a container job, defining the Pod's overall properties.
- Container Metadata: Step-level container configuration (e.g., command, environment variables, mounts) specified when submitting a container step, defining the container's specific behavior.
Entry Points¶
Container Support can be accessed via the ccon command and cbatch command. For detailed usage, please refer to the ccon Command Manual and cbatch Command Manual.
Creates a container job using a batch script as the entry point. When the job starts, a Pod is automatically launched on allocated nodes. The script runs as the Primary Step, allowing container steps to be appended for complex container orchestration, and can be mixed with non-container steps, providing a Slurm-like batch script experience.
Creates a container job with a container step as the Primary Step, suitable for simple jobs containing only container steps, providing a Docker/Kubernetes-like command-line experience.