Core Concepts¶
This page introduces the object model and lifecycle of CraneSched Container Support. After reading, you will understand the relationship between container jobs and container steps, the roles of Pod and container metadata, and resource allocation and inheritance mechanisms.
Basic Terminology¶
| Term | Description |
|---|---|
| Container Job | A job with TaskType=Container, creates a Pod at startup, hosts container and non-container steps |
| Container Step | A step executed within a container job, carries container metadata, corresponds to a container in the Pod |
| Pod Metadata | Job-level configuration defining Pod name, namespace options, port mappings, etc. |
| Container Metadata | Step-level configuration defining image, command, environment variables, mounts, etc. |
| CRI | Container Runtime Interface, CraneSched interacts with runtimes like containerd via CRI |
| CNI | Container Network Interface, CraneSched configures container networks via CNI plugins |
Container Job¶
A Container Job is the resource allocation unit for CraneSched Container Support. After submitting a container job, the scheduler completes resource allocation, and the node creates and maintains the Pod until the job ends.
Container jobs have the following characteristics:
- Resource Hosting: Job-level requests for CPU, memory, GPU, etc.; subsequent steps run within this allocation.
- Pod Lifecycle: Pod is created when the job starts and destroyed when it ends. All container steps run within the Pod.
- Mixed Steps: Container jobs can include both container steps and non-container steps (e.g., batch scripts, interactive commands).
When creating a container job, you can use ccon to submit a container step as the Primary Step, or use cbatch --pod to submit a batch script as the Primary Step, then use ccon within the script to append container steps.
Job Type Recognition
Container jobs display type as Container. Other job types do not allow calling ccon to submit container steps.
Container Step¶
A Container Step is the execution unit within a container job, corresponding to a container in the Pod. Each container step carries independent container metadata and can specify different images, commands, and mount configurations.
Container steps are similar to interactive steps submitted by crun. If you specify multiple nodes during submission, a corresponding container step instance will be created on each node.
Container steps follow the general step types:
| Type | Step ID | Description |
|---|---|---|
| Daemon Step | 0 | Daemon step, creates Pod and runs continuously |
| Primary Step | 1 | The first step produced by the job entry |
| Common Step | ≥2 | Appended steps, can be dynamically created during job execution |
Role of Pod
- Pod is created during Daemon Step at job startup and destroyed when the job ends.
- Pod provides unified network namespace and resource isolation environment for containers, without performing any actual computation tasks.
- Users neither need to nor can directly operate Pod.
Container-Related Metadata¶
CraneSched Container Support separates configuration into two layers:
flowchart LR
subgraph Job Level
PM[Pod Metadata]
end
subgraph Step Level
CM1[Container Metadata 1]
CM2[Container Metadata 2]
end
PM --> CM1
PM --> CM2
Pod Metadata¶
Pod Metadata is job-level configuration specified when submitting a container job, defining the Pod's overall runtime environment.
| Field | Description |
|---|---|
name |
Pod name, used to generate container hostname |
namespace |
Namespace options (network, PID, IPC, etc.) |
userns |
Whether to enable user namespace (Fake Root) |
run_as_user / run_as_group |
User/Group ID to run containers as |
ports |
Port mapping configuration |
Container Metadata¶
Container Metadata is step-level configuration specified when submitting a container step, defining the container's specific runtime behavior.
| Field | Description |
|---|---|
image |
Container image and pull policy |
command / args |
Container startup command and arguments |
workdir |
Working directory inside the container |
env |
Environment variables |
mounts |
Directory mount mappings |
tty / stdin |
Terminal and stdin configuration |
detached |
Whether to run in background |
Configuration Timing¶
| Entry Point | Pod Metadata | Container Metadata |
|---|---|---|
cbatch --pod |
Specified at job submission | Not needed for Primary Step; specified when appending steps |
ccon run (new job) |
Specified at job submission | Specified at job submission |
ccon run (append step) |
Inherited from job | Specified at step submission |
Resource Model¶
Container jobs follow a "job-level allocation, step-level inheritance" resource model, consistent with non-container jobs.
Job-Level Allocation¶
When submitting a container job, request resources using these parameters:
- Node count (
-N) - CPU (
-c/--cpus-per-task) - Memory (
--mem) - GPU and other devices (
--gres) - Time limit (
-t)
The scheduler allocates resources based on partition, account, QoS, and other policies.
Step-Level Inheritance¶
When appending container steps, resource handling follows these rules:
| Scenario | Behavior |
|---|---|
| Resources not specified | Inherit job-level request |
| Resource subset specified | Use specified values, must not exceed job allocation |
| Node list specified | Must be within the job's allocated node set |
Constraints¶
- Container step resource requests must not exceed job allocation (returns
ERR_STEP_RES_BEYOND) - Node selection must be within job allocation range (returns
ERR_NO_ENOUGH_NODE) - Container steps must maintain the same user identity as the job
Lifecycle¶
The container job lifecycle includes the following phases:
stateDiagram-v2
state "Failed" as FailedStartup
state "Failed" as FailedRuntime
[*] --> Pending: Submit
Pending --> Configuring: Scheduled
Configuring --> Starting: Pod startup
Configuring --> FailedStartup: Pod startup failed
Starting --> Running: Container startup
Starting --> FailedStartup: Container startup failed
Running --> Completing: Steps finished
Completing --> Completed: Pod cleanup
Running --> FailedRuntime: Execution failed
Running --> Cancelled: User cancelled
Running --> ETL: Time limit exceeded
Completed --> [*]
FailedStartup --> [*]
FailedRuntime --> [*]
Cancelled --> [*]
ETL --> [*]
Lifecycle Phase Description:
- Pending: Job enters queue awaiting scheduling.
- Configuring: Scheduling complete, node is creating Pod and performing necessary configuration (network, mounts, namespaces, etc.).
- Starting: Pod created, container runtime is pulling image and starting container; image pull may take some time.
- Running: Resource allocation complete, container started and executing, Primary Step begins running.
- Completing: All steps finished, awaiting Pod cleanup.
- Completed / Failed / Cancelled / ExceedTimeLimit: Job terminal states.
Runtime Interaction¶
Container steps support runtime interaction operations:
| Operation | Command | Description |
|---|---|---|
| Attach | ccon attach JOBID.STEPID |
Connect to container's stdin/stdout |
| Exec | ccon exec JOBID.STEPID COMMAND |
Execute command inside container |
| Logs | ccon logs JOBID.STEPID |
View container logs |
These operations are forwarded through CraneCtld to the Craned node running the container, which then interacts with the container runtime via CRI.
Mixed Steps¶
Container jobs allow mixing different types of steps within the same job:
flowchart TD
Job[Container Job] --> Pod[Pod]
Pod --> PS[Primary Step: Batch Script]
PS --> CS1[Container Step: Training]
PS --> CS2[Container Step: Inference]
PS --> NS[Non-container Step: Data Processing]
Use cases:
- Run core computation in containers while using host environment for pre/post-processing
- Complete containerized training and bare-metal debugging within the same resource allocation
- Use scripts to orchestrate the execution order of multiple container tasks
Architecture Overview¶
Container Support is implemented through coordination between the scheduling control plane and node execution plane:
flowchart LR
subgraph User Side
CLI[ccon / cbatch]
end
subgraph Control Plane
Ctld[CraneCtld]
end
subgraph Execution Plane
Craned[Craned]
CSuper[CSupervisor]
CRI[CRI Runtime]
end
CLI -->|gRPC| Ctld
Ctld -->|Task Dispatch| Craned
Craned -->|Dispatch & Coordinate| CSuper
CSuper -->|CRI Calls / cgroup Management| CRI
| Component | Responsibility |
|---|---|
| CraneCtld | Receives submission requests, schedules resources, validates permissions and parameters |
| Craned | Node-side agent, receives task dispatch and cooperates with CSupervisor, manages node-level resources and CSupervisor creation |
| CSupervisor | Monitoring and management component running on nodes, monitors lifecycle and cgroup of each Step, communicates with CRI to execute container operations |
| CRI Runtime | Container runtime (e.g., containerd), executes container operations upon CSupervisor's invocation |