ccon - Container Job Management¶
ccon is the container job management tool for CraneSched, used to create, manage, and monitor containerized jobs. ccon's design is inspired by Docker CLI, enabling users to operate containers in a familiar way.
Prerequisites
Before using ccon, ensure the cluster has container support enabled. See Container Deployment.
Quick Start¶
Run a simple container job:
# Run alpine container in CPU partition
ccon -p CPU run alpine:latest -- echo "Hello from container"
Command Overview¶
| Command | Description |
|---|---|
run |
Create and run new container |
ps |
List containers (steps) |
pods |
List container Pods (jobs) |
stop |
Stop running container |
wait |
Wait for all container steps in current job to complete |
logs |
View container logs |
attach |
Connect to running container |
exec |
Execute command inside running container |
inspect |
Show container step details |
inspectp |
Show Pod details |
login |
Login to container image registry |
logout |
Logout from container image registry |
Global Options¶
- -C, --config=<path>
-
Configuration file path. Default:
/etc/crane/config.yaml. - --debug-level=<level>
-
Set debug output level. Available levels:
trace,debug,info. Default:info. - --json
-
Output results in JSON format.
- -h, --help
-
Display help information.
- -v, --version
-
Display version information.
run Command¶
Create and run a new container job.
Crane Options Placement
Crane options (like -p, -N, --mem) must be placed between ccon and run, not after run.
Crane Options (Resource Scheduling)¶
These options control job resource allocation and scheduling behavior:
- -N, --nodes=<num>
-
Number of nodes required. Default: 1.
- -c, --cpus-per-task=<ncpus>
-
Number of CPU cores per task. Default: 1.
- --ntasks-per-node=<ntasks>
-
Number of tasks to invoke on each node. Default: 1.
- --mem=<size>
-
Maximum real memory. Supports units: GB (G, g), MB (M, m), KB (K, k), Bytes (B). Default unit: MB.
- --gres=<list>
-
Generic resources per task. Format:
gpu:a100:1orgpu:1. - -p, --partition=<partition>
-
Requested partition.
- -A, --account=<account>
-
Account for job submission.
- -q, --qos=<qos>
-
Quality of Service (QoS) for the job.
- -t, --time=<time>
-
Time limit, format:
[day-]hours:minutes:seconds. - -w, --nodelist=<nodes>
-
Nodes to allocate to job (comma-separated list).
- -x, --exclude=<nodes>
-
Exclude specific nodes from allocation (comma-separated list).
- -r, --reservation=<name>
-
Use reserved resources.
- --exclusive
-
Request exclusive node resources.
- -H, --hold
-
Submit job in held state.
- --extra-attr=<json>
-
Extra job attributes (JSON format).
- --mail-type=<type>
-
Mail notification type. Supported:
NONE,BEGIN,END,FAIL,TIMELIMIT,ALL. - --mail-user=<email>
-
Email address for notifications.
- --comment=<string>
-
Job comment.
Run Options (Container Configuration)¶
These options configure container runtime parameters:
- --name=<name>
-
Specify container name.
- -e, --env=<KEY=VALUE>
-
Set environment variable. Can be used multiple times.
- -v, --volume=<host:container>
-
Bind mount volume. Format:
host_path:container_path. Can be used multiple times. - -p, --ports=<host:container>
-
Publish container port to host. Format:
host_port:container_port. Can be used multiple times. - -d, --detach
-
Run container in background and output container ID.
- -i, --interactive
-
Keep stdin available to container process.
- -t, --tty
-
Allocate pseudo-TTY for container.
- --entrypoint=<cmd>
-
Override image default entrypoint.
- -u, --user=<uid[:gid]>
-
Run container with specified UID. When
--userns=false, only allows current user and accessible groups. - --userns
-
Enable user namespace. Default:
true(container user mapped to root). - --network=<mode>
-
Container network mode. Supports
host(use host network) anddefault(default Pod network). - -w, --workdir=<dir>
-
Working directory inside container.
- --pull-policy=<policy>
-
Image pull policy. Supported:
Always,IfNotPresent,Never.
Examples¶
Basic container job:
Interactive container:
GPU container with resource limits:
Mount data directories:
ccon -p CPU run -v /data/input:/input -v /data/output:/output alpine:latest -- cp /input/file /output/
Multi-node container job:
Background execution:
ps Command¶
List container steps.
- -a, --all
-
Show all containers (default shows only running).
- -q, --quiet
-
Only display container IDs.
Examples¶
# List running containers
ccon ps
# List all containers (including finished)
ccon ps -a
# Only output container IDs
ccon ps -q
pods Command¶
List container Pods (container jobs).
- -a, --all
-
Show all Pods (default shows only running).
- -q, --quiet
-
Only display Pod IDs.
Examples¶
stop Command¶
Stop running container.
- -t, --timeout=<seconds>
-
Timeout for container to stop (seconds), force terminate after timeout. Default: 10.
Examples¶
# Stop container (wait 10 seconds)
ccon stop 123.1
# Stop container immediately
ccon stop -t 0 123.1
wait Command¶
Wait for all container steps in current job to complete. Usually used in cbatch scripts.
- -t, --interval=<seconds>
-
Polling interval (seconds), minimum 10 seconds.
Examples¶
logs Command¶
View container logs.
- -f, --follow
-
Follow log output continuously.
- --tail=<lines>
-
Number of lines to show from end of logs.
- -t, --timestamps
-
Show timestamps.
- --since=<time>
-
Show logs since specified time. Format:
2025-01-15T10:30:00Zor relative like42m. - --until=<time>
-
Show logs before specified time. Same format as above.
- -n, --target-node=<node>
-
Get logs from specified node.
Examples¶
# View container logs
ccon logs 123.1
# Follow log output
ccon logs -f 123.1
# View last 100 lines
ccon logs --tail 100 123.1
attach Command¶
Connect to running container's stdin, stdout, and stderr.
- --stdin
-
Connect stdin. Default:
true. - --stdout
-
Connect stdout. Default:
true. - --stderr
-
Connect stderr. Default:
false. - --tty
-
Allocate pseudo-TTY. Default:
true. - --transport=<protocol>
-
Transport protocol. Supported:
spdy,ws. Default:spdy. - -n, --target-node=<node>
-
Connect to container on specified node.
Examples¶
exec Command¶
Execute command inside running container.
- -i, --interactive
-
Keep stdin open.
- -t, --tty
-
Allocate pseudo-TTY.
- --transport=<protocol>
-
Transport protocol. Supported:
spdy,ws. Default:spdy. - -n, --target-node=<node>
-
Execute command on specified node.
Examples¶
# Execute command in container
ccon exec 123.1 ls -la
# Interactive shell
ccon exec -it 123.1 /bin/bash
inspect Command¶
Show container step details.
Examples¶
inspectp Command¶
Show Pod details.
Examples¶
login Command¶
Login to container image registry.
- -u, --username=<user>
-
Username.
- -p, --password=<pass>
-
Password.
- --password-stdin
-
Read password from stdin.
Examples¶
# Interactive login
ccon login registry.example.com
# Login with username and password
ccon login -u myuser -p mypass registry.example.com
# Read password from stdin
echo $TOKEN | ccon login -u myuser --password-stdin registry.example.com
logout Command¶
Logout from container image registry.
Examples¶
Container ID Format¶
ccon uses JOBID.STEPID format to identify containers:
- JOBID: Job number assigned by scheduling system
- STEPID: Step number, increments from 0
Examples:
- 123.0: First container step of job 123
- 123.1: Second container step of job 123
Pods (container jobs) are identified by pure JOBID, e.g., 123.
Using with cbatch¶
Create Pod Job¶
Use cbatch --pod to create a container-capable job, then use ccon run in the script to start containers:
#!/bin/bash
#CBATCH --pod
#CBATCH -N 2
#CBATCH -c 4
#CBATCH --mem 8G
#CBATCH -p GPU
# Start container within job
ccon run -d pytorch/pytorch:latest -- python train.py
# Wait for containers to complete
ccon wait
Submit job:
cbatch Pod Options¶
| Option | Description |
|---|---|
--pod |
Enable container mode, create Pod job |
--pod-name |
Pod name (defaults to job name) |
--pod-port |
Pod port mapping, format: HOST:CONTAINER or PORT |
--pod-user |
Run Pod as specified UID[:GID] |
--pod-userns |
Enable Pod user namespace (default: true) |
--pod-host-network |
Use host network namespace |
See Also¶
- Container Support Overview - Introduction to container features
- Core Concepts - Explanation of Pod, container steps, and other concepts
- Quick Start - Quick experience with container features
- Examples - Typical usage scenarios
- Troubleshooting - Common issues and solutions
- cbatch - Batch job submission
- cqueue - View job queue