Container Feature Deployment¶
This guide explains how to enable and configure the container feature in CraneSched clusters, allowing users to run containerized jobs through the CRI (Container Runtime Interface).
Environment Preparation¶
Container Runtime¶
CraneSched interacts with container runtimes via the CRI interface, supporting CRI-compatible runtimes like containerd and CRI-O.
Install Container Runtime¶
We recommend referring to the containerd official installation guide for the latest steps.
Some Linux distributions can install containerd via package managers, but the versions may be older. The following commands are for reference only:
We recommend referring to the CRI-O official installation guide for the latest steps.
Some Linux distributions can install CRI-O via package managers, but the versions may be older. The following commands are for reference only:
Enable CRI Support¶
Container runtimes expose their interface through UNIX sockets. Most runtimes enable the CRI service by default:
| Runtime | Default Socket Path |
|---|---|
| containerd | /run/containerd/containerd.sock |
| CRI-O | /run/crio/crio.sock |
Use the crictl tool to check the CRI interface status:
Expected output (containerd example):
If errors occur, enable CRI support according to the runtime configuration documentation.
Enable Remote Connections¶
CraneSched allows users to connect to their running container jobs from any node in the cluster. Therefore, configure the container runtime to allow remote connections from other nodes in the cluster.
Warning
Enabling remote connections may introduce security risks. Use firewalls and TLS certificates to restrict access.
Generate the default containerd configuration (if none exists):
Find the following section in /etc/containerd/config.toml:
[plugins.'io.containerd.grpc.v1.cri']
disable_tcp_service = false # Enable TCP service
stream_server_address = '0.0.0.0' # Change to an address reachable within the cluster
enable_tls_streaming = false # Enable TLS if needed
# Configure TLS certificate paths as needed
[plugins.'io.containerd.grpc.v1.cri'.x509_key_pair_streaming]
tls_cert_file = ''
tls_key_file = ''
After configuration, restart the containerd service.
Refer to the containerd configuration guide for more details.
This section is TBD.
Enable GPU Support¶
Note
Enable this only when GPU/NPU resources are configured on cluster nodes. This feature depends on the container runtime and vendor plugins.
Refer to the NVIDIA Container Toolkit installation guide to install the NVIDIA Container Toolkit.
After installation, follow the "Configuring containerd" or "Configuring CRI-O" sections in that document to enable GPU support.
Refer to MindCluster - Containerized Support Feature Guide to install the Ascend NPU container runtime plugin.
After installation, follow the Using in Containerd Client tutorial to enable NPU support.
Refer to the AMD Container Toolkit installation guide to install AMD Container Toolkit.
Note: CraneSched container support for AMD GPU is still under testing and evaluation.
Container Network Plugins¶
CraneSched uses the Container Network Interface (CNI) to provide container network isolation and communication. Install a suitable CNI plugin based on the container runtime you choose.
CraneSched provides the Crane Meta CNI plugin to flexibly integrate existing CNI plugins (such as Calico). The CNI plugins that have passed compatibility tests currently include Calico.
Crane Meta CNI¶
In the CraneSched-FrontEnd root directory, run make tool to build the Meta CNI plugin. The build output is located at build/tool/meta-cni.
The Meta CNI plugin must be configured before use.
Warning
Crane Meta CNI does not perform network configuration itself. It serves as a bridge between CraneSched and the actual CNI plugin. Different clusters and different CNI plugins require different Meta CNI configurations.
The example configuration file is located at tool/meta-cni/config/00-meta.example.conf. Edit it to match your environment.
After editing, place the file in /etc/cni/net.d/ (the exact location is determined by the container runtime; keep the path consistent). Multiple configuration files may exist in this directory; the file with the lexicographically earliest name takes precedence.
Example: Calico¶
Warning
This section is still being improved. More details will be added later.
Calico is a popular CNI plugin that supports network policy and high-performance networking. CraneSched has completed compatibility testing with Calico.
The following is a Meta CNI + Calico + Port Mapping + Bandwidth configuration. Write it to /etc/cni/net.d/00-crane-calico.conf. If there is no higher-priority configuration file, it will take effect on the next container startup.
Example configuration
{
"cniVersion": "1.0.0",
"name": "crane-meta",
"type": "meta-cni",
"logLevel": "debug",
"timeoutSeconds": 10,
"resultMode": "chained",
"runtimeOverride": {
"args": [
"-K8S_POD_NAMESPACE",
"-K8S_POD_NAME",
"-K8S_POD_INFRA_CONTAINER_ID",
"-K8S_POD_UID"
],
"envs": []
},
"delegates": [
{
"name": "calico",
"conf": {
"type": "calico",
"log_level": "info",
"datastore_type": "etcdv3",
"etcd_endpoints": "http://192.168.24.2:2379",
"etcd_key_file": "",
"etcd_cert_file": "",
"etcd_ca_cert_file": "",
"ipam": {
"type": "calico-ipam"
},
"policy": {
"type": "none"
},
"container_settings": {
"allow_ip_forwarding": true
},
"capabilities": {
"portMappings": true
}
}
},
{
"name": "portmap",
"conf": {
"type": "portmap",
"snat": true,
"capabilities": {
"portMappings": true
}
}
},
{
"name": "bandwidth",
"conf": {
"type": "bandwidth",
"capabilities": {
"bandwidth": true
}
}
}
]
}
Optional Dependencies¶
BindFs¶
Note
This is required only when running containers with user namespaces (--userns, "Fake Root") and the underlying filesystem of the mount directory does not support ID Mapped Mounts.
Warning
Since BindFs is based on FUSE, it may introduce performance overhead. Enable it only when necessary.
BindFs is a FUSE-based filesystem tool that maps real user IDs to container user IDs on filesystems that do not support ID Mapped Mounts. Install it with the following commands:
After installation, enable BindFs in the Container Configuration section below.
Deployment Steps¶
Modify Configuration File¶
Edit /etc/crane/config.yaml and add the following container configuration:
Note
For the complete container configuration options, see Container Configuration.
Container:
Enabled: true
TempDir: supervisor/containers/
RuntimeEndpoint: /run/containerd/containerd.sock
ImageEndpoint: /run/containerd/containerd.sock
After editing, save and distribute the configuration file to all nodes.
Restart CraneSched Services¶
Restart CraneSched services on all nodes to apply the new configuration:
Verify Container Feature¶
Run a test container on any node that can submit jobs:
# Submit a simple container job
ccon run -p CPU alpine:latest -- echo "Hello from container"
# Check job status
ccon ps -a
Container Configuration¶
Configure container-related options in /etc/crane/config.yaml. Below is a complete example and field description:
Container:
# Enable container feature
Enabled: true
# Container temporary data directory (relative to CraneBaseDir)
TempDir: supervisor/containers/
# CRI runtime service socket
RuntimeEndpoint: /run/containerd/containerd.sock
# CRI image service socket (usually same as RuntimeEndpoint)
ImageEndpoint: /run/containerd/containerd.sock
# BindFs configuration (optional, for user namespace mapping)
BindFs:
Enabled: false
BindfsBinary: /usr/bin/bindfs
FusermountBinary: /usr/bin/fusermount3
MountBaseDir: /mnt/crane
Core Configuration¶
| Field | Type | Default | Description |
|---|---|---|---|
Enabled |
bool | false |
Whether to enable the container feature. Set to true to enable |
TempDir |
string | supervisor/containers/ |
Temporary data directory during container runtime, relative to CraneBaseDir. Stores container metadata, logs, etc. |
RuntimeEndpoint |
string | - | Required. Unix socket path for the CRI runtime service, used for container lifecycle management (create, start, stop, etc.) |
ImageEndpoint |
string | Same as RuntimeEndpoint |
Unix socket path for the CRI image service, used for image pulling and management. Usually the same as RuntimeEndpoint |
BindFs Configuration¶
BindFs implements user ID mapping mounts from host directories to containers, resolving permission issues under user namespaces.
| Field | Type | Default | Description |
|---|---|---|---|
BindFs.Enabled |
bool | false |
Whether to enable BindFs |
BindFs.BindfsBinary |
string | bindfs |
Path to the bindfs executable |
BindFs.FusermountBinary |
string | fusermount3 |
Path to the fusermount executable (used to unmount FUSE filesystems) |
BindFs.MountBaseDir |
string | /mnt/crane |
Base directory for BindFs mount points. Can be an absolute path or relative to CraneBaseDir |
Image Management¶
CraneSched does not directly manage image storage. The container runtime is responsible for pulling and storing images.
Container images can be obtained from:
- Public registries: Docker Hub, GHCR, Quay.io, etc.
- Private registries: Enterprise internal registry
- Local images: Pre-imported via
ctrorcrictl
We recommend deploying an image accelerator or private registry in the cluster to improve pull speed.
Troubleshooting¶
Refer to the Container Troubleshooting Guide for common issues and solutions.
Related Documentation¶
- Container Feature Overview: Understand the overall positioning and advantages of the container feature
- Core Concepts: Understand container jobs, Pods, resource models, and other concepts
- Quick Start: Quickly try container job submission
- Cluster Configuration: Complete configuration file documentation