Pular para o conteúdo principal

Instance Types

Instance types define the CPU and memory resource limits for compute instances.

NameCPUMemory
nano0.5 cores256 MB
micro1 core512 MB
small1 core1 GB
medium2 cores2 GB
large4 cores4 GB
xlarge8 cores8 GB

List available types via CLI:

nimbus compute instance-types

The instance type is set with --type when creating a compute instance:

nimbus compute run --name web --swarm SWARM_ID --image nginx --type medium

In manifests, use the type field:

compute:
web:
swarm: prod
image: nginx
type: medium

GPU support

GPUs are requested independently from instance types using the --gpu flag. Any instance type can be combined with GPU requests:

nimbus compute run --name ml-train --swarm SWARM_ID \
--image pytorch/pytorch --type large --gpu 2

In manifests, use the gpu field:

compute:
ml-train:
swarm: prod
image: pytorch/pytorch
type: large
gpu: 2

GPU requests are separate from CPU/memory tiers because GPU workloads have widely varying resource profiles. A lightweight inference model might need small + 1 GPU, while distributed training might need xlarge + 4 GPUs.

What is a GPU slot?

When you request --gpu 1, you are requesting one GPU slot on a node. What a slot represents depends on the node's overcommit factor:

OvercommitPhysical GPUsSlots--gpu 1 means
1 (default)11Exclusive access to the entire physical GPU. No other container can use it.
212Time-shared access — 2 containers share 1 physical GPU. Each sees full VRAM but shares compute time.
4144 containers share 1 physical GPU.
8188 containers share 1 physical GPU (aggressive sharing).

Example: A node with 2x RTX 3090 and overcommit=4 has 2 × 4 = 8 GPU slots. You can run 8 containers each requesting --gpu 1, with pairs of containers time-sharing each physical GPU.

The overcommit factor is set per node:

nimbus node gpu-overcommit NODE_ID 4

Important: Time-slicing provides NO memory isolation. All containers sharing a GPU see the full VRAM but share it. If two containers each try to use 100% of the VRAM, one will OOM. Set the overcommit factor based on your workloads' known VRAM usage. For production training workloads, keep overcommit at 1 (default). For inference with small models, 2-4 is typical.

Failed GPUs

DockNimbus tracks individual GPU health. If a GPU is marked as failed, it is excluded from the allocatable count and won't receive workloads. Use nimbus node describe NODE_ID to see per-device health status.

Only NVIDIA GPUs are supported. See the GPU acceleration guide for details on provisioning and availability.