Modeling and API Simulation
Topograph models are YAML files used to simulate discovered topology without querying a real cloud API, NetQ instance, InfiniBand fabric, or Kubernetes cluster. They are primarily used by tests and local development, but they are also useful when validating a scheduler integration against known topology shapes.
A model describes the same canonical topology that real providers eventually produce:
- A switch tree, used for Slurm
topology/treeoutput and Kubernetesleaf/spine/corelabels - Node membership in accelerated domains, used for block topology and accelerator labels
- Optional per-node attributes used by provider simulations
Model loading lives in pkg/models. Model fixtures live under tests/models/.
Where Models Are Used
Models are consumed in two different simulation flows.
Test Provider
The test provider simulates the Topograph API lifecycle itself. It can return successful topology output, delayed completion, malformed-request failures, provider failures, or a request that remains pending.
Use it when testing clients that call:
POST /v1/generateGET /v1/topology?uid=<request-id>
For the complete API status-code simulation behavior, see Test Mode and Test Provider.
Provider Simulations
Several providers also have simulation variants, such as:
aws-simgcp-simoci-simnebius-simnscale-simlambdai-simdsx-sim
These providers load a model file and then simulate that provider’s API responses. This is useful when you want to exercise the normal provider translation logic without real provider credentials or infrastructure.
Simulation providers share these common parameters:
| Parameter | Required | Description |
|---|---|---|
modelFileName | Yes | Model file to load. A basename such as medium.yaml is loaded from tests/models/; absolute and relative paths are also supported. |
api_error | No | Provider-specific test hook used by unit tests to simulate API failures. |
trimTiers | No | Number of topology tiers to trim where supported by the simulated provider. |
Example request:
{
"provider": {
"name": "aws-sim",
"params": {
"modelFileName": "medium.yaml"
}
},
"engine": {
"name": "slurm",
"params": {
"plugin": "topology/block"
}
}
}
Model File Shape
A model has three top-level sections:
switches:
...
nodes:
...
capacity_blocks:
...
All three sections are maps. nodes and capacity_blocks are flexible: you can specify node membership in either section, and Topograph completes the missing side during model loading.
Switches
The switches map describes the network hierarchy. Each key is the switch ID. Each value may contain:
| Field | Description |
|---|---|
metadata | Key-value metadata inherited by descendant nodes. Common keys are region, availability_zone, and group. |
switches | Child switch IDs. |
nodes | Compute node names attached to this switch. Compact node ranges are supported. |
Example:
switches:
core:
metadata:
region: us-west
switches: [spine]
spine:
metadata:
availability_zone: zone1
switches: [leaf1, leaf2]
leaf1:
metadata:
group: cb1
nodes: ["n[1-2]"]
leaf2:
metadata:
group: cb2
nodes: [n3]
Switch rules:
- A switch can have at most one parent switch.
- A node can be attached to at most one switch.
- If a switch references a node, that node must exist either in
nodesor be generated fromcapacity_blocks. - Switch
nodesentries are expanded through the same compact range syntax used elsewhere.
Nodes
The nodes map describes compute nodes directly. Each key is the node name. The value may contain:
| Field | Description |
|---|---|
id | Optional. If set, it must match the map key. Usually omitted. |
type | Optional instance type metadata used by instance-oriented exports. |
capacity_block | Optional accelerated domain ID. If set and capacity_blocks is omitted, Topograph creates the corresponding capacity block entry. |
attributes.nvlink | Optional accelerated-domain / NVLink identifier. Used by block topology simulation paths. |
Example:
nodes:
n1:
capacity_block: cb1
attributes:
nvlink: nvl1
n2:
attributes:
nvlink: nvl1
Node rules:
capacity_blockis optional.- Nodes without
capacity_blockare still valid compute nodes. - If
capacity_blockis set andcapacity_blocksis omitted, Topograph creates the capacity block and adds the node to it. - If a node is listed under
capacity_blocks.<id>.nodes, Topograph fills in the node’s missingcapacity_block. - If both sides specify different capacity block IDs for the same node, model loading fails.
Capacity Blocks
The capacity_blocks map describes accelerated domains. Each key is the capacity block ID. The value may contain:
| Field | Description |
|---|---|
nodes | Optional list of node names in this capacity block. Compact ranges are supported. |
attributes.nvlink | Optional NVLink / accelerator domain identifier applied to nodes generated from this capacity block, and to listed top-level nodes when provided. |
Example:
capacity_blocks:
cb1:
nodes: ["n[1-2]"]
attributes:
nvlink: nvl1
cb2: {}
Capacity block rules:
- The entire
capacity_blockssection may be omitted. - Individual capacity block entries may omit
nodes. - Capacity block entries with no corresponding nodes are allowed and preserved.
- If top-level
nodesis omitted,capacity_blocks.<id>.nodescreates node entries automatically. - If top-level
nodesis present,capacity_blocks.<id>.nodesmust reference nodes in the top-levelnodesmap.
Compact Ranges
Model node lists support compact ranges:
nodes: ["n[1-4]", "gpu[001-004]", node9]
These expand to:
n1, n2, n3, n4, gpu001, gpu002, gpu003, gpu004, node9
Ranges are accepted in:
switches.<switch>.nodescapacity_blocks.<id>.nodes
Derived Data
After YAML parsing, Topograph completes the model before simulation uses it:
- Switch node ranges are expanded.
- Capacity block node ranges are expanded.
- Node names are copied from their map keys.
- Switch names are copied from their map keys.
- Missing nodes can be created from
capacity_blocks.<id>.nodes. - Missing capacity block entries can be created from node
capacity_blockvalues. - Node
NetLayersis derived from the switch path from leaf to root. - Node
Metadatais built by merging switch metadata along the same path. Instancesis derived from node names and grouped bymetadata.region; nodes without a region usenone.
These derived fields are not written in YAML.
Complete Examples
Nodes From Capacity Blocks
This compact model omits the nodes section. Nodes are created from capacity block membership.
switches:
core:
switches: [leaf]
leaf:
nodes: ["n[1-2]", n3]
capacity_blocks:
cb1:
nodes: ["n[1-2]"]
attributes:
nvlink: nvl1
cb2:
nodes: [n3]
attributes:
nvlink: nvl2
After loading:
n1andn2belong tocb1and haveattributes.nvlink: nvl1n3belongs tocb2and hasattributes.nvlink: nvl2- All three nodes have network layers
[leaf, core]
Capacity Blocks From Nodes
This model omits capacity_blocks. Topograph creates cb1 from n1.capacity_block.
nodes:
n1:
capacity_block: cb1
attributes:
nvlink: nvl1
n2:
attributes:
nvlink: nvl2
After loading:
cb1.nodescontainsn1cb1.attributes.nvlinkis populated fromn1.attributes.nvlinkn2remains a valid node without capacity block membership
Orphan Capacity Block
This is valid. It declares a capacity block that currently has no nodes.
nodes:
n1:
capacity_block: cb1
capacity_blocks:
cb1: {}
cb2: {}
After loading:
cb1.nodescontainsn1cb2remains present with no nodes
Simulating the API
To simulate the Topograph API lifecycle, configure the test provider:
http:
port: 49021
ssl: false
provider: test
engine: slurm
requestAggregationDelay: 2s
Then submit a request that names a model:
{
"provider": {
"name": "test",
"params": {
"generateResponseCode": 202,
"topologyResponseCode": 200,
"modelFileName": "small-tree.yaml"
}
},
"engine": {
"name": "slurm"
}
}
Expected flow:
POST /v1/generatereturns202 Acceptedand a request ID.GET /v1/topology?uid=<request-id>returns202 Acceptedwhile the request is queued or processing.- When processing completes,
/v1/topologyreturns200 OKwith the selected engine output.
To simulate API failures, set generateResponseCode, topologyResponseCode, and errorMessage in provider.params. For example:
{
"provider": {
"name": "test",
"params": {
"generateResponseCode": 202,
"topologyResponseCode": 500,
"errorMessage": "simulated provider failure"
}
},
"engine": {
"name": "slurm"
}
}
Choosing the Right Simulation Path
Use the test provider when you want to validate API-client behavior:
- Request IDs
- Polling
- Pending responses
- Error status codes
- Retry behavior
Use a *-sim provider when you want to validate provider-specific topology translation:
- AWS, GCP, OCI, Nebius, Nscale, Lambda AI, or DSX topology paths
- Pagination behavior in simulated provider APIs
- Engine output generated from provider-shaped data
- Tree and block topology output from the same model
Validation Checklist
Before using a new model in a regression test:
- Confirm every switch child has only one parent.
- Confirm every switched node is defined in
nodesor generated fromcapacity_blocks. - Confirm no node appears under two switches.
- Confirm capacity block membership does not conflict with node
capacity_block. - Run the relevant provider simulation test or API flow with the target engine.