Proxmox Operator
Work in progress!
The Proxmox Operator manages Proxmox Virtual Environment (PVE) resources through Kubernetes Custom Resource Definitions. It provides declarative infrastructure management by reconciling Kubernetes resources with Proxmox clusters, nodes, and virtual machines.
Architecture
Controller Structure
The operator implements the Kubernetes controller pattern with the following components:
- Machine Controller: Reconciles
vitistack.io/v1alpha1/Machineresources - ProxmoxCluster Controller: Manages Proxmox cluster connections
- ProxmoxNode Controller: Handles individual node management
- Initialize Service: Validates Proxmox connectivity and authentication
Repository Structure
├── cmd/ # Main entry point
├── internal/
│ ├── controller/v1alpha1/ # Custom resource controllers
│ └── services/initializeservice/ # Proxmox initialization logic
├── config/
│ ├── crd/ # Custom Resource Definitions
│ ├── rbac/ # Role-based access control
│ ├── manager/ # Operator deployment
│ ├── prometheus/ # Monitoring configuration
│ ├── network-policy/ # Network security policies
│ └── samples/ # Example resources
├── charts/proxmox-operator/ # Helm deployment charts
└── test/ # Test suites
API Reference
Machine Resource
The primary resource managed by the Proxmox Operator:
apiVersion: vitistack.io/v1alpha1
kind: Machine
metadata:
name: string # Machine identifier
namespace: string # Kubernetes namespace
labels:
cluster.vitistack.io/cluster-name: string # Associated cluster
spec:
# Core Configuration
name: string # VM name in Proxmox
vmid: int # Proxmox VM ID (100-999999999)
node: string # Target Proxmox node
template: string # Base template/image
# Resource Allocation
cpu:
cores: int # CPU cores (1-128)
sockets: int # CPU sockets (1-4)
threadsPerCore: int # Threads per core (1-2)
memory: int # RAM in bytes
# Storage Configuration
disks:
- name: string # Disk identifier
size: string # Size (e.g., "20G", "1T")
storage: string # Storage pool name
type: string # Disk type: ide, sata, scsi, virtio
cache: string # Cache mode: none, writethrough, writeback
format: string # Format: raw, qcow2, vmdk
# Network Configuration
networks:
- name: string # Network interface name
bridge: string # Proxmox bridge (vmbr0, vmbr1, etc.)
model: string # NIC model: e1000, virtio, rtl8139
macAddress: string # MAC address (optional)
vlan: int # VLAN tag (1-4094)
firewall: bool # Enable Proxmox firewall
# Proxmox-Specific Settings
osType: string # OS type: linux, windows, solaris, other
bootOrder: []string # Boot device order: disk, network, cdrom
agent: bool # Enable QEMU guest agent
balloon: bool # Enable memory ballooning
protection: bool # Enable deletion protection
status:
phase: string # Current phase: Pending, Creating, Running, Stopped, Error
conditions: []Condition # Status conditions
vmid: int # Assigned VM ID
node: string # Actual Proxmox node
ipAddresses: []string # Assigned IP addresses
lastUpdated: string # Last reconciliation timestamp
ProxmoxCluster Resource
Represents a Proxmox cluster configuration:
apiVersion: vitistack.io/v1alpha1
kind: ProxmoxCluster
metadata:
name: string
spec:
# Connection Configuration
endpoint: string # Proxmox API URL (https://proxmox.example.com:8006)
insecureSkipTLSVerify: bool # Skip TLS certificate verification
# Authentication
credentials:
username: string # Proxmox username (user@pam, user@pve)
password: # Password reference
secretRef:
name: string # Secret name
key: string # Secret key
tokenID: string # API token ID (alternative to password)
tokenSecret: # API token secret
secretRef:
name: string
key: string
# Cluster Settings
nodes: []string # Available Proxmox nodes
storages: []string # Available storage pools
networks: []string # Available network bridges
status:
ready: bool # Cluster connectivity status
version: string # Proxmox VE version
nodes: []NodeStatus # Per-node status information
lastHealthCheck: string # Last health check timestamp
ProxmoxNode Resource
Represents individual Proxmox nodes:
apiVersion: vitistack.io/v1alpha1
kind: ProxmoxNode
metadata:
name: string
spec:
cluster: string # Reference to ProxmoxCluster
nodeName: string # Proxmox node name
maxVMs: int # Maximum VMs per node
status:
ready: bool # Node availability
resources:
cpu:
total: int # Total CPU cores
used: int # Used CPU cores
memory:
total: int # Total memory in bytes
used: int # Used memory in bytes
storage:
total: int # Total storage in bytes
used: int # Used storage in bytes
vmCount: int # Current VM count
Configuration Reference
Environment Variables
| Variable | Type | Default | Description |
|---|---|---|---|
PROXMOX_DEFAULT_CLUSTER |
string | - | Default cluster for machines without explicit cluster |
RECONCILE_INTERVAL |
duration | 30s | Reconciliation interval |
MAX_CONCURRENT_RECONCILES |
int | 5 | Maximum concurrent reconciliations |
METRICS_BIND_ADDRESS |
string | :8080 |
Metrics server bind address |
HEALTH_PROBE_BIND_ADDRESS |
string | :8081 |
Health probe bind address |
Proxmox API Configuration
Authentication Methods
Username/Password Authentication:
credentials:
username: "operator@pve"
password:
secretRef:
name: "proxmox-credentials"
key: "password"
API Token Authentication (Recommended):
credentials:
username: "operator@pve"
tokenID: "operator-token"
tokenSecret:
secretRef:
name: "proxmox-credentials"
key: "token-secret"
TLS Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
insecureSkipTLSVerify |
bool | false | Skip certificate validation |
caCertificate |
string | - | Custom CA certificate |
clientCertificate |
string | - | Client certificate for mutual TLS |
clientKey |
string | - | Client private key |
Operational Reference
Reconciliation Logic
The Proxmox Operator implements the following reconciliation workflow:
- Resource Validation: Validates Machine specification against Proxmox constraints
- Cluster Connection: Establishes connection to target Proxmox cluster
- Node Selection: Selects appropriate Proxmox node based on resources and constraints
- VM ID Assignment: Allocates unique VM ID if not specified
- VM Creation: Creates virtual machine with specified configuration
- Resource Provisioning: Configures CPU, memory, storage, and network
- State Monitoring: Monitors VM state and reports status
- Lifecycle Management: Handles start, stop, restart, and deletion operations
Proxmox API Integration
VM Management Operations
VM Creation:
POST /api2/json/nodes/{node}/qemu
Content-Type: application/json
{
"vmid": 100,
"name": "test-vm",
"cores": 2,
"memory": 2048,
"net0": "virtio,bridge=vmbr0",
"scsi0": "local-lvm:20"
}
VM Configuration Update:
PUT /api2/json/nodes/{node}/qemu/{vmid}/config
Content-Type: application/json
{
"cores": 4,
"memory": 4096
}
VM State Control:
POST /api2/json/nodes/{node}/qemu/{vmid}/status/{action}
# Actions: start, stop, shutdown, reset, suspend, resume
VM Status Query:
GET /api2/json/nodes/{node}/qemu/{vmid}/status/current
Resource Queries
Cluster Status:
GET /api2/json/cluster/status
Node Resources:
GET /api2/json/nodes/{node}/status
Storage Information:
GET /api2/json/nodes/{node}/storage
Network Configuration:
GET /api2/json/nodes/{node}/network
VM ID Management
| Range | Purpose | Auto-Assignment |
|---|---|---|
| 100-999 | System templates | No |
| 1000-9999 | Manual assignment | No |
| 10000-99999 | Operator managed | Yes |
| 100000+ | Reserved | No |
Auto-assignment algorithm:
- Query existing VMs to identify used IDs
- Find lowest available ID in operator range (10000+)
- Reserve ID during VM creation
- Handle conflicts with retry logic
Processing Specifications
Resource Allocation
CPU Configuration
| Specification | Proxmox Mapping | Validation |
|---|---|---|
cpu.cores |
cores parameter |
1-128 cores per VM |
cpu.sockets |
sockets parameter |
1-4 sockets per VM |
cpu.threadsPerCore |
Calculated as cores/(sockets*threads) |
1-2 threads per core |
Memory Management
- Specification: Bytes (e.g., 2147483648 for 2GB)
- Proxmox Format: Megabytes (e.g., 2048)
- Conversion:
proxmox_mb = bytes / 1048576 - Constraints: Minimum 64MB, maximum node memory limit
Storage Configuration
Disk Naming Convention:
{type}{index}: {storage}:{size}[,format={format}][,cache={cache}]
Example: scsi0: local-lvm:20,format=raw,cache=writeback
Supported Storage Types:
| Type | Interface | Use Case |
|---|---|---|
ide |
IDE | Legacy systems, CD-ROM |
sata |
SATA | Standard storage |
scsi |
SCSI | High-performance storage |
virtio |
VirtIO | Paravirtualized storage (recommended) |
Network Configuration
Bridge Mapping
| Specification | Proxmox Format | Example |
|---|---|---|
bridge: vmbr0 |
bridge=vmbr0 |
virtio,bridge=vmbr0 |
model: virtio |
virtio model prefix |
virtio,bridge=vmbr0 |
macAddress: aa:bb:cc:dd:ee:ff |
,macaddr=aa:bb:cc:dd:ee:ff |
virtio,bridge=vmbr0,macaddr=aa:bb:cc:dd:ee:ff |
VLAN Configuration
networks:
- name: eth0
bridge: vmbr0
vlan: 100 # Results in: virtio,bridge=vmbr0,tag=100
Error Handling Reference
Error Classification
| Error Type | HTTP Status | Retry Behavior | Resolution |
|---|---|---|---|
| Authentication | 401 | No | Update credentials |
| Authorization | 403 | No | Check user permissions |
| Resource Not Found | 404 | No | Verify cluster/node exists |
| Conflict | 409 | Yes | Retry with backoff |
| Server Error | 500 | Yes | Check Proxmox status |
| Network Error | - | Yes | Verify connectivity |
Reconciliation Backoff
| Attempt | Delay | Maximum Delay |
|---|---|---|
| 1 | 1 second | - |
| 2 | 2 seconds | - |
| 3 | 4 seconds | - |
| 4+ | 8 seconds | 5 minutes |
Common Error Patterns
| Error Message | Cause | Resolution |
|---|---|---|
VM {vmid} already exists |
VM ID conflict | Use different VM ID or delete existing VM |
insufficient resources on node |
Resource exhaustion | Select different node or increase resources |
storage '{storage}' not found |
Invalid storage pool | Verify storage pool exists and is accessible |
bridge '{bridge}' not found |
Invalid network bridge | Check network configuration on target node |
template '{template}' not found |
Missing VM template | Create or upload required template |
Monitoring Reference
Prometheus Metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
proxmox_operator_machines_total |
Gauge | phase, node |
Total machines by phase |
proxmox_operator_reconciliation_duration_seconds |
Histogram | controller |
Reconciliation duration |
proxmox_operator_api_requests_total |
Counter | method, endpoint, status |
Proxmox API requests |
proxmox_operator_errors_total |
Counter | error_type, controller |
Error occurrences |
proxmox_cluster_nodes_available |
Gauge | cluster |
Available nodes per cluster |
proxmox_cluster_connection_status |
Gauge | cluster |
Cluster connectivity status |
Health Endpoints
| Endpoint | Purpose | Status Codes |
|---|---|---|
/healthz |
Liveness probe | 200 (healthy), 500 (unhealthy) |
/readyz |
Readiness probe | 200 (ready), 500 (not ready) |
/metrics |
Prometheus metrics | 200 (metrics available) |
Status Conditions
| Condition Type | Status | Reason | Description |
|---|---|---|---|
Ready |
True/False | Various | Overall machine readiness |
VMCreated |
True/False | CreationSucceeded/CreationFailed |
VM creation status |
ResourcesAllocated |
True/False | AllocationSucceeded/InsufficientResources |
Resource allocation |
NetworkConfigured |
True/False | NetworkReady/NetworkError |
Network configuration |
Security Reference
RBAC Requirements
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: proxmox-operator
rules:
- apiGroups: ["vitistack.io"]
resources: ["machines", "proxmoxclusters", "proxmoxnodes"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["secrets", "events"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch"]
Secret Management
Credential Storage:
apiVersion: v1
kind: Secret
metadata:
name: proxmox-credentials
type: Opaque
stringData:
username: "operator@pve"
password: "secure-password"
# OR for token authentication:
token-id: "operator-token"
token-secret: "token-secret-value"
Network Security
Proxmox API Access Requirements:
- Port: 8006 (HTTPS) or 8007 (SPICE proxy)
- Protocol: HTTPS (TLS 1.2+)
- Firewall: Allow operator pods to reach Proxmox nodes
Deployment Reference
Helm Chart Configuration
| Parameter | Default | Description |
|---|---|---|
image.repository |
ghcr.io/vitistack/proxmox-operator |
Container image |
image.tag |
Chart version | Image tag |
replicaCount |
1 | Operator replicas |
resources.limits.cpu |
200m |
CPU limit |
resources.limits.memory |
256Mi |
Memory limit |
nodeSelector |
{} |
Node selection constraints |
tolerations |
[] |
Pod tolerations |
affinity |
{} |
Pod affinity rules |
Resource Requirements
Minimum:
- CPU: 50m
- Memory: 128Mi
Recommended:
- CPU: 200m
- Memory: 256Mi
Scaling Guidelines:
- CPU scales with number of managed VMs
- Memory scales with cluster size and reconciliation frequency
- Network bandwidth depends on Proxmox API call frequency
Troubleshooting Reference
Debug Commands
Check Operator Status:
kubectl get pods -n proxmox-operator-system
kubectl logs -n proxmox-operator-system deployment/proxmox-operator-controller-manager
Verify CRD Resources:
kubectl get machines -A
kubectl get proxmoxclusters -A
kubectl describe machine <machine-name>
Test Proxmox Connectivity:
# Test API endpoint
curl -k https://proxmox.example.com:8006/api2/json/version
# Validate credentials
curl -k -d "username=operator@pve&password=password" \
https://proxmox.example.com:8006/api2/json/access/ticket
Log Analysis
Common Log Patterns:
# Successful VM creation
"Successfully created VM" vmid=10001 node=pve-node1
# Authentication failure
"Authentication failed" error="invalid credentials"
# Resource allocation error
"Insufficient resources" node=pve-node1 requested_memory=4096 available=2048
# Network configuration error
"Bridge not found" bridge=vmbr0 node=pve-node1
This reference documentation provides comprehensive technical details for system administrators and developers working with the Proxmox Operator, assuming familiarity with Kubernetes operators, Proxmox VE, and virtualization concepts.