SockguardSockguard

Security Model

Sockguard's defense-in-depth model — transport admission, client admission, method/path filtering, request-body inspection, ownership isolation, visibility-controlled reads, and structured access plus audit logging.

Why Socket Proxying Matters

The Docker socket (/var/run/docker.sock) is equivalent to root access on the host. Any container with unrestricted socket access can:

  • Create a privileged container that mounts the host filesystem
  • Execute arbitrary commands via docker exec
  • Access host PID, network, and IPC namespaces
  • Pull and run malicious images
  • Manipulate Swarm clusters

A socket proxy sits between consumers and the raw socket, filtering requests to limit what each consumer can do.

Defense in Depth

Sockguard implements multiple layers of filtering:

Layer 0: Policy Integrity

Before any request is evaluated, Sockguard can verify that the loaded configuration was signed by a trusted key. When policy_bundle.enabled: true, the on-disk YAML is treated as untrusted until a cosign sigstore bundle at policy_bundle.signature_path confirms it. An unsigned, malformed, or wrong-key bundle aborts startup with a wrapped policy bundle: error — no rules compile, no listener opens.

Two verification paths are supported:

  • Keyed — PEM-encoded ECDSA, RSA, or ed25519 public keys listed under policy_bundle.allowed_signing_keys. No network round-trip required.
  • Keyless (Fulcio + Rekor)policy_bundle.allowed_keyless entries constrain the Fulcio cert chain by exact OIDC issuer URL and subject SAN regex. When policy_bundle.require_rekor_inclusion: true, a Rekor transparency-log entry is additionally required.

Verification also runs on every hot reload. A reload whose bundle fails verification is rejected with result=reject_signature in sockguard_config_reload_total and never touches the running policy. The trust material (enabled, allowed_signing_keys, allowed_keyless, require_rekor_inclusion, verify_timeout) is reload-immutable so a SIGHUP cannot silently widen the set of accepted signers; only signature_path is reload-mutable so an operator can re-sign without a restart.

The verified signer (keyed:<spki-fingerprint> or keyless:<issuer>:<san>) and the YAML's SHA-256 digest are stamped onto GET /admin/policy/version in the bundle_signer and bundle_digest fields, giving operators a tamper-evident audit trail of exactly who signed the running policy and over which bytes.

Layer 1: Transport Admission

Non-loopback TCP listeners require mutual TLS 1.3 by default via listen.tls. Plaintext remote TCP is rejected unless you set both listen.insecure_allow_plain_tcp: true and listen.insecure_allow_unauthenticated_clients: true — two deliberate acknowledgments (one without the other is rejected) for legacy compatibility on a private network. Unix socket listeners bypass this layer because they are filesystem-bounded. listen.tls.client_ca_file defines the issuing trust root, and the optional listen.tls.common_names, dns_names, ip_addresses, uri_sans, and public_key_sha256_pins fields can narrow that trust to specific verified client certificates instead of implicitly accepting every client cert issued by the configured CA.

Layer 2: Client Admission

clients.allowed_cidrs gates incoming TCP callers by source CIDR before any rule evaluation runs. When clients.container_labels.enabled is true, Sockguard resolves the calling container by source IP and enforces per-client com.sockguard.allow.<method> label allowlists in addition to the global rule set.

Named client profiles sit on top of that admission layer. Sockguard can now select a per-client ruleset and request-body policy by source IP, verified mTLS certificate selectors (common_names, dns_names, ip_addresses, uri_sans, spiffe_ids, public_key_sha256_pins), or unix peer credentials (uids, gids, pids), with a configurable default profile for unmatched callers. That turns one proxy from "one ruleset in front of Docker" into a shared control plane for multiple consumers without collapsing back to broad allowlists.

Layer 3: Method Filtering

Block entire HTTP methods. Most consumers only need GET (read-only mode).

Layer 4: Path Filtering

Allow or deny specific Docker API endpoint paths using glob patterns. Before matching, Sockguard strips Docker API version prefixes (/v1.45/), percent-decodes the path (including double-encoded separators and mixed-case escapes such as %2F, %2E, and %252F), and resolves . / .. segments via path.Clean. That means /v1.45/containers/%2e%2e/images/json canonicalizes to /images/json before the glob matcher sees it, so adversarial path shapes cannot slip past a literal allowlist or skip a request-body inspector.

Layer 5: Request Body Inspection

POST /containers/create bodies are parsed on every request and denied when they contain dangerous configuration:

  • HostConfig.Privileged: true
  • HostConfig.NetworkMode: host
  • HostConfig.PidMode: host
  • HostConfig.IpcMode: host
  • HostConfig.UsernsMode: host
  • A non-empty HostConfig.Sysctls map (kernel parameter tuning), unless request_body.container_create.allow_sysctls is set
  • Any bind mount whose source is outside request_body.container_create.allowed_bind_mounts
  • Any HostConfig.Devices host path outside request_body.container_create.allowed_devices
  • HostConfig.DeviceRequests unless explicitly allowed
  • HostConfig.DeviceCgroupRules unless explicitly allowed

Five further HostConfig fields are denied unconditionally — no policy setting opts back in — because each one opens a namespace-escape or privilege-escalation path: VolumesFrom, UTSMode: host, a non-empty CgroupParent, GroupAdd, and ExtraHosts.

POST /containers/*/exec and POST /exec/*/start can also be inspected now. When request_body.exec.allowed_commands is configured, Sockguard denies argv vectors that match no allowlist entry, denies privileged exec unless explicitly allowed, denies root-user exec unless explicitly allowed, and re-inspects POST /exec/*/start against Docker's stored exec metadata before the command runs. Each allowlist entry is an argv template whose tokens are sockguard globs (* matches a run of non-slash characters, ** matches any sequence): a command matches when its token count equals an entry's and every token matches the glob at that position, so an exec whose argv carries a variable component — a run ID, timestamp, or generated path — can be allowlisted without enumerating every literal form. Keep glob tokens as tight as the use case allows; a token of ** matches anything.

The exec-start re-inspection is necessarily best effort: Docker exposes metadata inspection and exec start as separate API calls, so Sockguard cannot make the check atomic with the eventual start operation. Treat this as a narrow TOCTOU window inherent to Docker's API shape, and prefer tight exec allowlists plus conservative per-client profile assignment for clients that do not need interactive command execution.

POST /images/create is inspected by default. Sockguard blocks fromSrc imports unless explicitly allowed and constrains pulls to Docker Hub official images unless the operator opts into allow_all_registries or an explicit registry allowlist.

POST /build is inspected by default. Sockguard blocks remote build contexts, networkmode=host, and Dockerfiles containing RUN instructions unless those behaviors are explicitly allowed.

POST /services/create and POST /services/*/update are inspected by default. Sockguard blocks host-network services, bind mounts outside request_body.service.allowed_bind_mounts, and service images outside the configured official/allowlisted registry set.

POST /volumes/create, POST /secrets/create, and POST /configs/create are inspected by default. Sockguard blocks non-local volume drivers and driver options unless explicitly allowed, and blocks custom or template drivers on secrets/configs unless explicitly allowed.

POST /swarm/init, POST /swarm/join, and POST /swarm/update are inspected by default. Sockguard blocks ForceNewCluster, external CA configuration, non-allowlisted join targets, token rotations, manager unlock-key rotations, manager autolock, and signing-CA updates unless explicitly allowed.

POST /plugins/pull, POST /plugins/*/upgrade, POST /plugins/*/set, and POST /plugins/create are inspected by default. Sockguard constrains remote registries, privilege grants, plugin-set assignments, local plugin tar config.json, host mounts, device exposure, and capability requests unless explicitly allowed. POST /plugins/create is treated as multipart/form-data as well as raw tar: Sockguard spools the upload to a temporary file, parses the multipart envelope, extracts config.json from the embedded tar, and applies the same plugin policy it applies to POST /plugins/pull. Uploads without a parseable config.json, or whose config.json fails policy, are denied before the body reaches Docker.

POST /networks/create, POST /networks/*/connect, and POST /networks/*/disconnect are inspected by default. Sockguard blocks custom network drivers, swarm/ingress/attachable/config-only networks, custom IPAM drivers/config/options, driver options, endpoint static IP/MAC/alias/driver options, and forced disconnects unless explicitly allowed.

POST /containers/*/update and PUT /containers/*/archive are inspected by default. Sockguard blocks restart-policy/resource-control changes, privileged/device/capability-like update fields, unsafe archive target paths, tar traversal, setuid/setgid entries, device nodes, and escaping symlinks/hardlinks unless explicitly allowed.

POST /images/load is inspected by default. Image archives are denied unless image-load policy allows matching registries or untagged images; Docker manifest.json repo tags are checked against the same official/registry allowlist model used for pulls.

POST /swarm/unlock and POST /nodes/*/update are inspected by default. Swarm unlock is denied unless explicitly allowed, and node updates block role, availability, name, and arbitrary label mutations unless the corresponding node policy permits them. The default owner-label key remains allowed for controlled node claims.

Bounded JSON/tar inspectors read request bodies under per-endpoint byte caps and return 413 Payload Too Large when those caps are exceeded, instead of streaming unbounded bodies into memory. A malformed or hostile client cannot tie up the filter or the Docker daemon with oversized payloads — the bounded reader short-circuits before the JSON decode or tar parse even begins. The filter also applies a 30-second read deadline to the request body before an inspector runs, so a client that opens a request and then dribbles the body slowly cannot pin the inspector indefinitely. On the upstream side, the reverse-proxy and side-channel transports set a 30-second response-header timeout, so a Docker daemon that accepts a connection but never replies cannot pin a goroutine; streaming endpoints send headers promptly and are unaffected.

These inspectors intentionally decode only the Docker request fields Sockguard actually enforces. They are not full Docker-schema validators, so full payload validation still belongs to Docker once Sockguard has checked the policy-relevant subset.

The remaining blind-write guardrail covers body-bearing writes Sockguard still cannot constrain safely, chiefly arbitrary exec without an allowlist, POST /swarm/join without configured allowed_join_remote_addrs, and plugin setting writes without allowed assignment prefixes. Validation refuses to start with those rules allowed unless you explicitly set insecure_allow_body_blind_writes: true, to keep the enforcement boundary honest.

Sockguard now applies the same honesty rule to raw archive/export and log/attach streaming reads. Validation refuses to start broad read rules that would expose GET /containers/*/archive, GET /containers/*/export, GET /containers/*/logs, GET /containers/*/attach/ws, POST /containers/*/attach, GET /services/*/logs, GET /tasks/*/logs, GET /images/get, or GET /images/*/get unless you explicitly set insecure_allow_read_exfiltration: true. Because this validation layer only sees method + path, /containers/*/logs is treated conservatively whether or not the caller also sets follow=1. That keeps backup/export and raw-stream use cases possible without letting a casual GET /containers/** or Tecnativa-style section gate silently include filesystem or log stream exfiltration.

Layer 6: Owner Label Isolation

When ownership.owner is set, Sockguard stamps label-capable creates and build-produced images with an owner label, injects owner filters into list/prune/events responses, and inspects target resources on individual requests to deny cross-owner access. That now covers owned containers, images, networks, volumes, services, tasks, secrets, configs, nodes, and swarm state, with service writes stamping both the service and its task template so downstream tasks inherit the same owner identity, /nodes using Docker's node.label filter key, and unlabeled node/swarm resources only claimable through their update paths. This turns one shared Docker socket into N isolated identity views.

Layer 7: Visibility-Controlled Reads

Sockguard's response filter applies to known protected Docker JSON response shapes on successful body-bearing 2xx responses across request methods, not only GET 200. If a protected successful response cannot be parsed or sanitized safely, Sockguard fails closed with a generic 502 instead of forwarding unsanitized data. Non-success responses, HEAD responses, no-body statuses, non-protected paths, and streaming endpoints (logs, attach, events) pass through unmodified — those are protected by request-side rules and the read-side exfiltration guardrail, not by response rewriting.

Together with request-side visibility and exfiltration guardrails, the read-side layer narrows what callers can see:

  • Inject label visibility selectors into GET /containers/json, GET /images/json, GET /networks, GET /volumes, and GET /events
  • Inject label visibility selectors into GET /services, GET /tasks, GET /secrets, GET /configs, and GET /nodes
  • Return 404 for hidden targets on inspect/log-style reads such as GET /containers/*/json, GET /images/*/json, GET /networks/*, GET /volumes/*, GET /exec/*/json, GET /services/*, GET /services/*/logs, GET /tasks/*, GET /tasks/*/logs, GET /secrets/*, GET /configs/*, GET /nodes/*, and GET /swarm
  • Fail startup unless raw archive/export and stream-style reads are explicitly acknowledged via insecure_allow_read_exfiltration: true
  • Redact Config.Env on GET /containers/*/json
  • Redact HostConfig.Binds host paths plus Mounts[*].Source on container list/inspect responses
  • Redact volume Mountpoint on GET /volumes and GET /volumes/*
  • Redact container and network address topology on container/network list and inspect responses
  • Redact service/task env, mount, secret/config-reference, and network metadata
  • Redact config payload data, plugin env/path metadata, node/swarm TLS material, swarm join/unlock material, and /info plus /system/df topology-sensitive fields

Single-resource inspect denials honor rollout mode: under a profile in warn or audit mode, a target that visibility policy would hide is forwarded upstream with a would_deny audit verdict instead of being hard-404'd, so visibility policy can be staged like every other deny gate. When response.name_patterns or response.image_patterns filter a list response, Sockguard buffers the upstream body under an 8 MiB cap and rejects a larger response with a 502 rather than buffering it unbounded.

These controls are on by default where they are pure redaction because runtime env vars routinely carry credentials and Docker read APIs expose raw host mount paths plus internal network layout.

Layer 8: Structured Access And Audit Logging

Every request is stamped with a proxy-generated canonical X-Request-Id and logged with method, raw path, normalized_path, decision, matched rule index, selected client profile when present, latency, request ID, trace context, and client metadata. If the caller supplied its own request ID, Sockguard preserves it separately as client_request_id in logs instead of trusting it as the canonical correlation key.

path is the client-controlled URL path exactly as received and is retained for forensic replay. Detection logic, SIEM grouping, and policy analysis should use normalized_path, which is the canonical path after Sockguard strips Docker API version prefixes, decodes escaped separators, and resolves dot segments before rule evaluation.

When log.audit.enabled is true, Sockguard also emits a dedicated JSON audit event with a stable schema: request ID, client request ID, trace ID, trace parent/span IDs, sampled flag, raw and normalized path, decision, machine-readable reason_code, human-readable reason, matched rule, selected profile, flattened actor and transport identity fields, ownership context, and final HTTP status. Upstream reverse-proxy errors overwrite the audit reason code with bounded values such as upstream_socket_unreachable or upstream_response_rejected_by_policy, so the terminal result remains explicit even after an allow decision has already been made.

The audit ownership object is emitted on every event. If ownership.owner is configured, that owner identifier is repeated in every audit record, not only resource ownership decisions, so it should be a non-secret tenant/workload label suitable for the audit sink.

Sockguard preserves valid W3C traceparent trace IDs and sampled flags, forwards a proxy-local span ID, and includes trace_id, trace_parent_id, trace_span_id, and trace_sampled in access, audit, and upstream reverse-proxy error logs. Invalid or absent trace context starts a fresh local trace without enabling any OTLP span exporter.

When health.watchdog.enabled is true, Sockguard actively probes the upstream Docker socket, logs reachable/unreachable state transitions, and lets /health reflect the latest watchdog state. When metrics.enabled is true, Sockguard serves Prometheus text metrics from /metrics by default, including a sockguard_build_info{version,commit,build_date,go_version} gauge, a sockguard_start_time_seconds gauge, and watchdog state and check counters if the watchdog is enabled. The scrape endpoint is local to Sockguard, is never forwarded to Docker, bypasses Docker API allow rules like /health, and remains behind listener security plus client ACLs.

Dangerous Docker API Endpoints

Risk LevelEndpoints
CriticalPOST /containers/create, POST /containers/{id}/exec, POST /exec/{id}/start, PUT /containers/{id}/archive
HighPOST /images/create, POST /images/load, POST /build, POST /services/create, POST /services/{id}/update, POST /swarm/init, POST /swarm/join, POST /swarm/update, POST /swarm/unlock, POST /nodes/{id}/update, POST /plugins/pull, POST /plugins/{name}/upgrade, POST /plugins/{name}/set, POST /plugins/create
MediumPOST /containers/{id}/update, POST /volumes/create, POST /networks/create, POST /networks/{id}/connect, POST /networks/{id}/disconnect, POST /secrets/create, POST /configs/create, DELETE /containers/{id}
LowGET /containers/json, GET /events, GET /version, GET /_ping

Image Security

Sockguard's container image is built on Wolfi (Chainguard):

  • Minimal package set, which keeps the base image's CVE exposure low
  • Built-in SBOM output and build provenance when release visibility supports attestations
  • Cosign-signed for verification — see the image verification guide for the canonical cosign verify invocation
  • No shell, no package manager in production image

Runtime Hardening

Sockguard runs as UID 65532 (Chainguard nonroot) inside the container. On stock Docker hosts where /var/run/docker.sock is owned by the docker group you may need a group_add: [docker] override or a matching GID. For a Docker socket proxy, the real security frontier is what the daemon will accept through the proxy, not the UID the proxy process reports after it has already opened the upstream socket.

The runtime controls that matter are:

  • Correct policy rules and request-body inspection
  • read_only: true
  • cap_drop: [ALL]
  • security_opt: ["no-new-privileges:true"]
  • Docker's default seccomp profile or a stricter custom profile
  • AppArmor/SELinux confinement on the host
  • Rootless dockerd on the host when available

The getting-started examples use the container-level controls above by default so the drop-in path stays simple without hiding the real hardening story.

Known Limitations

These are architectural constraints inherent to Sockguard's position in the stack. They are documented here for honest operator awareness rather than as open bugs.

IP-based client identity is soft isolation. When clients.container_labels.enabled is true, or any clients.profiles[*].match rule keys on source_cidrs, Sockguard resolves the calling container by source IP through the Docker API. This is soft isolation — adequate against configuration drift and friendly-fire mistakes, but not a hard boundary against an attacker who can influence which container a given bridge IP points at:

  • A container restart can race the label lookup: if a new container acquires the same bridge IP before the lookup completes, the lookup may return the new container's labels rather than the previous container's.
  • An attacker who can create containers on the same user-defined bridge can, in principle, claim a privileged IP and inherit its policy until the next legitimate container takes it back.
  • Host-network containers (network_mode: host) all share the host IP, so IP-keyed allowlists cannot tell them apart.

For workloads where caller identity is part of the security boundary, listen on a unix socket and use clients.unix_peer_profiles with uids/gids. SO_PEERCRED is supplied by the kernel and cannot be spoofed from within the calling container — that is the hard-isolation path.

Exec TOCTOU (inspect/start split). Docker exposes exec metadata inspection and exec start as separate API calls. Sockguard re-checks POST /exec/*/start against Docker's stored exec metadata before the command runs, but the gap between the create and start calls is an unavoidable time-of-check/time-of-use window inherent to Docker's API shape. Keep exec allowlists narrow and client profile assignments conservative for clients that do not need interactive command execution.

Hijacked-stream redaction limits. Sockguard's response filter — including the response.redact_container_env, response.redact_mount_paths, response.redact_network_topology, and response.redact_sensitive_data toggles — operates only on structured JSON responses with known shapes. Raw streaming endpoints — GET /containers/*/logs, POST /containers/*/attach, GET /services/*/logs, GET /events, exec attach, and image-build progress output — switch the connection to a raw byte stream (or a non-JSON chunked stream) after the initial HTTP response, at which point Sockguard cannot inspect or redact the byte stream. A secret an application writes to its own stdout will reach a caller that has been allowed to attach.

These paths are gated at request time via per-profile rule allowlists and the insecure_allow_read_exfiltration guardrail (which keeps the streaming read endpoints denied by default), but there is no post-admission byte-level filtering of the stream content. Restrict these paths in your rules to only the profiles and callers that genuinely need them, and treat the redaction toggles as a guarantee for Docker's structured metadata only — not for arbitrary workload output.

On this page