Version: Next (unreleased)

Quickstart

First run

Learn the incident loop, not just the command syntax.

The fastest way to understand kubediag is to run it against one broken workload, inspect the top-ranked finding, and follow the next commands it suggests. That loop is the product: symptom to evidence to action without manual archaeology.

pod, deployment, namespace, clustertext, json, markdownstable rule IDs

Start narrow

Run against one pod first

A single-pod diagnosis makes the ranking model and evidence format obvious before you move into deployment or namespace scope.

Read the top finding

Use severity and confidence as the entrypoint

The first finding should tell you what kubediag believes is driving the incident and how certain it is.

Confirm quickly

Paste the suggested next commands

kubediag shortens the gap between diagnosis and confirmation by embedding the most useful kubectl follow-ups directly in the output.

Pod diagnosis

Start with the smallest useful scope.

For a broken workload, the pod view is the quickest way to see how kubediag structures a diagnosis.

kubediag pod my-pod -n default

If the problem is rollout- or service-shaped rather than pod-shaped, move up to deployment or namespace scope after this first run.

triage:text

▶ Pod default/my-api-7f9b-xk2m2      Phase: Running     Ready: 0/1

ⓧ CRITICAL  [high confidence]  TRG-POD-CRASHLOOPBACKOFF
Container "api" is in CrashLoopBackOff (5 restarts in the last 3m)

Evidence:
  • pod.status.containerStatuses[0].lastState.terminated.reason = "Error"
  • pod.status.containerStatuses[0].lastState.terminated.exitCode = 1
  • Event (Warning, BackOff, 30s ago): "Back-off restarting failed container"

Next commands:
  $ kubectl logs -n default my-api-7f9b-xk2m2 -c api --previous
  $ kubectl describe pod -n default my-api-7f9b-xk2m2

Expand scope when the incident shape changes

Deployment scope

Use deployment diagnosis when the failure is about rollout progress.

This view combines deployment-level findings and the pod-level signals underneath them.

kubediag deployment web -n prod

Namespace and cluster scope

Use wider scopes for fleet-wide health signals.

Namespace and cluster modes surface warning events, service issues, and node pressure patterns that are easy to miss when starting from one pod.

kubediag namespace prod
kubediag cluster

Switch renderer based on audience

Machine-readable

Choose the renderer that matches the next consumer.

Use JSON for automation, markdown for reports, and terminal text for live incident response.

kubediag pod my-pod -o json
kubediag namespace prod -o markdown
kubediag report namespace prod > triage-report.md

Inspect and explain rules

Rule introspection

Use stable rule IDs as a reference surface.

Rules are public identifiers, which makes them useful in alerts, runbooks, and postmortems.

kubediag rules list
kubediag rules explain TRG-POD-CRASHLOOPBACKOFF

Keep the feedback loop tight

Run triage on the narrowest scope that still contains the incident.
Read the highest-ranked finding and the evidence it cites.
Paste the suggested next commands to confirm or falsify the diagnosis.
Apply the fix and rerun the same command to see whether the finding clears.

If you want a guided browser version of that flow, use the interactive sandbox.