Skip to content

Deployment reconciliation, observed status tracking, and restart policy controls#57

Open
zbal wants to merge 9 commits intodevelopmentfrom
restart-policy
Open

Deployment reconciliation, observed status tracking, and restart policy controls#57
zbal wants to merge 9 commits intodevelopmentfrom
restart-policy

Conversation

@zbal
Copy link
Collaborator

@zbal zbal commented Feb 5, 2026

Why

Deployment status in the DB can drift from real container state (crashed/stopped/removed). This creates confusing UI states and makes recovery unclear. We need a reliable observed snapshot, a reconciliation loop, and a controlled restart policy to keep deployment state accurate and actionable.

Summary

  • Added observed-state tracking on deployments and a computed status.
  • Implemented reconciliation logic (service + CLI + ARQ cron + API endpoint).
  • Emitted observed-state events on change for future real‑time UI updates.
  • Added Docker restart policy configuration for deployment containers.
  • Updated strategy docs to reflect architecture and current scope.

Key changes

  • models.py: new observed_* fields and computed_status property.
  • reconcile.py: Docker inspection + observed updates + change-only events.
  • reconcile.py + jobs.py: scheduled ARQ cron task.
  • project.py: POST /{team_slug}/projects/{project_name}/deployments/{deployment_id}/reconcile.
  • reconcile.sh: one‑off reconcile CLI.
  • deployment.py: Docker restart policy (on-failure + retries).
  • .env.example, .env.dev.example: new settings (reconcile interval, restart policy).

Config

  • RECONCILE_INTERVAL_SECONDS (default: 60)
  • DEPLOYMENT_RESTART_POLICY (default: on-failure)
  • DEPLOYMENT_RESTART_MAX_RETRIES (default: 5)

Testing

  • reconcile.sh --deployment <id> → verify observed_* updates -- good
  • Tail Redis streams for deployment_observed_update and check computed_status.
  • Restart worker-jobs and confirm cron ticks per interval.

Follow‑ups (not included)

SSE handler + UI templates for computed status.
Manual action endpoints (restart/stop/cleanup).

@zbal zbal requested a review from hunvreus February 5, 2026 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant