Containerise: Postgres + Redis/RQ + API + Ghidra worker

Brings up the documented target architecture as a docker-compose stack — a
modular monolith with the Ghidra step split into its own async worker.

- worker/: RQ queue (lazy redis import) + run_acquisition task (Job status
  queued→started→finished/failed, drives ams.acquire with sink=db)
- Job model + JobOut schema; Snapshot.data is JSONB on Postgres
- POST/GET /jobs: stream an upload to a shared volume, enqueue, poll status
- docker/api.Dockerfile (slim) + docker/worker.Dockerfile (JDK21 + Ghidra
  fetched at build, overridable via GHIDRA_URL) + docker-compose.yml
- ghidra.py: AMS_GHIDRA_SCRIPTS override for in-container script path
- pyproject: [worker] extra (rq/redis/psycopg), python-multipart in [api]
- tests: 4 new (task success/failure + endpoint enqueue/503) -> 22/22

Verified: API image builds, container serves /health + /ui + /jobs; compose
config validates. Worker image (downloads ~1 GB Ghidra) not built here.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Patryk Gensch
2026-05-31 12:24:47 +02:00
parent 6797ad5ddb
commit f4aa7caaa9
15 changed files with 511 additions and 3 deletions

View File

@@ -7,10 +7,14 @@ from __future__ import annotations
from datetime import datetime, timezone
from sqlalchemy import ForeignKey, JSON, String, UniqueConstraint
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column, relationship
from .db import Base
# JSONB on Postgres (indexable, typed), plain JSON elsewhere (e.g. SQLite in dev/tests).
_JSON = JSON().with_variant(JSONB, "postgresql")
def _utcnow() -> datetime:
return datetime.now(timezone.utc)
@@ -46,6 +50,33 @@ class Snapshot(Base):
n_fields: Mapped[int] = mapped_column(default=0)
created_at: Mapped[datetime] = mapped_column(default=_utcnow)
data: Mapped[dict] = mapped_column(JSON)
data: Mapped[dict] = mapped_column(_JSON)
game: Mapped["Game | None"] = relationship(back_populates="snapshots")
class Job(Base):
"""An acquisition job: an uploaded archive/DLL handed to the Ghidra worker.
The API row is the durable source of truth (survives Redis); `rq_id` links to the
transient RQ job. The worker walks status queued → started → finished/failed and,
on success, points `snapshot_id` at the catalog entry it produced."""
__tablename__ = "jobs"
id: Mapped[int] = mapped_column(primary_key=True)
rq_id: Mapped[str | None] = mapped_column(String, default=None, index=True)
status: Mapped[str] = mapped_column(String, default="queued", index=True)
source_name: Mapped[str] = mapped_column(String) # original upload filename
source_path: Mapped[str] = mapped_column(String) # path on the shared volume
game_name: Mapped[str | None] = mapped_column(String, default=None)
snapshot_id: Mapped[int | None] = mapped_column(ForeignKey("snapshots.id"), default=None)
dll_name: Mapped[str | None] = mapped_column(String, default=None)
error: Mapped[str | None] = mapped_column(String, default=None)
created_at: Mapped[datetime] = mapped_column(default=_utcnow)
updated_at: Mapped[datetime] = mapped_column(default=_utcnow, onupdate=_utcnow)
snapshot: Mapped["Snapshot | None"] = relationship()