Similar versions: surface-overlap metric + endpoint + UI panel

Ranks catalogued engine versions by how much of their CMC_* surface they share, which (unlike a binary fuzzy hash) stays meaningful across compilers — the golden pair PIKLIB8/MSVC6 vs bloomoodll/MSVC8 scores 85%. - similarity.py: jaccard, surface_similarity (per-axis + pooled overall), fuzzy_similarity (ssdeep via ppdeep, secondary signal) - service.similar_snapshots + GET /snapshots/{id}/similar?min=N (SimilarHit) - UI: "Podobne wersje" panel in the snapshot browser (overlap bar + ⇄ diff) - tests: 6 new (jaccard, identical/disjoint, golden pair 0<x<100, fuzzy, endpoint + min filter) -> 28/28 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 12:33:50 +02:00
parent 30b2b1011e
commit 38be932abc
8 changed files with 275 additions and 1 deletions
--- a/ams/api/schemas.py
+++ b/ams/api/schemas.py
@@ -43,6 +43,13 @@ class GameDetail(GameOut):
    snapshots: list[SnapshotOut] = []


+class SimilarHit(BaseModel):
+    snapshot: SnapshotOut
+    overall: int            # pooled surface-overlap score 0–100
+    fuzzy: int | None       # ssdeep similarity of the raw binary, when available
+    axes: dict              # per-axis {shared, only_a, only_b, score}
+
+
 class JobOut(BaseModel):
    model_config = ConfigDict(from_attributes=True)
    id: int