Earlier text-script engines (Piklib 6.1/7.1, added text scripts in 6.1) keep the
type factory on CMC_Scene::resolve, not CMC_ObjectsContainer::resolve — so the
extractor bailed with "resolve not found". find_factory() now tries both anchors.
6.1's factory is also tag-based: each branch is operator==(NAME) -> new(0x74) ->
store tag -> jmp, with the ctor in a separate tag switch (no inline ctor). extract_types
gains a pre-emit: when the next operator== arrives still armed, it records the pending
type by name (size known, ctor/cpp_class not). The 8.x inline-ctor factory clears `armed`
first, so it's untouched (golden pair unchanged).
Per-version reality: 6.1 = 23 types / 0 methods (no prepareMthHashSet yet) / 103 events
/ 80 fields; 7.1 = 26 / 322 / 102 / 86 / 288 dispatch (full); type names line up across
6.1->7.1->8.x so version diffs work.
- snapshots/PIKLib61 + PIKLIB71 added as golden fixtures (evolution chain)
- tests/test_versions.py: 6.1 partial surface, 7.1 full, 61->71 diff -> 38/38
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The .py extractor runs fine under PyGhidra in the GUI; only `analyzeHeadless`
doesn't init PyGhidra. Add an env-gated CPython path so modern Ghidra works headless:
- ghidra.run_extractor_pyghidra(): runs the same GhidraScript via pyghidra.run_script
(boots Ghidra in-process, imports+analyses, getScriptArgs()=[out_path]); run_extractor
dispatches to it when AMS_USE_PYGHIDRA is set. No script changes needed.
- worker image installs pyghidra + sets GHIDRA_INSTALL_DIR; compose exposes
AMS_USE_PYGHIDRA (default off). Jython path stays the default and untouched.
- README documents both variants (Jython <=11.3.x vs PyGhidra 11.4+/12.x).
- test: AMS_USE_PYGHIDRA routes to the PyGhidra back-end (clear error if pkg missing).
35/35 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Ghidra 11.4+/12.x dropped the bundled Jython, so the .py extractor fails headless
with "Ghidra was not started with PyGhidra. Python is not available" — analysis
succeeds but the post-script never runs, so no snapshot is produced. Default
GHIDRA_URL now points at 11.2.1 (Jython); README documents the constraint and the
PyGhidra path for staying on 12.x. Keeps the local Dockerfile fixes (pip upgrade,
non-editable install).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Resolves the method-id gaps surfaced by the dispatch axis, all real switch-shape
edge cases rather than numbering bugs:
- default holes: ids the runner doesn't implement route to the `JA default` block
(tail-call to base CMC_Runner::run); capture that target and drop those cases
(was emitting false Sound 5/6, Scene 10-15, Array 26-31)
- sign-extension: high-base switches (CMC_NetPeer id 257+) encode the base as
`LEA/ADD idx, 0xFFFFFEFF` (-257); _s32 sign-extends on both the scalar and the
text path (Ghidra prints big displacements unsigned, small ones signed)
- two-level (byte-indexed) switches: sparse runners (Image) use
`MOVZX r,byte[i+byteTable]` (MSVC8) / `MOV rl,byte[i+byteTable]` (MSVC6) then
`JMP [r*4+ptrTable]`; decode target = ptrTable[byteTable[i]], taking base/count
from the byte-table's index register (differs from the JMP index reg on MSVC6)
- _executable() guard + id clamp: never emit a non-code "case"
Result: Piklib 500 rows / BlooMoo 561, garbage 0, dispatch<->methods consistent.
The lone genuinely-nameless method is CMC_Animo id 14 (a bool getter prepareMthHashSet
doesn't register) - a real engine property, correctly absent from the methods axis.
FUN_ ctor names are not recoverable (no symbols/mangled strings/RTTI in the binary
for FILTER/MOVIE/VECTOR/PATH/FIFO/LIFO/STATICFILTER); cpp_class=None stays.
Snapshots regenerated; 34/34 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Turns the dispatch axis from a binary changed/unchanged into a "how much" measure
of code change — the original goal. ams.normalize compares two body fingerprints
(the ordered leaf-call anchors) with difflib after collapsing consecutive-duplicate
anchors (a load-twice codegen artefact), yielding a 0-100 similarity and the exact
leaves that appeared/vanished.
Every dispatch `changed` entry now carries body={similarity, added, removed}, and the
block carries a summary={shared, identical, changed, mean_similarity}.
Golden pair (cross-compiler): 470 shared bodies, 131 identical, mean 66% similar;
Animo SHOW/HIDE/PAUSE/RESUME come out 100% despite MSVC6 vs MSVC8, LOAD 50% with the
swapped leaves spelled out.
- normalize.py: canonical / body_similarity / body_delta
- diff: _dispatch_diff enriches changed with body + adds summary
- render: METHOD BODIES shows %, leaf delta, summary line
- UI: similarity % + leaf delta + axis summary
- tests: 5 new -> 34/34
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Recovers how a script method id maps to its implementation, the foundation for
body-level normalisation. Each CMC_*_Runner::run is a switch(id) (vtable slot 17);
every case is the method body — inline (MSVC6) or a tail-call to a separate
show()/load() (MSVC8). The extractor parses the jump table at the disassembly
level (Ghidra's decompiler jump-table recovery silently dropped the big runners),
fingerprints each case by its ordered CALL anchors (Class::method / vtbl+0xNN),
and expands thin wrappers one level so MSVC8 lines up with MSVC6.
Validated on the golden pair: Animo SHOW..RESUME (id 1-4) yield identical leaves
(getAnimo + vtbl+0xa0/0xa4/0x4c/0x50) across both compilers. Coverage 30/32
runners; Piklib 475 / BlooMoo 619 dispatch rows.
- extract_engine_surface.py: extract_method_dispatch (schema_version -> 4)
- snapshots regenerated with the method_dispatch axis
- ams: Snapshot.method_dispatch; diff axis keyed (owner,id) on [impl,calls] with
method-name join; render METHOD BODIES section; cli --only dispatch; owner filter
- UI: "Ciała metod" diff axis + browse tab
- tests: body-change unit + cross-compiler vtbl assertion -> 29/29
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Ranks catalogued engine versions by how much of their CMC_* surface they share,
which (unlike a binary fuzzy hash) stays meaningful across compilers — the golden
pair PIKLIB8/MSVC6 vs bloomoodll/MSVC8 scores 85%.
- similarity.py: jaccard, surface_similarity (per-axis + pooled overall),
fuzzy_similarity (ssdeep via ppdeep, secondary signal)
- service.similar_snapshots + GET /snapshots/{id}/similar?min=N (SimilarHit)
- UI: "Podobne wersje" panel in the snapshot browser (overlap bar + ⇄ diff)
- tests: 6 new (jaccard, identical/disjoint, golden pair 0<x<100, fuzzy,
endpoint + min filter) -> 28/28
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a "+ wgraj" control to the sidebar that uploads an ISO/ZIP/DLL to the
acquisition endpoint and tracks the job to completion, then refreshes the
version list so the new snapshot appears without a reload.
- index.html: upload form + #jobs panel in the sidebar
- app.js: submitUpload() (FormData → POST /jobs), pollJobs() (2.5s while any
job is queued/started; finished → load(); failed → inline error)
- style.css: mini-btn / upload form / job rows + queued/started badges
Verified: node --check clean; uvicorn serves /ui assets 200 and GET /jobs.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Brings up the documented target architecture as a docker-compose stack — a
modular monolith with the Ghidra step split into its own async worker.
- worker/: RQ queue (lazy redis import) + run_acquisition task (Job status
queued→started→finished/failed, drives ams.acquire with sink=db)
- Job model + JobOut schema; Snapshot.data is JSONB on Postgres
- POST/GET /jobs: stream an upload to a shared volume, enqueue, poll status
- docker/api.Dockerfile (slim) + docker/worker.Dockerfile (JDK21 + Ghidra
fetched at build, overridable via GHIDRA_URL) + docker-compose.yml
- ghidra.py: AMS_GHIDRA_SCRIPTS override for in-container script path
- pyproject: [worker] extra (rq/redis/psycopg), python-multipart in [api]
- tests: 4 new (task success/failure + endpoint enqueue/503) -> 22/22
Verified: API image builds, container serves /health + /ui + /jobs; compose
config validates. Worker image (downloads ~1 GB Ghidra) not built here.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Closes the chain from a game file to a catalog entry: unpack an ISO/ZIP,
content-identify the engine DLL (CMC_ObjectsContainer marker in RTTI, so a
renamed file is still found), hash it (sha256 + md5 + optional ssdeep via
ppdeep), run Ghidra headless with the extractor, enrich and import the snapshot.
- unpack.py: bsdtar (ISO9660 + ZIP) with a pure-Python zipfile fallback
- identify.py: content-based engine-DLL picker + hashing
- ghidra.py: analyzeHeadless launcher discovery + post-script run
- pipeline.py: orchestration with injectable extract_fn; sink db|http|none
- cli.py: python -m ams.acquire (incl. --identify-only dry run)
- tests: 7 new (forged PE markers + stubbed extractor) -> 18/18
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Static HTML/CSS/JS served by FastAPI (mounted at /ui, / redirects there),
talking to the existing JSON API — no node/npm, no bundler.
- games/versions sidebar with A/B version selectors
- visual 4-axis diff (types/methods/events/fields, +/- struct_layout) with
+/-/~ rows, per-axis counts, class (owner) filter, moved-methods section
- single-snapshot browser (tabs + live filter)
- app.py mounts StaticFiles(html=True) last so API routes win; / -> /ui/
Smoke-tested live on uvicorn: /, /ui/ and assets serve 200; UI wiring drives
the same /games and /diff endpoints verified end-to-end. app.js passes
`node --check`.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Modular-monolith backend over SQLAlchemy (SQLite by default, Postgres-ready
via DATABASE_URL). The full snapshot.json is stored verbatim; diffing reads it
back through the ams.diff engine, so the DB never mirrors the snapshot schema.
- ams.api.db/models/schemas/service : Game 1-N Snapshot, sha256-deduped upsert
- routes: POST/GET /games, POST/GET /snapshots (import, deduped), GET /diff
(?old&new[&owner]) running compute_diff on stored snapshots, /health
- ams.api.importer : bulk CLI loader (python -m ams.api.importer --game ...)
- run: uvicorn ams.api.app:create_app --factory
11 tests pass (6 diff + 5 API via TestClient over the golden pair). Smoke-tested
live on uvicorn: import -> /snapshots -> /diff returns the BlooMoo deltas.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Standalone CLI that diffs two engine-surface snapshots across all four axes,
the foundation the FastAPI/DB layer will sit on.
- ams.snapshot : typed access to a snapshot.json
- ams.diff : keyed set-diff per axis (added/removed/changed) + cross-owner
method-move detection; types keyed by (script_name,
via_module_iface) so the dual MULTIARRAY stays stable;
filter_by_owner for per-class focus
- ams.render : human-readable report (+/-/~), owner-grouped
- ams.cli : python -m ams OLD NEW [--owner C] [--only ...] [--json]
6 tests pass, incl. an integration test over the committed golden pair
(asserts BlooMoo adds GRBUFFER/INTERNET, MOUSE grows 104->128, Animo gains
GETFPS, Animo script fields unchanged).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Ghidra headless post-script (pyghidra/Jython) that extracts the scripting
"surface" of Aidem Media engine DLLs into a versionable snapshot.json, for
diffing engine versions. All four axes validated on the golden pair
(PIKLIB8.dll / MSVC6 vs bloomoodll.dll / MSVC8):
- types : CMC_ObjectsContainer::resolve factory ladder
(script name -> C++ class, ctor, object size; + dispatch_addr,
via_module_iface for the dual MULTIARRAY branch)
- methods : CMC_*_Runner::prepareMthHashSet (name -> id) + inheritance chain
- events : CMC_*::getBehavioursList (ordered per-class list)
- fields : CMC_* ctor -> CMElement::getProperty<T>Value (name + type)
(+ bonus struct_layout: this+offset stores via decompiler P-code)
Extraction rests on semantic anchors (call targets, referenced string
literals, push/immediate operands), never decompiled-C text, so the same
script works across both compilers despite ILT stubs, undefined string
literals, unnamed FUN_ ctors and an MSVC6 inline-strcpy off-by-one.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>