Add ISO/ZIP acquisition pipeline (ams.acquire worker)
Closes the chain from a game file to a catalog entry: unpack an ISO/ZIP, content-identify the engine DLL (CMC_ObjectsContainer marker in RTTI, so a renamed file is still found), hash it (sha256 + md5 + optional ssdeep via ppdeep), run Ghidra headless with the extractor, enrich and import the snapshot. - unpack.py: bsdtar (ISO9660 + ZIP) with a pure-Python zipfile fallback - identify.py: content-based engine-DLL picker + hashing - ghidra.py: analyzeHeadless launcher discovery + post-script run - pipeline.py: orchestration with injectable extract_fn; sink db|http|none - cli.py: python -m ams.acquire (incl. --identify-only dry run) - tests: 7 new (forged PE markers + stubbed extractor) -> 18/18 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
22
README.md
22
README.md
@@ -48,6 +48,28 @@ analyzeHeadless <projDir> <projName> -process PIKLIB8.dll \
|
||||
-postScript extract_engine_surface.py "$(pwd)/snapshots/PIKLIB8.snapshot.json"
|
||||
```
|
||||
|
||||
## Akwizycja — ISO/ZIP → katalog
|
||||
|
||||
Worker, który domyka łańcuch od *pliku gry* do wpisu w katalogu: rozpakowuje archiwum,
|
||||
**sam znajduje DLL silnika** (po markerach w binarce — `CMC_ObjectsContainer` w RTTI —
|
||||
więc działa nawet po zmianie nazwy pliku), liczy hashe (sha256 + md5 + opcjonalnie ssdeep),
|
||||
odpala Ghidrę headless z ekstraktorem i ląduje snapshotem w bazie.
|
||||
|
||||
```bash
|
||||
pip install -e ".[api,acquire]" # acquire = ppdeep (fuzzy hash, opcjonalny)
|
||||
export GHIDRA_HEADLESS=/path/to/ghidra/support/analyzeHeadless # albo GHIDRA_HOME
|
||||
|
||||
python -m ams.acquire game.iso --game "Reksio i UFO" # ISO/ZIP/katalog/luźny DLL
|
||||
python -m ams.acquire dump_dir --game "Reksio i UFO" --sink http --post http://127.0.0.1:8000
|
||||
python -m ams.acquire PIKLIB8.dll --identify-only # tylko unpack+identify+hash, bez Ghidry
|
||||
```
|
||||
|
||||
`--sink db` (domyślnie) importuje wprost do bazy, `--sink http` POST-uje na `/snapshots`,
|
||||
`--sink none` zostawia sam snapshot. `--identify-only` to suchy bieg do walidacji bez Ghidry.
|
||||
Rozpakowywanie stoi na `bsdtar` (libarchive — czyta i ISO9660, i ZIP); ZIP ma fallback na
|
||||
czysty Python. Snapshot dostaje doklejony blok `binary.acquisition` (źródło, nazwa DLL) oraz
|
||||
`binary.fuzzy/md5/size`.
|
||||
|
||||
## Diff engine (CLI)
|
||||
|
||||
```bash
|
||||
|
||||
Reference in New Issue
Block a user