worker: optional PyGhidra back-end for Ghidra 11.4+/12.x (no Jython)
The .py extractor runs fine under PyGhidra in the GUI; only `analyzeHeadless` doesn't init PyGhidra. Add an env-gated CPython path so modern Ghidra works headless: - ghidra.run_extractor_pyghidra(): runs the same GhidraScript via pyghidra.run_script (boots Ghidra in-process, imports+analyses, getScriptArgs()=[out_path]); run_extractor dispatches to it when AMS_USE_PYGHIDRA is set. No script changes needed. - worker image installs pyghidra + sets GHIDRA_INSTALL_DIR; compose exposes AMS_USE_PYGHIDRA (default off). Jython path stays the default and untouched. - README documents both variants (Jython <=11.3.x vs PyGhidra 11.4+/12.x). - test: AMS_USE_PYGHIDRA routes to the PyGhidra back-end (clear error if pkg missing). 35/35 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
22
README.md
22
README.md
@@ -65,20 +65,28 @@ Worker (`docker/worker.Dockerfile`, `eclipse-temurin:21-jdk`) pobiera Ghidrę i
|
|||||||
nadpisz URL realnym wydaniem z [releases NSA](https://github.com/NationalSecurityAgency/ghidra/releases)
|
nadpisz URL realnym wydaniem z [releases NSA](https://github.com/NationalSecurityAgency/ghidra/releases)
|
||||||
(nazwa pliku: `ghidra_<wer>_PUBLIC_<data>.zip`).
|
(nazwa pliku: `ghidra_<wer>_PUBLIC_<data>.zip`).
|
||||||
|
|
||||||
> **Musi to być Ghidra ≤ 11.3.x.** Ekstraktor to skrypt **Pythona (`.py`)**, który Ghidra w trybie
|
> **Domyślnie wymaga Ghidry ≤ 11.3.x.** Ekstraktor to skrypt **Pythona (`.py`)**, który Ghidra
|
||||||
> headless uruchamia przez wbudowanego **Jythona**. Ghidra **11.4+ / 12.x usunęły Jythona** — tam
|
> w headless uruchamia przez wbudowanego **Jythona**. Ghidra **11.4+ / 12.x usunęły Jythona** — tam
|
||||||
> `.py` headless wymaga **PyGhidry** (CPython), której ten obraz nie inicjalizuje, i dostaniesz
|
> `.py` headless przez `analyzeHeadless` nie ruszy (`Ghidra was not started with PyGhidra...`):
|
||||||
> `Ghidra was not started with PyGhidra. Python is not available` (analiza przejdzie, ale post-skrypt
|
> analiza przejdzie, ale post-skrypt nie wyemituje snapshotu. Domyślny `GHIDRA_URL` celuje w 11.2.1.
|
||||||
> nie wyemituje snapshotu). Domyślny `GHIDRA_URL` celuje w 11.2.1 (z Jythonem). Chcesz zostać na 12.x?
|
|
||||||
> Trzeba doinstalować `pyghidra` i odpalać headless przez PyGhidrę — sam skrypt jest CPython-kompatybilny,
|
|
||||||
> więc zadziała, gdy interpreter wstanie (patrz dokumentacja PyGhidra w danej wersji Ghidry).
|
|
||||||
|
|
||||||
|
**Wariant Jython (domyślny, ≤ 11.3.x):**
|
||||||
```bash
|
```bash
|
||||||
docker compose build worker \
|
docker compose build worker \
|
||||||
--build-arg GHIDRA_URL=https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.2.1_build/ghidra_11.2.1_PUBLIC_20241105.zip
|
--build-arg GHIDRA_URL=https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.2.1_build/ghidra_11.2.1_PUBLIC_20241105.zip
|
||||||
docker compose up
|
docker compose up
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Wariant PyGhidra (Ghidra 11.4+ / 12.x):** obraz workera ma już `pyghidra`; ten sam skrypt leci
|
||||||
|
przez CPython (`pyghidra.run_script`, bez zmian w kodzie). Zbuduj z nowszą Ghidrą i włącz przełącznik:
|
||||||
|
```bash
|
||||||
|
docker compose build worker \
|
||||||
|
--build-arg GHIDRA_URL=https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.4.2_build/ghidra_11.4.2_PUBLIC_20250826.zip
|
||||||
|
AMS_USE_PYGHIDRA=1 docker compose up
|
||||||
|
```
|
||||||
|
Pod spodem worker woła `ams.acquire.ghidra.run_extractor_pyghidra` (uruchamia Ghidrę in-process,
|
||||||
|
importuje + analizuje binarkę, odpala nasz GhidraScript z `getScriptArgs()=[out_path]`).
|
||||||
|
|
||||||
### 4. Ekstrakcja ręcznie w GUI Ghidry (alternatywa, bez Dockera)
|
### 4. Ekstrakcja ręcznie w GUI Ghidry (alternatywa, bez Dockera)
|
||||||
|
|
||||||
*Script Manager → Manage Script Directories* → wskaż `ghidra_scripts/`, otwórz program (DLL),
|
*Script Manager → Manage Script Directories* → wskaż `ghidra_scripts/`, otwórz program (DLL),
|
||||||
|
|||||||
@@ -1,13 +1,20 @@
|
|||||||
"""Drive Ghidra's `analyzeHeadless` to run the engine-surface extractor on a DLL.
|
"""Drive Ghidra to run the engine-surface extractor on a DLL.
|
||||||
|
|
||||||
This is the heavy worker step: it imports the binary into a throwaway Ghidra
|
This is the heavy worker step: it imports the binary into a throwaway Ghidra
|
||||||
project, auto-analyses it, then runs `ghidra_scripts/extract_engine_surface.py`
|
project, auto-analyses it, then runs `ghidra_scripts/extract_engine_surface.py`
|
||||||
as a post-script that writes the snapshot JSON to a path we pick.
|
to write the snapshot JSON to a path we pick.
|
||||||
|
|
||||||
Ghidra isn't a Python package, so it must be located on disk. Resolution order:
|
Two back-ends, picked by the `AMS_USE_PYGHIDRA` env var:
|
||||||
1. $GHIDRA_HEADLESS — full path to the analyzeHeadless launcher
|
|
||||||
2. $GHIDRA_HOME/support/analyzeHeadless
|
* default — `analyzeHeadless` runs the script as a post-script via Ghidra's bundled
|
||||||
3. `analyzeHeadless` on PATH
|
**Jython**. Works on Ghidra <= 11.3.x; on 11.4+/12.x Jython is gone and the script
|
||||||
|
silently doesn't run ("Ghidra was not started with PyGhidra").
|
||||||
|
* `AMS_USE_PYGHIDRA=1` — run the same script through **PyGhidra** (CPython) via
|
||||||
|
`pyghidra.run_script`, so modern Ghidra (11.4+/12.x) works. Needs `pip install pyghidra`
|
||||||
|
and Ghidra's dir in `$GHIDRA_INSTALL_DIR` (falls back to `$GHIDRA_HOME`).
|
||||||
|
|
||||||
|
analyzeHeadless resolution order: $GHIDRA_HEADLESS, $GHIDRA_HOME/support/analyzeHeadless,
|
||||||
|
then `analyzeHeadless` on PATH.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
@@ -58,6 +65,9 @@ def run_extractor(
|
|||||||
|
|
||||||
Raises GhidraNotFound if no launcher is configured, GhidraRunError on failure or if
|
Raises GhidraNotFound if no launcher is configured, GhidraRunError on failure or if
|
||||||
the script produced no output."""
|
the script produced no output."""
|
||||||
|
if os.environ.get("AMS_USE_PYGHIDRA"):
|
||||||
|
return run_extractor_pyghidra(dll_path, out_path, script_dir=script_dir)
|
||||||
|
|
||||||
headless = headless or find_headless()
|
headless = headless or find_headless()
|
||||||
if not headless:
|
if not headless:
|
||||||
raise GhidraNotFound(
|
raise GhidraNotFound(
|
||||||
@@ -91,3 +101,38 @@ def run_extractor(
|
|||||||
raise GhidraRunError(
|
raise GhidraRunError(
|
||||||
"extractor produced no snapshot at {0}\n--- headless tail ---\n{1}".format(out_path, tail))
|
"extractor produced no snapshot at {0}\n--- headless tail ---\n{1}".format(out_path, tail))
|
||||||
return out_path
|
return out_path
|
||||||
|
|
||||||
|
|
||||||
|
def run_extractor_pyghidra(dll_path: str, out_path: str, *, script_dir: str | None = None) -> str:
|
||||||
|
"""Run the extractor through PyGhidra (CPython) instead of analyzeHeadless/Jython.
|
||||||
|
|
||||||
|
`pyghidra.run_script` boots Ghidra in-process, imports + auto-analyses the binary, and
|
||||||
|
executes our GhidraScript with `getScriptArgs() == [out_path]` - the same script, just under
|
||||||
|
CPython, so it works on Ghidra 11.4+/12.x where Jython is gone."""
|
||||||
|
os.environ.setdefault("GHIDRA_INSTALL_DIR", os.environ.get("GHIDRA_HOME", ""))
|
||||||
|
try:
|
||||||
|
import pyghidra
|
||||||
|
except ImportError:
|
||||||
|
raise GhidraNotFound(
|
||||||
|
"AMS_USE_PYGHIDRA is set but the 'pyghidra' package isn't installed (pip install pyghidra)")
|
||||||
|
|
||||||
|
script_dir = script_dir or os.environ.get("AMS_GHIDRA_SCRIPTS") or str(_SCRIPT_DIR)
|
||||||
|
script_path = os.path.join(script_dir, _SCRIPT_NAME)
|
||||||
|
out_path = os.path.abspath(out_path)
|
||||||
|
os.makedirs(os.path.dirname(out_path) or ".", exist_ok=True)
|
||||||
|
|
||||||
|
proj_dir = tempfile.mkdtemp(prefix="ams_pyghidra_")
|
||||||
|
try:
|
||||||
|
pyghidra.run_script(
|
||||||
|
dll_path, script_path,
|
||||||
|
project_location=proj_dir, project_name="ams_" + uuid.uuid4().hex[:8],
|
||||||
|
script_args=[out_path], analyze=True, verbose=False,
|
||||||
|
)
|
||||||
|
except Exception as e: # jpype/Ghidra errors aren't a tidy hierarchy
|
||||||
|
raise GhidraRunError("pyghidra.run_script failed: {0}".format(e))
|
||||||
|
finally:
|
||||||
|
shutil.rmtree(proj_dir, ignore_errors=True)
|
||||||
|
|
||||||
|
if not os.path.isfile(out_path):
|
||||||
|
raise GhidraRunError("extractor produced no snapshot at {0} (PyGhidra path)".format(out_path))
|
||||||
|
return out_path
|
||||||
|
|||||||
@@ -59,6 +59,9 @@ services:
|
|||||||
DATABASE_URL: postgresql+psycopg://ams:ams@db:5432/ams
|
DATABASE_URL: postgresql+psycopg://ams:ams@db:5432/ams
|
||||||
REDIS_URL: redis://redis:6379/0
|
REDIS_URL: redis://redis:6379/0
|
||||||
AMS_UPLOAD_DIR: /data/uploads
|
AMS_UPLOAD_DIR: /data/uploads
|
||||||
|
# Set to 1 (e.g. `AMS_USE_PYGHIDRA=1 docker compose up`) to run the extractor through
|
||||||
|
# PyGhidra instead of Jython - needed for Ghidra 11.4+/12.x (build worker with that GHIDRA_URL).
|
||||||
|
AMS_USE_PYGHIDRA: ${AMS_USE_PYGHIDRA:-}
|
||||||
volumes:
|
volumes:
|
||||||
- uploads:/data/uploads
|
- uploads:/data/uploads
|
||||||
depends_on:
|
depends_on:
|
||||||
|
|||||||
@@ -29,6 +29,7 @@ RUN wget -q "$GHIDRA_URL" -O /tmp/ghidra.zip \
|
|||||||
RUN pip3 install --no-cache-dir --upgrade pip setuptools wheel
|
RUN pip3 install --no-cache-dir --upgrade pip setuptools wheel
|
||||||
|
|
||||||
ENV GHIDRA_HOME=/opt/ghidra
|
ENV GHIDRA_HOME=/opt/ghidra
|
||||||
|
ENV GHIDRA_INSTALL_DIR=/opt/ghidra
|
||||||
ENV AMS_GHIDRA_SCRIPTS=/app/ghidra_scripts
|
ENV AMS_GHIDRA_SCRIPTS=/app/ghidra_scripts
|
||||||
ENV AMS_UPLOAD_DIR=/data/uploads
|
ENV AMS_UPLOAD_DIR=/data/uploads
|
||||||
|
|
||||||
@@ -38,7 +39,9 @@ COPY ams ./ams
|
|||||||
COPY ghidra_scripts ./ghidra_scripts
|
COPY ghidra_scripts ./ghidra_scripts
|
||||||
COPY snapshots ./snapshots
|
COPY snapshots ./snapshots
|
||||||
|
|
||||||
RUN pip3 install --no-cache-dir ".[api,acquire,worker]"
|
# pyghidra enables the CPython back-end (set AMS_USE_PYGHIDRA=1) for Ghidra 11.4+/12.x, which
|
||||||
|
# dropped Jython. Harmless when unused; the default Jython path doesn't import it.
|
||||||
|
RUN pip3 install --no-cache-dir ".[api,acquire,worker]" pyghidra
|
||||||
|
|
||||||
# Drain the 'acquire' queue. Shell form so $REDIS_URL expands at runtime.
|
# Drain the 'acquire' queue. Shell form so $REDIS_URL expands at runtime.
|
||||||
CMD rq worker --url "${REDIS_URL:-redis://redis:6379/0}" acquire
|
CMD rq worker --url "${REDIS_URL:-redis://redis:6379/0}" acquire
|
||||||
|
|||||||
@@ -136,6 +136,18 @@ def test_acquire_zip_no_sink(tmp_path, golden_snapshot):
|
|||||||
assert r.imported_id is None and r.sink == "none"
|
assert r.imported_id is None and r.sink == "none"
|
||||||
|
|
||||||
|
|
||||||
|
def test_pyghidra_dispatch_without_dep(tmp_path, monkeypatch):
|
||||||
|
"""AMS_USE_PYGHIDRA routes to the PyGhidra back-end; without the package it fails clearly."""
|
||||||
|
import importlib.util
|
||||||
|
|
||||||
|
from ams.acquire import ghidra
|
||||||
|
if importlib.util.find_spec("pyghidra") is not None:
|
||||||
|
pytest.skip("pyghidra is installed; this exercises the missing-dependency path")
|
||||||
|
monkeypatch.setenv("AMS_USE_PYGHIDRA", "1")
|
||||||
|
with pytest.raises(ghidra.GhidraNotFound, match="pyghidra"):
|
||||||
|
ghidra.run_extractor(str(tmp_path / "x.dll"), str(tmp_path / "out.json"))
|
||||||
|
|
||||||
|
|
||||||
def test_acquire_loose_dll_into_db(tmp_path, golden_snapshot):
|
def test_acquire_loose_dll_into_db(tmp_path, golden_snapshot):
|
||||||
from ams.api.db import configure
|
from ams.api.db import configure
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user