Developer reference#
Developer reference for how uz_dataviewer is structured, the design decisions behind it, and how to extend it.
End-user documentation is in Usage guide; packaging is in Build.
Core model#
The viewer is state-driven and command-routed:
A single source of truth,
AppState(state.py). Panels are (almost) pure render functions: each frame they readAppStateand draw it.Every discrete user action goes through the command registry (
commands.py). A command mutatesAppStateand echoes its canonical call to the console.
This model yields four properties:
Scriptability — anything clickable can be typed or replayed (
.uzscript).Live transcript — the console shows what happened, as commands.
Save/restore — sessions serialise to JSON or to a replayable command script.
Testability — app logic is exercised by dispatching command strings; no window required.
Warning
A feature whose only trigger is a direct widget mutation breaks this model. Add a command and have the widget call it.
Frame flow#
immapp.run() (hello_imgui) drives a classic immediate-mode loop.
Per frame:
pre_new_frame() # app.py: poll async loads, focus requests, file dialogs
└─ docking layout renders each DockableWindow's gui_function:
├─ NavigationPanel.render(state) # left: runs + signals + normalize menu
├─ PlotsPanel.render(state) # center: subplot grid
├─ FftPanel.render(state) # center tab: FFT window
├─ HistogramPanel.render(state) # center tab: Histogram window
├─ NodesPanel.render(state) # center tab: node canvas
└─ Console.render(state) # bottom: log + command input
Immediate mode has no retained widget state, so anything that must persist across frames lives in AppState/SubplotCell.
Transient “do this once next frame” requests use a pending-flag pattern (cell.fit_pending, cell.pending_x_lim, cfg.compute_requested): the renderer consumes and clears the flag.
Module map#
Module |
Responsibility |
Key types |
|---|---|---|
|
Docking layout, |
|
|
GUI-free façade for headless/library use (load, FFT, node transforms) — see Python library |
|
|
The single app state, grid/cells, analysis configs, async load orchestration |
|
|
Command registry, parser, dispatcher, the ~60 built-in commands |
|
|
Generates the docs command reference (CSV) from the live registry — see Generated command reference (command_doc.py) |
|
|
Bottom console: scrolling selectable log + command input/completion/history |
|
|
Loaded data + per-log time normalization |
|
|
CSV (Arrow) / Parquet loading, header cleaning, delimiter sniffing; lean large-log loading (float32-at-parse, CSV size guard, streaming Parquet, CSV→Parquet |
|
|
Range-aware decimation: min/max pyramid + one-shot (pure NumPy) |
|
|
GUI-free transforms (FFT) |
|
|
GUI-free node transforms (math, FIR filter), pure NumPy |
|
|
Dataflow graph + on-demand evaluation → derived runs; the transform registry |
|
|
Load external transform-node plugins ( |
|
|
JSON save/restore, |
|
|
Browser integration (file input, array load, downloads) |
|
|
Left tree, drag sources, file dialog, normalize menu |
|
|
Subplot grid: types, cursors, spy, axis linking, secondary axis, export |
|
|
Shared base for analysis windows (sources, follow-window, compute, stale, zoom, export) |
|
|
FFT window |
|
|
Histogram window |
|
|
Node canvas ( |
|
Data model#
A ``Run`` is one loaded file: a shared
timeaxis (float64) plus one ``Signal`` per channel (yasfloat32, contiguous, ready for ImPlot and the pyramid).``DataRegistry`` owns runs and hands out stable integer ids;
SignalRef = (run_id, name)identifies a signal everywhere.Time normalization is per-log:
Run.set_time_origin(target)keeps the originaltime_rawand derivestime = time_raw - time_raw[0] + target. It is reversible (target=Nonerestores raw) and re-derives only on change, never per frame.
Channel headers like CH8=8)ia are cleaned to ia by parse_channel_name, which also detects a trailing unit token (_rpm, _us, …) for axis labels.
Loading & large logs (loader.py)#
load_file dispatches by extension through parse_file → parse_csv / parse_parquet, each returning a ParsedRun (plain arrays, no registry touch) that is turned into a Run via ParsedRun.register (DataRegistry.add_run).
On native, loads run on a ThreadPoolExecutor: the worker only parses into a ParsedRun, and the main thread registers it in state.poll_pending_loads, so DataRegistry is only ever mutated from one thread (see Native vs web (webbridge.py)).
The loader keeps the load-time memory peak near the resident dataset size, so ~100M-point logs open without OOM:
float32 at parse, no intermediate copy.
_csv_column_typesfeeds ArrowConvertOptions(column_types=…)so channels parse straight tofloat32(time staysfloat64), avoiding the float64→float32 re-cast that doubled channel memory._named_signalsshares one naming/dedup pass across every load path.Streaming CSV. For files at/above
CSV_STREAM_MIN_BYTES,_parse_csv_streamingreads via Arrow’sopen_csvbatch reader into preallocated numpy arrays (grown geometrically since a CSV’s row count isn’t known up front),include_columnsprojecting to just the named columns. Peak ≈ resident + one batch (~1.5×) instead of the bulkread_csvpeak (~4× resident, measured). Small CSVs keep the fast multithreaded bulk path.Releasing Arrow’s pool. After every parse,
parse_filecalls_release_arrow_pool(pa.default_memory_pool().release_unused()): Arrow otherwise retains freed parse scratch rather than returning it to the OS, pinning RSS at the parse high-water mark (measured ~1.2 GB reclaimed on the 9M-row log, ~10 GB on the 75M).CSV size guard.
_guard_csv_sizeestimates the resident footprint from file size × column count and refuses a CSV aboveMAX_CSV_NUMERIC_BYTES. Now that large CSVs stream, this is a resident-RAM ceiling (the full record must fit in memory), not a bulk-parse guard; it routes the user toconvert(...)or fewer channels.Streaming Parquet. For files above
PARQUET_STREAM_MIN_ROWS,_parse_parquet_streamingreads the exact row count fromParquetFile.metadata.num_rows, preallocates the output arrays, and fills them withiter_batches(peak ≈ resident + one row group, ~1.5× measured) instead of materialising the whole table. Small Parquet keeps the simple bulk path.CSV→Parquet converter.
convert_csv_to_parquetstream-converts a CSV to Parquet with bounded memory (per-batchParquetWriter.write_batch), dropping JavaScope’s empty trailing column and preserving raw headers so the result round-trips through the same name parsing. Exposed as the scriptableconvert(src, [dst])command and theuz-dataviewer convertCLI.
The data stays in RAM, so FFT / histogram / node transforms operate on the full record. Lifting the in-RAM bound entirely (out-of-core, web’s hard ~4 GB case) is deferred — see Large logs in the web build (out-of-core).
The command layer#
CommandRegistry holds Command``s (a name, typed ``Param``s, a handler, help).
The grammar is a single function call, ``name(arg, arg, ...) (parse_call).
Param kinds drive both coercion and canonical formatting:
int,float,bool,str,plot(plot_1``↔index), ``run(run_2or a file label),fwindow(custom/full/plot_Nfor analysis windows).Three entry points:
execute(state, name, args)— used by UI handlers; coerces, runs, echoes the canonical call.dispatch(state, text)— used by the console/scripts; parses then executes; errors go to the console.echo(state, name, values)— logs a command without running it. Used for continuous gestures (zoom, cursor/spy drag) that ImPlot has already applied — the echo settles once on mouse-up rather than firing every frame.
Run resolution by label lets scripts say
add_signal(plot_1, Log.csv, ia)and survive run-id reassignment after a reload.
Panels never mutate AppState for a user action directly; they call state.commands.execute(...) (often via a small self._emit wrapper that logs errors).
This invariant keeps everything scriptable.
Generated command reference (command_doc.py)#
The command table in Usage guide is generated from the live registry, not hand-maintained.
The commands are closures registered inside register_builtins and the _register_analysis_common / _register_node_commands factories — built from name f-strings (fft_remove, hist_remove are one closure parameterised by prefix), so there is no module-level symbol for sphinx.ext.autodoc to document.
command_doc.build_csv instead instantiates a CommandRegistry and walks reg.all(), taking the name, cmd.help (description), cmd.params (rendered name:kind) and the handler’s source location (handler.__code__.co_filename/co_firstlineno → commands.py:577) — that source column is exactly what makes an otherwise-unfindable closure locatable.
Grouping lives here, not in the registry.
group_formaps each command to a section (GROUP_ORDER) by name prefix (fft_*/hist_*/node_*) or an explicit table for the rest, and raises for any unassigned command. (helplists commands flat; the curated grouping is documentation-only.)Committed, not built at docs time. The output
uz_dataviewer_commands.csvsits besideuz_dataviewer_usage.rstand is included via.. csv-table:: :file:, so the Sphinx build stays hermetic. Regenerate withpython -m uz_dataviewer.command_doc.Drift-guarded.
tests/test_command_doc.pyre-renders in memory and fails if the committed CSV is stale or a new command has no group — so editing a command (or even shifting its line) without regenerating breaks CI.
Plots panel internals (panels/plots.py)#
A runtime grid of SubplotCell``s.
Cell types: ``line / scatter / stairs (time series) and xy (one signal vs another).
FFT and Histogram are separate windows, not cell types.
Per time-series cell, each frame:
Compute the data extent and apply any pending axis limits (
fit_pending,pending_x_lim,pending_y_lim) withCond_.Always.Linked X: the hovered cell becomes the
_driverand publishes its X range tostate.shared_x; followers lock to it. Every time-series cell also publishes its own range tostate.plot_x_ranges[plot_n](used by analysis windows’ “follow plot_N”).Downsample each signal to ~``max_points`` over the visible window (
decimate_range, fed byvisible_slice) — this keeps multi-GB logs interactive. XY cells are the exception: they decimate by plain uniform stride (_xy_stride), not a min/max envelope, so a busy Lissajous figure can alias (acceptable for the phase-portrait case).Draw it (line/scatter/stairs), optionally with per-sample markers (
Spec.marker), routing the left/right (secondaryImAxis_.y2) axis per signal.Cursors (two
drag_line_x) and spy (drag_rect+ acanvas_onlyinset) draw on top. Their readouts and rectangles are recomputed only on move.Zoom echo: when a pan/zoom settles,
echo set_x_lim(plot_n, ...).
Note
Cursor performance. Cursor readouts originally used np.interp, which up-casts the entire float32 signal to float64 on every call (~150 ms/frame on a 5 M-point log).
It was replaced with an O(log n) searchsorted lookup (_value_at) that touches only the two bracketing samples.
Analysis windows (panels/analysis.py)#
FFT and Histogram share AnalysisPanel.
Subclasses set a command prefix (fft/hist) and a compute_command, then implement _compute, _draw, _options_key, and any _extra_controls.
The base provides:
A sources list fed by drag-drop (
*_sourcecommand) and the legend “Remove” (*_remove).A time-window selector (
follow_combo):Custom(0),Full(-1), orplot_N(≥1), resolved byAppState.resolve_x_window. Switching to a plot computes immediately; switching to Custom/Full does not (the window can be the whole log).On-demand compute only — never per frame. A compute happens on the Compute button, on a drag-in, or once when an input changes (
compute_requested). Following a plot and then panning it goes stale rather than recomputing.A stale indicator: a
_computed_keysnapshot (sources + window + options) is compared to the current key each frame; mismatch shows “⚠ stale - press Compute”.Every control routed through commands (
fft_follow,fft_remove_dc,hist_bins,*_xlimon zoom,*_export, …) and Export (OS dialog on desktop, browser download on web).
Node graph (nodes.py, transforms.py, panels/nodes.py)#
A small dataflow engine, built as a derived-signal factory: a transform node, when evaluated, materializes its result as a new run in the registry (Run.derived=True) via AppState.upsert_derived_run.
That derived signal then flows through the existing app unchanged — it shows in Navigation and is draggable into plots / FFT / Histogram.
Nodes produce data; nothing else in the app needs to know they exist.
Graph (
NodeGraph) ofNode``s: a **source** wraps a ``(run, signal)ref; a transform (fft/math/filter/shift) pulls arrays from its inputs and computes. Links are validated against cycles; ids are persisted so derived-run labels (= node name) are stable across save/restore.On-demand evaluation (
evaluate): topological order; each transform reads its inputs (a source from the registry, an upstream transform from its cache), computes viatransforms/analysis, bumps aversion, and upserts its derived run in place (so plot references survive a re-eval). Per-node errors are logged and skipped, not fatal. A node is stale (is_stale) when its key — params + each input’s version/identity — differs from the last evaluated key (the same idea as the analysis windows’_computed_key).GUI-free + scriptable:
nodes.py/transforms.pyimport no GUI and raise plainValueError``s, so the whole engine is driven and tested from command strings. The canvas (``panels/nodes.py,imgui_node_editor) is a thin layer that only issues ``node_*`` commands — drag-drop →node_source, link →node_link, a widget →node_set, a drag →node_pos, the button →node_eval. So the graph round-trips through the console,.uzscript, and the JSON session.Transforms are pure NumPy (windowed-sinc FIR filter, not SciPy) so native and web share one code path. FFT reuses
analysis.compute_fft; its derived run’s x-axis is frequency.Registry-driven. Every transform (builtin or plugin) is a
TransformSpecinREGISTRY; the engine looks up available kinds, default params, input arity and the compute function there, so a plugin is indistinguishable from a builtin. External plugins (plugins.py, the@transformdecorator) are loaded at startup from$UZ_DATAVIEWER_PLUGINS/~/.uz_dataviewer/nodes/— both optional; a missing dir, no files, or a broken plugin are all tolerated (the app runs with just the builtins). An unknown kind (plugin not installed) restores as a placeholder that keeps its params/links. See Node plugins.
Note
Intentional v1 limits: binary math needs equal-length inputs (no resampling); a source node pointing at another node’s derived output is a runtime convenience that may not fully survive restore (chaining is meant to go through node_link).
Downsampling (downsample.py)#
Pure NumPy — identical on native and web (the Rust tsdownsample dependency was removed in favour of the pyramid).
Every path produces a min/max envelope at the signals’ true sample positions (so it looks like the waveform, not a blocky bucket-centre grid) and a consistent ~``max_points`` output at every zoom.
Two layers:
One-shot envelope (``_envelope_oneshot``) — splits the visible slice into ~``max_points/2`` near-equal buckets, takes each bucket’s argmin/argmax, and interleaves them in order. Vectorised, O(visible points) — used for windows up to
PYRAMID_MIN_POINTS(1 M), a couple of ms at that size.Multi-resolution pyramid (``Pyramid``) — for larger windows. Built once per signal on first display (lazily, cached on
Signal.pyramid): geometric levels storing the per-bucket argmin/argmax sample indices (64, 512, 4096, …), each aggregated from the previous, so the build is O(n). A query picks the coarsest level that still has ≥ the budget of buckets, then groups those down to the budget keeping each group’s true extreme — O(output).
Note
Hitting the budget exactly. Both layers group their candidate buckets down to exactly max_points/2 via _reduce_to — it assigns the k candidates to max_points/2 near-equal groups (grp = arange(k)*target//k) and keeps each group’s true extreme with one lexsort.
This replaced an integer group factor (ceil(k/target)), which could only halve/third/… the count and so realised as few as ~50 % of the budget at certain zooms (e.g. a 5 M-sample window landed on ~1 220 of 2 000 points).
The output is now a stable ~``max_points`` at every zoom and on every path.
_reduce_to only ever runs on an already-bucketed array (≲ target × factor), so the query stays O(output); the pyramid build is untouched (still a fixed-stride O(n) pass).
Storing indices (not values) lets the envelope use true X/Y read from the run’s time/y arrays, so it is also normalization-agnostic.
Measured (single float32 signal, max_points=10 000, one core; reproduce with a short Pyramid.build / decimate_range timing loop):
Signal |
Build (once) |
Query/frame (full) |
Query/frame (zoomed) |
Pyramid memory |
|---|---|---|---|---|
10 M pts (40 MB) |
~15 ms |
~0.9 ms |
~0.15 ms |
1.4 MB (3.6 %) |
50 M pts (200 MB) |
~75 ms |
~0.7 ms |
~0.2 ms |
7.1 MB (3.6 %) |
Re-scanning the full 50 M extent every frame (the one-shot envelope, the work the pyramid avoids) costs ~23 ms/frame — the pyramid is ~30× cheaper per frame at the price of a one-time build and ~3.6 % memory. The output stays ~``max_points`` whether viewing the whole record or a slice. (Absolute times are hardware-dependent; the structural point is the O(output) vs O(visible) gap.)
The FFT/Histogram windows use the same decimation: a multi-million-point spectrum is range-decimated per frame (AnalysisPanel._plot_decimated) rather than drawn in full, and a histogram is binned once at compute (not re-binned every frame).
Two accepted trade-offs:
The pyramid is built lazily on first display, on the render thread, so the first frame after dropping in a large signal stalls for the build (~75 ms at 50 M). It is a one-off per signal and dwarfed by the file load; building it on the loader thread alongside parsing is noted as future work.
The renderer re-decimates every frame with no result cache. Because a query is O(output) this is cheap even on a static view, and it keeps the code stateless (true to immediate mode). Caching decimated arrays keyed on
(visible limits, max_points)would trim idle CPU at the cost of a cache to invalidate; simplicity was chosen.
Sessions (session.py)#
Two complementary formats:
JSON snapshot (
save_state/load_state) — the exact view (grid, assignments, plot types, cursors/spy, secondary axes, FFT/Histogram config, time origins, file list). Restore reloads files then resolves assignments by run label (not id).``.uzscript`` (
export_script/run_script) — a human-readable, editable list of commands that reproduces the session when replayed. Normalised loads round-trip asload(path, start).
CSV exports (export_data, export_fft, export_histogram) also live here.
Native vs web (webbridge.py)#
The same Python runs on the desktop and in the browser via Pyodide (CPython→WASM); a single flag, webbridge.IS_WEB, gates the handful of edges that differ.
The UI, command layer, plots, FFT/Histogram, nodes and downsampler are identical on both — the differences are file I/O, threading, the WASM memory ceiling, and plugin loading.
Design-relevant points: loads are async on native and synchronous on web (Pyodide has no worker threads); the renderer never branches on the target because the decimator is pure NumPy (no native dependency to wheel for the browser); and the web build cannot hold a multi-GB log, so a large CSV is stream-parsed into typed arrays and decimated on a memory budget at load rather than materialised whole.
The user-facing comparison is in Limitations of web version; the wasm32 ~4 GB ceiling and the deferred out-of-core fix are analysed in Large logs in the web build (out-of-core).
Web edges (webbridge.py / build/gen_web.py):
File I/O. Native uses
portable_file_dialogsplus a GLFW drop callback; web uses a hidden HTML<input type=file>to open,webbridge.download(a Blob + synthetic anchor click) to save, and a second hidden picker (uzSessionInput) for session uploads, routed by extension inwebbridge.load_uploaded_session(.json→load_state,.uzscript/.txt→run_script).Loading.
wireFileInput(build/gen_web.py) dispatches by type: a CSV ≤FULL_LOAD_LIMIT(200 MB) and any Parquet takeloadFull→ written to the Pyodide FS and parsed by Arrow, same code as native; a larger CSV takesloadStreamingCsv.Streaming CSV decimation (CSV only).
loadStreamingCsvreads theFileblob in chunks (the multi-GB text never sits in WASM memory) and pushes per-column typed arrays intowebbridge.load_columns. It estimates the numeric footprint (estRows × (8 + 4·nDataCols)) from a header sample and, if that exceedsMEM_BUDGET(~1.5 GB, the working budget below the ~4 GB heap ceiling), keeps everystride-th row so the result fits — emitting the “decimated 1:N … use the native app” console note. This safety net is CSV-only: a large Parquet has no equivalent and goes throughloadFullunguarded, so it loads at full resolution if it fits the heap or, if it doesn’t, the oversize allocation silently aborts the whole WASM runtime (“loading does nothing”). The size estimate exists precisely to avoid that abort for the CSV path.Node plugins are native-only.
plugins.load_plugins(called once at startup inapp.py) scans filesystem/env plugin dirs ($UZ_DATAVIEWER_PLUGINS,~/.uz_dataviewer/nodes/) that a browser tab cannot populate, so on web it finds nothing and the node graph runs the builtins innodes.REGISTRYonly. Theload_pluginscommand still exists but has no dirs to read. See Node plugins.Version pins.
PYODIDE_VERSIONandIMGUI_BUNDLE_WHEEL(build/gen_web.py) must be a compatible pair — theimgui_bundlewheel is built against one specific Pyodide release, so bump them together (see the imgui_bundle Pyodide docs). A true single-file, no-server build would require a C++/emscriptenSINGLE_FILErebuild and would drop the Python data stack (Web / WASM (Pyodide)).
Extending the app#
Add a command — register it in register_builtins (commands.py):
def my_action(state, a):
plot_n, value = a
state.cells[plot_n - 1].something = value
reg.add("my_action", [Param("plot", "plot"), Param("value", "float")], my_action, "help text")
Then have the widget call state.commands.execute(state, "my_action", [plot_n, value]) (via the panel’s _emit) on change.
Persist it in session.py if it should be saved.
Add a time-series option — add a field to SubplotCell, a command to toggle it, a checkbox in PlotsPanel._cell_header that emits the command, the rendering in _render_time_series, and the session round-trip.
Add an analysis window — subclass AnalysisPanel, set prefix/compute_command, implement _compute/_draw/_options_key/_extra_controls; register the per-type commands (the common ones come from _register_analysis_common); add a DockableWindow in app.py and a config dataclass + serialisation.
Add a plot type — extend PlotType, branch in PlotsPanel._render_cell, and update session.py if it carries extra per-cell state.
Add a node transform — register a TransformSpec in nodes.py (a compute function + default params + arity), or, for a user-supplied one, ship a plugin file with the @transform decorator (see Node plugins).
Builtins keep bespoke widgets in NodesPanel._params; plugins render from their ParamSpec list automatically.
Commands and save/restore are generic (node_set carries arbitrary params; the graph stores only the kind name), so nothing else needs touching.
Testing#
Tests are pure pytest, no real window.
Two patterns:
Logic via commands — dispatch command strings against an
AppStateand assert on state (e.g.test_commands.py,test_session.py).Headless rendering — create an ImGui/ImPlot context, set a backend-textures flag, and drive a few frames calling
panel.render(state)(e.g.test_plot_types.py). This exercises the real ImPlot calls (begin/end pairing, Spec markers, drag tools) without a GPU window. Destroy the contexts in afinallyso tests don’t leak across modules.
Run pytest from the project root.
Design decisions#
Decision |
Why |
|---|---|
Single |
One source of truth; trivial to serialise and test; no scattered widget state. |
Command layer is the only mutation path |
Scriptability, console transcript, save/restore-by-replay, and headless testing all derive from it. |
Gestures echo, discrete actions execute |
Zoom/drag are continuous; echoing once on settle keeps the console readable while staying replayable. |
Min/max pyramid (pure NumPy) for decimation |
Per-frame cost becomes O(output), not O(visible points) (~0.7 ms vs ~23 ms/frame at
50 M points), so multi-GB logs pan at full fps — built once at first view (~75 ms/50 M),
~3.6 % memory. Replaced the Rust |
XY cells decimate by uniform stride, not min/max |
An XY/phase plot has no monotone time axis to bucket against; stride keeps it simple but can alias a dense Lissajous figure (the one place range-aware downsampling does not apply). |
FFT/Histogram are windows, on-demand |
Per-frame transforms of a large window would tank the frame rate; explicit Compute keeps it snappy. |
Nodes are a derived-signal factory (not a new viewer) |
Transforms emit ordinary runs, so plots/FFT/Histogram consume them unchanged. The
engine is GUI-free and command-driven (scriptable and headless-testable); the
|
|
|
Session refs by run label |
Run ids are reassigned on reload; labels survive. |
Per-log time normalization keeps |
Reversible, cheap (recompute on change only), and signals are never touched. |
Web: stream-parse CSV → arrays |
Avoids holding multi-GB text in 32-bit WASM memory; loads full resolution. |
Pending-flag pattern ( |
The immediate-mode way to express “do X once next frame”. |
Repository layout#
uz_dataviewer/
├── run.py # run from a source checkout (also replays *.uzscript)
├── src/uz_dataviewer/ # modules — see the module map above
│ └── panels/ # navigation, plots, analysis base, fft, histogram, nodes
├── tests/ # pytest (logic via commands + headless rendering)
├── docs/ # USAGE / ARCHITECTURE / BUILD / NATIVE_VS_WEB / PLUGINS / LIBRARY / ROADMAP
└── build/ # native (PyInstaller) + web (Pyodide) build flow