core#
Core orchestration for datakit.
Provides Dataset — the user-facing entry point that wraps a
discovered file inventory and materializes it via the source registry.
Failure policy:
- materialize(strict=True) (default) raises on the first error with
(subject, session, task, source, path)context.
materialize(strict=False)continues past errors and (withreturn_errors=True) returns a long-format error DataFrame.validate()runs every cell, never raises, and returns the same long-format DataFrame.
- class mesofield.datakit.core.Dataset[source]#
Bases:
objectDiscovery + materialization for a BIDS-style experiment hierarchy.
- classmethod from_directory(root, *, sources=None, prefer_processed=True, include_task_level=True)[source]#
Discover one or more experiment roots and build a Dataset.
Pass a single path or a sequence of paths; multiple roots are concatenated row-wise.
- property meta: dict#
Provenance metadata for the running datakit package.
The same dictionary is attached to the materialized DataFrame as
df.attrs["datakit"]so it persists through pickle round-trips.
- include(*, subject=None, session=None, task=None, source=None)[source]#
Keep only rows/sources matching the given filters (AND-combined).
Each keyword accepts either a single string or a sequence of strings;
None(the default) means “no constraint on this axis”. All provided filters are combined with logical AND, so adding a keyword narrows the result. Returns a newDataset— the original is unchanged, so calls chain naturally.Examples#
>>> ds.include(subject="STREHAB07") # one subject >>> ds.include(subject=["STREHAB07", "STREHAB08"]) # multiple subjects >>> ds.include(session="ses-05", task="task-widefield") >>> ds.include(source=["dataqueue", "treadmill"]) # drop other sources >>> ds.include(subject="STREHAB07").include(task="task-movies") # chain
- exclude(*, subject=None, session=None, task=None, source=None)[source]#
Drop rows/sources matching the given filters.
Like
include(), every keyword accepts a string or a sequence of strings, and combining keywords narrows what gets removed. Behavior depends on which axes are provided:source only — drop those source columns globally.
row axes only (subject/session/task) — drop matching rows.
both — NaN out only the listed source columns within matching rows; rows and other sources are preserved.
Examples#
>>> ds.exclude(subject="STREHAB07") # drop a subject >>> ds.exclude(source="psychopy") # drop a source globally >>> ds.exclude(session=["ses-01", "ses-02"]) # drop multiple sessions >>> ds.exclude(subject="STREHAB07", source="pupil_dlc") ... # blank pupil_dlc only for STREHAB07; other rows/sources untouched
- head(n=3)[source]#
Return a new
Datasetcontaining only the firstnrows.Convenience for quick tests; equivalent to slicing the inventory with
.iloc[:n]while preserving sources and roots.
- select(subject, session, task=None)[source]#
Return a new
Datasetcontaining exactly one inventory row.Positional shorthand for
include(subject=..., session=..., task=...)intended for the common “give me this one cell” use case. Unlikeinclude(), all arguments must be single strings — for multi-value or partial filtering useinclude()directly:>>> ds.select("STREHAB07", "ses-05", "task-widefield") # one cell >>> ds.include(subject="STREHAB07", session="ses-05") # all tasks for that session
Raises
KeyErrorif no row matches andValueErrorif more than one row matches (e.g. whentaskis omitted on a task-level inventory and multiple tasks exist for the session).
- materialize(*, strict=True, return_errors=False, progress=False)[source]#
Build the materialized DataFrame.
With
strict=True(default) the first error is raised with full(subject, session, task, source, path)context. Withstrict=Falsefailed cells are blanked; passreturn_errors=Trueto also receive the long-format error frame produced byvalidate().
- class mesofield.datakit.core.LoadContext[source]#
Bases:
objectContext object passed to every
DataSource.loadcall.Carries identity (subject/session/task), the inventory row for that cell (so sources can locate sibling files via
path_for()), and any upstream sources that were loaded for the same cell as declared onDataSource.requires.For backward parity with previous releases, when
"dataqueue"is present independencies, the convenience attributesdataqueue_frame,dataqueue_meta,master_timeline, andexperiment_windoware populated from it. New sources should prefer reading fromdependenciesdirectly.- require_path(tag)[source]#
Like
path_for()but raisesFileNotFoundErrorwhen missing.
- get_dependency(tag)[source]#
Return a previously-loaded dependency stream, or None if unavailable.
- Parameters:
tag (str)
- Return type:
LoadedStream | None
- require_dependency(tag)[source]#
Like
get_dependency()but raises if the dependency is missing.- Parameters:
tag (str)
- Return type:
- __init__(subject, session, task, inventory_row, dependencies=<factory>, master_timeline=None, experiment_window=None, dataqueue_frame=None, dataqueue_meta=None)#
- Parameters:
- Return type:
None
- class mesofield.datakit.core.LoadedStream[source]#
Bases:
objectHydrated data stream with timestamps and metadata.
- mesofield.datakit.core.load(root, *, sources=None, prefer_processed=True, include_task_level=True, progress=True, strict=True, return_errors=False)[source]#
One-shot discovery + materialization.
Equivalent to
Dataset.from_directory(root, ...).materialize(...). UseDataset.from_directory()directly when you need to filter (.include/.exclude) before materializing.
- mesofield.datakit.core.load_dataset(path, *, hdf_key='dataset')[source]#
Load a previously materialized dataset back into a DataFrame.
The consumer-side inverse of
Dataset.save(). Reads a.pkl/.pickle(pandas pickle) or.h5/.hdf5(HDF5) artefact and returns the materializedpandas.DataFrame— including thedf.attrsprovenance metadata embedded at save time.Raises#
- FileNotFoundError
If
pathdoes not exist.- ValueError
If the file extension is not a supported dataset format.
- mesofield.datakit.core.load_path(tag, path)[source]#
Ad-hoc single-file load via the registered source for
tag.Builds a minimal
LoadContextso sources withoutrequiresor sibling-path lookups can be exercised directly. Sources declaring dependencies will receiveNonefor them incontext.dependenciesand must either degrade gracefully or raise.- Parameters:
- Return type:
- mesofield.datakit.core.inspect_sources(inventory_or_dataset, sources=None)[source]#
Return a per-source coverage summary for an inventory.
The returned DataFrame is indexed by source tag with columns
present,total,missing, andcoverage(fraction of rows with a non-null path). Accepts either aDatasetor a raw inventory DataFrame.When
sourcesis omitted, every registered tag found in the inventory’s columns is reported.