API

Almost all of the measurement-side functionality is implemented in procedural_data.py, and the Measurement class within it.

Analysis side functionality is implemented in dataview.py and dataexplorer.py.

Index

Measurement

procedural_data

Class for storing measurement data.

pdata.procedural_data.run_measurement(get_snapshot, columns, name, data_base_dir='.', dir_name_generator=<function <lambda>>, autosnap=True, snap_diff_filter=<function preprocess_snapshot>, omit_readme=False, omit_input_history=False, omit_notebook_copy=True, compress=True, log_level='inherit')[source]

A simple context manager that runs begin() and end() automatically, and gets an updated snapshot of instrument parameters each time data is added, using get_snapshot().

See docstring of Measurement.__init__ for definition of most arguments.

The directory name for storing the data set is:

<data_base_dir>/<dir_name_generator(name)>

By default, we filter out timestamps added by QCoDeS from snapshot diffs.

If you’re using QCoDeS, your get_snapshot function would typically look like this:

import qcodes.station
with run_measurement(lambda s=qcodes.station.Station.default: s.snapshot(update=True), ...) as m:
  ...
pdata.procedural_data.abort_measurements()[source]

Abort all measurements running on this machine after their next call to add_points(), which is presumably a safe time to abort. Can be called from an independent process running on the same machine.

class pdata.procedural_data.Measurement(columns, target_dir=None, get_snapshot=None, autosnap=True, snap_diff_filter=None, omit_readme=False, omit_input_history=False, omit_notebook_copy=True, compress=True, log_level='inherit')[source]

Writes data to a data table with fixed columns defined during __init__().

The data directory contains the following:

  • tabular_data.dat – Data table with rows added using add_points, and columns defined as arguments of run_measurement.

  • snapshot.json – Instrument parameter snapshot when run_measurement started.

  • snapshot.row-<n>.diff<m>.jsonjsondiff of parameter changes, recorded when the there were <n> data rows in tabular_data.dat. <m> is a simple counter, in case multiple diffs are created for the same row.

  • log.txt – copy of messages from the logging module.

  • input-history – copy of input given to IPython/Jupyter in the current session, up to 500 most recent cells. (optional)

  • A copy of the Jupyter notebook (optional and disabled by default; only in Jupyter Notebook not Lab)

Optionally, the files may be compressed (.gz or .tar.gz).

Although the format is human-readable in the simplest cases, it is meant to be parsed programmatically by the dataview module (analysis/dataview.py).

For more information, see https://pdata.readthedocs.io/en/latest/

add_point(data, snap=None)[source]

Same as add_points but takes scalar inputs for the column values.

add_points(data, snap=None)[source]

Add rows to the data table.

The values for the columns are given as a dicitonary of the form:

{ 'column name 1': <array of values>,
  'column name 2': ... }.

snap=True/False can override the “autosnap” option specified in __init__.

begin()[source]

Creates the data directory, initial snapshot, etc. Must be called before add_points(). The run_measurement() context manager calls this automatically.

end()[source]

Ends the measurement. Adds final metadata and closes and compresses the data set files. The run_measurement() context manager calls this automatically.

write_snapshot(snap=None)[source]

Add a snapshot (delta) file to the data directory. This is called automatically whenever you call add_points(), unless you disabled autosnapping.

Analysis

dataexplorer

Module for quick data visualization helpers.

Note that pdata is not meant to be a fully-featured plotting utility.

pdata.analysis.dataexplorer.basic_plot(base_dir, data_dirs, x, y, xlog=False, ylog=False, zlog=False, slowcoordinate=None, preprocessor=<function <lambda>>, trace_processor=<function <lambda>>, plot_type='line plot', figure=None)[source]

Convenience function for quickly plotting y vs x in a given set of pdata data directories.

data_dirs should be an array of PDataSingle objects or paths, given as strings relative to base_dir. data_dirs can also be a single string or a single PDataSingle object. If base_dir is None, it is ignored.

x, y and slowcoordinate are column names, specified as strings.

The data will be plotted as sweeps based on changing value of slowcoordinate, if specified. A legend entry is also added for each slowcoordinate value. If no slowcoordinate is specified, the plot is divided into sweeps based on the direction of x, and no legend is added.

preprocessor is an optional function applied to the DataView object before plotting. It can be used to, e.g., add virtual columns.

trace_processor is an optional function applied to the x and y values just before each plotted trace. It can be used to e.g. plot only the magnitude of a complex y by specifying lambda x,y: (x,np.abs(y))

Supported values for plot_type:
  • “line plot” –> Plot each trace as a line, with slow value in legend

  • “heatmap” –> Plot a heat map with y in each trace as the color, and slow value as the vertical coordinate

  • None –> Instead of a plot, return { “traces”: [ (xvals, yvals), … ], “slow values”, [ slow value, … ] }

An existing pyplot figure can be optionally specified. It is first cleared.

Returns the created/reused figure object.

pdata.analysis.dataexplorer.call_with_extra_kwargs(f, **kwargs)[source]

Returns results of f(**kwargs), after filtering out parameters from kwargs that f doesn’t accept.

pdata.analysis.dataexplorer.data_selector(base_dir, name_filter='.', age_filter=None, max_entries=30, sort_order='chronological', return_widget=True)[source]

Create an interactive Jupyter selector widget listing at most max_entries data directories located in base_dir, with directory name satisfying the regular expression name_filter.

Data sets last modified more than age_filter seconds ago are filtered out.

sort_order==’chronological’ implies inverse chronological sort order, by last modification date. The other option is ‘alphabetical’.

If return_widget==False, return a list instead.

pdata.analysis.dataexplorer.get_data_mtime(base_dir, data_dir, fallback_value=0)[source]

Get last modification time of data set in <base_dir>/<data_dir>. If the directory appears invalid, return fallback_value.

pdata.analysis.dataexplorer.is_valid_pdata_dir(base_dir, data_dir)[source]

Check whether <base_dir>/<data_dir> is a pdata data set.

pdata.analysis.dataexplorer.monitor_dir(base_dir, x, y, name_filter='.', age_filter=None, xlog=False, ylog=False, slowcoordinate=None, preprocessor=None, trace_processor=<function <lambda>>, plot_type='line plot', selector=<function data_selector>, plotter=<function basic_plot>, ref_data_dirs=[], poll_period=3)[source]

Monitor base_dir for new data matching selector(base_dir, name_filter, age_filter), until interrupted by KeyboardInterrupt.

If new data is found, plot y vs x using plotter(base_dir=None, data_dirs=<array of PDataSingle>, x=x, y=y, …).

The default selector and plotter functions can be overriden. They should accept a subset of the keyword arguments of data_selector() and basic_plot(), respectively.

ref_data_dirs can be used to specify data sets that are always plotted. These should be given as full paths (not relative to base_dir), or as PDataSingle objects.

poll_period specifies how often base_dir is checked for changes. Specified in seconds.

pdata.analysis.dataexplorer.snapshot_explorer(d, max_depth=10, detect_qcodes_params=True)[source]

Graphical dropdown-menu-based helper for creating virtual dimension specifications for DataView d. Alternatively, a single snapshot can be provided as d.

max_depth controls the number of dropdown menus shown.

If detect_qcodes_params==True, a more complete suggestion is provided for selections that seem like QCoDeS parameters.

In the current implementaion, if you call snapshot_explorer in multiple cells, only the most recently created GUI may work properly. This is due to use of snapshot_explorer_globals effectively as a static variable (see code for details).

DataView

Class for post-processing measurement data.

class pdata.analysis.dataview.DataView(data, deep_copy=False, source_column_name='data_source')[source]
Class for post-processing measurement data. Main features are:
  • Concatenating multiple separate data objects

  • Creating “virtual” columns by parsing comments or snapshot files or by applying arbitrary functions to the data

  • Dividing the rows into “sweeps” based on various criteria.

See docs/examples/Procedural Data and DataView.ipynb for example use.

add_virtual_dimension(name, units='', fn=None, arr=None, comment_regex=None, from_set=None, dtype=<class 'float'>, preparser=None, cache_fn_values=True, return_result=False)[source]

Makes a computed vector accessible as self[name]. The computed vector depends on whether fn, arr or comment_regex is specified.

It is advisable that the computed vector is of the same length as the real data columns.

kwargs:

Arguments for specifying how to parse the value:

fn – the function applied to the DataView object, i.e self[name] returns fn(self) arr – specify the column directly as an array, i.e. self[name] returns arr comment_regex – for each row, take the value from the last match in a comment, otherwise np.NaN. Should be a regex string. from_set – for each row, take the value from the corresponding snapshot file. Specify as a tuple that indexes the settings dict (“instrument_name”, “parameter_name”, …).

Other options:

dtype – data type (default: float) preparser – optional preparser function that massages the value before it is passed to dtype cache_fn_values – evaluate fn(self) immediately for the entire (unmasked) array and cache the result return_result – return the result directly as an (nd)array instead of adding it as a virtual dimension

clear_mask()[source]

Unmask all data (i.e. make all data in the initially provided Data object visible again).

column(name, deep_copy=False)[source]

Get the non-masked entries of dimension ‘name’ as a 1D ndarray. name is the dimension name.

kwargs:

deep_copy – copy the returned data so that it is safe to modify it.

comments()[source]

Return the comments parsed from the data files.

Returns tuples where the first item is an index to the first datarow that the comment applies to.

continuous_ranges(masked_ranges=False)[source]

Returns a list of (start,stop) tuples that indicate continuous ranges of (un)masked data.

copy(copy_data=False)[source]

Make a copy of the view. The returned copy will always have an independent mask.

copy_data – whether the underlying data is also deep copied.

data_source()[source]

Returns a list of strings that tell which Data object each of the unmasked rows originated from.

dimensions()[source]

Returns a list of all dimensions, both real and virtual.

divide_into_sweeps(sweep_dimension, use_sweep_direction=None)[source]

Divide the rows into “sweeps” based on a monotonously increasing or decreasing value of column “sweep_dimension”, if use_sweep_direction==True.

If use_sweep_direction==False, sequences of points where “sweep_dimension” stays constant are considered sweeps. This is useful for splitting the data into sweeps based on a slowly varying parameter, e.g. a gate voltage set point that is changed between IV curve sweeps.

If use_sweep_direction is None, this function tries to figure out which one is more reasonable.

Returns a sequence of slices indicating the start and end of each sweep.

Note that the indices are relative to the currently _unmasked_ rows only.

mask()[source]

Get a vector of booleans indicating which rows are masked.

mask_rows(row_mask, unmask_instead=False)[source]

Mask rows in the data. row_mask can be a slice or a boolean vector with length equal to the number of previously unmasked rows.

The old mask is determined from the mask of the first column.

Example:

d = DataView(…) # ignore points where source current exceeds 1 uA. d.mask_rows(np.abs(d[‘I_source’]) > 1e-6)

mask_sweeps(sweep_dimension, sl, unmask_instead=False)[source]

Mask entire sweeps (see divide_into_sweeps()).

sl can be a single integer or any slice object compatible with a 1D numpy.ndarray (list of sweeps).

unmask_instead – unmask the specified sweeps instead, mask everything else

pop_mask()[source]

Pop the topmost mask from the mask stack, set previous mask in the stack as current one and return the popped mask. Raises an exception if trying to pop an empty stack.

push_mask(mask, unmask_instead=False)[source]

Same as mask_rows(), but also pushes the mask to a ‘mask stack’. Handy for temporary masks e.g. inside loops. See also pop_mask().

remove_masked_rows_permanently()[source]

Removes the currently masked rows permanently.

This is typically unnecessary, but may be useful before adding (cached) virtual columns to huge data sets where most rows are masked (because the cached virtual columns are computed for masked rows as well.)

set_mask(mask)[source]

Set an arbitrary mask for the data. Should be a vector of booleans of the same length as the number of data points. Alternatively, simply True/False masks/unmasks all data.

See also mask_rows().

settings()[source]

Return the settings parsed from the settings files.

Returns tuples where the first item is an index to the first datarow that the settings apply to.

single_valued_parameter(param)[source]

If all values in the (virtual) dimension “param” are the same, return that value.

sweeps(sweep_dimension, use_sweep_direction=None)[source]

Generator that returns shallow copies of this DataView with unmasked rows corresponding to sweeps. For more details on the arguments and how the rows are divided into sweeps, see divide_into_sweeps()

to_xarray(values, coords, fill_value=nan, coarse_graining={}, include_single_valued_params=True)[source]

Create an N-dimensional xarray DataSet out of values, where N is equal to the number of coordinates and values are specified as a list of dataview dimension names, or (<data variable name>, f, <units>) tuples where f(self) returns a vector of length equal to the number of unmasked rows in this DataView. Alternatively, values can be a single dimension name. Coordinates are specified as a list of dimension names. Entries of the xarray corresponding to coordinate combinations that don’t exist in this data set are filled with fill_value.

This is well-suited for N-dimensional parameter/coordinate sweeps that were executed with nested for loops in which the looped coordinate values in each loop were selected mostly independent of other coordinates. Otherwise there will be lots of fill_value’s.

Usually, you’ll want to use setpoints, rather than measured values, as coordinates. If a coordinate c is instead a measured value, you probably want to specify coarse graining with coarse_graining={c: <Delta>}, which causes coordinates differing by at most <Delta> to be interpreted as the same coordinate.

Note that if the same coordinate combination is repeated more than once in the data set, only the last measured value will appear in the output xarray. If you want to preserve information about repetitions, add another coordinate for the repetition number.

If include_single_valued_params is True, all single valued parameters will be included as attributes of the xarray.

Spaces, dashes and other special characters in coordinate names are replaced automatically by underscores, as these don’t work well with xarray syntax.

units(d)[source]

Returns the units for dimension d

unmask_sweeps(sweep_dimension, sl)[source]

Mask all rows except the specified sweeps (see divide_into_sweeps()).

sl can be a single integer or any slice object compatible with a 1D numpy.ndarray (list of sweeps).

class pdata.analysis.dataview.PDataSingle(path, convert_timestamps=True, parse_comments=False)[source]

Class for reading in the contents of a single pdata data directory. Almost always passed on to DataView for actual analysis.

data()[source]

DEPRECATED: Return data as a structured numpy array, with column names col0, col1, etc.