Labels Matching

This is an experimental module from the Xtra Library: https://xtras.amira-avizo.com.

This module aims at quantifying the quality of an object segmentation (instances), comparing a proposal segmentation (Prediction) with a Reference (Ground Truth). It is directly referring to the StarDist matching algorithm [1]

The module computes the confusion matrix [2] for each pair of objects between the ground truth and reference, and an associated score matrix according to the specified matching criterion (Intersection over Union (IoU), Intersection over ground Truth (IoT), or Intersection over Prediction (IoP)- see below). The confusion matrix indicates the counts of overlapping pixels or voxels, for rows corresponding to the ground truth labels and columns corresponding to the prediction labels. The score matrix contains the values for the specified matching criterion.

An optimal matching is then computed, - i.e. the set of matching pairs of ground truth and prediction labels that maximizes the overall matching score. The specified matching threshold is used to favor the scores above threshold when solving the matching, and to mark the corresponding pairs as accepted ("OK" in result tables). Nevertheless, all solved matching pairs are reported in the result tables (best match column in individual measures) regardless of the acceptance status.

Individual label matching quality measures are calculated from pixels or voxels counts for matching label pairs: Jaccard index (IoU), precision (IoP), recall (IoT), F1 score [2]. Note that individual measures do not include row for label 0 (background), while the confusion matrix and the scores matrix have a row for label 0.

Global metrics are calculated from accepted matching label pairs counted as true positive (tp), unmatched prediction labels counted as false positives (fp), and unmatched ground truth labels counted as false negatives (fn). Global metrics include precision, recall, IoU and F1 score based on tp/fp/fn counts, and mean true score, mean matched score, panoptic quality [3].

The resulting global metrics, individual measures, confusion matrix and matching scores matrix can be output as tables in a spreadsheet data object.

Individual measures can be also reported in a label analysis data object, allowing to use features such as label seek. For learning about label analysis table features, see tutorial “Getting Started with Advanced Image Processing and Quantitative Analysis”.

Individual measures can be output with rows set either by ground truth label or by prediction labels. This allows to examine result from either perspective, for instance when using Label Seek or Colorize by Measure module with label analysis table.
When individual measure rows are set by prediction label, the associated confusion and score matrices are transposed.

Connections

Input

Description

Image1 (Ground Truth): [required]

Binary or label image of ground truth instances. If a label image is provided (versus a binary image), each label is considered as an object/instance even if it is composed of different connected components

Image2 (Prediction): [required]

Binary or label image of predicted instances. If a label image is provided (versus a binary image), each label is considered as an instance even if it is composed of different connected components

Parameters (Ports)

Parameter

Description

Console:

Opens the Python Script Object console of the module as the active console window.

Mode:

Specify whether the inputs are interpreted as a 3D volume or a stack of 2D images for processing.

Matching Criterion:

For each pair of ground truth and predicted objects, the criterion used to define whether the pair is matching or not.

The proposed criteria are:

-          Intersection over Union (IoU or Jaccard index), used mostly as balanced matching quality criterion.

-          Intersection over ground Truth (IoT or recall), can favor detection at the expense of false positives.

-          Intersection over Prediction (IoP or precision), can favor precision at the expense of false negatives.

Matching Threshold:

A pair of objects is considered as matching, if the Criterion value is above the Threshold. The threshold value can be between 0 (minimal overlap) and 1 (maximal overlap).

Results output:

The module can generate the following outputs:

-          table: the main quality metrics (see description), in an output dataset named with suffix ‘.matchingGT

-          analysis: analysis table with one row per object (of either the ground truth or the prediction), with columns indicating whether the object is matched (in resp. the prediction or ground truth image), the ID of the best match, and the supporting metrics characterizing the overlap with the best matching instance.

-          Confusion: a table indicating the number of pixels overlap between all pairs of objects between the Prediction and Ground Truth.

-          Diff image: a label image that represents, for each pixel, whether it belongs to any object of the ground truth and prediction  - regardless of instances and whether the instances are matched. Using the default colormap, the color code is as follows:

o   Green (value 4): pixel belongs to an object in both ground truth and prediction (true positive)

o   Red (value 3): are false positives

o   Blue (value 2): are false negatives

Analysis Results by:

Only relevant if ‘analysis’ is selected as an output. Each row of the corresponding analysis table(s) will refer to each object of respectively the ground truth, or prediction image.

References

·         [1] Weigert, M., Schmidt, U., Boothe, T., Müller, A., Dibrov, A., Jain, A., ... & Myers, E. W. (2018). "Content-aware image restoration: pushing the limits of fluorescence microscopy." Nature Methods, 15(12), 1090-1097.

o    https://github.com/stardist/stardist