Skip to contents

Calculates distances and similarity scores from a set of observations to a single prototype using weighted distance metrics. This function implements the core distance and similarity calculations used in prototype-based classification models.

Usage

compute(data, prototypes, w, g, r = 1L)

Arguments

data

A data frame of binary features (0s and 1s), as returned by make_binary_data. Each row represents an observation and each column represents a binary feature.

prototypes

A list of prototype vectors. Each must be the same length as the number of columns in data and contain only binary values (0 or 1).

w

A numeric vector of attention weights, one for each feature. Must be:

Length

Equal to length(P) and ncol(data)

Values

Non-negative and sum to 1

g

A numeric sensitivity parameter controlling the steepness of the similarity function. Must be non-negative (>= 0). Higher values make the similarity function more sensitive to distances.

r

Integer specifying the distance metric type. Note that this is irrelevant when working with binary data.

1

Manhattan distance (L1 norm)

2

Euclidean distance (L2 norm)

Value

A data frame with nrow(data) rows and two columns:

distance

Numeric vector of weighted distances from each observation to the prototype

similarity

Numeric vector of similarity scores, computed as exp(-g * distance)

A prototypeComputation object containing:

distance

Data frame of distances from each observation to each prototype

similarity

Data frame of similarity scores for each observation to each prototype

probabilities

Data frame of category membership probabilities

data

The original input data

The object also stores the input parameters as attributes.

Details

The function implements a prototype-based categorization model where:

1. **Distance Calculation**: For each observation \(x\) and prototype \(P_j\): $$d(x, P_j) = \sum_{k=1}^{K} w_k |x_k - P_{j,k}|^r$$

2. **Similarity Calculation**: $$s(x, P_j) = \exp(-g_j \cdot d(x, P_j))$$

3. **Probability Calculation**: $$P(C_j|x) = \frac{s(x, P_j)}{\sum_{i=1}^{n} s(x, P_i)}$$