Calculate Distances and Similarities to Multiple Prototypes

Calculates distances and similarity scores from a set of observations to a single prototype using weighted distance metrics. This function implements the core distance and similarity calculations used in prototype-based classification models.

Usage

compute(data, prototypes, w, g, r = 1L)

Arguments

data

A data frame of binary features (0s and 1s), as returned by make_binary_data. Each row represents an observation and each column represents a binary feature.

prototypes

A list of prototype vectors. Each must be the same length as the number of columns in data and contain only binary values (0 or 1).

w

A numeric vector of attention weights, one for each feature. Must be:

Length: Equal to length(P) and ncol(data)
Values: Non-negative and sum to 1

g

A numeric sensitivity parameter controlling the steepness of the similarity function. Must be non-negative (>= 0). Higher values make the similarity function more sensitive to distances.

r

Integer specifying the distance metric type. Note that this is irrelevant when working with binary data.

1: Manhattan distance (L1 norm)
2: Euclidean distance (L2 norm)

Value

A data frame with nrow(data) rows and two columns:

distance: Numeric vector of weighted distances from each observation to the prototype
similarity: Numeric vector of similarity scores, computed as exp(-g * distance)

A prototypeComputation object containing:

distance: Data frame of distances from each observation to each prototype
similarity: Data frame of similarity scores for each observation to each prototype
probabilities: Data frame of category membership probabilities
data: The original input data

The object also stores the input parameters as attributes.

Details

The function implements a prototype-based categorization model where:

1. **Distance Calculation**: For each observation $x$ and prototype $P_j$: $$d(x, P_j) = \sum_{k=1}^{K} w_k |x_k - P_{j,k}|^r$$

2. **Similarity Calculation**: $$s(x, P_j) = \exp(-g_j \cdot d(x, P_j))$$

3. **Probability Calculation**: $$P(C_j|x) = \frac{s(x, P_j)}{\sum_{i=1}^{n} s(x, P_i)}$$