Calculates distances and similarity scores from a set of observations to a single prototype using weighted distance metrics. This function implements the core distance and similarity calculations used in prototype-based classification models.
Arguments
- data
A data frame of binary features (0s and 1s), as returned by
make_binary_data. Each row represents an observation and each column represents a binary feature.- prototypes
A list of prototype vectors. Each must be the same length as the number of columns in
dataand contain only binary values (0 or 1).- w
A numeric vector of attention weights, one for each feature. Must be:
- Length
Equal to
length(P)andncol(data)- Values
Non-negative and sum to 1
- g
A numeric sensitivity parameter controlling the steepness of the similarity function. Must be non-negative (>= 0). Higher values make the similarity function more sensitive to distances.
- r
Integer specifying the distance metric type. Note that this is irrelevant when working with binary data.
- 1
Manhattan distance (L1 norm)
- 2
Euclidean distance (L2 norm)
Value
A data frame with nrow(data) rows and two columns:
distanceNumeric vector of weighted distances from each observation to the prototype
similarityNumeric vector of similarity scores, computed as
exp(-g * distance)
A prototypeComputation object containing:
distanceData frame of distances from each observation to each prototype
similarityData frame of similarity scores for each observation to each prototype
probabilitiesData frame of category membership probabilities
dataThe original input data
The object also stores the input parameters as attributes.
Details
The function implements a prototype-based categorization model where:
1. **Distance Calculation**: For each observation \(x\) and prototype \(P_j\): $$d(x, P_j) = \sum_{k=1}^{K} w_k |x_k - P_{j,k}|^r$$
2. **Similarity Calculation**: $$s(x, P_j) = \exp(-g_j \cdot d(x, P_j))$$
3. **Probability Calculation**: $$P(C_j|x) = \frac{s(x, P_j)}{\sum_{i=1}^{n} s(x, P_i)}$$