Skip to contents

Generates correlated binary data. The function first generates multivariate normal data with specified correlations, then transforms it to binary data while preserving the correlation structure. Apparently this is known as a "Gaussian copula" approach.

Usage

make_binary_data(marginals, rho, obs = 1000)

Arguments

marginals

A numeric vector of marginal probabilities for each variable.

rho

A symmetric correlation matrix with dimensions matching the length of marginals

obs

Integer. Number of observations (rows) to generate.

Value

A prototypeData object containing:

Binary data

A data frame with obs rows and length(marginals) columns

params attribute

List containing the original marginals and correlation matrix

Examples

# Generate 8-dimensional correlated binary data
K <- 8
marginals <- rbeta(K, 2, 3)
rho <- rlkjcorr(1, K, eta = 1 / 4)
out <- make_binary_data(marginals, rho)
out
#> 
#> ── Data ──
#> 
#> 1000 obs. of  8 variables:
#>  $ x1: int  1 0 0 0 0 1 0 0 0 0 ...
#>  $ x2: int  1 0 1 0 0 0 1 0 0 1 ...
#>  $ x3: int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ x4: int  0 0 0 1 1 0 0 0 0 1 ...
#>  $ x5: int  0 1 0 1 1 1 1 0 0 1 ...
#>  $ x6: int  1 1 1 0 0 0 1 1 1 1 ...
#>  $ x7: int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ x8: int  0 1 1 0 0 0 0 0 1 0 ...
#> 
#> 
#> ── Parameters ──
#> 
#> ── Marginal Probabilities: 
#>   x1   x2   x3   x4   x5   x6   x7   x8 
#> 0.12 0.47 0.03 0.40 0.56 0.70 0.07 0.34 
#> 
#> 
#> ── Correlation Matrix: 
#>       x1    x2    x3    x4    x5    x6    x7    x8
#> x1  1.00 -0.11  0.20  0.07  0.33  0.32  0.07 -0.19
#> x2 -0.11  1.00  0.63 -0.25 -0.59  0.78  0.09  0.66
#> x3  0.20  0.63  1.00 -0.19 -0.09  0.57  0.54  0.42
#> x4  0.07 -0.25 -0.19  1.00  0.35 -0.56 -0.02 -0.54
#> x5  0.33 -0.59 -0.09  0.35  1.00 -0.45  0.12 -0.52
#> x6  0.32  0.78  0.57 -0.56 -0.45  1.00 -0.14  0.58
#> x7  0.07  0.09  0.54 -0.02  0.12 -0.14  1.00  0.22
#> x8 -0.19  0.66  0.42 -0.54 -0.52  0.58  0.22  1.00