Package 'mlf' reference manual

Title:	Machine Learning Foundations
Description:	Offers a gentle introduction to machine learning concepts for practitioners with a statistical pedigree: decomposition of model error (bias-variance trade-off), nonlinear correlations, information theory and functional permutation/bootstrap simulations. Székely GJ, Rizzo ML, Bakirov NK. (2007). <doi:10.1214/009053607000000505>. Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC. (2011). <doi:10.1126/science.1205438>.
Authors:	Kyle Peterson [aut, cre]
Maintainer:	Kyle Peterson <[email protected]>
License:	GPL-2
Version:	1.2.1
Built:	2025-02-10 03:12:29 UTC
Source:	https://github.com/cran/mlf

Bootstrap Confidence Intervals via Resampling

Description

Provides nonparametric confidence intervals via percentile-based resampling for given mlf function.

Usage

boot(x, y, func, reps, conf.int)
boot(x, y, func, reps, conf.int)

Arguments

`x`, `y`	numeric vectors of data values
`func`	specify `mlf` function
`reps`	(optional) number of resamples. Defaults to 500
`conf.int`	(optional) numeric value indicating level of confidence. Defaults to `0.90`.

Examples

# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)

mlf::mic(a, b)
mlf::boot(a, b, mic)
# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)

mlf::mic(a, b)
mlf::boot(a, b, mic)

Bias-Variance Trade-Off

Description

Provides estimated error decomposition from model predictions (mse, bias, variance).

Usage

bvto(truth, estimate)
bvto(truth, estimate)

Arguments

`truth`	test data vector or baseline accuractruth to test against.
`estimate`	predicted vector

Examples

# Sample data
test <- rnorm(25, 80, 35)
predicted <- rnorm(25, 80, 50)

mlf::bvto(test, predicted)
# Sample data
test <- rnorm(25, 80, 35)
predicted <- rnorm(25, 80, 50)

mlf::bvto(test, predicted)

Distance Correlation

Description

Provides pairwise correlation via distance covariance normalized by distance standard deviation. Allows for non-linear dependencies.

Usage

distcorr(x, y)
distcorr(x, y)

Arguments

x, y

numeric vectors of data values

References

Székely GJ, Rizzo ML, Bakirov NK. Measuring and testing dependence by correlation of distances. Ann Stat. 2007. 35(6):2769-2794.

Examples

# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)

mlf::distcorr(a, b)
# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)

mlf::distcorr(a, b)

Entropy

Description

Estimates uncertainty in univariate probability distribution.

Usage

entropy(x, bins)
entropy(x, bins)

Arguments

`x`	numeric or discrete data vector
`bins`	specify number of bins if numeric or integer data class.

Examples

# Sample numeric vector
a <- rnorm(25, 80, 35)
mlf::entropy(a, bins = 2)

# Sample discrete vector
b <- as.factor(c(1,1,1,2))
mlf::entropy(b)
# Sample numeric vector
a <- rnorm(25, 80, 35)
mlf::entropy(a, bins = 2)

# Sample discrete vector
b <- as.factor(c(1,1,1,2))
mlf::entropy(b)

Bias

Description

Estimates squared bias by decomposing model prediction error.

Usage

get_bias(truth, estimate)
get_bias(truth, estimate)

Arguments

`truth`	test data vector or baseline accuracy to test against.
`estimate`	predicted vector

Examples

# Sample data
test <- rnorm(25, 80, 35)
predicted <- rnorm(25, 80, 50)

mlf::get_bias(test, predicted)
# Sample data
test <- rnorm(25, 80, 35)
predicted <- rnorm(25, 80, 50)

mlf::get_bias(test, predicted)

Mean Squared Error

Description

Estimates mean squared error from model predictions.

Usage

get_mse(truth, estimate)
get_mse(truth, estimate)

Arguments

`truth`	test data vector or baseline accuracy to test against.
`estimate`	predicted vector

Examples

# Sample data
test <- rnorm(25, 80, 35)
predicted <- rnorm(25, 80, 50)

mlf::get_mse(test, predicted)
# Sample data
test <- rnorm(25, 80, 35)
predicted <- rnorm(25, 80, 50)

mlf::get_mse(test, predicted)

Variance

Description

Estimates squared variance by decomposing model prediction error.

Usage

get_var(estimate)
get_var(estimate)

Arguments

estimate

predicted vector

Examples

# Sample data
test <- rnorm(25, 80, 35)
predicted <- rnorm(25, 80, 50)

mlf::get_var(predicted)
# Sample data
test <- rnorm(25, 80, 35)
predicted <- rnorm(25, 80, 50)

mlf::get_var(predicted)

Joint Entropy

Description

Estimated difference between two probability distributions.

Usage

jointentropy(x, y, bins)
jointentropy(x, y, bins)

Arguments

`x`, `y`	numeric or discrete data vectors
`bins`	specify number of bins

Examples

# Sample numeric vector
a <- rnorm(25, 80, 35)
b <- rnorm(25, 90, 35)
mlf::jointentropy(a, b, bins = 2)

# Sample discrete vector
a <- as.factor(c(1,1,2,2))
b <- as.factor(c(1,1,1,2))
mlf::jointentropy(a, b)
# Sample numeric vector
a <- rnorm(25, 80, 35)
b <- rnorm(25, 90, 35)
mlf::jointentropy(a, b, bins = 2)

# Sample discrete vector
a <- as.factor(c(1,1,2,2))
b <- as.factor(c(1,1,1,2))
mlf::jointentropy(a, b)

Kullback-Leibler Divergence

Description

Provides estimated difference between individual entropy and cross-entropy of two probability distributions.

Usage

kld(x, y, bins)
kld(x, y, bins)

Arguments

`x`, `y`	numeric or discrete data vectors
`bins`	specify number of bins

Examples

# Sample numeric vector
a <- rnorm(25, 80, 35)
b <- rnorm(25, 90, 35)
mlf::kld(a, b, bins = 2)

# Sample discrete vector
a <- as.factor(c(1,1,2,2))
b <- as.factor(c(1,1,1,2))
mlf::kld(a, b)
# Sample numeric vector
a <- rnorm(25, 80, 35)
b <- rnorm(25, 90, 35)
mlf::kld(a, b, bins = 2)

# Sample discrete vector
a <- as.factor(c(1,1,2,2))
b <- as.factor(c(1,1,1,2))
mlf::kld(a, b)

Mutual Information

Description

Estimates Kullback-Leibler divergence of joint distribution and the product of two respective marginal distributions. Roughly speaking, the amount of information one variable provides about another.

Usage

mi(x, y)
mi(x, y)

Arguments

x, y

numeric or discrete data vectors

Examples

# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)

mlf::mi(a, b)
# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)

mlf::mi(a, b)

Maximal Information Criterion

Description

Information-theoretic approach for detecting non-linear pairwise dependencies. Employs heuristic discretization to achieve highest normalized mutual information.

Usage

mic(x, y)
mic(x, y)

Arguments

x, y

numeric or discrete data vectors

References

Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC. Detecting novel associations in large data sets. Science. 2011. 334(6062):1518-1524.

Examples

# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)

mlf::mic(a, b)
# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)

mlf::mic(a, b)

Permutation Test

Description

Provides nonparametric statistical significance via sample randomization.

Usage

perm(x, y, func, reps)
perm(x, y, func, reps)

Arguments

`x`, `y`	numeric vectors of data values
`func`	specify `mlf` function: (`distcorr` or `mic`).
`reps`	(optional) number of resamples. Defaults to 500.

Examples

# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)

mlf::mic(a, b)
mlf::perm(a, b, mic)
# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)

mlf::mic(a, b)
mlf::perm(a, b, mic)

Package 'mlf'

Help Index

Bootstrap Confidence Intervals via Resampling

Description

Usage

Arguments

Examples

Bias-Variance Trade-Off

Description

Usage

Arguments

Examples

Distance Correlation

Description

Usage

Arguments

References

Examples

Entropy

Description

Usage

Arguments

Examples

Bias

Description

Usage

Arguments

Examples

Mean Squared Error

Description

Usage

Arguments

Examples

Variance

Description

Usage

Arguments

Examples

Joint Entropy

Description

Usage

Arguments

Examples

Kullback-Leibler Divergence

Description

Usage

Arguments

Examples

Mutual Information

Description

Usage

Arguments

Examples

Maximal Information Criterion

Description

Usage

Arguments

References

Examples

Permutation Test

Description

Usage

Arguments

Examples