Package 'mlf'

Title: Machine Learning Foundations
Description: Offers a gentle introduction to machine learning concepts for practitioners with a statistical pedigree: decomposition of model error (bias-variance trade-off), nonlinear correlations, information theory and functional permutation/bootstrap simulations. Székely GJ, Rizzo ML, Bakirov NK. (2007). <doi:10.1214/009053607000000505>. Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC. (2011). <doi:10.1126/science.1205438>.
Authors: Kyle Peterson [aut, cre]
Maintainer: Kyle Peterson <[email protected]>
License: GPL-2
Version: 1.2.1
Built: 2024-11-12 03:39:27 UTC
Source: https://github.com/cran/mlf

Help Index


Bootstrap Confidence Intervals via Resampling

Description

Provides nonparametric confidence intervals via percentile-based resampling for given mlf function.

Usage

boot(x, y, func, reps, conf.int)

Arguments

x, y

numeric vectors of data values

func

specify mlf function

reps

(optional) number of resamples. Defaults to 500

conf.int

(optional) numeric value indicating level of confidence. Defaults to 0.90.

Examples

# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)

mlf::mic(a, b)
mlf::boot(a, b, mic)

Bias-Variance Trade-Off

Description

Provides estimated error decomposition from model predictions (mse, bias, variance).

Usage

bvto(truth, estimate)

Arguments

truth

test data vector or baseline accuractruth to test against.

estimate

predicted vector

Examples

# Sample data
test <- rnorm(25, 80, 35)
predicted <- rnorm(25, 80, 50)

mlf::bvto(test, predicted)

Distance Correlation

Description

Provides pairwise correlation via distance covariance normalized by distance standard deviation. Allows for non-linear dependencies.

Usage

distcorr(x, y)

Arguments

x, y

numeric vectors of data values

References

Székely GJ, Rizzo ML, Bakirov NK. Measuring and testing dependence by correlation of distances. Ann Stat. 2007. 35(6):2769-2794.

Examples

# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)

mlf::distcorr(a, b)

Entropy

Description

Estimates uncertainty in univariate probability distribution.

Usage

entropy(x, bins)

Arguments

x

numeric or discrete data vector

bins

specify number of bins if numeric or integer data class.

Examples

# Sample numeric vector
a <- rnorm(25, 80, 35)
mlf::entropy(a, bins = 2)

# Sample discrete vector
b <- as.factor(c(1,1,1,2))
mlf::entropy(b)

Bias

Description

Estimates squared bias by decomposing model prediction error.

Usage

get_bias(truth, estimate)

Arguments

truth

test data vector or baseline accuracy to test against.

estimate

predicted vector

Examples

# Sample data
test <- rnorm(25, 80, 35)
predicted <- rnorm(25, 80, 50)

mlf::get_bias(test, predicted)

Mean Squared Error

Description

Estimates mean squared error from model predictions.

Usage

get_mse(truth, estimate)

Arguments

truth

test data vector or baseline accuracy to test against.

estimate

predicted vector

Examples

# Sample data
test <- rnorm(25, 80, 35)
predicted <- rnorm(25, 80, 50)

mlf::get_mse(test, predicted)

Variance

Description

Estimates squared variance by decomposing model prediction error.

Usage

get_var(estimate)

Arguments

estimate

predicted vector

Examples

# Sample data
test <- rnorm(25, 80, 35)
predicted <- rnorm(25, 80, 50)

mlf::get_var(predicted)

Joint Entropy

Description

Estimated difference between two probability distributions.

Usage

jointentropy(x, y, bins)

Arguments

x, y

numeric or discrete data vectors

bins

specify number of bins

Examples

# Sample numeric vector
a <- rnorm(25, 80, 35)
b <- rnorm(25, 90, 35)
mlf::jointentropy(a, b, bins = 2)

# Sample discrete vector
a <- as.factor(c(1,1,2,2))
b <- as.factor(c(1,1,1,2))
mlf::jointentropy(a, b)

Kullback-Leibler Divergence

Description

Provides estimated difference between individual entropy and cross-entropy of two probability distributions.

Usage

kld(x, y, bins)

Arguments

x, y

numeric or discrete data vectors

bins

specify number of bins

Examples

# Sample numeric vector
a <- rnorm(25, 80, 35)
b <- rnorm(25, 90, 35)
mlf::kld(a, b, bins = 2)

# Sample discrete vector
a <- as.factor(c(1,1,2,2))
b <- as.factor(c(1,1,1,2))
mlf::kld(a, b)

Mutual Information

Description

Estimates Kullback-Leibler divergence of joint distribution and the product of two respective marginal distributions. Roughly speaking, the amount of information one variable provides about another.

Usage

mi(x, y)

Arguments

x, y

numeric or discrete data vectors

Examples

# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)

mlf::mi(a, b)

Maximal Information Criterion

Description

Information-theoretic approach for detecting non-linear pairwise dependencies. Employs heuristic discretization to achieve highest normalized mutual information.

Usage

mic(x, y)

Arguments

x, y

numeric or discrete data vectors

References

Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC. Detecting novel associations in large data sets. Science. 2011. 334(6062):1518-1524.

Examples

# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)

mlf::mic(a, b)

Permutation Test

Description

Provides nonparametric statistical significance via sample randomization.

Usage

perm(x, y, func, reps)

Arguments

x, y

numeric vectors of data values

func

specify mlf function: (distcorr or mic).

reps

(optional) number of resamples. Defaults to 500.

Examples

# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)

mlf::mic(a, b)
mlf::perm(a, b, mic)