Title: | Machine Learning Foundations |
---|---|
Description: | Offers a gentle introduction to machine learning concepts for practitioners with a statistical pedigree: decomposition of model error (bias-variance trade-off), nonlinear correlations, information theory and functional permutation/bootstrap simulations. Székely GJ, Rizzo ML, Bakirov NK. (2007). <doi:10.1214/009053607000000505>. Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC. (2011). <doi:10.1126/science.1205438>. |
Authors: | Kyle Peterson [aut, cre] |
Maintainer: | Kyle Peterson <[email protected]> |
License: | GPL-2 |
Version: | 1.2.1 |
Built: | 2024-11-12 03:39:27 UTC |
Source: | https://github.com/cran/mlf |
Provides nonparametric confidence intervals via percentile-based resampling for given mlf
function.
boot(x, y, func, reps, conf.int)
boot(x, y, func, reps, conf.int)
x , y
|
numeric vectors of data values |
func |
specify |
reps |
(optional) number of resamples. Defaults to 500 |
conf.int |
(optional) numeric value indicating level of confidence. Defaults to |
# Sample data a <- rnorm(25, 80, 35) b <- rnorm(25, 100, 50) mlf::mic(a, b) mlf::boot(a, b, mic)
# Sample data a <- rnorm(25, 80, 35) b <- rnorm(25, 100, 50) mlf::mic(a, b) mlf::boot(a, b, mic)
Provides estimated error decomposition from model predictions (mse, bias, variance).
bvto(truth, estimate)
bvto(truth, estimate)
truth |
test data vector or baseline accuractruth to test against. |
estimate |
predicted vector |
# Sample data test <- rnorm(25, 80, 35) predicted <- rnorm(25, 80, 50) mlf::bvto(test, predicted)
# Sample data test <- rnorm(25, 80, 35) predicted <- rnorm(25, 80, 50) mlf::bvto(test, predicted)
Provides pairwise correlation via distance covariance normalized by distance standard deviation. Allows for non-linear dependencies.
distcorr(x, y)
distcorr(x, y)
x , y
|
numeric vectors of data values |
Székely GJ, Rizzo ML, Bakirov NK. Measuring and testing dependence by correlation of distances. Ann Stat. 2007. 35(6):2769-2794.
# Sample data a <- rnorm(25, 80, 35) b <- rnorm(25, 100, 50) mlf::distcorr(a, b)
# Sample data a <- rnorm(25, 80, 35) b <- rnorm(25, 100, 50) mlf::distcorr(a, b)
Estimates uncertainty in univariate probability distribution.
entropy(x, bins)
entropy(x, bins)
x |
numeric or discrete data vector |
bins |
specify number of bins if numeric or integer data class. |
# Sample numeric vector a <- rnorm(25, 80, 35) mlf::entropy(a, bins = 2) # Sample discrete vector b <- as.factor(c(1,1,1,2)) mlf::entropy(b)
# Sample numeric vector a <- rnorm(25, 80, 35) mlf::entropy(a, bins = 2) # Sample discrete vector b <- as.factor(c(1,1,1,2)) mlf::entropy(b)
Estimates squared bias by decomposing model prediction error.
get_bias(truth, estimate)
get_bias(truth, estimate)
truth |
test data vector or baseline accuracy to test against. |
estimate |
predicted vector |
# Sample data test <- rnorm(25, 80, 35) predicted <- rnorm(25, 80, 50) mlf::get_bias(test, predicted)
# Sample data test <- rnorm(25, 80, 35) predicted <- rnorm(25, 80, 50) mlf::get_bias(test, predicted)
Estimates mean squared error from model predictions.
get_mse(truth, estimate)
get_mse(truth, estimate)
truth |
test data vector or baseline accuracy to test against. |
estimate |
predicted vector |
# Sample data test <- rnorm(25, 80, 35) predicted <- rnorm(25, 80, 50) mlf::get_mse(test, predicted)
# Sample data test <- rnorm(25, 80, 35) predicted <- rnorm(25, 80, 50) mlf::get_mse(test, predicted)
Estimates squared variance by decomposing model prediction error.
get_var(estimate)
get_var(estimate)
estimate |
predicted vector |
# Sample data test <- rnorm(25, 80, 35) predicted <- rnorm(25, 80, 50) mlf::get_var(predicted)
# Sample data test <- rnorm(25, 80, 35) predicted <- rnorm(25, 80, 50) mlf::get_var(predicted)
Estimated difference between two probability distributions.
jointentropy(x, y, bins)
jointentropy(x, y, bins)
x , y
|
numeric or discrete data vectors |
bins |
specify number of bins |
# Sample numeric vector a <- rnorm(25, 80, 35) b <- rnorm(25, 90, 35) mlf::jointentropy(a, b, bins = 2) # Sample discrete vector a <- as.factor(c(1,1,2,2)) b <- as.factor(c(1,1,1,2)) mlf::jointentropy(a, b)
# Sample numeric vector a <- rnorm(25, 80, 35) b <- rnorm(25, 90, 35) mlf::jointentropy(a, b, bins = 2) # Sample discrete vector a <- as.factor(c(1,1,2,2)) b <- as.factor(c(1,1,1,2)) mlf::jointentropy(a, b)
Provides estimated difference between individual entropy and cross-entropy of two probability distributions.
kld(x, y, bins)
kld(x, y, bins)
x , y
|
numeric or discrete data vectors |
bins |
specify number of bins |
# Sample numeric vector a <- rnorm(25, 80, 35) b <- rnorm(25, 90, 35) mlf::kld(a, b, bins = 2) # Sample discrete vector a <- as.factor(c(1,1,2,2)) b <- as.factor(c(1,1,1,2)) mlf::kld(a, b)
# Sample numeric vector a <- rnorm(25, 80, 35) b <- rnorm(25, 90, 35) mlf::kld(a, b, bins = 2) # Sample discrete vector a <- as.factor(c(1,1,2,2)) b <- as.factor(c(1,1,1,2)) mlf::kld(a, b)
Estimates Kullback-Leibler divergence of joint distribution and the product of two respective marginal distributions. Roughly speaking, the amount of information one variable provides about another.
mi(x, y)
mi(x, y)
x , y
|
numeric or discrete data vectors |
# Sample data a <- rnorm(25, 80, 35) b <- rnorm(25, 100, 50) mlf::mi(a, b)
# Sample data a <- rnorm(25, 80, 35) b <- rnorm(25, 100, 50) mlf::mi(a, b)
Information-theoretic approach for detecting non-linear pairwise dependencies. Employs heuristic discretization to achieve highest normalized mutual information.
mic(x, y)
mic(x, y)
x , y
|
numeric or discrete data vectors |
Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC. Detecting novel associations in large data sets. Science. 2011. 334(6062):1518-1524.
# Sample data a <- rnorm(25, 80, 35) b <- rnorm(25, 100, 50) mlf::mic(a, b)
# Sample data a <- rnorm(25, 80, 35) b <- rnorm(25, 100, 50) mlf::mic(a, b)
Provides nonparametric statistical significance via sample randomization.
perm(x, y, func, reps)
perm(x, y, func, reps)
x , y
|
numeric vectors of data values |
func |
specify |
reps |
(optional) number of resamples. Defaults to 500. |
# Sample data a <- rnorm(25, 80, 35) b <- rnorm(25, 100, 50) mlf::mic(a, b) mlf::perm(a, b, mic)
# Sample data a <- rnorm(25, 80, 35) b <- rnorm(25, 100, 50) mlf::mic(a, b) mlf::perm(a, b, mic)