Package 'ridge' reference manual

Title:	Ridge Regression with Automatic Selection of the Penalty Parameter
Description:	Linear and logistic ridge regression functions. Additionally includes special functions for genome-wide single-nucleotide polymorphism (SNP) data. More details can be found in <doi: 10.1002/gepi.21750> and <doi: 10.1186/1471-2105-12-372>.
Authors:	Steffen Moritz [aut, cre] , Erika Cule [aut], Dan Frankowski [aut]
Maintainer:	Steffen Moritz <[email protected]>
License:	GPL-2
Version:	3.3
Built:	2025-03-07 04:34:05 UTC
Source:	https://github.com/steffenmoritz/ridge

ridge-package description

Description

R package for fitting linear and logistic ridge regression models.

Details

This package contains functions for fitting linear and logistic ridge regression models, including functions for fitting linear and logistic ridge regression models for genome-wide SNP data supplied as file names when the data are too big to read into R.

For a complete list of functions, use help(package="ridge").

Author(s)

Steffen Moritz, Erika Cule

Simulated genetic data with a binary phenotypes

Description

Simulated genetic data at 15 SNPs, together with simulated binary phenotypes

Usage

data(GenBin)data(GenBin)

Format

GenBin is a saved R matrix with 500 rows and 15 columns. The first column is the pheotypes and columns 2-15 contain the genotypes. Each row represents an indiviaul. The same data are stored in flat text files in GenBin_genotypes and GenBin_phenotypes (in the directory extdata (in the installed package) or inst/extdata (in the source)).

Source

Simulated using FREGENE

References

Fregene: Simulation of realistic sequence-level data in populations and ascertained samples Chadeau-Hyam, M. et al, 2008, BMC Bioinformatics 9:364

Examples

data(GenBin)
data(GenBin)

Simulated genetic data with continuous outcomes

Description

Simulated genetic data with continuous outcomes.

Usage

data(GenCont)data(GenCont)

Format

GenCont is a saved R matrix with 500 rows and 13 columns. The first column is the pheotypes and columns 2-13 contain the genotypes. Each row represents an indiviaul. The same data are stored in flat text files in GenCont_genotypes and GenCont_phenotypes (in the directory extdata (in the installed package) or inst/extdata (in the source)).

Details

Genotypes were simulated using FREGENE.

References

Fregene: Simulation of realistic sequence-level data in populations and ascertained samples Chadeau-Hyam, M. et al, 2008, BMC Bioinformatics 9:364

Examples

data(GenCont)
data(GenCont)

The Ten-Factor data first described by Gorman and Toman (1966).

Description

A Ten-Factor data set first described by Gornam and Toman (1966) and used by Hoerl and Kennard (1970) (and others) to investigate regression problems.

Usage

data(Gorman)data(Gorman)

Format

Numeric matrix.

Details

The first column is the response on the log scale, the remaining columns are the predictors.

Source

Selection of variables for fitting equations to data. Gorman, J. W. and Toman, R. J. (1966) Technometrics, 8:27.

References

Selection of variables for fitting equations to data. Gorman, J. W. and Toman, R. J. (1966) Technometrics, 8:27. Ridge Regression: Biased estimators for nonorthogonal problems. Hoerl, A. E. and Kennard, R. W. (1970) Technometrics, 12:55.

Examples

data(Gorman)
data(Gorman)

Hald data

Description

The Hald data as used by Hoerl, Kennard and Baldwin (1975). These data are also in package wle.

Usage

data(Hald)data(Hald)

Format

Numeric matrix.

Details

The first column is the response and the remaining four columns are the predictors.

References

Ridge Regression: some simulations, Hoerl, A. E. et al, 1975, Comm Stat Theor Method 4:105

Examples

data(Hald)
data(Hald)

Linear ridge regression.

Description

Fits a linear ridge regression model. Optionally, the ridge regression parameter is chosen automatically using the method proposed by Cule et al (2012).

Usage

linearRidge(formula, data, lambda = "automatic", nPCs = NULL,
scaling = c("corrForm", "scale", "none"), ...)

## S3 method for class 'ridgeLinear'
coef(object, all.coef = FALSE, ...)

## S3 method for class 'ridgeLinear'
plot(x, y = NULL, ...)

## S3 method for class 'ridgeLinear'
predict(object, newdata, na.action = na.pass, all.coef = FALSE, ...)

## S3 method for class 'ridgeLinear'
print(x, all.coef = FALSE, ...)

## S3 method for class 'ridgeLinear'
summary(object, all.coef = FALSE, ...)

## S3 method for class 'summary.ridgeLinear'
print(x, digits = max(3,
getOption("digits") - 3),
signif.stars = getOption("show.signif.stars"), ...) 

linearRidge(formula, data, lambda = "automatic", nPCs = NULL,
scaling = c("corrForm", "scale", "none"), ...)

## S3 method for class 'ridgeLinear'
coef(object, all.coef = FALSE, ...)

## S3 method for class 'ridgeLinear'
plot(x, y = NULL, ...)

## S3 method for class 'ridgeLinear'
predict(object, newdata, na.action = na.pass, all.coef = FALSE, ...)

## S3 method for class 'ridgeLinear'
print(x, all.coef = FALSE, ...)

## S3 method for class 'ridgeLinear'
summary(object, all.coef = FALSE, ...)

## S3 method for class 'summary.ridgeLinear'
print(x, digits = max(3,
getOption("digits") - 3),
signif.stars = getOption("show.signif.stars"), ...)

Arguments

`formula`	a formula expression as for regression models, of the form `response ~ predictors`. See the documentation of `formula` for other details.
`data`	an optional data frame in which to interpret the variables occuring in `formula`.
`lambda`	A ridge regression parameter. May be a vector. If `lambda` is `"automatic"` (the default), then the ridge parameter is chosen automatically using the method of Cule et al (2012).
`nPCs`	The number of principal components to use to choose the ridge regression parameter, following the method of Cule et al (2012). It is not possible to specify both `lambda` and `nPCs`.
`scaling`	The method to be used to scale the predictors. One of `"corrform"`(the default) scales the predictors to correlation form, such that the correlation matrix has unit diagonal. `"scale"`Standardizes the predictors to have mean zero and unit variance. `"none"`No scaling.
`object`	A ridgeLinear object, typically generated by a call to `linearRidge`.
`newdata`	An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used.
`na.action`	function determining what should be done with missing values in `newdata`. The default is to predict `NA`.
`all.coef`	Logical. Should results be returned for all ridge regression penalty parameters (`all.coef = TRUE`), or only for the ridge parameter chosen automatically using the method of Cule et al?
`x`	An object of class `ridgeLinear` (for the `print.ridgeLinear` and `plot.ridgeLinear` functions) or an object of class `summary.ridgeLinear` (for the `print.summary.ridgeLinear` function)
`y`	Dummy argument for compatibility with the default `plot` method. Ignored.
`digits`	minimum number of significant digits to be used for most numbers
`signif.stars`	logical; if `TRUE`, P-values are additionally encoded visually as `significance stars` in order to help scanning of long coefficient tables. It defaults to the `show.signif.stars` slot of `options`.
`...`	Additional arguments to be passed to or from other methods.

Details

If an intercept is present in the model, its coefficient is not penalised. If you want to penalise an intercept, put in your own constant term and remove the intercept.

Value

An object of class "ridgeLinear", with components:

`automatic`	Logical. Was `lambda` chosen automatically?
`call`	The matched call.
`coef`	A named vector of fitted coefficients.
`df`	A vector of degrees of freedom of the model fit, degrees of freedom for variance, and residual degrees of freedom of the fitted model.
`Inter`	Was an intercept included?
`isScaled`	Were the predictors scaled before the model was fitted?
`lambda`	The ridge regression parameter(s).
`scales`	The scales used to standardize the predictors.
`terms`	The `terms` object used.
`x`	The scaled predictor matrix.
`xm`	A vector of means of the predictors.
`y`	The response.
`ym`	The mean of the response.

And optionally the components

`max.nPCs`	The maximum number of principal components for which a ridge regression parameter was computed.
`chosen.nPCs`	The number of principal components used to compute the ridge parameter.

Author(s)

Erika Cule

References

A semi-automatic method to guide the choice of ridge parameter in ridge regression. Cule, E. and De Iorio, M. (2012) arXiv:1205.0686v1 [stat.AP]

Examples

data(GenCont)
mod <- linearRidge(Phenotypes ~ ., data = as.data.frame(GenCont))
summary(mod)
data(GenCont)
mod <- linearRidge(Phenotypes ~ ., data = as.data.frame(GenCont))
summary(mod)

Fits linear ridge regression models for genome-wide SNP data.

Description

Fits linear ridge regression models for genome-wide SNP data. The SNP genotypes are not read into R but file names are passed the code directly, enabling the analysis of genome-wide scale SNP data sets.

Usage

linearRidgeGenotypes(genotypesfilename, phenotypesfilename, lambda = -1, 
thinfilename = NULL, betafilename = NULL, approxfilename = NULL,
permfilename = NULL, intercept =
TRUE, verbose = FALSE)
linearRidgeGenotypes(genotypesfilename, phenotypesfilename, lambda = -1, 
thinfilename = NULL, betafilename = NULL, approxfilename = NULL,
permfilename = NULL, intercept =
TRUE, verbose = FALSE)

Arguments

`genotypesfilename`	character string: path to file containing SNP genotypes coded 0, 1, 2. See `Input file formats`.
`phenotypesfilename`	character string: path to file containing phenotypes. See `Input file formats`.
`lambda`	(optional) shrinkage parameter. If not provided, the default denotes automatic choice of the shrinkage parameter using the method of Cule & De Iorio (2012).
`thinfilename`	(optional) character string: path to file containing three columns: SNP name, chromosme and SNP psotion. See `Input file formats`. (See `details`.)
`betafilename`	(optional) character string: path to file where the output will be written. See `Output file formats`.
`approxfilename`	(optional) character string: path to fine where the approximate test p-values will be written. Approximate p-values are not computed unless this argument is given. Approximate p-values are computed using the method of Cule et al (2011). See `Output file formats`.
`permfilename`	(optional) character string: path to file where the permutation test p-values will be written. Permutation test p-values are not computed unless this argument is given. (See warning). See `Output file formats`.
`intercept`	Logical: Should the ridge regression model be fitted with an intercept? (Defaults to `TRUE`)
`verbose`	Logical: If `TRUE`, additional information is printed to the R output as the code runs. Defaults to `FALSE`.

Details

If a file thin is supplied, and the shrinkage parameter lambda is being computed automatically based on the data, then this file is used to thin the SNP data by SNP position. If this file is not supplied, SNPs are thinned automatically based on number of SNPs.

Value

The vector of fitted ridge regression coefficients. If betafilename is given, the fitted coefficients are written to this file as well as being returned. If approxfilename and/or permfilename are given, results of approximate test p-values and/or permutation test p-values are written to the files given in their arguments.

Input file formats

genotypesfilename:: A header row, plus one row for each individual, one SNP per column. The header row contains SNP names. SNPs are coded as 0, 1, 2 for minor allele count. Missing values are not accommodated. Invariant SNPs in the data cause an error, please remove these from the file before calling the function.
phenofilename:: A single column of phenotypes with the individuals in the same order as those in the file genotypesfilename.
thin:: (optional) Three columns and the same number of rows as there are SNPs in the file genotypesfilename, one row per SNP. First column: SNP names (must match names in genotypesfilename); second column: chromosome; third column: SNP position in BP.

Output file formats

All output files are optional. Whether or not betafilename is provided, fitted coefficients are returned to the R workshpace. If betafilename is provided, fitted coefficients are written to the file specified (in addition).

betafilename:: Two columns: First column is SNP names in same order as in genotypesfilename, second column is fitted coefficients. If intercept = TRUE (the default) then the first row is the fitted intercept (with the name Intercept in the first column).
approxfilename:: Two columns: First column is SNP names in same order as in genotypesfilename, second column is approximate p-values.
permfilename:: Two columns: First column is SNP names in same order as in genotypesfilename, second column is permutation p-values.

Warning

When data are large, the permutation test p-values may take a very long time to compute. It is recommended not to request permutation test p-values (using the argument permfilename) when data are large.

Author(s)

Erika Cule

References

Significance testing in ridge regression for genetic data. Cule, E. et al (2011) BMC Bioinformatics, 12:372 A semi-automatic method to guide the choice of ridge parameter in ridge regression. Cule, E. and De Iorio, M. (2012) arXiv:1205.0686v1 [stat.AP]

Examples

## Not run: 
    genotypesfile <- system.file("extdata","GenCont_genotypes.txt",package = "ridge")
    phenotypesfile <- system.file("extdata","GenCont_phenotypes.txt",package = "ridge")
    beta_linearRidgeGenotypes <- linearRidgeGenotypes(genotypesfilename = genotypesfile,
phenotypesfilename = phenotypesfile)
    ## compare to output of linearRidge
    data(GenCont) ## Same data as in GenCont_genotypes.txt and GenCont_phenotypes.txt
    beta_linearRidge <- linearRidge(Phenotypes ~ ., data = as.data.frame(GenCont))
    cbind(round(coef(beta_linearRidge), 6), beta_linearRidgeGenotypes)

## End(Not run)
  ## Not run: 
    genotypesfile <- system.file("extdata","GenCont_genotypes.txt",package = "ridge")
    phenotypesfile <- system.file("extdata","GenCont_phenotypes.txt",package = "ridge")
    beta_linearRidgeGenotypes <- linearRidgeGenotypes(genotypesfilename = genotypesfile,
phenotypesfilename = phenotypesfile)
    ## compare to output of linearRidge
    data(GenCont) ## Same data as in GenCont_genotypes.txt and GenCont_phenotypes.txt
    beta_linearRidge <- linearRidge(Phenotypes ~ ., data = as.data.frame(GenCont))
    cbind(round(coef(beta_linearRidge), 6), beta_linearRidgeGenotypes)

## End(Not run)

Predict phenotypes from genome-wide SNP data based on a file of coefficients

Description

Predict phenotypes from genome-wide SNP data based on a file of coefficients. Genotypes and fitted coefficients are provided as filenames, allowing the computation of fitted probabilities when SNP data are too large to be read into R.

Usage

linearRidgeGenotypesPredict(genotypesfilename, betafilename, phenotypesfilename = NULL,
verbose = FALSE)
linearRidgeGenotypesPredict(genotypesfilename, betafilename, phenotypesfilename = NULL,
verbose = FALSE)

Arguments

`genotypesfilename`	character string: path to file containing SNP genotypes coded 0, 1, 2. See `Input file formats`.
`betafilename`	character string: path to file containing fitted coefficients. See `Input file formats`.
`phenotypesfilename`	(optional) character string: path to file in which to write out the predicted phenotypes. See `Output file formats`. Whether or not this argument is supplied, the fitted coefficients are also returned by the function.
`verbose`	Logical: If `TRUE`, additional information is printed to the R outupt as the code runs. Defaults to `FALSE`.

Value

A vector of fitted values, the same length as the number of individuals whose data are in genotypesfilename. If phenotypesfilename is supplied, the fitted values are also written there.

Input file formats

genotypesfilename:: A header row, plus one row for each individual, one SNP per column. The header row contains SNP names. SNPs are coded as 0, 1, 2 for minor allele count. Missing values are not accommodated.
betafilename:: Two columns: First column is SNP names in same order as in genotypesfilename, second column is fitted coefficients. If the coefficients include an intercept then the first row of betafilename should contain it with the name Intercept in the first column. An Intercept thus labelled will be used appropriately in predicting the phenotypes. SNP names must match those in genotypesfilename. The format of betafilename is that of the output of linearRidgeGenotypes, meaning linearRidgeGenotypesPredict can be used to predict using coefficients fitted using linearRidgeGenotypes (see the example).

Output file format

Whether or not phenotypesfilename is provided, predicted phenotypes are returned to the R workshpace. If phenotypesfilename is provided, predicted phenotypes are written to the file specified (in addition).

phenotypesfilename:: One column, containing predicted phenotypes, one individual per row.

Author(s)

Erika Cule

References

A semi-automatic method to guide the choice of ridge parameter in ridge regression. Cule, E. and De Iorio, M. (2012) arXiv:1205.0686v1 [stat.AP]

Examples

## Not run: 
genotypesfile <- system.file("extdata","GenCont_genotypes.txt",package = "ridge")
phenotypesfile <- system.file("extdata","GenCont_phenotypes.txt",package = "ridge")
betafile <- tempfile(pattern = "beta", fileext = ".dat")
beta_linearRidgeGenotypes <- linearRidgeGenotypes(genotypesfilename = genotypesfile,
                                                      phenotypesfilename = phenotypesfile,
                                                      betafilename = betafile)
pred_phen_geno <- linearRidgeGenotypesPredict(genotypesfilename = genotypesfile,
                                                    betafilename = betafile)
## compare to output of linearRidge
data(GenCont) ## Same data as in GenCont_genotypes.txt and GenCont_phenotypes.txt
beta_linearRidge <- linearRidge(Phenotypes ~ ., data = as.data.frame(GenCont))
pred_phen <- predict(beta_linearRidge)
print(cbind(pred_phen_geno, pred_phen))
## Delete the temporary betafile
unlink(betafile)

## End(Not run)
## Not run: 
genotypesfile <- system.file("extdata","GenCont_genotypes.txt",package = "ridge")
phenotypesfile <- system.file("extdata","GenCont_phenotypes.txt",package = "ridge")
betafile <- tempfile(pattern = "beta", fileext = ".dat")
beta_linearRidgeGenotypes <- linearRidgeGenotypes(genotypesfilename = genotypesfile,
                                                      phenotypesfilename = phenotypesfile,
                                                      betafilename = betafile)
pred_phen_geno <- linearRidgeGenotypesPredict(genotypesfilename = genotypesfile,
                                                    betafilename = betafile)
## compare to output of linearRidge
data(GenCont) ## Same data as in GenCont_genotypes.txt and GenCont_phenotypes.txt
beta_linearRidge <- linearRidge(Phenotypes ~ ., data = as.data.frame(GenCont))
pred_phen <- predict(beta_linearRidge)
print(cbind(pred_phen_geno, pred_phen))
## Delete the temporary betafile
unlink(betafile)

## End(Not run)

Logistic ridge regression.

Description

Fits a logistic ridge regression model. Optionally, the ridge regression parameter is chosen automatically using the method proposed by Cule et al (2012).

Usage

logisticRidge(formula, data, lambda = "automatic", nPCs = NULL,
scaling = c("corrForm", "scale", "none"), ...)

## S3 method for class 'ridgeLogistic'
coef(object, all.coef = FALSE, ...)

## S3 method for class 'ridgeLogistic'
plot(x, y = NULL, ...)

## S3 method for class 'ridgeLogistic'
predict(object, newdata = NULL, type = c("link", "response"), 
    na.action = na.pass, all.coef = FALSE, ...)

## S3 method for class 'ridgeLogistic'
print(x, all.coef = FALSE, ...)

## S3 method for class 'ridgeLogistic'
summary(object, all.coef = FALSE, ...)

## S3 method for class 'summary.ridgeLogistic'
print(x, digits = max(3, getOption("digits") - 3),
signif.stars = getOption("show.signif.stars"), ...) 

logisticRidge(formula, data, lambda = "automatic", nPCs = NULL,
scaling = c("corrForm", "scale", "none"), ...)

## S3 method for class 'ridgeLogistic'
coef(object, all.coef = FALSE, ...)

## S3 method for class 'ridgeLogistic'
plot(x, y = NULL, ...)

## S3 method for class 'ridgeLogistic'
predict(object, newdata = NULL, type = c("link", "response"), 
    na.action = na.pass, all.coef = FALSE, ...)

## S3 method for class 'ridgeLogistic'
print(x, all.coef = FALSE, ...)

## S3 method for class 'ridgeLogistic'
summary(object, all.coef = FALSE, ...)

## S3 method for class 'summary.ridgeLogistic'
print(x, digits = max(3, getOption("digits") - 3),
signif.stars = getOption("show.signif.stars"), ...)

Arguments

`formula`	a formula expression as for regression models, of the form `response ~ predictors`. See the documentation of `formula` for other details.
`data`	an optional data frame in which to interpret the variables occuring in `formula`.
`lambda`	A ridge regression parameter. If `lambda` is `"automatic"` (the default), then the ridge parameter is chosen automatically using the method of Cule et al (2012).
`nPCs`	The number of principal components to use to choose the ridge regression parameter, following the method of Cule et al (2012). It is not possible to specify both `lambda` and `nPCs`.
`scaling`	The method to be used to scale the predictors. One of `"corrform"`(the default) scales the predictors to correlation form, such that the correlation matrix has unit diagonal. `"scale"`Standardizes the predictors to have mean zero and unit variance. `"none"`No scaling.
`object`	A ridgeLogistic object, typically generated by a call to `linearRidge`.
`newdata`	An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used.
`type`	the type of prediction required. The default predictions are of log-odds (probabilities on logit scale) and `type = "response"` gives the predicted probabilities.
`na.action`	function determining what should be done with missing values in `newdata`. The default is to predict `NA`.
`all.coef`	Logical. Should results be returned for all ridge regression penalty parameters (`all.coef = TRUE`), or only for the ridge parameter chosen automatically using the method of Cule et al?
`x`	An object of class `ridgeLogistic` (for the `print.ridgeLogistic` and `plot.ridgeLogistic` functions) or an object of class `summary.ridgeLogistic` (for the `print.summary.ridgeLogistic` function)
`y`	Dummy argument for compatibility with the default `plot` method. Ignored.
`digits`	minimum number of significant digits to be used for most numbers
`signif.stars`	logical; if `TRUE`, P-values are additionally encoded visually as `significance stars` in order to help scanning of long coefficient tables. It defaults to the `show.signif.stars` slot of `options`.
`...`	Additional arguments to be passed to or from other methods.

Details

If an intercept is present in the model, its coefficient is not penalised. If you want to penalise an intercept, put in your own constant term and remove the intercept.

Value

An object of class "ridgeLogistic", with components:

`automatic`	Was `lambda` chosen automatically?
`call`	The matched call.
`coef`	A named vector of fitted coefficients.
`df`	A vector of degrees of freedom of the model fit and degrees of freedom for variance.
`Inter`	Was in antercept included?
`isScaled`	Were the predictors scaled before the model was fitted?
`lambda`	The ridge regression parameter.
`scales`	The scales used to standardize the predictors.
`terms`	The `terms` object used.
`x`	The scaled predictor matrix.
`xm`	A vector of means of the predictors.
`y`	The response.

And optionally the component

nPCs

The number of principal components used to compute the ridge regression parameter.

Author(s)

Erika Cule

References

A semi-automatic method to guide the choice of ridge parameter in ridge regression. Cule, E. and De Iorio, M. (2012) arXiv:1205.0686v1 [stat.AP]

Examples

data(GenBin)
mod <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin))
summary(mod)
data(GenBin)
mod <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin))
summary(mod)

Fits logistic ridge regression models for genomoe-wide SNP data.

Description

Fits logistic ridge regression models for genome-wide SNP data. The SNP genotypes are not read into R but file names are passed to the code directly, enabling the analysis of genome-wide SNP data sets which are too big to be read into R.

Usage

logisticRidgeGenotypes(genotypesfilename, phenotypesfilename, lambda = -1, 
thinfilename = NULL, betafilename = NULL, approxfilename = NULL,
permfilename = NULL, intercept =
TRUE, verbose = FALSE)
logisticRidgeGenotypes(genotypesfilename, phenotypesfilename, lambda = -1, 
thinfilename = NULL, betafilename = NULL, approxfilename = NULL,
permfilename = NULL, intercept =
TRUE, verbose = FALSE)

Arguments

`genotypesfilename`	character string: path to file containing SNP genotypes coded 0, 1, 2. See `Input file formats`.
`phenotypesfilename`	character string: path to file containing phenotypes. See `Input file formats`.
`lambda`	(optional) shrinkage parameter. If not provided, the default denotes automatic choice of the shrinkage parameter using the method of Cule & De Iorio (2012).
`thinfilename`	(optional) character string: path to file containing three columns: SNP name, chromosme and SNP psotion. See `Input file formats`. (See `details`.)
`betafilename`	(optional) character string: path to file where the output will be written. See `Output file formats`.
`approxfilename`	(optional) character string: path to fine where the approximate test p-values will be written. Approximate p-values are not computed unless this argument is given. Approximate p-values are computed using the method of Cule et al (2011). See `Output file formats`.
`permfilename`	(optional) character string: path to file where the permutation test p-values will be written. Permutation test p-values are not computed unless this argument is given. (See warning). See `Output file formats`.
`intercept`	Logical: Should the ridge regression model be fitted with an intercept? Defaults to `TRUE`.
`verbose`	Logical: If `TRUE`, additional information is printed to the R output as the code runs. Defaults to `FALSE`.

Details

Value

Input file formats

genotypesfilename:: A header row, plus one row for each individual, one SNP per column. The header row contains SNP names. SNPs are coded as 0, 1, 2 for minor allele count. Missing values are not accommodated. Invariant SNPs in the data cause an error, please remove these from the file before calling the function.
phenofilename:: A single column of phenotypes with the individuals in the same order as those in the file genotypesfilename. Phenotypes must be coded as 0 or 1.
thin:: (optional) Three columns and the same number of rows as there are SNPs in the file genotypesfilename, one row per SNP. First column: SNP names (must match names in genotypesfilename); second column: chromosome; third column: SNP position in BP.

Output file formats

betafilename:: Two columns: First column is SNP names in same order as in genotypesfilename, second column is fitted coefficients. If intercept = TRUE (the default) then the first row is the fitted intercept (with the name Intercept in the first column).
approxfilename:: Two columns: First column is SNP names in same order as in genotypesfilename, second column is approximate p-values.
permfilename:: Two columns: First column is SNP names in same order as in genotypesfilename, second column is permutation p-values.

Warning

Author(s)

Erika Cule

References

Examples

## Not run: 
    genotypesfile <- system.file("extdata","GenBin_genotypes.txt",package = "ridge")
    phenotypesfile <- system.file("extdata","GenBin_phenotypes.txt",package = "ridge")
    beta_logisticRidgeGenotypes <-
logisticRidgeGenotypes(genotypesfilename = genotypesfile, phenotypesfilename = phenotypesfile)
    ## compare to output of logisticRidge
    data(GenBin) ## Same data as in GenBin_genotypes.txt and GenBin_phenotypes.txt
    beta_logisticRidge <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin))
    cbind(round(coef(beta_logisticRidge), 6), beta_logisticRidgeGenotypes)

## End(Not run)
  ## Not run: 
    genotypesfile <- system.file("extdata","GenBin_genotypes.txt",package = "ridge")
    phenotypesfile <- system.file("extdata","GenBin_phenotypes.txt",package = "ridge")
    beta_logisticRidgeGenotypes <-
logisticRidgeGenotypes(genotypesfilename = genotypesfile, phenotypesfilename = phenotypesfile)
    ## compare to output of logisticRidge
    data(GenBin) ## Same data as in GenBin_genotypes.txt and GenBin_phenotypes.txt
    beta_logisticRidge <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin))
    cbind(round(coef(beta_logisticRidge), 6), beta_logisticRidgeGenotypes)

## End(Not run)

Predict fitted probabilities from genome-wide SNP data based on a file of coefficients

Description

Predict fitted probabilities from genome-wide SNP data based on a file of coefficients. Genotypes and fitted coefficients are provided as filenames, allowing the computation of fitted probabilities when SNP data are too large to be read into R.

Usage

logisticRidgeGenotypesPredict(genotypesfilename, betafilename,
phenotypesfilename = NULL, verbose = FALSE)
logisticRidgeGenotypesPredict(genotypesfilename, betafilename,
phenotypesfilename = NULL, verbose = FALSE)

Arguments

`genotypesfilename`	character string: path to file containing SNP genotypes coded 0, 1, 2. See `Input file formats`.
`betafilename`	character string: path to file containing fitted coefficients. See `Input file formats`.
`phenotypesfilename`	(optional) character string: path to file in which to write out the fitted probabilities. See `Output file formats`. Whether or not this argument is supplied, the fitted coefficients are also returned by the function.
`verbose`	Logical: If `TRUE`, additional information is printed to the R outupt as the code runs. Defaults to `FALSE`.

Value

A vector of fitted probabilities, the same length as the number of individuals whose data are in genotypesfilename. If phenotypesfilename is supplied, the fitted probabilities are also written there.

Input file formats

genotypesfilename:: A header row, plus one row for each individual, one SNP per column. The header row contains SNP names. SNPs are coded as 0, 1, 2 for minor allele count. Missing values are not accommodated.
betafilename:: Two columns: First column is SNP names in same order as in genotypesfilename, second column is fitted coefficients. If the coefficients include an intercept then the first row of betafilename should contain it with the name Intercept in the first column. An Intercept thus labelled will be used appropriately in predicting the phenotypes. SNP names must match those in genotypesfilename. The format of betafilename is that of the output of linearRidgeGenotypes, meaning linearRidgeGenotypesPredict can be used to predict using coefficients fitted using linearRidgeGenotypes (see the example).

Output file format

Whether or not phenotypesfilename is provided, fitted probabilities are returned to the R workshpace. If phenotypesfilename is provided, fitted probabilities are written to the file specified (in addition).

phenotypesfilename:: One column, containing fitted probabilities, one individual per row.

Author(s)

Erika Cule

References

A semi-automatic method to guide the choice of ridge parameter in ridge regression. Cule, E. and De Iorio, M. (2012) arXiv:1205.0686v1 [stat.AP]

Examples

## Not run: 
genotypesfile <- system.file("extdata","GenBin_genotypes.txt",package = "ridge")
phenotypesfile <- system.file("extdata","GenBin_phenotypes.txt",package = "ridge")
betafile <- tempfile(pattern = "beta", fileext = ".dat")
beta_logisticRidgeGenotypes <- logisticRidgeGenotypes(genotypesfilename = genotypesfile,
                                                      phenotypesfilename = phenotypesfile,
                                                      betafilename = betafile)
pred_phen_geno <- logisticRidgeGenotypesPredict(genotypesfilename = genotypesfile,
                                                    betafilename = betafile)
## compare to output of logisticRidge
data(GenBin) ## Same data as in GenBin_genotypes.txt and GenBin_phenotypes.txt
beta_logisticRidge <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin))
pred_phen <- predict(beta_logisticRidge, type="response")
print(cbind(pred_phen_geno, pred_phen))
## Delete the temporary betafile
unlink(betafile)

## End(Not run)
## Not run: 
genotypesfile <- system.file("extdata","GenBin_genotypes.txt",package = "ridge")
phenotypesfile <- system.file("extdata","GenBin_phenotypes.txt",package = "ridge")
betafile <- tempfile(pattern = "beta", fileext = ".dat")
beta_logisticRidgeGenotypes <- logisticRidgeGenotypes(genotypesfilename = genotypesfile,
                                                      phenotypesfilename = phenotypesfile,
                                                      betafilename = betafile)
pred_phen_geno <- logisticRidgeGenotypesPredict(genotypesfilename = genotypesfile,
                                                    betafilename = betafile)
## compare to output of logisticRidge
data(GenBin) ## Same data as in GenBin_genotypes.txt and GenBin_phenotypes.txt
beta_logisticRidge <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin))
pred_phen <- predict(beta_logisticRidge, type="response")
print(cbind(pred_phen_geno, pred_phen))
## Delete the temporary betafile
unlink(betafile)

## End(Not run)

Compute p-values for ridgeLinear and ridgeLogistic models

Description

Functions for computing, printing and plotting p-values for ridgeLinear and ridgeLogistic models. The p-values are computed using the significance test of Cule et al (2011).

Usage

pvals(x, ...)

## S3 method for class 'ridgeLinear'
pvals(x, ...)

## S3 method for class 'ridgeLogistic'
pvals(x, ...)

## S3 method for class 'pvalsRidgeLinear'
print(x, digits = max(3, getOption("digits") - 3),
signif.stars = getOption("show.signif.stars"), all.coef = FALSE, ...) 

## S3 method for class 'pvalsRidgeLogistic'
print(x, digits = max(3, getOption("digits") - 3),
signif.stars = getOption("show.signif.stars"), all.coef = FALSE, ...)

## S3 method for class 'pvalsRidgeLinear'
plot(x, y = NULL, ...)

## S3 method for class 'pvalsRidgeLogistic'
plot(x, y = NULL, ...)

pvals(x, ...)

## S3 method for class 'ridgeLinear'
pvals(x, ...)

## S3 method for class 'ridgeLogistic'
pvals(x, ...)

## S3 method for class 'pvalsRidgeLinear'
print(x, digits = max(3, getOption("digits") - 3),
signif.stars = getOption("show.signif.stars"), all.coef = FALSE, ...) 

## S3 method for class 'pvalsRidgeLogistic'
print(x, digits = max(3, getOption("digits") - 3),
signif.stars = getOption("show.signif.stars"), all.coef = FALSE, ...)

## S3 method for class 'pvalsRidgeLinear'
plot(x, y = NULL, ...)

## S3 method for class 'pvalsRidgeLogistic'
plot(x, y = NULL, ...)

Arguments

`x`	For the pvals methods, an object of class "ridgeLinear" or "ridgeLogistic", typically from a call to "linearRidge" or "logisticRidge". For the print and plot methods, an object of class "pvalsRidgeLinear" or "pvalsRidgeLogistic", typically from a call to "pvals".
`digits`	minimum number of significant digits to be used for most numbers
`signif.stars`	logical; if `TRUE`, P-values are additionally encoded visually as `significance stars` in order to help scanning of long coefficient tables. It defaults to the `show.signif.stars` slot of `options`.
`all.coef`	Logical. Should p-values for all the ridge regression parameters be printed, or only the one from the ridge parameter chosen using the method of Cule et al (2012)
`y`	Dummy argument for compatibility with the default `plot` method. Ignored.
`...`	further arguments to be passed to or from other methods

Details

Standard errors, test statistics and p-values are computed using coefficients and data on the scale that was used to fit them. If the coefficients were standardized before the model was fitted, then the p-values relate to the scaled data.

Value

For the pvals methods, an object of class "pvalsRidgeLinear" or "pvalsRidgeLogistic" which is a list with elements

`coef`	The (scaled) regression coefficients
`se`	The standard errors of the regression coefficients
`tstat`	The test statistic of the regression coefficients
`pval`	The p-values of the regression coefficients
`isScaled`	Were the data scaled before the regression coefficients were fitted?

For the print methods, the argument x is returned invisibly.

Author(s)

Erika Cule

References

Significance testing in ridge regression for genetic data. Cule, E. et al (2011) BMC Bioinformatics, 12:372

Examples

data(GenBin)
mod <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin))
pvalsMod <- pvals(mod)
print(pvalsMod)
print(pvalsMod, all.coef = TRUE)
plot(pvalsMod)
data(GenBin)
mod <- logisticRidge(Phenotypes ~ ., data = as.data.frame(GenBin))
pvalsMod <- pvals(mod)
print(pvalsMod)
print(pvalsMod, all.coef = TRUE)
plot(pvalsMod)

ridge: Linear and logistic ridge regression functions.

Description

Additionally includes special functions for genome-wide single-nucleotide polymorphism (SNP) data.

Package 'ridge'

Help Index

ridge-package description

Description

Details

Author(s)

Simulated genetic data with a binary phenotypes

Description

Usage

Format

Source

References

Examples

Simulated genetic data with continuous outcomes

Description

Usage

Format

Details

References

Examples

The Ten-Factor data first described by Gorman and Toman (1966).

Description

Usage

Format

Details

Source

References

Examples

Hald data

Description

Usage

Format

Details

References

Examples

Linear ridge regression.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Fits linear ridge regression models for genome-wide SNP data.

Description

Usage

Arguments

Details

Value

Input file formats

Output file formats

Warning

Author(s)

References

See Also

Examples

Predict phenotypes from genome-wide SNP data based on a file of coefficients

Description

Usage

Arguments

Value

Input file formats

Output file format

Author(s)

References

See Also

Examples

Logistic ridge regression.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Fits logistic ridge regression models for genomoe-wide SNP data.