% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/linear_LDA.R
\name{do.lda}
\alias{do.lda}
\title{Linear Discriminant Analysis}
\usage{
do.lda(X, label, ndim = 2)
}
\arguments{
\item{X}{an \eqn{(n\times p)} matrix or data frame whose rows are observations
and columns represent independent variables.}

\item{label}{a length-\eqn{n} vector of data class labels.}

\item{ndim}{an integer-valued target dimension.}
}
\value{
a named list containing
\describe{
\item{Y}{an \eqn{(n\times ndim)} matrix whose rows are embedded observations.}
\item{trfinfo}{a list containing information for out-of-sample prediction.}
\item{projection}{a \eqn{(p\times ndim)} whose columns are basis for projection.}
}
}
\description{
Linear Discriminant Analysis (LDA) originally aims to find a set of features
that best separate groups of data. Since we need \emph{label} information,
LDA belongs to a class of supervised methods of performing classification.
However, since it is based on finding \emph{suitable} projections, it can still
be used to do dimension reduction. We support both binary and multiple-class cases.
Note that the target dimension \code{ndim} should be \emph{less than or equal to} \code{K-1},
where \code{K} is the number of classes, or \code{K=length(unique(label))}. Our code
automatically gives bounds on user's choice to correspond to what theory has shown. See
the comments section for more details.
}
\section{Limit of Target Dimension Selection}{

In unsupervised algorithms, selection of \code{ndim} is arbitrary as long as
the target dimension is lower-dimensional than original data dimension, i.e., \code{ndim < p}.
In LDA, it is \emph{not allowed}. Suppose we have \code{K} classes, then its formulation on
\eqn{S_B}, between-group variance, has maximum rank of \code{K-1}. Therefore, the maximal
subspace can only be spanned by at most \code{K-1} orthogonal vectors.
}

\examples{
\donttest{
## generate data of 3 types with clear difference
dt1  = aux.gensamples(n=33)-100
dt2  = aux.gensamples(n=33)
dt3  = aux.gensamples(n=33)+100

## merge the data and create a label correspondingly
Y      = rbind(dt1,dt2,dt3)
label  = c(rep(1,33), rep(2,33), rep(3,33))

## perform onto 2-dimensional space
output = do.lda(Y, label, ndim=2)

## visualize
opar <- par(no.readonly=TRUE)
plot(output$Y, main="3 groups on 2d plane")
par(opar)
}

}
\references{
\insertRef{fisher_use_1936}{Rdimtools}

\insertRef{fukunaga_introduction_1990}{Rdimtools}
}
\author{
Kisung You
}
