Kernel local Fisher discriminant analysis of principal components (KLFDAPC)

This function performs Kernel local Fisher discriminant analysis of principal components (KLFDAPC) for genomic data

KLFDAPC(infile, y, n.pc,sample.id=NULL, snp.id=NULL,
                 autosome.only=TRUE, remove.monosnp=TRUE, maf=NaN, missing.rate=NaN,
                 algorithm=c("exact", "randomized"),
                 eigen.cnt=ifelse(identical(algorithm, "randomized"), 16L, 32L),
                 num.thread=1L, bayesian=FALSE, need.genmat=FALSE,
                 genmat.only=FALSE, eigen.method=c("DSPEVX", "DSPEV"),
                 aux.dim=eigen.cnt*2L, iter.num=10L, verbose=TRUE,
                 kernel=kernlab::polydot(degree = 1, scale = 1, offset = 1), r=3,
                 tol=1e-30,prior=NULL, CV=FALSE,usekernel = TRUE,
                 fL = 0.5,metric = c('weighted', 'orthonormalized', 'plain'),
                 knn = 6, reg = 0.001, ...)

Arguments

genofile	The genomic files. This package uses GDS file to perform a high performance computation. GDS formatted data is designed for efficient random access to large data sets.The GDS file can be converted from various formats such as PLINK data format (BED) and vcf files via SNPRelate package.
y	The group labels
n.pc	Number of PCs to proceed the KLFDA.
snp.id	The sample IDs that the users want to select to perform the analysis, if NULL or missing, the default is to use all samples.
num.thread	the number of (CPU) cores used; if NA, detect the number of cores automatically
kernel	The kernel techniques used in KLFDA
r	The number of reduced features
tol	The tolerance to decide if a matrix is singular; it will reject variables and linear combinations of unit-variance variables whose variance is less than tol^2.
prior	The prior of the groups
CV	If true, returns results (classes and posterior probabilities) for leave-one-out cross-validation. Note that if the prior is estimated, the proportions in the whole dataset are used.
usekernel	Whether to use kernel classifier.whether to use kernel classifier, if TRUE, pass to Naive Bayes classifier.
fL	If usekernel is TRUE, pass to the kernel function. See the kernel function from kernlab.
metric	The type of metric in the embedding space (default: 'weighted') 'weighted' - weighted eigenvectors 'orthonormalized' - orthonormalized 'plain' - raw eigenvectors
knn	The number of nearest neighbours
reg	The regularization parameter
...	Other parameters pass to klfda and snpgdsPCA.

Details

The kernel local discriminant analysis becomes complex once the kernel tricks are intruduced into the function. We also employ the kernel classifier to make inference based on the non-linear features.

Value

The results return two parts, the results of PCA and the results of KLFDAPC including the classified classes and the posterior possibility of each class using different classifier.

KLFDAPC

The results of KLFDAPC including the classified classes and the posterior possibility of each class using different classifier

PCA

PCA results

%% ...

References

Sugiyama, M (2007).Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis. Journal of Machine Learning Research, vol.8, 1027-1061.

Sugiyama, M (2006). Local Fisher discriminant analysis for supervised dimensionality reduction. In W. W. Cohen and A. Moore (Eds.), Proceedings of 23rd International Conference on Machine Learning (ICML2006), 905-912.

Original Matlab Implementation: http://www.ms.k.u-tokyo.ac.jp/software.html#LFDA

Tang, Y., & Li, W. (2019). lfda: Local Fisher Discriminant Analysis inR. Journal of Open Source Software, 4(39), 1572.

Moore, A. W. (2004). Naive Bayes Classifiers. In School of Computer Science. Carnegie Mellon University.

Pierre Enel (2020). Kernel Fisher Discriminant Analysis (https://www.github.com/p-enel/MatlabKFDA), GitHub. Retrieved March 30, 2020.

Karatzoglou, A., Smola, A., Hornik, K., & Zeileis, A. (2004). kernlab-an S4 package for kernel methods in R. Journal of statistical software, 11(9), 1-20.

Zheng, X., Levine, D., Shen, J., Gogarten, S. M., Laurie, C., & Weir, B. S. (2012). A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics, 28(24), 3326-3328.

Author

qinxinghu@gmail.com

Examples

### input file
f <- system.file('extdata',package='KLFDAPC')
infile <- file.path(f, "2019-nCoV_total.gds")
##Note the labels below is random
y1=rep(1,times=1736)
y2=rep(2,times=2000)
y=rbind(as.matrix(y1),as.matrix(y2))
y=as.factor(y)
### using gaussan kernel
### This will take longer than PCA, denpending on the number of samples and n.pcs.
#We will not show the results here. Users can test on their own clusters
###virus_klfdapc=KLFDAPC(infile,y,kernel=kernlab::rbfdot(sigma = 0.5),r=3,snp.id=NULL, maf=0.05, missing.rate=0.05,n.pc=10,tol=1e-30, num.thread=2,metric = "plain",prior = NULL)
#showfile.gds(closeall=TRUE)