Data Reduction Prior to Inference: Is it Sensible to Use Principal Component Scores to Make Group Comparisons in a Student's t-test or ANOVA?
Institución: Department of Epidemiology and Biostatistics, University of Arizona, Tucson
Tipo de Evento: Investigación
Cuándo |
11/11/2019 de 16:00 a 17:00 |
---|---|
Dónde | Aula de cómputo, IMATE Juriquilla |
Agregar evento al calendario |
vCal iCal |
Abstract: There has been a significant recent development of statistical methods for inference with high-dimensional data. Despite these developments, biomedical researchers and computational scientists often use a simple two-step step process to analyze multivariate data. First, the dimensionality is reduced using a standard technique such as principal component analysis, followed by a group comparison using a t-test or analysis of variance. In this talk I will try to untangle a number of issues associated with this approach, stating with the simplest but most vexing question - what hypothesis is being tested? I will use a combination of approaches, including asymptotics, analytical construction of worst case scenarios, and simulation based on actual data to address whether this approach is sensible. Although asymptotics will consider a non-sparse setting, some discussion of sparse problems will be given. A short discussion of the use of PC scores for classification will also be provided.