The k-Sample Problem When k is Large and n Small
Abstract
The k-sample problem, i.e., testing whether two or more data sets come from the same population, is a classic one in statistics. Instead of having a small number of k groups of samples, this dissertation works on a large number of p groups of samples, where within each group, the sample size, n, is a fixed, small number. We call this as a "Large p, but Small n" setting. The primary goal of the research is to provide a test statistic based on kernel density estimation (KDE) that has an asymptotic normal distribution when p goes to infinity with n fixed.
In this dissertation, we propose a test statistic called Tp(S) and its standardized version, T(S). By using T(S), we conduct our test based on the critical values of the standard normal distribution. Theoretically, we show that our test is invariant to a location and scale transformation of the data. We also find conditions under which our test is consistent. Simulation studies show that our test has good power against a variety of alternatives. The real data analyses show that our test finds differences between gene distributions that are not due simply to location.