Darius Dziuda,
Associate Professor,
Central Connecticut State University
Biomarker discovery studies based on current high-throughput genomic and proteomic technologies analyze data sets with thousands of variables, p, and much fewer biological samples, n. To successfully deal with such p >> n data, bioinformaticians or biomedical researchers have to be familiar with intricacies of multivariate analysis, and particularly with approaches capable of efficiently dealing with the curse of dimensionality. However, there are still too many biomarker discovery studies that apply statistical approaches that are inappropriate for this kind of high-dimensional data. In other words, it is easy to perform such studies poorly, and report either anecdotal or misleading results. More sophisticated statistical and data mining methods should be used and combined with appropriate validation of their results as well as with methods allowing for linking biomarkers with existing or new biomedical knowledge. In this presentation, we will discuss current trends in multivariate biomarker discovery based on high-dimensional ‘omic’ data. First, we will take a look at common misconceptions in biomarker discovery, and provide guidance on when to use (and when to avoid) which methods and why. Then we will focus on the methods and concepts maximizing the chances for discovering parsimonious multivariate biomarkers that are robust and biologically interpretable.
|
|
|