The MeDDL platform is a suite of tools currently implemented in MATLAB v2010a (The MathWorks Inc., Natick, MA) which allows for registration of “peaks”, which are defined here as a single ion or measured mass/charge (m/z) at a given retention time, peak mass and chromatographic time alignment, and a suite of statistical tools selected for biomarker screening studies. In brief, the MeDDL tool reads in lists of CDF (common data format) conversions of the raw GC/MS data files, registers peaks based on a user-defined filter in terms of mass sensitivity and accuracy thresholds as well as chromatographic reproducibility tailored to the performance of the analytical platform, and performs alignment of the generated peak lists in both time and mass. Following registration and alignment, the data is analyzed using two of the primary analytical methods included in the MeDDL platform: principal component analysis (PCA; Richmond et al. 1987) and a novel fold change filter described below.
MeDDL was originally created for the analysis of Liquid chromatography/mass spectrometry (LC/MS) data. The ionization technique employed for LC/MS is “soft” and imparts low energy to analytes resulting in fairly simple mass spectra: generally just the ionized analyte. Modifications to the original implementation of the fold change filter were required to aid in the differential analysis of the more complex mass spectra resulting from the “hard ionization” induced by electron impact (EI) fragmentation processes in the mass spectrometer’s ion source. A reductionist approach for analysis is required for the efficient determination of changes observed between sample groups. To address this issue, we created a supplementary time-binned, fold change algorithm which was labeled as “hard ionization”. In the “hard ionization” method, the analyst specifies both a time window and peak intensity threshold for comparison. The comparison then proceeds as follows: an averaged composite image of each user-defined comparative group is generated; the most intense peak from all comparative groups is evaluated across all aligned images using the criteria specified below; once the comparison is completed, this “time slice” based upon the peak apex ± ½ of the specified time window is removed from further analysis and the next most intense set of peaks are compared.
Each defined group (e.g. the composite, averaged surface obtained from six samples each comprising the B6 intact urine and the associated B6 denatured urine) was assigned as in the following Boolean expression for a two group example: (B6)&(Intact) → (B6)&(Denatured) with p <0.05). We restricted the fold-change to only those time windows which demonstrated increased levels of change upon protein denaturation. For this study, we sought all differential peaks which exhibited a 2 fold or greater absolute change between intact and denatured urine samples with a specified time slice of 0. 2 minutes and an intensity threshold of 100,000 absolute intensity (total ion count). In addition, we performed detailed statistical analysis including ANOVA (N-way) among the selected strains as well as multiple pairwise comparison testing among the means of groups to determine whether or not all differences among group means satisfied a P = 0.05 level of significance. A Bonferroni correction was applied to compensate for the tendency to incorrectly find a single pairwise significant difference among multiple comparisons.
Current LC/MS and GC/MS systems typically consist of a system of specialized instrumentation with customized support software. This software is generally proprietary, being supplied by the instrument manufacturer and designed to facilitate user interaction with the analytical hardware. Most platform manufacturers also market add-on commercial software packages for the analysis of the results of GC and LC/MS experiments, which are generally designed to provide a very specific type of data analysis (i.e. proteomic or metabolomic) and cannot be readily modified or added to by the end-user. For larger metabolomic and volatile organic compound (VOC) biomarker discovery studies, such as the GC/MS based VOC profiling efforts initiated by our laboratory and collaborators, none of the software solutions reviewed offer the ability to compare multiple time point and exposure groups, or handle data sets in significant sample numbers. This bottleneck in data handling initiated the described development and evolution of the Metabolite Differentiation and Discovery Lab (MeDDL) tool, allowing us to differentiate metabolite and VOC profiles in multiple differential biomarker discovery studies and facilitated the ability to visualize collected data for a global view of an entire experiment while maintaining the ability to focus on individual compounds and spectra for subsequent identification.
The goal of this work is to design and implement a prototype software tool for the visualization and analysis of small molecule metabolite GC-MS and LC-MS data for biomarker discovery. The key features of the MeDDL software platform include support for the manipulation of large data sets, tools to provide a multifaceted view of the individual experimental results, and a software architecture amenable to modification and addition of new algorithms and software components. The MeDDL tool, through its emphasis on visualization, provides unique opportunities by combining the following: easy use of both GC-MS and LC-MS data; use of both manufacturer specific data files as well as netCDF (network Common Data Form); preprocessing (peak registration and alignment in both time and mass); powerful visualization tools; and built in data analysis functionality.
Given the requirements and currently available software limitations outlined above, a logically designed and successfully implemented comprehensive tool for time-series spectral registration, spectral and chromatographic alignment, visualization, and comparative analysis will facilitate and allow the efficient and methodical analysis of multiple, large-scale biomarker discovery studies.
Given the unique requirements for large-scale, GC/MS based biomarker studies and currently available software limitations, a logically designed and successfully implemented comprehensive tool for time-series spectral registration, spectral and chromatographic alignment, visualization, and comparative analysis facilitates and allows the efficient and methodical analysis of multiple, large-scale biomarker discovery studies. The MeDDL platform has been markedly improved from the original version and greatly streamlines the analysis of multi-group comparisons through the addition of a more intuitive interface, the ability to dynamically alter group definitions and group comparative displays, and the creation of definable, group comparative graphics. Through a combination of the base MeDDL registration and alignment algorithms and the described additional functionality, MeDDL now offers the analytical chemists the potential for visualizing data in new ways, providing novel insight into the experimental results, and expediting GC/MS based biomarker discovery.