GC- and LC-MS metabolomics data processing, correction and quality assessment

Hayley Abbiss1,2, John Moncur2, Scott J. Campbell2, Robert D. Trengove1,3

1Separation Science and Metabolomics Laboratory, Murdoch University, WA, Australia; 2SpectralWorks Ltd, United Kingdom; 3Metabolomics Australia, Western Australian Node, WA, Australia

First Published: Scottish Metabolomics Network Syposium 2018


To reduce the impact of non-biological variance introduced into untargeted metabolomics datasets by, among other factors, gradual changes in instrument performance, it is common practice to include the analysis of reference samples throughout an analytical sequence. There are few single software platforms for data processing, signal-correction and analysis/interpretation which are vendor neutral and support GC- and LC-MS data, and fewer implemented through a GUI. Here we present this software capability and have explored approaches for the validation of signal-corrected data.


Robust statistical analysis for the removal of outlier QC samples was implemented as was correction to pooled quality control samples. Principal component analysis was also implemented and performed for the visual interpretation of data. Three sample sets (four datasets; Table 1) of biological samples (fungal mycelia, plasma, grains) from GC- and   LC-qTOFMS instruments were analysed using AnalyzerPro® and QC-correction and PCA were performed.

Table 1. Number of samples, reference sample types and quality assessment measures for the sample matrices and platforms presented.

Each data set utilised different reference sample types as well as different quality assessment measures to explore options for the validation of metabolomics-based analyses. The sequence set up for the grains samples is shown in Figure 1.

Figure 1. Sequence set up for grains samples showing evenly spaced pooled QC samples (every fifth injection) and validation samples before and after study samples. For the grains analysis, validation samples were run at the beginning, middle and end of the sequence.


The AnalyzerPro data processing workflow consisted of five key steps relevant to untargeted metabolomics-based data analysis. These were: Import Data, Create processing method, Process Sequence, Apply QC Correction and Visual results. Figure 2 shows improved PCA clustering of pooled QC samples after QC correction.

Figure 2. Fungal mycelia samples before (left) and after (right) correction to pooled QC samples.

Additionally, the linearity of an example feature from dilution and concentration of some pooled QC samples was improved after correction (Figure 3).

Figure 3. An example feature from the fungal tissue data showing improved linearity (right) after correction to quality control samples. The relative standard deviation of the feature was also improved with a RSD of 31.75% before and 4.83% after correction.

The second sample set was run on GCqTOF and LCqTOF instruments. For both platforms QC samples were clustered in the centre of the PCA scores plot however grouping of sample sets was only observed for the LCqTOF data (Figures 4).

Figure 4. The PCA scores plot for the GCqTOFMS plasma data (A) shows little grouping of samples after QC correction however the LCqTOFMS shows two distinct sample groups (B).

In both cases, there were further improvements in data quality such as feature RSD for validation samples and linearity of spiked standards (not shown).

An analysis of grains samples including two reference types (pooled pre- and post-extraction) and three validation sets of samples showed differences in PCA scores plots depending on the type of reference used (Figure 5).

Figure 5. Before and after QC correction of grain samples using QC samples pooled pre (A) and post (B) extraction.

Figure 6 shows improved linearity of a standard (13C6-sorbitol) spiked into validation samples for two out of three validation sets and both types of reference sample.

Figure 6. Example of the linearity of a spiked standard over a linear range in validation samples run throughout the grains sequence.

The reference sample which showed the best reproducibility for spiked standards after QC correction was the sample pooled prior to extraction (two of three; Table 2) however reproducibility was best without QC correction.

Table 2. Relative standard deviation (%) of standards spiked into grain samples before and after QC correction.


A vendor neutral GC- and LC-MS data processing, correction and analysis software is presented. In general, reproducibility and linearity of quality assessment samples was improved by correcting data to pooled quality control samples. This work shows that having additional samples to validate not only the MS analysis but the data processing pipeline is essential for the assessment of data quality.


The authors would like to acknowledge Oliver Mead, Susan Breen and Peter Solomon from the Research School of Biology, Australian National University, ACT, Australia; Tobias Strunk, King Edward Memorial Hospital, WA and Andrew Currie, School of Veterinary and Life Sciences, Murdoch University, WA;    Mike Francki, Department of Primary Industries and Regional Development, WA, Australia; and the funding bodies: Bioplatforms Australia (NCRIS), Wesfarmers institute for infectious diseases (telethon kids’ institute); Western Australian Premier’s Fellowship Program in Food and Agriculture.

Content retrieved from: http://www.spectralworks.com/news-events/publications/gc-and-lc-ms-metabolomics-data-processing-correction-and-quality-assessment/.