Measure, Normalise, Analyse
Glycans are complex sugar molecules that are present on the majority of proteins in the human body. As such, glycans have shown great potential as biomarkers of both biological and chronological ages, as well as various diseases, including cancer and autoimmune diseases. Large sample sizes need to be measured for studies of biomarker identification which can introduce differences between individuals due to the experimental variation and not necessarily biological variation researchers are interested in. Glycomics is no exception and requires reduction of these differences to make samples comparable, thereby avoiding unwanted bias and false positives.
Modern high-throughput glycomics data shows that there are large differences between subjects in total glycan abundance and that the glycans are highly correlated. An essential step in preprocessing of raw glycan data is normalisation, a process which allows the transformation of glycomics measurements and makes them comparable between subjects. The compositional nature of the data resulting from applying current methods of normalisation makes many standard multivariate statistical methods inappropriate or inapplicable.
Lack of consensus on the appropriate normalisation approach in the field of glycomics motivated the study by Uh et al. to investigate how different normalisation methods affect subsequent statistical analysis, such as variable selection for age prediction using immunoglobulin G (IgG) glycans.
The study focuses on testing six normalisation methods, variable selection and ultimately evaluation of the robustness and efficiency of the normalisation methods by performing simulations. Researchers demonstrate that the widely used row-wise total area normalisation method performs poorly compared to the column-wise normalisation methods, such that the glycans were falsely selected the prediction error was large. The column-wise normalisation methods, such as MS (median scaling) and MQN (Multivariate Quantile normalisation), not only outperformed the row-wise methods but also have an advantage of preserving the correlation structure. The recommendation is that several normalisation methods should be applied and association results that are detected by the majority should be reported.
Studies like this highlight the importance of the initial data handling and increase the awareness of possible bias and false positives due to inappropriate choice of normalisation method. The procedure described also represents a great guide for studies aiming to identify robust reproducible glycan biomarkers.
Start or continue your GlycanAge journey
Don’t be afraid to reach out to us and ask questions, provide commentary or suggest topics.