DS4S Seminar | Data integration of in-house and publicly available proteome data

by Ms Voß Hannah ((Clinical Chemistry and Laboratory Medicine, UKE Hamburg)




The whole title of the presentation is "Data integration of in-house and publicly available proteome data across tissue types, quantification techniques and experimental setups overcomes cohort size limitations and enables valid statistical analysis for rare samples".

Abstract: Investing the proteome is crucial for the understanding of molecular changes in diseases, as the proteome represents pharmacologically addressable phenotype.Small cohorts limit the usability and validity of statistical methods especially for rare disease, while variable technical setups and high numbers of missing values make data integration from public sources challenging. Here, we show for the first time the successful integration of proteomic data across different tissue types (Fresh Frozen, Formalin Fixated Paraffin embedded (FFPE)), quantification platforms (DDA, DIA, SILAC, TMT) and technical setups, while handling missing data without the need for error prone imputation. The developed framework can remove technical batch effects trough Bayesian framework or linear regression model and is adaptable to different data probability distributions-according to the user’s needs.

Based on different datasets we show that data integration across independent proteomic cohorts can help to identify subpopulations and to disclose molecular signatures and altered pathways in biomarker discovery studies.

Please subscribe to the mailing list to get the zoom link and further information on the biweekly Data Science for Science (DS4S) networking series: .