When Simple Beats Complex in scRNA-seq Data Processing

A point that every judicious data analyst should always have in mind, and an overarching lesson for science

Figure composed from Dall-E 2 generations and hand-made drawings.

Writing in Nature Methods, researchers delve into the critical task of preprocessing single-cell RNA-sequencing (scRNA-seq) data before analysis. And they find that the simplest method is in practice the best.

The standard input to scRNA-seq analysis consists in a matrix that represents a numerical table of genes by cells. To ensure the reliability of downstream analysis, researchers usually employ various transformations on the input matrix to adjust counts for variable sampling efficiency and to equalize variance across the dynamic range. But how solid and well supported are such transformations? Do they introduce biases? How do they affect noise levels and downstream interpretations?

In their study, the authors of this work explored four distinct transformation approaches based on the delta method, model residuals, inferred latent expression states, and factor analysis. Their aim was to compare the strengths and weaknesses of these approaches, and with this information to determine which one provides the most favorable results. The answer was surprising: a very simple method, with no fancy maths or AI, works the best.

The rigorous evaluation found that while the approaches based on model residuals, inferred latent expression and factor analysis exhibit appealing theoretical properties that would in principle make them very suitable, extensive simulations and real-world datasets show that in practice a far simpler approach surpasses them. This approach involves applying a logarithm with a pseudocount (literally as simple as adding +1 to avoid values of zeros and then taking the logarithm) and then performing principal component analysis, a rather simple technique in these times. Strikingly, this straightforward technique consistently outperforms all the other approaches in the benchmarks the authors carried out using simulated and real-world scRNA-seq data. And of course, it is blazing fast.

The work also emphasizes the importance of understanding the conceptual basis behind the different transformation methods, as they impact directly on their actual applicability to diverse data types -a lesson applicable to all sciences, indeed. By performing only simple mathematical operations, the numerical values of the transformed data remain intuitive to grasp.

Another important point that emerges from the study is the need for empirical performance evaluation to complement theoretical analyses, as the authors of this work did by using simulated and real datasets -again an aspect discussed in the paper for scRNA-seq data but likely valid for all science.

When simplicity beats complexity

More broadly, the results of this study exemplify that sometimes simplicity beats complexity. Simplicity brings not only practicality, lower costs and higher speeds, but also easier interpretability and understanding.

At Nexco we are aware of this, and consider all the spectrum of analytical methods spanning from the simplest approaches to the most modern AI-based techniques. Importantly, we build on decades of accumulated experience in statistical analysis that have supported high-impact research, in particular with over a decade analyzing RNA-seq and scRNA-seq data and performing may other kinds of basic and advances bioinformatic analyses. To know more about us and learn how we incorporate the dark genome in scRNA-seq, check our website at https://www.nexco.ch/.

References

Comparison of transformations for single-cell RNA-seq data - Nature Methods

  • Saturday, Oct 28, 2023, 8:47 PM
  • single-cell-analysis, single-cell-sequencing, genomics, bioinformatics
  • Share this post
Contact us

Our location

Nexco Analytics Bâtiment Alanine, Startlab Route de la Corniche 5A 1066 Epalinges, Switzerland

Give us a call

+41 76 509 73 73     

Leave us a message

contact@nexco.ch

Do not hesitate to contact us

We will answer you shortly with the optimal solution to answer your needs