Open and Reproducible Biomedical Science: Lessons from participating in the Schwannomatosis Open Research Collaborative

Forays into open science

As a researcher primarily using computational tools for my work, I do my best to ensure that my code is publicly available and reproducible so that others who want to run the same analysis can retrace the steps I have taken and tweak them if needed.

In neuroimaging (one of the key methods in my research area), initiatives like the International Neuroimaging Data-sharing Initiative (INDI) and the Human Connectome Project (HCP) have facilitated open access to data. Additionally, the Brain Imaging Data Structure (BIDS) is an emerging standard for organizing and describing neuroimaging data in a consistent and structured way. BIDS encourages researchers to format and label their data in a standardized manner, making it easier to share, combine and cross-examine datasets across datasets. Large-scale collaborations are increasingly common; for instance, the Enhancing Neuro Imaging Genetics Through Meta-Analysis (ENIGMA) Consortium, which my lab is part of, has multiple working groups which focuses on various psychiatric disorders. One key goal of the consortium is to overcome issues associated with underpowered studies due to small sample sizes and to standardize data processing protocols.

Nonetheless, my experience with open source pipelines and software has always been on the user-end; I’ve downloaded Docker images and forked repositories of code for my own purposes and shared my work; but working in a collaborative setting where we practice version control in a team is novel experience for me.

A community driven approach to uncover the genetic landscape of schwannomatosis

When the chance came up to apply bioinformatic tools to study a rare genetic disorder —thanks to the Bioinformatics Research Network and Sage Bionetworks — through Schwannomatosis Open Research Collaborative (SORC), I jumped on it.

SORC is part of a larger umbrella project called Synodos for Schwannomatosis. Funded by the Children’s Tumor Foundation, its main goal is to facilitate collaborative research that advances our understanding of the disease and ultimately leads to the development of more effective treatments

Quick rundown of SORC and schwannomatosis:

Schwannomatosis is a rare genetic disorder leading to nerve sheath tumors, often caused by SMARCB1 and LZTR1 mutations. In many cases, however, the causes remain unknown, making treatment and prognosis difficult.
The primary goal of SORC is to conduct comprehensive genomic analyses, targeting noncoding variants, under-studied genes, and other genomic factors contributing to disease heterogeneity and etiology.
Using genetic data from whole exome sequencing (n=33; a technique to read the coding parts of their DNA) and whole genome sequencing (n=6; reading their entire DNA), sequence variants were identified by applying several variant calling pipelines (GATK HaplotypeCaller, DeepVariant, and Strelka)
After applying the allele frequency filter, we run the VCF files containing the remaining genetic variants through a pipeline containing splicing and missense variant annotation tools we have selected.
The variant annotation tools in the pipeline are open source, and they assess how pathogenic or impactful a variant might be — subsequently, we scale and average the scores to create a composite score. I primarily focused on identifying missense annotation tools and preparing the VCF files for proper integration into the pipeline.
Ultimately, we produce a list of variants that could change gene expression, and may support the discovery of new genes or pathways that could be targeted for treatment.
In the future, proteomic and DNA methylation data could be integrated to create a more complete picture of the disease.

A culture of collaboration between patient and research communities

SORC utilizes a platform called Synapse created by Sage Bionetworks for data hosting and management, which made the process of accessing data and metadata a far smoother process; it’s not uncommon for researchers to wait for weeks to get data or cloud access, when first starting a project, and I was very grateful to work in an environment with great data infrastructure right from the get go — it makes the process of doing science much more satisfying!
One of the most rewarding aspects of the project was learning how to use gitlab collaboratively — and learning how to write effective documentation so that when teammates (or even myself) read the code, there was sufficient context and explanation for why the code was written at the first place. My code documentation can be found here:

swnts-nghuixin

nghuixin ⋅ 2 years ago

Attending 2023 neurofibromatosis (NF) conference was a highlight because it was the first time I attended an academic conference in which patients, clinicians, drug developers and researchers were well-represented during the event. More on my reflections on the conference here.

Ways to learn more about the Schwannomatosis Open Research Collaborative (SORC)

Read the README on the github repo on how to contribute
Learn more about the umbrella project Synodos Schwannomatosis
Read more about schwannomatosis and the data source of the project in Mansouri et al. 2020
Learn more about The Children’s Tumor Foundation and current NF research tools
Poster with team members co-authors Hector Kroes, Adon Chawe and project mentor Alexandra Scott and many others can be found here

💡

This article is also published on Sage Bionetwork’s pubpub Ng, H.X. (2024). Lessons from participating in the Schwannomatosis Open Research Collaborative. Sage Bionetworks.