Reproducibility and replicability in science should be frictionless. I find that the problem associated with the lack of reproducibility and replicability largely one that is related to the UX of publishing in science. At the very least, design can have a huge impact on culture and community practices surrounding publishing science.
But can’t apply the exact same workflows as content creation in other domains (think social media, e-commerce). Dissemination and curation of new discoveries need some form of expert review (peer review in its current form) before it should get circulated in public, but we can learn a lot about reducing friction and improving scientists’ ability to share their work from commonly used platforms in our daily lives.
I’d like to think that the publication should be a byproduct of doing science.
Template based on Arcadia Notebook Pub "From black box to glass box: Making UMAP embeddings interpretable with exact feature contributions"
This is an experiment in replicating methods published on an experimental platform: no more reformatting in separate editors, or copying results from one environment into another.
This code repository contains material that is used for creating and hosting the publication entitled, "From black box to glass box: Making UMAP embeddings interpretable with exact feature contributions". The original publication is hosted at this URL.
repo: https://github.com/nghuixin/glass-box-umap-notebook-pub/tree/main
notebook pub rendered as github page: https://nghuixin.github.io/glass-box-umap-notebook-pub/#umap-of-gene-expression-data
Questions
This is an attempt to replicate and reproduce a notebook pub, which is a publishing format that renders transparent the process between doing analyses and its final publication as typically presented in a journal. The "product" , instead of being PDF in a journal. is an online, executable format; you can think of it like a Jupyter Notebook with a DOI so it gets indexed and becomes searchable. The biggest advantage is I envision is that users/scientists can quickly replicate the environmment for analyses quickly and re-run the orignal analyses before modifying it to suit their needs. Two questions I wanted to answer:
How much time and effort does it take to reproduce and replicate a notebook pub?
<1 day, not bad considering I have never attempted to replicate a notebook pub before.
What sorts of challenges might I face when adapting the analyses to another dataset?
I managed to read the dataset with some slight modification to the code provided. Often this is where I run into issues adapting code from others, and considering I was still able to deploy this work-in-progress notebook pub and publish it as a webppage, I’d say this is a fairly frictionless process from getting started on new analyses to sharing it.