Renee Hui Xin Ng
Renee Hui Xin Ng
/
Articles
/
Automated RSS-to-weekly research digest pipeline

Automated RSS-to-weekly research digest pipeline

Originally forked from this repo, the only changes I had made were customizing the the inputs so that it will extract papers on topics that I am interested in, and adding the Quarto-rendered site for better usability. therwise the results will appear just as fine on digests.md

The instructions are pretty straightforward, as documented in the readme.

Repo: https://github.com/nghuixin/tocify

Quarto-rendered site published via Github Pages: https://nghuixin.github.io/tocify/

Customize inputs

Edit feeds.txt to add RSS feeds (Journal Name | URL), interests.md for keywords / narrative seed of interest, prompt.txt to rank and score the papers.

Deploy Quarto site

In _quarto.yml, let Quarto use its default website output (_site/),

project:
  type: website
  output-dir: _site

Quarto’s GitHub Pages docs cover the overall approach for rendering/publishing, and this works nicely with the Github Pages artifact deploy model.

Add two GitHub Action workflows

Two YAML files specify that GitHub should run digest.py every Monday and then run quarto render to rebuild and publish the site. This is why we have two github action workflows. GitHub Actions is the automation system that runs your digest.py script on a schedule.

  1. weekly-digest.yml focuses on content generation
  • Runs on a schedule or manually
  • Sets up Python → installs dependencies → runs digest.py
  • digest.py fetches RSS → calls OpenAI → writes digest.md
  • Commits and pushes digest.md to main
  • It will also rebuild when the Week ToC Digest workflow is run manually
  1. publish-digest.yml focuses on site deployment
  • Triggered by the push or by workflow_run
  • Checks out the updated repo
  • Runs quarto render → converts digest.md into HTML (_site/)
  • Deploys _site to GitHub Pages

Unless we commit and push to github, the updated digest.md file is lost because Github Actions runs on a temporary machine. This push changes command is defined in weekly-digest.yml. The other workflow does not modify the repo as it is sent directly to github pages via actions/upload-pages. Separately, there is a quarto.yml file in the main folder, which defines site structure, navigation, output folder, and formatting.