Changing Changelog's logs automatically with GitHub Actions

# Changing Changelog's logs automatically with GitHub Actions I love automation. It's just magical when an application or a script _\*just does the thing\*_. Hundreds of times faster than you could do by hand and infinitely more accurate. Especially for mundane tasks. Magic. And in the world of magical automation, GitHub Actions are a grade A [Elder Wand](https://harrypotter.fandom.com/wiki/Elder_Wand). I used a GitHub action to automate applying some formatting to [Changelog](https://changelog.com)'s episode transcripts. Here's how. ## My backstory This was my first meaningful code contribution to open source. It was an awesome experience for me. I'm a hobby coder, and podcasts like Changelog played a big role in my journey learning to code. This made it even more exciting for me when I came across the [issue for a GitHub action to auto-improve transcripts](https://github.com/thechangelog/transcripts/issues/844) on Changelog's transcripts repo while scouring for potential contributions during [Hacktoberfest](hacktoberfest.digitalocean.com). It was a special chance to contribute to something that I value. I had played with Github Actions a little before, so I felt I knew just enough to give it a go. ## The Goal The goal was a way to automatically apply certain style formatting rules to Changelog's episode transcripts. There was an initial list of rules, which included fixing timestamps and changing javascript to JavaScript (because, as I learnt through this, [that's how you're supposed to write it](https://dev.to/gypsydave5/how-to-spell-javascript-107d)). Somebody will probably think of additional useful ones later, so it should also be simple enough to extend the solution. The transcripts for Changelog are stored in markdown in [this repo on GitHub](https://github.com/thechangelog/transcripts). ## The Solution The main idea was to configure a GitHub Action that would run some code that would apply the formatting changes (per the issue description). ### GHA? wha? [GitHub Actions](https://docs.github.com/en/actions) are a way to run code - in the cloud (i.e. on GitHub's servers), - when code is pushed to a repo on github (or on a schedule, or on various other triggers) and - almost always for free (especially on public repos) ### Actions are hotdogs Let me pause here to _well actually_ myself: They're [_actually_ called _Workflows_](https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions). _Actions_ technically refer to single steps which build up to jobs which build up to workflows, but I'm just gonna refer to workflows as actions. Just like _technically_ a hotdog is just the sausage and once it's in a bun it's a _hotdog roll_, but Imma just call the whole thing a _hotdog_. Actions are hotdogs. ### Back to the solution Anyway, back to the solution: Fleshing it out a bit more, the solution has three parts: 1. a scaffold that finds all the transcript files, and loops over them 2. a formatter that takes a markdown file and applies the required formatting to it and 3. a GitHub Action that runs the formatter on the repo whenever code is pushed to the repo (and 2½: some tests for the formatter - this isn't mission-cricital code, so I didn't aim for 100% test coverage, but testing the formatter was useful, more on this later.) Other high-level goals were to keep the solution lightweight, and to use technology that is ubiquitous, to make it easy to maintain and to enable as many people as possible to contribute. This lead me to write the solution in [Node](https://nodejs.org), due to its ubiquity and power. I also managed to write this without using any dependencies[*](https://github.com/thechangelog/transcripts/commit/9b1d2b953d961d4dec59dc465683ef67fe9a77c1#:~:text=contributors-cli%22%3A%20%22%5E4.10.0%22%2C-,%22jest%22%3A%20%22%5E27.2.4%22,-Write "except Jest for testing, but that doesn't count, right?"). In the spirit of staying lightweight it called the command line for globbing lists of files, and accessing git commands, rather than using an external dependency. I used regular expressions ("regex") for the core of the replacement logic. Regex is [really powerful](https://xkcd.com/208), but also [very dangerous](https://xkcd.com/1171/). Thankfully I was able to get it doing what I want it to do. It turns out it was possible to get it working with about 50 lines of node, one _workflow_ file to define our _action_, and about 50 lines of tests, mostly for the regexes (regecii? regecide?). ### The code You can find the the node code in the `scripts` folder and the _action_ in the `.github/worksflows` folder of the [transcripts repo](https://github.com/thechangelog/transcripts). ## HBD @ GHA (or: what was hard and what I learnt) I got there, but it wasn't all smooth sailing. Here are some sharp edges I encountered and lessons I learnt along the way: ### Writing testable code This wasn't a sharp edge per sé, but a practice that I've been trying to adopt as much as I can, that proved very useful here. I call it Test-driven development-lite: keeping "how would you test this" top of mind. I found this especially helped break my code up into small steps that make sense in isolation. These are easier to test and reason about and they can then be easily stitched together to build the overall solution. ### Testing actions is tricky This one was a sharp edge. As fair as I know, there is no easy GitHub-provided way to test GitHub Actions that I could find (If I'm wrong please tell me, I would love to learn about this). The way I found around this was to first push the code to my fork of the repo and manually test it there. This is much clumsier than the tight feedback loop of automated unit testing, but at least it gave me a way of validating that it works as expected before pushing it to 'prod'. ### YAGNI If you succeed in writing small, contained units of code a new potential anti-pattern starts seducing you: building more than you need for your solution. Two practical examples of this that I managed to resist in this project were * Not catering for use cases that you don't have yet. E.g. I wrote a function to loop over text files, apply an transform and then write it back to the file. I *could* have generalised this for the case where it doesn't write out to the file again. But I didn't need it for this scenario, so I didn't extend it. * Balancing when to stop generalising. I made a hard coded function that returns all the `.md` files in the repo. I could have generalised this to accept any glob pattern. But this solution didn't need that, so I resisted the urge. ### GitHub just trusts the git cli email address This was an odd one for me. I had to deal with it when figuring out how to have the formatting changes be commited as [Logbot](https://github.com/changelogbot). First I tried using an auth token to have the action authenticate as Logbot, but this [caused an infinite loop of action triggers](https://github.community/t/workflow-infinite-loop/16547). I then realised I didn't have to do any of that, and could just use Logbot's username and email in the git cli and it worked because of [how GitHub counts contributions](https://docs.github.com/en/account-and-profile/setting-up-and-managing-your-github-profile/managing-contribution-graphs-on-your-profile/why-are-my-contributions-not-showing-up-on-my-profile). It had never occured to me that there is no validation on the username and email you enter into git on the command line. I also learnt [it's possible to view the email used to make commits to public repos](https://www.nymeria.io/blog/how-to-manually-find-email-addresses-for-github-users). This is one to keep in mind when deciding which email address to use in the git cli. ### Regexes are hard Finally, as I mentioned before, regexes are hard. And clearly thinking through expressing intent is hard. For instance, an edge case was if part of a word appeared in the middle of another word. Testing came in very handy here once more. This worked well for expressing examples where the text was supposed to be replaced, and also places where it shouldn't be replaced. ## Wrapping up Once again, this was an awesome experience for me. I made my first real code contribution to an open source repo and I learnt some along the way. I also got to experience that _magic_ feeling when your automation code runs and _\*just does the thing\*_. I still go an check to see [it running on the repo](https://github.com/thechangelog/transcripts/commits?author=changelogbot) occasionally, and I enjoy seeing it run every time. ### Community To top it all off, I also feel more part of the [Changelog community](changelog.slack.com). [Jerod](https://twitter.com/jerodsanto) from Changelog was super welcoming and willing to chat me through the process and provide guidance (and I enjoyed chatting to someone who's podcast-famous). Thanks Jerod! ### Say hi If you'd like to have a chat to learn more about all this, [please get in touch](https://twitter.com/simeydeklerk). If you've got ideas how we can do this better, or if I got stuff wrong, please also get in touch. I'd love to learn more too!