Weekly: Tutorials and Tech!

Hey everyone! It's weekly time again! Got a bunch going on!

Tutorial Video by Icarus Games

First off, Anto from Icarus Games made a fantastic tutorial video for LK 0.12. If you want a thorough overview of how the app has changed, give this a watch!

LegendKeeper 0.12

Last week we launched LK 0.12. Overall the update went really smoothly given how much we changed. There are still quite a few minor updates for 0.12.x.x coming; there are a few bugs to squash and features that need to be added back in. At a high level, the feedback on LK 0.12's direction has been overwhelmingly positive. Almost all of our worldbuilders think this is a productive evolution of the LK formula.

It's a huge relief! As I've mentioned in the past, with the addition of the "Boards" tab in late 2021, LK was starting to feel bloated. New users already struggled with understanding how the atlas worked, so we thought it was time to simplify. We think simpler is almost always better and more powerful, and it seems to be paying off in this case. We're so excited to see what everyone builds!

Cleaning up tech debt

This update took way longer than I thought! Like 3-4 months longer. Admittedly, I had some personal stuff take up about 2 months of time, but even still, this update went way over my estimate. The truth is, a lot of LK is well-tested and works fine, but much of it is still built on the same codebase I started on in early 2018. I didn't know web development well back then, and LK has accumulated a lot of technical debt over time. It makes making changes harder, especially on the UI. (The "Hydra" data layer was built more recently and is actually self-contained and tested in its own module, so all the collab/offline stuff is pretty easy to maintain.)

The upcoming public worlds and mobile upgrades are even harder to achieve with LK's ancient UI and server codebase. This is why while working on 0.12.x.x patches, I've started putting together plans to mop up technical debt while building out the public worlds, sharing, and mobile support features. It's still early, but I believe I've found a way to modernize the app without too much effort. I'll have more technical posts on that as we get closer to those things being relevant.

But for this week, maintaining the database tables for pre-Hydra projects sucks! There are still about 8000 of these projects, and many of them have a lot of content! While these folks probably aren't coming back, I don't want to delete stuff or have to deal with sending data deprecation emails. That would take months... I thought about going in and migrating them all myself, but I estimated that would take around 120 hours. Instead, I bounced all of those un-migrated projects to flat files, uploaded them to a cloud storage bucket, and rebuilt the migrator to work with those. I then nuked all the old database tables and all the code dealing with them. Took about 2 hours total! It was very satisfying and cleaned up the server a lot!

Minor Panic

The day after the update, something (ultimately harmless) did happen that made my heart jump out of my chest. LegendKeeper's server has a work queue where all of the save jobs go. Whenever you sync changes on your client, that data is sent to the server and put into a job queue. Then a few seconds later, a worker comes alive, processes the changes, and saves them. (We do this because changes tend to come in batches, and saving after every one of your keystrokes makes our database sad.)

During this processing, the worker checks if the document looks right before it saves to the database. If it doesn't, the changes are thrown out and the data is put into the "failed" queue. This generally only happens when there's a bug in the client code.

Anyways, I woke up the day after the update to... an enormous fail queue. I started sweating bullets! This update touched very little of the sync code! How could this be happening? Even with all the failed jobs, I couldn't find evidence that users were actually impacted by any kind of problem. Very strange, so I kept an eye on it.

I monitor for a while and the failed jobs just keep queuing on and on, but then I notice something... The documents that are failing have IDs that don't exist in the project they claim to be in... So these failed jobs are on documents that don't exist. Seeing as I did just add in migration code in 0.12 that changes every document, this was... very sussy.

It turns out that when LK does a 404 Page Not Found, it would try to migrate that non-existent document. LegendKeeper migrates data in a "just-in-time" way; that is, the moment you open a document on the client, it's migrated to a new format if necessary. Otherwise, I'd have to migrate all 10 million or so LK articles on my end and that would take a really long time.

That said, LK's offline/collaborative layer cannot tell the difference between "does not exist" and "not downloaded yet". So LK opens a document, waits for it to download, and then shows it. If it doesn't become valid within a certain timeframe, we show the 404. But this means when you navigate to a document called "beans", we open the doc which then... Runs the migration code whether "beans" is ready or not. If "beans" doesn't actually exist, we just migrated an empty document. These changes get sent to the server, and then finally the worker is like "wtf is this beans". So it was an easy fix... The migrator just needs to wait on the document like the rest of the app does.

That's all for now! I hope you have a great weekend!

Braden