Notes

This is a new page I'm trying out. I'll put shorter content here; thoughts and ideas that aren't full posts.

Tue Nov 21 2023

In 9542c6da I added video support in posts on this website!

My use-case is small looping .mp4 files without controls (without sound, so they can autoplay).

<video
style={{ display: 'block', margin: '0 auto', maxWidth: '100%' }}
key={props.src}
controls={false}
autoPlay
playsInline
loop
muted
width={videoMetadata[props.src].width}
height={videoMetadata[props.src].height}>
<source src={`/posts/${id}/${props.src}`} type="video/mp4"></source>
</video>

I get the video metadata from a library called mp4box during my build step.


Sat Aug 19 2023

I added a few features to my nodots programming language (it's just a little tree-walk interpreter, nothing fancy).

These PRs were fun to implement. They're a little hacky.

  • Add dicts and lists #2
  • Add write(), read(), read_all(), join() #3
  • Add REPL #4

Tue May 16 2023

I was mentioned in Val Town's Restricted Library Mode blog after reporting a series of critical security vunerabilities in their JavaScript runtime.

If you are familiar with the challenges of sandboxing user code, you might realize that we set ourselves a herculean task: keeping user secrets sandboxed while also allowing them to pass arbitrarily rich computational objects between each other. After playing and losing this cat-and-mouse game with the fantastic exploit-finder Andrew Healey one too many times, we decided to admit defeat and race to a more securable semantics ASAP. Specifically, we needed semantics that allowed for process isolation and serialization between all user code.

I'm really happy with how they responded to the issues I raised – and I continue to think it's a neat platform!


Sun Apr 23 2023

I found an optimization that speeds up the large benchmark build of Ter by 20-30sec!

[Ter is a] tiny wiki-style site builder with Zettelkasten flavor

My PR was just merged, it's a tiny change I found after profiling Deno using --inspect-wait and chrome://inspect.


Sun Mar 26 2023

I got a shoutout in the Val Town newsletter!

The award for best Val Town user this month clearly goes to Andrew Healey (@healeycodes) for his tireless work trying to outwit our and Deno's sandboxing. He’s up to maybe 5 exploits so far. So hats off to Andrew for keeping us all secure πŸ™

Over the past few weeks, I've been poking at some JavaScript sandboxes (including Val Town's Deno runtime before it was released). My experience has confirmed something I already expected – which is that you need to sandbox "from the outside" or there will be obvious exploits.

Some examples of sandboxing from the inside:

  • vm2 (see some of the past breakouts)
  • Running user code in an isolated WebWorker but relying on fetch being hidden e.g. so you can limit the amount/type of network access. You can do fetch = fetchWrapper but people will still find a way to access the original.

Some examples of sandboxing from the outside:

  • V8 isolates
  • Running code in a AWS Lambda (and controlling access to the Lambda, and not caring if hostile code accesses the runtime environment you've created)
  • Or using the software that powers AWS Lambda (Firecracker).

Wed Mar 01 2023

Here's a performance-focused refactoring pattern I've applied a few times to great effect. In my head, I call it a "promise pipeline" but there's nothing super special about it – you could also just call it "writing fast concurrent code".

Let's say I've been tasked with speeding up β€œimage jobs” while working at some company that mails framed photographs to customers every month. I'm working with code that already exists and is running in production so I don't have time to rewrite it from scratch or ship new infrastructure. The goal is to just make the code faster.

Since I profiled before optimizing, I've narrowed the issue down to a slow function that glues together calls to external services. This function receives a list of image links and needs to:

  • Download them
  • Upscale them using AI
  • Make a third-party API to mail the customer each image

Here's what we're starting with:

async function handleImages(user: User, imageURLs: string[]) {
// Download images
const images: Blob[] = [];
for (const imageURL of imageURLs) {
images.push(await (await fetch(imageURL)).blob());
}
// Upscale them
const upscaledImages: Blob[] = [];
for (const image of images) {
upscaledImages.push(await upscale(image));
}
// Mail them to the user
for (const upscaleImage of upscaledImages) {
await user.mail(upscaleImage);
}
}

Maybe you've spotted the first problem. This function does one thing at a time! It uploads images one-by-one, and then upscales them one-by-one, and then finally mails them one-by-one. For 50 images, it takes ~8.5 seconds to complete.

Let's add some concurrency β€” with limits so that we can respect our contracts with external services. For same-process concurrency limits in JavaScript, I like the semaphore pattern (e.g. deno-semaphore, await-semaphore).

async function handleImages2(user: User, imageURLs: string[]) {
// Through trial and error, we found that other services
// can handle up to this amount of load
const downloadSemaphore = new Semaphore(5);
const upscaleSemaphore = new Semaphore(5);
const mailSemaphore = new Semaphore(5);
// Download images (5 at a time)
const images = imageURLs.map(async (imageURL) => {
await downloadSemaphore.acquire();
const image = (await fetch(imageURL)).blob();
downloadSemaphore.release();
return image;
});
await Promise.all(images);
// Upscale them (5 at a time)
const upscaledImages = images.map(async (image) => {
await upscaleSemaphore.acquire();
const blob = await upscale(image);
upscaleSemaphore.release();
return blob;
});
await Promise.all(upscaledImages);
// Mail them to the user (5 at a time)
const mailTasks = upscaledImages.map(async (upscaledImage) => {
await mailSemaphore.acquire();
await user.mail(upscaledImage);
mailSemaphore.release();
});
await Promise.all(mailTasks);
}

We had to make the function longer. But it's faster. It takes ~1.2s β€” an improvement of 7x.

Let's say that this isn't enough. Speeding up image jobs is priority zero on the roadmap.

We can take our performance refactoring one step further if we think about how data flows through our function. As it's currently written, there are no guarantees around when images should be mailed to users – only that they should all be mailed by the time the function returns.

Here's the key fact: an image doesn't depend on the progress of another image.

Instead of designing our function with shared steps that block the progress of all images:

  • download images β†’ upscale images β†’ mail images

We can instead think about the flow of each individual image:

  • download image A β†’ upscale image A β†’ mail image A
  • download image B β†’ upscale image B β†’ mail image B
  • download image C β†’ upscale image C β†’ mail image C

If we run these flows concurrently, while still respecting contracts with external services, we will achieve optimal concurrency.

We end up needing less code too.

async function handleImages3(user: User, imageURLs: string[]) {
const downloadSemaphore = new Semaphore(5);
const upscaleSemaphore = new Semaphore(5);
const mailSemaphore = new Semaphore(5);
// Images don't depend on each other!
// They can flow between steps independently
const pipeline = imageURLs.map(async (imageURL) => {
await downloadSemaphore.acquire();
const image = (await fetch(imageURL)).blob();
downloadSemaphore.release();
await upscaleSemaphore.acquire();
const upscaledImage = await upscale(image);
upscaleSemaphore.release();
await mailSemaphore.acquire();
await user.mail(upscaledImage);
mailSemaphore.release();
});
await Promise.all(pipeline);
}

Instead of taking as long as the sum of: the slowest download, the slowest upscale, and the slowest mail call. The function will now take as long as: the image with the slowest combination of calls.

This final version of the function takes ~700ms β€” a speed-up of 1.7x.

If you're wondering how I arrived at these numbers, I wrote a test script to measure each version with 50 image URLs, and mocked external calls so they would take Math.random() * 100 milliseconds.

In the real world, calls to other services don't take an evenly distributed amount of time. Calls have spikes and high P99 latencies – so the impact of going from version two to three is actually much higher!

(In theory, there are rare circumstances where version two and version three can take the same amount of time for some inputs but in practice version three will always be faster).


Wed Feb 15 2023

The amount of joy I get from making tiny changes to this website is unreasonable high.

I am reflecting on this after adding some border radius to all code blocks in b1dd2af8.

I also made it easy to start writing a note (like the one you're reading). All I have to do is run node createNode.js in the root directory of this repository.

// Creates a note in ./notes/ with the schema `${TIMESTAMP}.md`
const fs = require('fs')
const path = require('path')
const notesDir = './notes'
if (!fs.existsSync(notesDir)) {
fs.mkdirSync(notesDir);
}
const filepath = path.join(notesDir, `${Date.now()}.md`)
fs.writeFileSync(filepath, '')
// This can be cmd+clicked from VS Code's terminal to open/edit it!
console.log(filepath)

In comparison, my flow for adding a new blog post is fraught with friction.

  • Write the blog in Notion
  • Get feedback from friends who add Notion comments
  • Export from Notion to markdown
  • Create an empty markdown file in ./posts/
  • Copy frontmatter from an existing post (and edit the front matter)
  • Paste in the exported post from Notion (minus the title)
  • Fix apostrophes and quote marks to be the ASCII variants (' and ")
  • Fix any weird markdown differences (sometimes I need to do stuff like add or delete empty lines)
  • Manually move the images from the exported dump to: public/posts/$postId/$imageName.png
  • Update the image markdown links in the created markdown file I just created

Most of the steps after Export to markdown could be automated by a script that consumes an exported Notion dump. Seeing that I've used this tedious blog creation flow for more than a year, it's probably worth writing a quick script to automate it!


Tue Feb 14 2023

I am sunsetting tags.

All my tag pages (e.g. /tags/go) were getting very low traffic compared to all other pages. Plus, due to the type of content I write, I never really bought into the use case. If I were writing programming tutorials then a tag system might make more sense.

More benefits of removing tags: freeing up a tiny bit of UI space, deleting code, not having to decide what kind of tag a post belongs in.

To be a good netizen, I won't kill any external links. /tags/* will redirect to /articles.

I'm also adding a "star system", where my favorite posts have an asterisk next to them. The visual design of this is identical to Linus Lee's list of posts because I've always been a fan of the design! Check it out on /articles.


Fri Feb 03 2023

Playing chess via voice commands.

I'm often pacing around the room while holding my new baby – putting him to sleep, soothing him, burping him. I've also been playing more chess lately.

I hacked together a script to play lichess games via voice commands (so I could put my laptop on top of my wardrobe while holding my almost-asleep baby). The results were okay, I got it to play some moves for me, but it wasn't very reliable (~40% chance I'd need to repeat myself). So I put it aside for now.

Getting a voice recognizer to parse standard notation ("e2e4") seems harder than getting it to parse normal speech (like nouns, verbs, etc). I used Whisper via speech_recognition, and pyautogui to handle clicking.

As you can see below, I tried some manual parsing to help with the accuracy. It would work better if I used more distinct words for the squares rather than standard notation but I wanted to use standard notation.

import pyautogui
import speech_recognition
r = speech_recognition.Recognizer()
# lichess.org full screen, 14in MacBook Pro
top_left = [472, 217]
bot_right = [991, 738]
# calculate rows
x_space = (bot_right[0] - top_left[0]) / 7
x_rows = [top_left[0] + (x_space * i) for i in range(0, 8)]
y_space = (top_left[1] - bot_right[1]) / 7
y_rows = [top_left[1] - (y_space * i) for i in range(0, 8)]
# e.g "A1" -> (472.0, 217.0)
def position_to_xy(pos: str):
letters = list("HGFEDCBA")
numbers = list("12345678")
letter, number = pos[0], pos[1]
xy = (
x_rows[letters.index(letter)],
y_rows[numbers.index(number)],
)
return xy
while True:
with speech_recognition.Microphone() as source:
print("waiting..")
voice = r.listen(source)
command = r.recognize_whisper(voice)
print(f"got command: {command}")
# examples: "E2, E4.", "E2 E4"
trimmed = command.upper().strip().replace(".", " ").replace(",", " ")
positions = trimmed.split()
if len(positions) != 2 or len(positions[0]) != 2 or len(positions[1]) != 2:
print(f"warn: didn't get a valid position, got: {positions}")
continue
from_square = position_to_xy(positions[0])
to_square = position_to_xy(positions[1])
print(f"moving {positions[0]} -> {positions[1]}")
pyautogui.moveTo(from_square[0], from_square[1])
pyautogui.click()
pyautogui.moveTo(to_square[0], to_square[1])
pyautogui.click()

Fri Feb 03 2023

Hello, World!

The design of this small notes system is stolen from inspired by https://muan.co/notes.