The trouble with becoming a senior engineer is that there’s less and less that I can talk about it in public. Nevertheless, here are some articles I’ve written.
I also throw in some advice for free where it makes sense.
Computing at scale
A behind-the-scenes look at an example machine-learning model that we could run on a global scale on Google Earth Engine. It worked because:
- Google provided programming primitives for parallelism,
- had the infrastructure to run our code in parallel, and
- random-forest as an algorithm is well-suited for scale because you can grow the trees in parallel. (It doesn’t work with gradient-boosted trees, for example.)
Data-processing with Spark
Processing data quickly and at scale is hard, and this falls under the realm of “streaming” data processing. I have captured some notes from developing streaming apps with Spark Structured Streaming.
Advice: avoid Spark Streaming, Structured or otherwise. Use simple apps that subscribe to the relevant event streams directly and scale with something like Kubernetes. In other words, move the complexity away from streaming, perhaps into a feature store that can be easily looked up with a regular app.
It is important to tune your Spark app, or at least have a look at the UI and run-time profile to see what the bottlenecks are. Sometimes it’s fun to look inside Spark and understand what makes it tick.
Advice: go at least 1 or 2 levels deep from whatever abstraction you work at.
Debugging is the bane of the programmer. It is that hard place where reality meets expectation. As you get older, you’ll learn enough art to minimize the time you spend debugging, but it never really goes away.
Often, I have had to debug because my mental model of the system is different from how it actually is. I’ve seen disks running full due to ghostly files, programs that wouldn’t die. Apparently this happened to a library I was using as well.
Some problems are particularly hard, such as if it occurs once in a blue-moon, or if it happens only on your computer and nowhere else. A simple program that exercises the interesting bits can help a lot.
I’ve now worked in the industry long enough to encounter issues due to processor architecture or bugs in operating system code. Sometimes there’s no alternative to systematic experiments and reading the kernel sources.
Yet, my advice is that if you see a problem, always start by assuming the problem is in your code or understanding of the system. You should translate the current debugging problem into one of information: what information is currently missing that you should add or obtain in order to narrow down and make progress?
Sometimes, programs can teach humans a thing or two.
I think graph data structures are the scalable way to absorb knowledge.
If lazy evaluation works for Spark, it can work for you too.