Recent history of education funding in Rhode Island

 “For decades, Rhode Island’s policymakers have operated under the myth that cutting taxes for businesses and high earners would spur economic growth. Yet the data is clear..."

SEEDS UNPLANTED - The Recent History of Education Funding in Rhode Island

A pair of formal diagram explanations. One for software and one for buildings.

The C4 set of diagrams are directly relevant to my work as a software engineer. Seeing the architecture profession's sets of diagrams is a useful reminder -- and an immediately obvious one for anyone who has lived in a house! -- that different diagrams are needed for different purposes (ie, contexts).

The C4 Model – Misconceptions, Misuses & Mistakes • Simon Brown • GOTO 2024

What's in my set of architectural documents? Sharing everything: drawings, schedules, + specs.

Why I don't want color coded logs

Most developers laugh at me when I say I don't want color coded logs. They rarely ask why. Logs with color coded structures provide no actionable information. They actually obscure actionable information by forcing a distinction without a difference. What I am looking for are clues in the logs. Those clues are often easily overlooked small tokens. I use color or inversion to distinguish them so that their occurrence immediately stands out from the background noise. The highlight script is a simple tool for this.

Unnecessary frustration and toil

I spent a good part of yesterday tracking down a problem with the staging deployment of a feature I first started back in October of last year. (That it has taken this long to get it to staging has everything to do with how this organization manages work.) When you have such a extended period between implementation and deployment you rarely retain the feature's context and even rarer an environment within which to debug problems. It took me some time to regain that context and environment. (I should have left better notes for my future self.) Once I had that it became obvious that the feature worked and that the problem lay in the deployment.

The deployment is a small Kubernetes cluster. Each service has 2 or 3 container in several pods. I figured out the pod ids and container names (why are they all different!) and opened a terminal window for each and streamed the logs. I used a small script to highlight text that I was expecting to find. I then used the feature and discovered the problem was due to a corrupted private key stored in the deployment's database.

The organization uses Honeybadger.io to record exceptions and Elasticsearch to aggregate logs. These tools are intended to improve access to the details needed to debug issues. Each tool has its own user interface and mechanisms for accessing and searching. To use them you obviously need to understand these mechanisms and, more significantly, you need to know how the organization has configured them. That is, no two organizations use the same data model for how it records exceptions and log details. 

The developer needs documentation about the configuration and there was none. Well, that is not quite true. This organization has thousands of incomplete, unmaintained, and contradictory Confluence pages. The "information" available to the developer is actually worse than none at all as they will waste time trying to piece together some semblance of a coherent (partial) picture. What I eventually concluded was that it could not be done and my best path forward was to look at the raw container logs.

I understand that at this organization I am a contractor and so just developer meat. But what I have seen is that this global, financial, highly profitable organization does not do any better for their developer employees. Perhaps all industries are like this. I have only experienced the software development industry and here they are mostly the same. It makes me sad and mad to see and experience such unnecessary frustration and toil.

Transactions and some concurrency problems

A group of us are reading Kleppmann's Designing Data-Intensive Applications. Chapter 7 is on transactions and especially the different approaches used to address concurrency problems, ie the Isolation in ACID. What becomes clear is that transaction isolation levels can only mitigate some problems. It is your application's data model design and use that are mostly responsible for avoiding them. Here are the concurrency problems raised in this chapter:

Lost Updates. These occur when a process reads some records, modifies them, and writes them back. If the updated records had been modified by another process after the read then those updates would be lost.

Read Skew. This is a variation of Lost Updates due to the delays between steps in a multiple step operation. Processes A and B are interacting with the same records. Process A reads records X and Y (two steps). Process B updates records X and Y (two steps). Due to the delay between A's and B's steps, process A has the original X value but the updated Y value.

Write Skew. This occurs when process A reads some records to make a decision and then updates other records appropriate to the decision. While the decision is being made process B changes some records that would alter process A's decision. Process A is unaware of process B's changes and continues to make its updates which invalidates the data model.

Phantoms. This is a variation of Write Skew. Process A queries for records to make a decision. Process B inserts records that would have been included in process A's query results. Unaware of these inserts, process A makes its updates which invalidates the data model. The "phantoms" are the records not included in process A's query result.

Costs of helpful data flexibility.

I'm having a discussion currently with a young developer who has only ever worked in Ruby and JavaScript. I noticed that the developer had chained "symbolize_keys" to the end of a method call. In Ruby this converts a hash's keys from strings to symbols, ie { "a" => 123 } becomes { :a => 123 }. Their reason for this was to offer flexibility to the called as to how it returned the result. They thought this provided flexility and robustness. I countered that it did the exact opposite.

When a function can be given parameters and return results in multiple formats then robustness is only had when the function handles all formats equally. To do that the function needs to be tested with all formats. This can be done, but in practice, and I've seen across many organizations, it is not. Not only the function needs to be tested with the multiple formats, but the callers and the called need to be tested too. It's a combinatorial explosion of testing.

The other detriment to this flexibility is that since no function is sure of the format every function converts the data to its preferred format even if the data is already in the preferred format. This conversion adds to the function's code size and has a runtime cost (CPU and memory) on every invocation. The cost of a single use might be small, but our applications work in a world with thousands of concurrent sessions each with deep call chains, and expect microsecond responses. Those single uses add up.

My recommendation to the developer was to require one format as part of its contract and add validation that runs at least during testing. (I'd like to just tell them to use a typed language where this wouldn't even be an issue!)

I mentioned that the developer's experience is in Ruby and JavaScript. I have found that is common for such developers to not expect data to be in a specific format or type. I assume some of this comes from never being trained to always validate and convert data coming from the outside before using it inside. (Eg, directly passing around an INPUT element's value or a database's column value.)  Once inside, you can be assured of its correctness. Instead, data is passed around without any function knowing a priori that it is correct.

I am unsure if I will convince this developer to not use "symbolize_keys". I am rowing against the tide.

Update: Not only did I not convince the developer, but the system's architect rejected it also.

RI DOT

A long time ago I organized a study group to read the whole RI state budget. We were lucky to get Tom Sgouros to guide us through this massive document. At the time there was no online version so we got printed copies. I remember struggling to carry the weight of multiple copies of its multiple volumes as I walked to my car. One of the things we learned was that DOT has almost no debt service. How can a $981M department that is responsible for roads, bridges, etc with lots of bond money projects have only $330K of debt service? It achieves this by hiding it within the Department of Administration. Most of the DOA's $211M debt service is actually DOT's. DOT costs Rhode Islander's well over a billion dollars a year. I honestly don't know if this cost is outrageous, or if it is money well spent. But it is useful to know the scale of the effort to build and maintain the road infrastructure.

FY 2025 Budget

SSL terminating tunnel using ghostunnel

From time to time the need for a simple SSL terminating tunnel is wanted. This is used to enable the browser to use an HTTPS connection to an HTTP server. It is common to use a proxy server, but I was curious if there was something simpler. I was able to create an SSL tunnel using ghostunnel

https://github.com/ghostunnel/ghostunnel

To build it for MacOS 14.7 I needed to update the go.mod to use toolchain go1.22.7 (instead of toolchain go1.22.4).

Created the cert and key

openssl req \
  -x509 \
  -newkey rsa:4096 \
  -keyout key.pem \
  -out cert.pem \
  -sha256 \
  -days 3650 \
  -nodes \
  -subj "/C=US/ST=RI/L=Providence/O=MojoTech/OU=Labs/CN=clientsite.com"

Add the client's domain name to /etc/hosts

127.0.0.1 clientsite.com

Run the tunnel

sudo ghostunnel server \
  --listen clientsite.com:443 \
  --target localhost:3000 \
  --cert cert.pem \
  --key key.pem \
  --disable-authentication

Run Python's file directory serving http server

python3 -m http.server 3000

And finally, open https://clientsite.com in the browser or with curl

curl -k https://clientsite.com

I think since this is Go and executables are statically linked, you could share the ghostunnel executable and PEMs with other developers.

"His train goes to a different station" is the best description of eccentricity I have heard in a long time.

Bye little Linode VM

The website https://andrewgilmatin.com/ is no more. I wasn't using the little Linode VM for much of anything anymore. If I were to keep it running I really needed to move it off of the discontinued CentOS 7. I would have to transition content, old code, and figure out security. Much has changed since I last needed to do that. I was not up for that marathon again.

Sensitive side of pure evil

I am reading Lord of the Rings for the first time. Yes, reading LotR is a right of passage for geeks, but I'm really only a geek by circumstances rather than by anything deeper. (I have watched Peter Jackson's movies several times, if that helps.) I am enjoying the books, having starting with the Hobbit. But several times I have wondered how a young reader today, one not raised in bucolic Devon, responds to Tolkien's beautifully rendered landscapes? Those landscapes are integral to the book and, for me, a sustaining attraction.

I did try watching the first season of the Rings of Power, but quickly gave up. Others have well explained its many, many failures. It is now in its second season and, apparently, has very strange things to say about the sensitive side of pure evil.

Rings of Power’s orc baby: Amazon’s Lord of the Rings prequel doesn’t get it right. | Vox

Ad hoc systems for managing work

I love seeing people's systems for managing their work. Even those of fictional people. This short from The Bear on managing the restaurant's guests and their orders is great. 

To Do as a game

This might actually work!

A templating system using the file system for inheritance

Way back in the early days of the web, around 2004, I wrote a templating system that used the file system for inheritance. I think Fred Toth originally conceived of the technique. 

In the directory /A/B/C you place the template M with content

Hello [%include N%]

You then have the templating system expand /A/B/C/M. It would execute the directive [%include N%] to include the template N by looking up the directory tree, in order, /A/B/C/N, /A/B/N, and /A/N, and using the first N it found. You would place common templates (eg headers) and default content (eg company name) in the upper directories and "override" them in the lower directories. It worked really well for the mostly static sites my department was creating.

I have not seen something like this elsewhere. You can, however, achieve the same effect by manipulating your templating system's template search path per output document.

The system came to be called Trampoline and it has a Perl and a partial Java implementation. The implementations are in the Clownbike project at Source Forge. None of the templates Clownbike used made it to Source Forge, unfortunately. Those became the proprietary web sites our customers were paying for. Galley, an internal project, seems to have some.

I have no idea if any of this code still works. I am sure to be embarrassed by the code's quality! Some quiet, rainy day this winter perhaps I will try running it.

Red Indian Pipes

On a walk this weekend I saw a red Indian Peace Pipe. Neither I or my wife had ever seen one before. Apparently, they are not common, but also not rare.

Setting a Mac's default email client

I mostly love using Macs, but sometimes the conviences provided are not. I needed to change my default mail client to Microsoft Outlook. You set the default mail client within Apple's Mail app's Settings. However, you can't access Settings unless you first configure an email account! Since I don't want Mail to touch a actual real email account I ran these mail services locally using Docker:

docker run virtuasa/docker-mail-devel
This enabled me to configure Mail to use "debug@example.com" and the local POP server. And now I can access Mail's Settings to set the default mail client to Microsoft Outlook. I really do feel for all those users without there own System Admin.

MSCHF's “Not Wheels” and Osprey's Gaslands

I recently saw MSCHF's “Not Wheels” and it reminded me of Osprey Publishing's Gaslands, a tabletop post-apocalypse vehicle combat game. Even if you don't play the game, getting some matchbox cars, salvaging greeblies from the inside of defunct printers, superglue, and some paint is a lot of fun for the whole family (well, some members of the family).

Examples of hardiness

I walked at Tippacansett this past weekend and I was drawn to these examples of hardiness. Perhaps I was thinking, consciously or not, about still being an individual-contributor software developer at 60.

This morning I referred to my eggs as "hard coded."

Using data from external files

I am working with some code that processes CSV files. Each row corresponds to an existing record and the record is updated in response to the column values. This is not an uncommon task. The existing code implements this in an also not uncommon way by intermixing row parsing and record updating. For example, assume we are updating Foos

class Foo
  attr id, :location, :valuation
  # ...
end
A typical intermixing is
row = [...]
raise "unusable foo id" if row[0].blank?
foo = Foo.find(row[0].to_i)
raise "foo not found with id #{row[0]}" unless foo
raise "unusable town location" if row[1].blank?
location = Location.find_by(town: row[1])
raise "location not found with town #{row[1]}" unless location;
foo.location = location
raise "unusable valuation" unless values[2].to_i < 10_000
foo.valulation = values[2].to_i
While this initially seems like a reasonable approach it quickly breaks down as the number of columns increase, column format is non-trival, and there are column (or row) interdependencies. But the more significant problem is that the parsing and the updating can't be tested individually. This makes the test harder to write, understand, and maintain.

It is always better to first parse the raw data, validate it, and then use it. Eg

class Record
  attr :id, :town, :valuation
  attr :foo, :location

  def initialize(values)
    raise "unusable foo id" unless /^\s*(\d+)\s*$/ =~ values[0]
    id = $1.to_i
    raise "unusable town location" unless /^\s*(.+)\s*$/ =~ values[1]
    location = $1
    raise "unusable valuation" unless /^\s*(\d+)\s*$/ =~ values[2]
    valuation = $1.to_i
  end

  def validate
    foo = Foo.find(id)
    raise "foo not found with id #{id}" unless foo
    location = Location.find_by(town:)
    raise "location not found with town #{town}" unless location
    raise "valuation does not match the minimum" unless valuation >= 10_000;
  end
end

# read the raw data
rows = [[...], ...]

# parse and validate the data
records = rows.map do |row|
  record = Record.new(row)
  record.validate
end

# use the data
records.each do |record|
  record.foo.location = record.location
  record.foo.valuation = record.valuation
  record.foo.save
end
This parse, validate, and use approach is approporate for all cases where you are bringing data from the outside into your application, no matter what the outside source.

ps. These small, helper classes are your friends. Prefer them over your language's hash primitive as they provide great control. Most languages have efficient syntax for creating and using them.