Codex — Machine Learning & AI Writing

“We write so the machines
make more sense and the humans feel less lost.”

— C.X.

Swipe to read

§ I — Philosophy

Clarity is not
simplification.
It is respect.

The AI field has a documentation problem. Not a shortage — an excess of words that perform expertise without transferring it. Papers written to impress reviewers. Blog posts optimized for search engines. Tutorials that assume you already understand what they're teaching.

Codex is a corrective. Every piece starts with a question someone actually asked at 2 a.m., debugger open, hope fading.

p. 01

§ II — Mission

“Making AI legible to the people building with it — not despite the math, but through it.”

We write for the junior engineer who suspects the loss curve means something but can't name it. For the senior architect who wants a second opinion framed in honest language. For the PM who needs to explain to the board why the model isn't wrong, it's just uncertain.

47Long-form pieces

12kThursday readers

0Sponsored posts

p. 02

§ III — From the Notebook

Graph paper with hand-drawn loss curve diagram showing training vs validation divergence

Fundamentals9 min

Why Your Loss Curve Lies to You

The training loss bottomed out at 0.003. The model was useless. Here is what the curve was actually saying.

Read →p. 12

Handwritten mathematical notation on notebook paper showing attention mechanism equations

Deep Dives18 min

Attention Is All You Need — But Not Why You Think

Dismantling the transformer paper one equation at a time, with the parts the blog posts skip.

Read →p. 34

Scatter plot visualization of embedding clusters on graph paper with colored annotations

Applied ML12 min

The Embeddings That Actually Work in Production

Three months, four embedding models, one honest retrospective on what shipped and what got shelved.

Read →p. 56

§ IV — Keep Going

The notebook
continues
below.

Scroll down to find your reading path, read two full pieces inline, and subscribe if you want one clear explanation every Thursday.

Start Reading

Choose your entry point

Three Reading Tracks.

Pick the path that matches where you are. Each track is a curated sequence, not a random list.

013 pieces

Fundamentals

For the engineer who wants to understand, not just implement.

01Why Your Loss Curve Lies to You

9 min

02Gradient Descent Is Not Optimization

11 min

03What Regularization Is Actually Doing

8 min

Begin track

023 pieces

Deep Dives

Architecture breakdowns for engineers who read the paper and still had questions.

01Attention Is All You Need — But Not Why You Think

18 min

02The RLHF Papers Nobody Cites Correctly

22 min

03Diffusion Models From First Principles

25 min

Begin track

033 pieces

Applied ML

Production lessons from models that shipped — and ones that didn't.

01The Embeddings That Actually Work in Production

12 min

02Serving 10M Predictions a Day on $200/Month

14 min

03When to Retrain vs. Fine-Tune

10 min

Begin track

Proof before commitment

Read before you subscribe.

Fundamentals9 min read· Feb 6, 2026

Why Your Loss Curve Lies to You

A forensic examination of what training metrics actually encode — and the three failure modes nobody puts in their tutorial.

It was 2:17 a.m. when Priya messaged the team Slack. The model had been training for six hours. Loss: 0.0031. She typed: "I think we're done?" The next morning, the model predicted"cat" for every image in the validation set. Loss curves had lied to her — politely, consistently, and with complete statistical confidence.

The problem isn't the metric. Cross-entropy loss is a perfectly reasonable objective. The problem is the implicit assumption baked into how we read it: that lower is better in a way that generalizes. It doesn't, and understanding why requires a brief detour through what the loss function is actually optimizing.

“The training loss bottomed out at 0.003. The model was useless. Here is what the curve was actually saying.”

Cross-entropy measures the average negative log-likelihood of the true labels under your model's predicted distribution. When it goes to zero, it means your model has assigned probability 1 to every correct label in the training set. This is not generalization. This is memorization wearing a lab coat.

The validation loss tells a different story — but only if you know how to read the divergence. A gap of 0.1 between training and validation at epoch 20 means something entirely different from the same gap at epoch 200 on the same architecture. Context is everything, and the curve alone has none.

Applied ML12 min read· Jan 30, 2026

The Embeddings That Actually Work in Production

Three months, four embedding models, one honest retrospective on what shipped and what got quietly shelved.

We started with OpenAI's text-embedding-ada-002. Of course we did. It was February, we had a deadline, and the benchmark numbers looked fine. Three months later, we were migrating away from it at 3 a.m. on a Saturday. This is the story of what we learned — not the cleaned-up version.

The first thing nobody tells you about embedding models in production is that "similarity" is not a single thing. Cosine similarity between two product descriptions behaves completely differently from cosine similarity between two support tickets, even if both are sentences about the same domain. The geometry of the embedding space is shaped by the training data, and if your data distribution doesn't match the pretraining corpus, the distances mean something you didn't intend.

“The model that won the benchmark lost in production. The model that lost the benchmark won in production. We spent two weeks figuring out why.”

We ran a structured evaluation across four models: ada-002, e5-large-v2, bge-m3, and a fine-tuned version of nomic-embed-text. The evaluation had three components: offline retrieval quality on our labeled dataset, latency under realistic load, and stability of cluster structure across a two-week rolling window.

Browse All 47 Articles →

“We write so the machinesmake more sense and the humans feel less lost.”

Clarity is notsimplification.It is respect.

Why Your Loss Curve Lies to You

Attention Is All You Need — But Not Why You Think

The Embeddings That Actually Work in Production

The notebookcontinuesbelow.

Three Reading Tracks.

Fundamentals

Deep Dives

Applied ML

Read before you subscribe.

Why Your Loss Curve Lies to You

The Embeddings That Actually Work in Production

One clear explanation.No noise.

“We write so the machines
make more sense and the humans feel less lost.”

Clarity is not
simplification.
It is respect.

The notebook
continues
below.

One clear explanation.
No noise.