Codex — Machine Learning & AI Writing

“We write so the machines
make more sense and the humans feel less lost.

— C.X.

Swipe to read
i
§ I — Philosophy

Clarity is not
simplification.
It is respect.

The AI field has a documentation problem. Not a shortage — an excess of words that perform expertise without transferring it. Papers written to impress reviewers. Blog posts optimized for search engines. Tutorials that assume you already understand what they're teaching.

Codex is a corrective. Every piece starts with a question someone actually asked at 2 a.m., debugger open, hope fading.

p. 01
§ II — Mission
“Making AI legible to the people building with it — not despite the math, but through it.”

We write for the junior engineer who suspects the loss curve means something but can't name it. For the senior architect who wants a second opinion framed in honest language. For the PM who needs to explain to the board why the model isn't wrong, it's just uncertain.

47Long-form pieces
12kThursday readers
0Sponsored posts
p. 02
§ III — From the Notebook
Graph paper with hand-drawn loss curve diagram showing training vs validation divergence
Fundamentals9 min

Why Your Loss Curve Lies to You

The training loss bottomed out at 0.003. The model was useless. Here is what the curve was actually saying.

Read →p. 12
Handwritten mathematical notation on notebook paper showing attention mechanism equations
Deep Dives18 min

Attention Is All You Need — But Not Why You Think

Dismantling the transformer paper one equation at a time, with the parts the blog posts skip.

Read →p. 34
Scatter plot visualization of embedding clusters on graph paper with colored annotations
Applied ML12 min

The Embeddings That Actually Work in Production

Three months, four embedding models, one honest retrospective on what shipped and what got shelved.

Read →p. 56
§ IV — Keep Going

The notebook
continues
below.

Scroll down to find your reading path, read two full pieces inline, and subscribe if you want one clear explanation every Thursday.

Start Reading
Choose your entry point

Three Reading Tracks.

Pick the path that matches where you are. Each track is a curated sequence, not a random list.

013 pieces

Fundamentals

For the engineer who wants to understand, not just implement.

01Why Your Loss Curve Lies to You
9 min
02Gradient Descent Is Not Optimization
11 min
03What Regularization Is Actually Doing
8 min
Begin track
023 pieces

Deep Dives

Architecture breakdowns for engineers who read the paper and still had questions.

01Attention Is All You Need — But Not Why You Think
18 min
02The RLHF Papers Nobody Cites Correctly
22 min
03Diffusion Models From First Principles
25 min
Begin track
033 pieces

Applied ML

Production lessons from models that shipped — and ones that didn't.

01The Embeddings That Actually Work in Production
12 min
02Serving 10M Predictions a Day on $200/Month
14 min
03When to Retrain vs. Fine-Tune
10 min
Begin track
Proof before commitment

Read before you subscribe.

Fundamentals9 min read· Feb 6, 2026

Why Your Loss Curve Lies to You

A forensic examination of what training metrics actually encode — and the three failure modes nobody puts in their tutorial.

It was 2:17 a.m. when Priya messaged the team Slack. The model had been training for six hours. Loss: 0.0031. She typed: "I think we're done?" The next morning, the model predicted"cat" for every image in the validation set. Loss curves had lied to her — politely, consistently, and with complete statistical confidence.

The problem isn't the metric. Cross-entropy loss is a perfectly reasonable objective. The problem is the implicit assumption baked into how we read it: that lower is better in a way that generalizes. It doesn't, and understanding why requires a brief detour through what the loss function is actually optimizing.

“The training loss bottomed out at 0.003. The model was useless. Here is what the curve was actually saying.”

Cross-entropy measures the average negative log-likelihood of the true labels under your model's predicted distribution. When it goes to zero, it means your model has assigned probability 1 to every correct label in the training set. This is not generalization. This is memorization wearing a lab coat.

The validation loss tells a different story — but only if you know how to read the divergence. A gap of 0.1 between training and validation at epoch 20 means something entirely different from the same gap at epoch 200 on the same architecture. Context is everything, and the curve alone has none.

Applied ML12 min read· Jan 30, 2026

The Embeddings That Actually Work in Production

Three months, four embedding models, one honest retrospective on what shipped and what got quietly shelved.

We started with OpenAI's text-embedding-ada-002. Of course we did. It was February, we had a deadline, and the benchmark numbers looked fine. Three months later, we were migrating away from it at 3 a.m. on a Saturday. This is the story of what we learned — not the cleaned-up version.

The first thing nobody tells you about embedding models in production is that "similarity" is not a single thing. Cosine similarity between two product descriptions behaves completely differently from cosine similarity between two support tickets, even if both are sentences about the same domain. The geometry of the embedding space is shaped by the training data, and if your data distribution doesn't match the pretraining corpus, the distances mean something you didn't intend.

“The model that won the benchmark lost in production. The model that lost the benchmark won in production. We spent two weeks figuring out why.”

We ran a structured evaluation across four models: ada-002, e5-large-v2, bge-m3, and a fine-tuned version of nomic-embed-text. The evaluation had three components: offline retrieval quality on our labeled dataset, latency under realistic load, and stability of cluster structure across a two-week rolling window.

Every Thursday

One clear explanation.
No noise.

Every Thursday, one piece. Long enough to matter, short enough to finish. 12,000 engineers and PMs read it. No sponsors, no affiliate links, no "quick wins."

No spam. Unsubscribe in one click. Read by engineers at Anthropic, Hugging Face, and Cohere.

12kSubscribers
47Long-form pieces
4.1★Avg. reader rating