Starting a 30-day caffeine experiment — the setup post, before any data

This post has no findings in it. I want to say that in the first sentence, because the honest version of an n=1 experiment starts before the data does, and most habit-and-health writing skips straight to the conclusion it wanted anyway.

Here is what this actually is. Starting this week, I am logging every dose of caffeine I take, a focus rating at the end of each deep-work block, and a mood rating twice a day. I am doing it for at least 30 days. In about a month, there will be a follow-up post with whatever the data shows — including, very possibly, "nothing clear." This post is the setup: what I'm measuring, how, why, and what I already know is wrong with the method before I've collected a single useful day.

A note on the site first, the same note Ravi wrote on the finance side and Jay wrote on the engineering side, because it would be strange not to. spanpost used to publish general health-and-wellness content of the kind that gets generated quickly and optimized for search. That's done. What this site is now is narrower: documented n=1 experiments on ordinary habits — caffeine, focus, sleep timing — written by someone with no medical credentials, logging his own behavior. My name is Daniel. The disclaimer at the top of every post is the literal truth and not a formality: this is one person experimenting on himself, and it is not medical advice.

What I'm logging

Three things, each as a timestamped entry sent to a small local database. Nothing fancy — a tap on my phone, a number, an optional note.

What	Value	When
Caffeine	dose in mg	every coffee, tea, or energy drink
Focus	1–10 rating	end of each deep-work block
Mood	1–10 rating	once mid-morning, once evening

For caffeine I'm using standard estimates rather than pretending I can measure my own cup: an espresso shot at roughly 80 mg, an americano around 200 mg, drip coffee near 130 mg, green tea about 30 mg. These are approximations and I'll treat them as approximations. The point isn't milligram precision; it's whether a day with 400 mg before noon looks different from a day with 150 mg spread out.

The focus and mood numbers are 1–10 self-ratings, which are exactly as soft as they sound — more on that in a second. The target is roughly 7 to 12 entries a day. Over 30 days that's somewhere between 200 and 350 data points, which is enough to see a pattern if a pattern exists and enough to see clearly that one doesn't if it doesn't.

The hypothesis, held loosely

I have one, and I'm writing it down now specifically so I can't quietly revise it later to match the result.

My loose prior: caffeine timing affects my focus more than caffeine total does. I suspect that 200 mg at 9:30 in the morning produces a better afternoon than the same 200 mg sipped from 9:30 to 2:00, and that both beat 300 mg taken after noon. I also suspect the effect on mood is smaller and noisier than the effect on focus.

That's it. That's the whole hypothesis. Notice it does not say "caffeine improves focus" — that's not interesting and it's not really in doubt. It says something narrower and falsifiable about timing versus dose, and after 30 days the data will either lean that way or it won't.

The logging has to be nearly frictionless or it won't survive week two

I want to be specific about the mechanics, because the single most common way a self-tracking experiment dies is not loss of interest — it's friction. If logging a coffee takes 40 seconds and four taps, I will skip it when I'm busy, and the days I'm busy are exactly the days worth capturing.

So the setup is deliberately minimal. Each entry is one tap from my phone's home screen, a number, and an optional short note — it resolves to a single request against a small database running on my own machine. Caffeine entry: tap, type 200, done. Focus entry: tap, type 7, done. The whole thing is built to take under five seconds, because a five-second log gets done 300 times and a 40-second log gets done 30 times and then abandoned.

There's a measurable claim hiding in that paragraph, and it's one the data itself will test: I expect my logging completeness to be highest in week one and to sag somewhere in week two or three. If the follow-up post shows a day with two entries instead of ten, that's not missing data to apologize for — it's a real signal about how sustainable the method is. A protocol that only works when I'm not busy isn't a protocol. It's a hobby.

What's wrong with this method — stated up front

This is the section most habit posts leave out, so it gets written before the data instead of after.

n=1 is weak evidence. One person is not a sample. Anything I find is a fact about Daniel in a particular month, not a fact about people. A 2018 review of n-of-1 trial methodology made the case that single-subject designs can be rigorous for individual decisions — but the same literature is clear that they don't generalize, and that the practical limitation is exactly the one I'll be living with: the subject and the experimenter are the same person.

Self-reported ratings are contaminated. When I rate my focus a 7, I know how much coffee I had. That knowledge leaks into the rating. If I expect caffeine to help, I'll tend to score the post-coffee block higher regardless of what actually happened. There is no clean fix for this in a solo setup. The honest move is to name the bias, not to claim I've eliminated it.

Confounders are everywhere. Sleep, what I ate, whether the work itself was interesting, whether it was raining. I'm logging caffeine, focus, and mood — not those. So when a pattern shows up, "caffeine timing" will only ever be one candidate explanation among several, and I'll say so.

A month is short. Thirty days catches day-to-day variation. It does not catch tolerance changes, seasonal shifts, or anything slow. The follow-up post will be a snapshot, not a verdict.

I'm listing all of this not to talk myself out of the experiment but because a result is only worth reading if you can see the machinery that produced it. If the follow-up post says "the data leaned toward the timing hypothesis," you should be able to hold that next to these four weaknesses and weight it accordingly — somewhere well below "a study found" and well above "a guy on the internet has a feeling."

Why do it at all, then

Fair question, given that paragraph. Three reasons.

The first is that an individual decision genuinely is the n=1 case. I'm not trying to publish a finding about humans. I'm trying to answer "should I move my coffee earlier," and for that question, data about me is the relevant data. The weakness — doesn't generalize — isn't a weakness for the actual decision.

The second is that logging changes behavior, and I want to see by how much. There's a well-documented measurement-reactivity effect: people who track a behavior tend to shift it just by tracking. If my caffeine intake drops in week one purely because I'm counting it, that itself is worth knowing, and it's the kind of thing the data will show plainly.

The third is the reason this is a post and not a private spreadsheet. The whole point of this version of the site is that the method is visible. You can see the hypothesis before the result. In a month you'll be able to check whether the conclusion I draw actually matches the numbers, or whether I quietly bent one to fit the other. That check is only possible because this post exists first.

What happens next

I start logging this week. The follow-up — call it the 30-day post — will include the actual data: a chart of caffeine by day, focus ratings against caffeine timing, mood as a flatter line underneath. It will report what the numbers leaned toward, the size of that lean, and an honest count of how many days had enough entries to mean anything.

If the answer is "the timing hypothesis didn't hold up," that post will say so in its first sentence, the same way this one led with "no findings." A logging experiment that can only confirm what the logger expected isn't an experiment. It's a diary with a chart attached.

Thirty days from now we'll know which one this was.

— Daniel