Building a Dataset to Track My Own Gym Performance

I go to the gym every day. I also spend an embarrassing amount of time looking at charts. At some point the obvious thing to do was combine these two habits.

The question I kept coming back to: how much does any of this actually matter? Not in the abstract — I know protein matters, I know sleep matters — but specifically, for me, how much does skipping creatine for a week affect how much I can lift? Does eating too little the day before show up in my reps? Does it being 35°C outside change anything? I wanted numbers, not conventional wisdom. As the saying goes: what gets measured gets improved.

So I started building a dataset.

The problem with being consistent

The ideal experiment would be perfectly controlled: same diet every day, same supplement timing, same sleep, same everything — then vary one thing at a time. That’s not how I live.

I forget to take creatine for a week. I eat way too much one day and almost nothing the next. I occasionally skip pre-workout. My mood varies. Life varies. If I was trying to publish research, this would be a problem.

But for a personal dataset, the inconsistency is actually useful. If I were perfectly consistent, every row in the dataset would look nearly identical and I’d have nothing to regress on. The variation in my real habits creates the signal. Skipping creatine for seven days then coming back gives me a natural before-and-after. Eating at a deficit two days in a row shows up against baseline performance. The messiness is the data.

What I’m tracking

Gym performance

The core of the dataset is simple: exercise, weight, sets, reps, and a subjective effort rating (1–10) for each working set. I log every session. Over time this gives me a strength curve per movement — I can see not just whether I’m progressing, but whether a given session was above or below my recent baseline.

I track the main compound lifts and a few accessories I care about. I’m not logging every warm-up set and cable pull-down. The goal is signal, not completeness.

Diet

I use FatSecret for dietary logging. It’s not perfect — estimating portions is imprecise, and I’m not weighing every meal — but it gives me reasonable ballpark numbers for:

Total calories
Protein, carbs, fat in grams
Rough meal timing (did I eat close to the session or hours before?)

I don’t need precision here. I need enough resolution to distinguish “ate well” from “barely ate” from “ate a lot of the wrong things.” Even rough numbers are enough for that.

Supplements

This is where the inconsistency I mentioned becomes genuinely interesting. My supplement stack is basic: creatine monohydrate, glutamine, pre-workout on some sessions. I log whether I took each one, and if so, roughly when.

The honest truth is I don’t fully understand the mechanisms behind creatine and glutamine at a biochemical level. I know the high-level claims — creatine replenishes phosphocreatine faster, glutamine may help with recovery — but I don’t know the specific timelines, saturation curves, or how they interact with each other. That’s fine. I don’t need to model the mechanism; I just need to see whether the presence or absence of each supplement correlates with performance differences in the data.

When I’ve skipped creatine for a week or two, I have a window into what my numbers look like without it. When I reload, I have the before and after. The dataset can surface the correlation even if I can’t explain the mechanism.

External variables

A few things I add to each session row that I wouldn’t have thought to track initially:

Temperature. I train indoors, but the gym has almost no air conditioning — so in summer the place is genuinely hot and humid, and in winter it gets uncomfortably cold and damp instead. The indoor temperature tracks the outside season closely enough that logging the day’s ambient temperature is a reasonable proxy. I also log the season itself as a categorical variable, which turned out to be interesting in its own right: it lets me ask not just “was it hot today” but “did I perform better or worse across entire cold-weather months versus hot-weather months” — a coarser but sometimes cleaner signal than a single day’s temperature reading.

Mood. A simple 1–5 scale logged before the session. Subjective, yes, but surprisingly predictive. There’s a real correlation in my data between pre-session mood and how well the session goes — which sounds obvious in retrospect but is easy to dismiss before you’ve seen it.

Sleep. Hours of sleep the previous night. Not tracked perfectly — I don’t wear a tracker — but a rough estimate is better than nothing.

Rest days. Whether the previous day was a rest day or a training day. Accumulated fatigue is a real variable.

The dataset structure

Each row is a single working set:

date | exercise | weight_kg | reps | sets | effort_rating
calories | protein_g | carbs_g | fat_g | pre_session_meal_hours
creatine | glutamine | preworkout
sleep_hours | mood | temp_celsius | rest_day_prior

I keep it in a flat CSV. Nothing fancy. One row per set means the dataset grows quickly — a typical session generates 15–25 rows — but it makes per-exercise analysis straightforward without any joins.

What I’ve found so far

A few patterns that have emerged:

Creatine loading and deload is visible. When I’ve taken a week off creatine and come back, there’s a dip in max output during the off period that recovers over about a week after resuming. The effect is real in my data, though the magnitude is modest.

Calories the day before matter more than calories on the day. I expected same-day nutrition to be the dominant signal. It’s not. What I ate 24 hours prior has a stronger correlation with performance than what I ate that morning.

Mood is the strongest single-session predictor. More than sleep, more than diet, more than supplements. A 2/5 mood session almost always underperforms a 4/5 mood session, controlling for everything else. I don’t know whether mood causes the bad session or whether I’m detecting pre-session fatigue through the mood rating, but the correlation is strong either way.

Temperature has a threshold effect. Performance is stable across a wide range but drops noticeably once the gym gets hot and humid — above roughly 30°C outside, I see a consistent dip. The cold months are a different story: there’s a modest performance uptick in winter compared to the middle of summer, though the cold and humidity bring their own issues (joints, warmup time). The most interesting finding is at the seasonal level: looking at monthly averages, my best numbers cluster in the cooler transition months — autumn and spring — when the gym is neither a sauna nor a cold box. It’s the kind of pattern that’s invisible when you’re just logging individual sessions but obvious once you group by season.

What I’m not doing

I want to be clear about the limits here. This isn’t a controlled study. There’s no blinding, no control group, no randomization. Confounders are everywhere. I can identify correlations but I can’t establish causation.

I’m also not getting bloodwork done. I don’t know my actual creatine saturation levels or my inflammatory markers after hard sessions. A sports physiologist could tell me things this dataset can’t.

What I can do is build a personal baseline and notice deviations. If my numbers drop and I can’t explain it from the data, that’s a signal worth investigating. If I see a consistent pattern around a specific variable, I can test it deliberately — take creatine off for two weeks intentionally, log carefully, then reload and compare. The dataset makes that kind of self-experiment tractable.

Getting started

If you want to do something similar, the minimum viable version is:

Log every set: exercise, weight, reps, effort out of 10
Log calories and protein from a dietary app (FatSecret, Cronometer, MyFitnessPal — anything)
Note whether you took your key supplements that day
Log sleep hours and a mood rating before each session

That’s six or seven data points per session, most of which take under a minute to enter. Give it three months and you’ll have enough data to start seeing patterns that would have been invisible otherwise.

The gym is already a controlled environment compared to the rest of your life. You show up, you do measurable things, you track a number. Adding a few more columns to that log costs almost nothing and gives you a dataset nobody else has: your own physiology, in your own context, over time.

There’s something quietly absurd about being your own guinea pig. You’re the researcher, the subject, the IRB, and the one who has to live with the results. No ethics board approved my creatine deload. No lab validated my mood scale. My sample size is one, and that one person is deeply biased about his own performance.

But that’s also exactly what makes it useful. No study on “men aged 20–40” will ever tell you what your body does when you skip glutamine for two weeks and eat at a 400-calorie deficit the day before leg day. Only you can run that experiment, and only you have the motivation to actually care about the answer. The n=1 problem is also the n=1 advantage — you’re not averaging over a population, you’re building a precise model of one very specific system, your own body.

You also develop a kind of body literacy that’s hard to get any other way. After a few months of logging, you start noticing things before you even open a spreadsheet. You walk into the gym and think “I slept badly, I barely ate yesterday, it’s humid — this is going to be a rough one” — and you’re right, and you know why you’re right. That’s not mysticism, it’s pattern recognition built on data you collected yourself.

Worst case: you spent five minutes a day logging things and learned nothing. Best case: you become genuinely difficult to surprise by your own body. Either way, the gym sessions still happened. The data was free.