I wanted to get comfortable with the real questions that come up in experimentation:
Also: I wanted a project that could stand on its own as a portfolio piece ... something you can open, run, and understand without needing a long explanation.
The plan :I planned the work as two connected projects, each with a distinct job:
This is the “toolbox.” It includes functions that take data from two groups (A and B) and return results in a consistent format.
My goal here was simplicity:
This is the proof. It runs many simulated experiments and checks how the toolkit behaves.
This was the key idea:
It’s easy to write A/B testing code that seems correct. But, it’s much harder to prove it behaves correctly across many situations.So instead of trusting formulas blindly, I used simulation to validate things like:
This second repo is what turned the project from “I implemented some statistics” into “I understand experimentation like a platform problem.”
The ExecutionI treated this like a mini product build rather than a notebook exercise.
Before writing much functionality, I set up:
It felt annoying at times, but it paid off. Once the workflow was in place, it became easy to make small improvements without breaking everything.
I implemented the toolkit gradually, focusing on what real experimentation teams use day-to-day.I started with simpler metric types and built toward harder ones:
Each time I added functionality, I added:
This was a big mindset shift for me. In real experimentation platforms, the job is not only to compute a p-value. You also need checks like:
So I added features that make the toolkit feel like a tiny experimentation platform rather than a stats demo.
Once the toolkit existed, I built a separate repo that:
This gave me a clear way to validate assumptions and build confidence in the toolkit.
What I learned (the real takeaways)The hardest part is everything around it:
Simulation gave me a way to sanity-check my own thinking.
It also made the project feel grounded: not just math, but “does this behave the way a platform needs it to behave?”
CI, formatting, and tests are not just for show. They let you move faster without constantly breaking things.
And honestly, having clean repos made the whole project more enjoyable to work on.
Closing thoughtsThis project was about building a system you can trust ... technically and culturally. If you’re learning experimentation and want a project that forces you to understand what’s going on under the hood, I highly recommend building something like this. The learning curve is real, but the payoff is even bigger.
If you’d like to explore the code:
Hi, I am Arnav! If you liked this article, do consider leaving a comment below as it motivates me to publish more helpful content like this!
Related Posts