about

Samir Patil

I'm an ML engineer based in Pune, India. I spent most of the last decade working on production systems — distributed services at Druva, ranking models at Google Maps Ads, and now a multi-agent platform at RunWhen — but the work that excites me most has always been the same: getting machines to do hard things reliably.

Right now that means online reinforcement learning for tool-using agents. It's the part of post-training where the systems get to act in the world — call APIs, read files, run code, navigate environments — and you have to teach them to choose well across long horizons. The problems are part ML, part systems, part taste, and nobody has them figured out yet.

I write here for two reasons. The first is selfish: writing forces me to understand things. The second is reciprocal: I've learned more from other people's open notes — Tunstall on TRL, Lambert on Interconnects, the verl maintainers' design docs, countless DeepSeek and Anthropic papers — than from any structured course. If I can return even a fraction of that to whoever comes next, the time is worth it.

If you're working on agent RL, post-training, or frontier-model engineering — especially if something I've written helped or annoyed you — I want to hear about it.


Currently

Founding Engineer @ RunWhen — building AgentFarm, a production agentic AI orchestration platform on Google's Agent Development Kit.

Previously

Recognized work

Education

B.Tech, Computer Science — Vishwakarma Institute of Technology, Pune (2013–17). CPI 9.3 / 10. Vice Chair, ACM Student Chapter; Founder, Coder's Club.

Find me