Ben Cerio: From Physicist to Data Scientist

Friday, December 2, 2016

When Ben Cerio, PhD ’15, came to Duke Physics as a graduate student, he imagined he would eventually become a physics professor. Instead, he’s a data scientist, a job the Harvard Business Review in 2012 called “the sexiest job of the 21st century.”

Cerio works at a startup in San Francisco called DoubleDutch, which produces an app that attendees can use during conferences and other events to make a schedule, network with other attendees, comment on sessions, and participate in surveys.

Although Cerio’s current work might seem a world away from the focus of his PhD thesis—the Higgs boson and the ATLAS experiment at CERN—he says the projects share some similarities.

“The transition was very natural for me,” he says. “At Duke, I was writing software that accepted as input electrical signals from an enormous detector, ATLAS. At my current company, I’m writing software that takes signals from a distributed detector, which is the users’ mobile devices.”

In both cases, calculations using the signals yield useful measurements—albeit of a very different sort.

Making the transition even smoother was the fact that at Duke, Cerio used cutting-edge machine learning to increase the statistical sensitivity of the input signals, and machine learning is widely used to analyze data in industry.

The idea of becoming a data scientist began to appeal to Cerio when he was still in graduate school. He had doubts about whether he would be happy as a life-long academician, and furthermore, he saw other recent grads struggling to make the leap from postdoc to professor because so few positions open up in high energy physics each year.

So after he earned his PhD, Cerio attended an eight-week data science boot camp called The Data Incubator. (His Duke classmate Huaixiu Zheng, PhD ’13, attended a boot camp called Insight Data Science; he now works as a machine-learning data scientist at Uber Technologies.) Competition for entry into these programs is fierce, but once accepted as a fellow, tuition is free (paid for by companies that hire graduates) and virtually all fellows land jobs in the field.

At the boot camp, Cerio and his classmates—all of whom had advanced degrees in math, engineering, or science—learned to apply the analytical skills they had honed in the service of very specific technical problems to more general problems. The students also learned the “soft skills” of interviewing, networking, and industry culture.

“I think one of the biggest values in these boot camps is the network,” Cerio says. “My cohort was full of these incredible people from top-tier schools with really fancy degrees and now they are in my network.”

Soon after finished boot camp, Cerio accepted a position at DoubleDutch. “The product I’m working on is a very new concept, and because of that there’s not a prescribed way to do things,” he says, “so I’m constantly having to think creatively about how to bring in new data and leverage state-of-the-art techniques.” The goal is to use the data to improve the conference-going experience for individual users as well as to generate marketing information for those running the conference.

The problem-solving skills and tenacity Cerio learned as a graduate student serve him well in his new job. “There’s a sort of grit or persistence that you develop over the course of your PhD that creates this confidence in your ability to solve any problem that’s given to you,” he says. “People are giving you these problems that at first glance seem extremely hard but I have this confidence that I can massage the data to do something useful.”

One aspect of his job that’s quite different from physics graduate school is talking to his co-workers about what he’s doing. “I’m the only data scientist at my company,” he says. “Most people are non-technical so you have to find a way to bring data alive to them. That’s extremely challenging to me and something I’m still figuring out.”

In fact, he recommends that current graduate students looking to go into data science take advantage of public outreach activities so they can practice communicating with non-scientists. His also suggests taking programming classes, particularly in industry-favored languages like Python, Java, and C++.

While Cerio is enjoying his current position, he views it as a stepping stone. “As a graduate student, I worked on a very big scientific problem—the Higgs boson,” he says. “Now I’m working on a smaller problem with less potential to directly benefit people. Eventually, I want to take the skills I’ve learned and apply those to bigger problems. Now that health records are standardized and electronic, there’s this massive amount of data. I want to use state-of-the-art machine-learning algorithms to, for example, build models to tailor treatments for cancer patients. It’s a similar problem in that it’s basically building a model from a large, high-dimensional dataset, but in this case the societal payoff is much greater.”

Mary-Russell Roberson is a freelance science writer who lives in Durham.