Our interviewee today is Dan who is a third-year Ph.D. student in Computer Science at UC Berkeley. Dan is primarily interested in Machine Learning Safety. Some of his notable works include the GELU activation function, the activation function used in BERT and GPT. His work in robustness and uncertainty includes proposing the baseline for detecting anomalies with deep neural networks; creating robustness benchmarks for natural adversarial examples and image corruptions; and so on.
Dan has interned at DeepMind where he conducted research on robustness and uncertainty under Balaji Lakshminarayanan. During this tenure, he developed AugMix. He also co-organized the Workshop on Robustness and Uncertainty Estimation in Deep Learning at ICML 2019 and 2020. He has also acted as a reviewer for many tier-1 conferences like NIPS 2017, ICML 2018, ECCV 2018, ICLR 2019, CVPR 2019, ICML 2019, and so on.
An interview with Dan Hendrycks, Ph.D. student at UC Berkeley
Sayak: Hi Dan! Thank you for doing this interview. It’s a pleasure to have you here today.
Dan: Thank you for the invitation.
Sayak: Maybe you could start by introducing yourself — what are you currently working on, what are your research interests, and so on?
Dan: Right now I am researching machine ethics, or how to embed ethics and human values into machine learning systems. (1) Can we create video understanding models that know that while cat videos are normally cute, a video of someone cruelly throwing a cat across a field usually evokes anger? (2) Can we get NLP models to understand the human values encoded in the law, and can they apply their legal understanding to new scenarios? (3) How can we control RL agents in diverse text-based video games to avoid needlessly killing characters and also apply minimum necessary force? Can we command them with text toward better behavior without hand-crafting a reward for each environment? These projects will test a machine learning model’s grasp of shared human values and their ability to act in accordance.
Sayak: So much diversity along with practical relevance, Dan! I am interested to know how did you become interested in Machine Learning Safety. Could you shed some light on that?
Dan: It is a common cognitive bias for people to wholly ignore low-probability events until they’re unavoidable (consider COVID). Since an ounce of prevention is a pound of cure, I have mainly focused on identifying new risks and turning them into research problems. This way the research community can start working on the problem early.
I started on this path in my first year of college when Bastian Stern, then a Ph.D. student of Philosophy at Oxford, encouraged me to work on machine learning safety for these reasons. When I started ML research, people were hardly talking about risks such as AI accidents, but now trustworthy machine learning, robustness, and uncertainty are common research topics. Now that the risk of AI accidents has attention, I am now trying to reduce the risk of machine learning systems being out-of-touch with humanity and its diverse values. This risk will eventually be magnified as ML systems increasingly interact with humans.
Sayak: Pretty nice to know what motivated you to start with not-so-talked-about stuff and actually find effective directions about them. When you were starting what kind of challenges did you face? How did you overcome them?
Dan: When I started ML research, the largest issue was not having enough GPUs to train ImageNet models. It is hard to convince researchers without ImageNet experiments. Without enough GPUs, I consequently worked on less standard problems where ImageNet experiments were not demanded by reviewers. Finally, in mid-2019, three years after starting research, I acquired enough resources to train my first ImageNet model. Now I’m fortunate enough to have many GPUs and TPUs. Research definitely requires perseverance and not giving up on yourself.
Sayak: I could not agree more about the perseverance part, Dan. Also, it’s really amazing to get to know about your diligence toward conducting research.
On that note, what is your general philosophy for conducting research? How do you generally scope a research problem and set up the experiments?
Dan: I generally like to spend most of my time thinking about research directions before pursuing a project, and then I like to get in a tight empirical feedback loop. I’ll give a concrete example. With the AugMix data augmentation paper, I spent over a month just thinking through questions such as the following.
“Are invertible models just a fad, or should I try to use them to improve robustness? Why are adversarial corruptions far more destructive than random corruptions? A diverse data distribution helped models detect examples from unseen distributions in the Outlier Exposure paper; could something similar be done for robustness? What are some intuitions to explain mixup data augmentation’s performance?”
I spent the first half of the internship just thinking about research directions. Once it seemed that diverse data augmentation was the most promising direction to improve robustness, I created a testbed to try out new ideas hourly. Since “good ideas are a dime a dozen,” it is important to put a large number of ideas to the test and see if they survive the collision with reality.
Sayak: Thank you so much for sharing this, Dan. Many researchers find it extremely difficult to properly model problem statements. The approach you shared would definitely be helpful in this regard.
Your works span across both NLP and Computer Vision. I am very interested to know about your take on novel idea generation. How do you pursue that process?
Dan: When I first started research I would try to generate ideas by falling into a state called hypnagogia, a state between wakefulness and sleep. That was useful for creativity, but I have not done that in a long time since I just fall asleep too frequently!
Today, after checking arXiv nightly, I go outside, pace back-and-forth, and try to think through intuitions and arguments. I also try to think about the future of the field, so that I can “skate to where the hockey puck will be,” so to speak.
Sayak: I would refer to this as a fearless mind. This question is from a friend, Ayush Thakur.
Recently, he did a comparative study of different data augmentation techniques in computer vision and this study includes CutMix, Mixup, CutOut, and AugMix. As we see in the study that AugMix is a clear winner in terms of getting a robust model to data shifts. However, it takes longer to train with AugMix generally. Do you have future plans to improve that?
Dan: It’s possible to disable AugMix’s consistency loss and just use it as a data augmentation technique. When you do that, it should be around as fast as the other methods.
We also recently proposed DeepAugment where augmentations are synthesized before training so that the required training time is the same as in normal training.
Sayak: Thanks for sharing that, Dan. DeepAugment gets added to my “to-be-played-around-with” list. Among all your works, Natural Adversarial Examples is by far the most favorite of mine. Would you like to talk a little bit about the work? What sprung the motivation behind it?
Dan: We searched for images like “tiger in mud” to find tougher images that should break a classifier. Sometimes the model would surprisingly correctly classify those images. Sometimes the model would incorrectly classify “tiger” images that were clean and normal. This was strange because it was assumed that classifiers were reliable on clean and normal images. Consequently, we decided to zoom into the model’s error distribution and we collected thousands of examples that “superhuman” classifiers should be able to easily classify, but as we show they often cannot.
Sayak: I see.
Your career has been so successful so far. Not only in terms of your research but also in your service as a reviewer to many prestigious conferences. It’s very inspiring to me. Any advice for beginners?
Dan: First, know that academe is governed by the Matthew effect. That means people are given in proportion to how much they have previously been given. For example, a student who lucked into a research internship is more likely to get future research opportunities. Played out over time, the Matthew effect means that opportunities, resources, and influence each compound. The Matthew effect leads to highly skewed outcomes, following something like a Pareto distribution. Therefore it is extremely important to put in the extra effort early on for your research trajectory to have great outcomes.
Second, academic research environments often make many people unhappy. The lack of structure seems to breed neuroticism. Additionally, there is no immediate penalty for slacking, so graduate students often follow the shorter-term incentives and procrastinate at the expense of their long-term well-being and satisfaction. To combat this, I suggest clinging to routine and healthy habits so that one’s day is not determined by one’s mood.
Third, stay in the zone of proximal development. That means taking on projects that have a reasonable but not absolute chance of succeeding. Then get feedback, and adjust the difficulty of your next project accordingly. While attempting ground-breaking research should be encouraged, I think most researchers should first obtain evidence that it is within their capacity to produce a well-received workshop or conference paper. Figuratively, do not immediately try to deadlift 500 pounds without having first lifted 50 pounds. In many areas of machine learning, ideas are getting harder to find and there is less low-hanging fruit, so first get a sense of the limits of your reach and improve from there.
Sayak: Thank you so much, Dan, for doing this interview and for sharing your valuable insights. I hope they will be immensely helpful for the community.
Dan: Thank you for the fun questions.