An interview with Thomas Wolf, Chief Science Officer at Hugging Face
We have Thomas Wolf for today’s interview. Thomas is currently leading the science team at Hugging Face, a startup working on Natural Language Generation and Natural Language Understanding. He is interested in Natural Language Processing, Deep Learning, and Computational Linguistics.
Thomas is a passionate teacher and he used to teach during his Ph.D. at MIT. He blogs about the things he finds interesting and all of his blogs can be found here. Thomas happens to be an active speaker as well and details about the same can be found here. You can know more about Thomas from here.
I would like to wholeheartedly thank Thomas for taking the time to do this interview. I hope this interview serves a purpose towards the betterment of data science and machine learning communities in general :)
An interview with Thomas Wolf, Chief Science Officer at Hugging Face
Sayak: Hi Thomas! Thank you for doing this interview. It’s a pleasure to have you here today.
Thomas: Hi Sayak, it’s a pleasure to talk to you.
Sayak: Maybe you could start by introducing yourself — what is your current job and what are your responsibilities over there?
Thomas: I’m the Chief Science Officer of Hugging Face, an open-source/open-science startup based in Brooklyn and Paris working on Natural Language Processing (NLP). We focus on catalyzing and democratizing the recent research breakthrough in NLP. My daily work is quite diverse and ranges from teaching (I was teaching at the NLPL Winter School on NLP last week for instance), research community duties (I’m area chair at ACL 2020 and organize several conferences and workshop this year, EurNLP 2020 in Paris and SustaiNLP at EMNLP 2020), to research (mostly on future and more efficient language models but also better understanding our models) and of course working on open-source libraries and tools (transformers, tokenizers, and our future tools).
Sayak: Must be very fun working on these different diverse responsibilities! Could you tell us how did you become interested in machine learning?
Thomas: My career path is rather unconventional. I have a Ph.D. in quantum statistical physics during which I was working on superconducting materials. After that I needed a break from research and actually ended up switching from physics to law, I got a law degree and worked as a patent attorney for 5 years with a portfolio of startups and big companies. Over my last years as an attorney, most of the startups I was advising were working on Deep Learning and this led me to discover the field of Machine Learning and AI. I realized that most of the math behind ML was rebranded physics :-) This led me to dive deeper in the theory of ML/AI and start my offline education (I call it an offline education because I mostly like to learn by reading books rather than online blogs or code).
Here you can find some of the books and courses Thomas recommends for learning Machine Learning.
Sayak: Pretty interesting! When you were starting what kind of challenges did you face? How did you overcome them?
Thomas: I’m interested in many things in AI/ML/NLP and one of my main challenges has always been to prioritize and to select which project I want to spend significant energy on at a given time. I also try to avoid multitasking by not having too many projects in parallel. The way I manage to solve the prioritization issue is usually by focusing on transversal projects that will move forward, or enable, several projects at the same time.
Sayak: I definitely concur with you on the prioritization issue. I myself suffer from it and thank you for sharing your approach here. I am definitely going to try it. What were some of the capstone projects you did during your formative years?
Thomas: A funny capstone project I did while being a lawyer, roughly 4 years ago, was a magic sandbox I built for my son which comprised hardware/software and became viral, reaching the front page of Reddit and several news outlets (https://imgur.com/gallery/Q86wR). I learned a lot while doing this project, in particular how it feels when a project becomes viral and how to share a project with the community (https://github.com/thomwolf/Magic-Sand).
A magic sandbox I made for my 3 y.o. son's birthday. Detailed BOM and source code included.
Post with 13494 votes and 773826 views. Tagged with Awesome; Shared by Thomwolf. A magic sandbox I made for my 3 y.o…
Sayak: Woah! This is just fantastic! Looks like I got an idea for a project for this summer. Hugging Face is such a nice tool for doing cutting edge NLP. Starting from the API design to the kind of performance it produces — all top-notch! You are the one leading the team that fuels everything behind this — the science team! Would you like to share some thoughts about what kind of challenges do you face, some general perspectives towards the role etcetera?
Thomas: One of the challenges we face is to anticipate future developments in the field. A lot of research work in NLP/ML is about inventing new models and going beyond existing work so it can be tricky to design an API and user-experience for things that have not been invented yet. We solve that by being very close to research, doing the research ourselves with top-tier groups and researchers.
Sayak: Thanks for sharing that philosophy, Thomas. These fields like machine learning are rapidly evolving. How do you manage to keep track of the latest relevant happenings?
Thomas: I’m a science geek and definitely in love with the field so there is not much I like more than having a big pile of papers to read and some time to dive into them. It doesn’t feel like work. I usually select the papers I read based on recommendations from friends or posts by the people I follow on Twitter which has a pretty amazing ML community.
Sayak: This was very honest. From my other interviews and your inputs, of course, it is now evident that Twitter is a wonderful medium to stay up-to-date with the stuff. Being a practitioner, one thing that I often find myself struggling with is learning a new concept. Would you like to share how do you approach that process?
Thomas: Reading several papers or books with different approaches to the same concept is usually a good way.
Sayak: Can we expect a book from you anytime soon on the intersection of deep learning and NLP?
Thomas: Hahaha I’ve looked a bit into that because many people (and editors) have been asking but, wouah, writing a book takes so much time!
Sayak: That’s true! Any advice for the beginners?
Thomas: If you really love the field you’ll find your way into it, don’t worry.
Sayak: Thank you so much, Thomas, for doing this interview and for sharing your valuable insights. I hope they will be immensely helpful for the community.
Thomas: Thanks a lot Sayak, it was a pleasure!