An interview with Abhishek Thakur, Data Scientist, and Kaggle 3x Grandmaster
Our interviewee today is Abhishek. Abhishek is currently with boost.ai serving as a Chief Data Scientist. In the past, Abhishek has worked in a number of companies as a Data Scientist. He is also advising a Bangalore-based startup named Stylumia.
Abhishek is the world’s first Kaggle Triple Grandmaster. His highest Kaggle World Rank is 3. Abhishek’s research interests are in the areas like automated machine learning, hyperparameter optimization and so on. Abhishek is also the organizer of the Berlin Machine Learning Meetup. You know more about Abhishek from here.
I would like to wholeheartedly thank Abhishek for taking the time to do this interview. I hope this interview serves a purpose towards the betterment of data science and machine learning communities in general :)
An interview with Abhishek Thakur, Data Scientist, and Kaggle 3x Grandmaster
Sayak: Hi Abhishek! Thank you for doing this interview. It’s a pleasure to have you here today.
Abhishek: Thank you Sayak. The pleasure is all mine.
Sayak: Maybe you could start by introducing yourself — what is your current job and what are your responsibilities over there?
Abhishek: I work as Chief Data Scientist at boost.ai which is a company based in Norway. We build Conversational AI. My job is to implement state-of-the-art Natural Language Processing / Understanding components and the deep learning models which we use to answer end-users and to make sure they work fine all the time.
Sayak: Interesting! Would you like to share how you became interested in data science and machine learning?
Abhishek: I was always interested in computer science so after finishing my Bachelor’s in Electronics Engineering from NIT Surat, India, I moved to Germany to do MSc in Computer Science from the University of Bonn. I was working as a student at Fraunhofer during my masters and my job was to implement OCR algorithms on microcontrollers. My friends were working with NLP and machine learning. Talking to them made me interested in the field and I started with machine learning.
My friends were working with NLP and machine learning. Talking to them made me interested in the field and I started with machine learning.
Sayak: This is nice to know. Great things often start with right conversations. When you were starting what kind of challenges did you face? How did you overcome them?
Abhishek: Well, first of all, I had no machine learning subjects. I tried taking the one from the university but failed miserably. The courses were all theoretical and barely focussed on any applied concepts. I started learning on my own. I found Kaggle and started with an on-going competition. Needless to say, I failed. Then I checked the solutions of the winners and came across huge terms like random forest, neural networks, etc. So, I started Googling and looking up these terms. I found a lot of papers, I read them, even implemented some of them and then I read more. It went on like this for 10 months. I would start with a problem and learn about how to solve it by trying to solve it.
I would start with a problem and learn about how to solve it by trying to solve it.
Sayak: Woah! That was pretty diligent and I certainly think this go-getter philosophy works well specifically you are into implementing things. What were some of the capstone projects you did during your formative years?
Abhishek: We did not have any capstone projects, we had a Master’s Thesis. My thesis was on computer vision, related to saliency in the images. It was barely connected to machine learning. If I remember correctly only algorithm related to machine learning I used was K-Nearest Neighbors.
Sayak: I see, thank you for sharing that. Your Kaggle work has been super amazing so far. Would you like to share any pointers on how you approach problems in Kaggle?
Abhishek: My approach to any problem is by starting with taking a basic look at data and with the simplest models. It’s all about building a personal benchmark and then improve on it in a stepwise manner. For example, if the problem is about text classification, I start from TF-IDF and not BERT. When I think I have achieved a score that cannot be improved by traditional methods, I dive into deeper and advanced models.
For example, if the problem is about text classification, I start from TF-IDF and not BERT. When I think I have achieved a score that cannot be improved by traditional methods, I dive into deeper and advanced models.
Sayak: These fields data science and machine learning are rapidly evolving. How do you manage to keep track of the latest relevant happenings?
Abhishek: Most of the new stuff I learn about is from social media. There are many genius people one should follow and they often share what they are working on, for example on Twitter, Reddit and even on LinkedIn. I keep myself updated on the latest happenings using these channels and on the latest papers through ArXiv and Papers with code.
Sayak: I think among all the resources you mentioned Twitter is certainly my favorite. I personally use it a lot to come across the latest stuff. Being a practitioner, one thing that I often find myself struggling with is learning a new concept. Would you like to share how do you approach that process?
Abhishek: It’s always difficult to learn a new concept. For me, the problem becomes a bit easier when we learn the applications of the concept and how it is used. Let’s say I am learning how to tackle time-series problems for the very first time. I would start with my own traditional approach and try to build a baseline based on that. Then I would keep on improving on the baseline. When I’m exhausted in terms of the things that I have tried, I start looking for different approaches to solve the problem. In this way, I keep learning about the new concepts and it’s applications at the same time. I’ve been following this approach for a long time now. Learning by doing :)
Sayak: Learning by doing and getting motivated for that doing — really cool! Can we expect a book from your end anytime soon?
Abhishek: I have co-authored a book but I don’t talk about it much. As a matter of fact, I am planning on something, so look out for an announcement quite soon :)
Sayak: Awesome! I will be looking forward to it definitely! Any advice for the beginners?
Abhishek: There is so much going on in this field that it sometimes becomes overwhelming, especially for beginners. Most of the beginners feel exhausted when starting with machine learning and don’t know where to start. There are quite good courses from Andrew Ng and he explains the most difficult things in the simplest possible manner. My advice would be to gain theoretical knowledge first and start with applications as soon as possible. Pick a problem on Kaggle and not the 101s, the real competitions, and try them with what you have learned. Look at how others are solving the same problem and implement them on your own. Do not copy-paste. When you fail, repeat, and you will succeed. It’s also very important to have a good portfolio of projects these days and these competitions are going to help with that too. Recently, I have also started a YouTube channel that focuses on applied machine learning and is beginner-friendly. In the end, it’s all about perseverance.
In the end, it’s all about perseverance.
Sayak: I absolutely agree especially to the point of starting projects with Kaggle. Thank you so much, Abhishek, for doing this interview and for sharing your valuable insights. I hope they will be immensely helpful for the community.
Abhishek: Thank you Sayak! It has been an absolute pleasure.
I hope you enjoyed reading this interview. Watch out this space for the next one and I hope to see you soon. This is where you can find all the interviews done so far.
If you want to know more about me, check out my website.