An interview with Ankur Patel, Vice President of Data Science at 7Park Data

Sayak Paul
7 min readDec 8, 2019

--

Our interviewee today is Ankur Patel. Ankur is currently with 7Park Data, serving as the VP of Data Science. Previously he was a Data Scientist at ThetaRay. Ankur is deeply interested in using unsupervised learning algorithms to find hidden patterns in large-scale unlabeled data. This led Ankur to write a book on the subject: Hands-on Unsupervised Learning using Python. It has been published and is now available via Amazon and O’Reilly.

To learn more about Ankur, check out here.

I would like to wholeheartedly thank Ankur for taking the time to do this interview. I hope this interview serves a purpose towards the betterment of data science and machine learning communities in general :)

An interview with Ankur Patel, Vice President of Data Science at 7Park Data

Sayak: Hi Ankur! Thank you for doing this interview. It’s a pleasure to have you here today.

Ankur: Thanks for having me, Sayak. The pleasure is mine.

Sayak: Maybe you could start by introducing yourself — what is your current job and what are your responsibilities over there?

Ankur: I manage the data science team at 7Park Data. We take alternative data such as credit card, email receipt, clickstream, app intel, point of sale, and location data and productize them for clients such as hedge funds. These hedge funds get a more timely read of real-time economic activity from our alternative data products compared to the conventional data, which is normally reported by companies and government agencies with at least a one month lag.

Sayak: Pretty interesting. I am fascinated by the way 7Park Data does data science. How did you become interested in pursuing data science and machine learning?

Ankur: I used to be a sovereign debt trader for Bridgewater Associates, and I saw first hand just how powerful data could be in making really smart investment decisions. Once I left Bridgewater, I started my own hedge fund, largely applying data science and machine learning to data to run a 100% systematic hedge fund strategy. Since then, I’ve gotten deeper and deeper into the space, learning and applying unsupervised learning and, more recently, natural language processing. I released my first book on unsupervised learning with O’Reilly earlier this year, and I’m working on my second book now. This second book will focus on applying natural language processing in the enterprise.

Sayak: That’s wonderful to know. I will be looking forward to reading the book when it’s released. When you were starting in the field what kind of challenges did you face? How did you overcome them?

Ankur: Back in 2012, a lot of people did not know what data science or machine learning was, and they were skeptical of mining data for patterns. A lot has changed since then. Almost every single enterprise is now interested in data — how to get data, how to make decisions from data, how to automate work using machine learning, etc. It’s quite remarkable how much of a 180 has occurred in the past seven years. The big challenge for companies now is how to successfully launch, deliver, and maintain machine learning models in production. That’s what a lot of companies still struggle with, but the interest in data science and machine learning is at an all-time high.

The big challenge for companies now is how to successfully launch, deliver, and maintain machine learning models in production. That’s what a lot of companies still struggle with, but the interest in data science and machine learning is at an all-time high.

Sayak: I absolutely agree with the struggling part of the companies these days for productionalizing machine learning models. What were some of the capstone projects you did during your formative years?

Ankur: The original data science models we built at my hedge fund involved taking in lots of conventional data from government agencies and companies and evaluating them for alpha. By identifying which datasets had alpha, we were able to package good alpha signals to generate buy or sell decisions. In recent years, the scope of my work has expanded considerably into areas such as anomaly detection, named entity recognition, disambiguation, and linking, text extraction, and reading comprehension.

Sayak: Quite widespread of different tasks. That must be exciting! These fields data science and machine learning are rapidly evolving. How do you manage to keep track of the latest relevant happenings?

Ankur: It’s very hard work. Many papers get released on a daily basis. But, the number of truly impactful breakthroughs in the field are very few per year. For example, the release of Google’s BERT last fall was a watershed moment for natural language processing. Since then, at least six companies have launched their own versions of a Transformer-based language model, but these advances are more incremental in nature.

I try to focus on the advances that are monumentally impactful versus just marginally important, and being able to tell which advances are critical versus not comes from experience working in the field. I also scour Crunchbase often to find how new startups are tackling different use cases with the new technology that is coming to the market.

I try to focus on the advances that are monumentally impactful versus just marginally important, and being able to tell which advances are critical versus not comes from experience working in the field.

Sayak: This is a very comprehensive insight. I am sure I am going to try following this. Tell us about your book — what motivated you to write it in the first place? How did you approach structuring the book and things like that?

Ankur: In late 2017, I had just started working for an Israeli startup that specializes in unsupervised learning called ThetaRay. If you have any use cases in anti-money laundering or fraud, ThetaRay is your go-to shop. I realized just how little literature existed at the time on applying unsupervised learning to real-world problems. Back in 2017, unsupervised learning was considered as an esoteric, theoretical field, but unsupervised learning is having some real powerful applications in business. I had to share some of these, and that’s why I started writing the book. Unsupervised learning is the bedrock for applications such as anomaly detection, group segmentation, recommender systems, and all the generative models we’ve seen to date.

I organized the book in a way that newcomers to unsupervised learning could quickly get up to speed. Each chapter introduces the theory and is followed by the application of that theory to a real-world problem. The first half of the book involves unsupervised learning applications built from Scikit-Learn, and the latter half explores applications built from neural networks. It takes the reader on a journey one small step at a time.

Back in 2017, unsupervised learning was considered as an esoteric, theoretical field, but unsupervised learning is having some real powerful applications in business. […] Unsupervised learning is the bedrock for applications such as anomaly detection, group segmentation, recommender systems, and all the generative models we’ve seen to date.

Sayak: I must admit I am really enjoying reading your book. I have read the first six chapters and I have specifically liked the way you have showcased the code snippets. Being a practitioner, one thing that I often find myself struggling with is learning a new concept. Would you like to share how do you approach that process?

Ankur: I usually start with a search for applications of the new concept. Unless I understand how that new concept is useful in solving a real-world problem, I don’t have the motivation to spend time learning the new concept. Once I internalize the value add of learning the new concept and its usefulness, I start with the code and examples provided online to understand how to apply the concept. Only then do I dig into the theory and the math.

To me, the theory and the math are important, but they are not nearly as important as knowing what use cases the new concept is relevant for and how the code works. Many of us do not know precisely how a computer works but that does not stop us from doing incredibly important work using a computer. I find too many people new to machine learning get stuck in the theory stage and never advance beyond it. This stymies their progress in the field.

To me, the theory and the math are important, but they are not nearly as important as knowing what use cases the new concept is relevant for and how the code works.

Sayak: What beautiful analogies! This is truly motivating! Any advice for the beginners?

Ankur: Yes, I highly recommend leveraging the videos available on YouTube, O’Reilly Safari, and the MOOC providers. But, the best way to learn is to write code and build models. Don’t spend too much time in theory. Go build. Compete on Kaggle. And use resources that will get you up and running and building fast.

For mainstream machine learning, I recommend Aurelien Geron’s Hands-on Machine Learning using Scikit-Learn, Keras, and TensorFlow. For unsupervised learning, I recommend my book, Hands-on Unsupervised Learning Using Python. I also find David Foster’s Generative Deep Learning to be a very compelling read.

Sayak: I am glad I have these resources. I have read a number of chapters from Aurelien’s book and they are really comprehensive. I am yet to start David’s book, though. Thank you so much, Ankur, for doing this interview and for sharing your valuable insights. I hope they will be immensely helpful for the community.

Ankur: Of course, happy to help. For anyone that would like more, please feel free to reach me at ankur@unsupervisedlearningbook.com.

I hope you enjoyed reading this interview. Watch out this space for the next one and I hope to see you soon. This is where you can find all the interviews done so far.

If you want to know more about me, check out my website.

--

--

Sayak Paul
Sayak Paul

Written by Sayak Paul

ML at 🤗 | Netflix Nerd | Personal site: https://sayak.dev/

No responses yet