An interview with Girish Palshikar, Principal Scientist at TCS Research and Innovation
Our interviewee today is Girish Palshikar. Girish is a Principal Scientist at TCS Research and Innovation. His research interests lie in areas like Machine Learning, Data Mining, Natural Language Processing, and Text Mining as well as Theoretical Computer Science. For his tremendous research contributions, he was awarded TCS Distinguished Scientist in 2012.
Girish also serves as a Visiting Faculty for the University of Pune and College of Engineering Pune. All the details about the courses he teaches can be found here. Girish is an active speaker too and many of his presentations can be found here. If you are into research, you might not want to miss out on his How to Read a Research Paper deck. If you want to know more about Girish, be sure to check out his Google Site.
I would like to wholeheartedly thank Girish for taking the time to do this interview. I hope this interview serves a purpose towards the betterment of data science and machine learning communities in general :)
An interview with Girish Palshikar, Principal Scientist at TCS Research and Innovation
Sayak: Hi Girish! Thank you for doing this interview. It’s a pleasure to have you here today.
Girish: Thank you for inviting me!
Sayak: Maybe you could start by introducing yourself — what is your current job and what are your responsibilities over there?
Girish: I am an alumnus of both IIT Bombay and IIT Madras. Since 1992, I am working in TCS Research, Pune, India, where I am now a principal scientist and lead the Machine Learning R&D Group. In 2012, I was honoured with the title of TCS Distinguished Scientist. TCS Research is the R&D organization within Tata Consultancy Services Limited — a premier and large software company in India with ~4,20,000 employees. I have published ~130 publications in international journals and conferences. I am also a visiting lecturer at the Computer Science Department of Savitribai Phule University of Pune and Government College of Engineering, Pune (COEP). My areas of research include machine learning, data mining, text mining, natural language processing and their applications to various domains, including fraud detection, human resource (HR) management, education, law, and medicine.
Sayak: 27 long years! I am sure they have been fun, rewarding and challenging at the same time. I am curious to know how did you become interested in data science and machine learning?
Girish: I was interested in Artificial Intelligence (AI) since my college days. I was initially working in mathematical logic, which is a part of the classical, old-school AI — knowledge representation, inference etc. At the same time, I was also interested in the nature and complexity of human languages and in philosophy, particularly understanding epistemological questions: What is knowledge? How do we acquire knowledge? How do we validate and revise knowledge? How do we know what we have is indeed knowledge? Are there any limits to what can be known? So, it was a natural move to natural language processing (NLP), which builds systems that “understand” human languages, and to machine learning (ML), which builds algorithms that create knowledge! While there are many ways for a machine to create knowledge, creating knowledge from past data (inductive learning) is the most well-explored, and most useful method. I was drawn to such data-driven ML techniques, because of my strong interest in real-life applications of AI.
Sayak: The questions posed by you are indeed very interesting and ever-evolving. Plus the way you made your way through the world of NLP and Machine Learning makes so much sense. When you were starting in machine learning what kind of challenges did you face? How did you overcome them?
Girish: There were many technological limitations: slow and small hardware, low-bandwidth communications, or inadequate programming language support for AI / NLP / ML problems. The availability of enterprise data was a challenge. Luckily, ML / NLP was new, raw fields and a large number of practical problems were simply begging for AI solutions. All I did was find and solve many of them. It was amazing to see how ML / NLP algorithms helped solve many real problems, in ways that were incomprehensible earlier. Research in both NLP / ML was evolving really rapidly, and I was lucky to witness and participate in its explosive growth.
Sayak: Interesting! You have been a part of so many research projects throughout your career. Would you like to share more about some of them that you found to be capstone?
Girish: I have worked on many research projects and have published ~130 research papers. Given the limited space here, I will talk about only a few projects (“I” here always refers to “I and my team”). My research is driven from real-life applications where AI can help.
I feel strongly about frauds, scams, corruption and money laundering because they steal money, goods, and services from people. I studied share market trading frauds, such as circular trading and price manipulation, and realized that there were no techniques to detect them. Hence, I designed several unsupervised ML algorithms for this purpose. I continue to work on many such problems.
I was fascinated by HR as a discipline that must deal with unpredictable human behaviour in a wide variety of organizational contexts. I realized that many HR tasks have no AI support, which prompted me with a research agenda build ML / NLP algorithms for them: mine employee survey responses (text and data) to derive insights about the issues faced by them, mine performance appraisal text and data to derive insights to improve its quality and effectiveness, recommend industrial training courses to help employees reskill themselves and predict attrition, among others. I have published ~40 papers in HR analytics.
These days I am working on extracting “knowledge” from domain-specific documents, such as history textbooks, court judgments, and biomedical textbooks and research papers. Just to give an idea, I have built NLP algorithms that automatically extract a novel visual “storyline” representation from, say, a chapter on Napoleon. I use it for several purposes, including helping students understand, generating exam questions and checking answers.
Sayak: Those are some amazing research projects, really. All are so useful and impactful in their own course. Thank you for sharing them, Girish. These fields data science and machine learning are rapidly evolving. How do you manage to keep track of the latest relevant happenings?
Girish: To do research, one must constantly read books and papers, as they are being published. I also teach at COEP and Pune University, which forces one to organize the material in interesting ways. I give a lot of seminars and workshops as part of TCS’ Academic Interface Program. I also regularly give presentations in our internal “journal club”, each discussing one “interesting” research paper, explaining the details and its strengths. Attending conferences, expert sessions, tutorials and even courses on the Internet (e.g., NPTEL) helps me a lot.
Sayak: This is quite comprehensive. Reading books and papers is definitely something I can connect to for this aspect. Being a practitioner, one thing that I often find myself struggling with is learning a new concept. Would you like to share how do you approach that process?
Girish: A wide background in various branches of mathematics is needed for understanding ML / NLP concepts. A good understanding of probability, statistics, stochastic processes, linear algebra, information and coding theory, optimization, algorithms, complexity theory is needed. One must constantly study. I often read papers with an “applied” viewpoint: how can I use this? Reading with such a question in mind, drawing diagrams of your own and having handy examples around on which to try the technique you are reading about also help me a lot.
Sayak: Very helpful insight, Girish. Thank you. Many students find it very hard to properly do research in the field they are interested in. Would you like to share your thoughts on this?
Girish: First get at least a minimum dose of the required background. Also, have an application in mind to fix your context and your ideas. First write a 3–4-page problem definition, giving background, motivation, challenges, benefits of solving the problem, problem formalization etc., without worrying about how to solve the problem. Read and present some basic papers in the area. Add a 2-page literature survey summarizing (in your own words) maybe 10–12 most relevant papers, criticize their techniques for any limitations to solve your problem. Gather the relevant resources (e.g., datasets, public-domain implementations). Then design your own solution on paper, developing with the necessary theory, and convince yourself why it will do better than the relevant techniques. Design experiments, and evaluation criteria carefully. Document all this carefully, and present it to colleagues. Only then go for implementation.
Sayak: This is very useful. It is definitely very methodical. Any plans on authoring a book?
Girish: Hmm. Let me see.
Sayak: I will look forward to reading your book when it’s published! Any advice for the beginners?
Girish: Know your strengths and your weaknesses. Play to your strengths. Learn + practice to reduce your weaknesses. Motivate yourself with a larger and long-term context (or a question) that interests you a lot. AI for poverty eradication? AI for reducing hunger? AI for a just society? AI for reducing wars or social conflicts? AI for an absolutely new kind of music? Pick your battle! Such social problems will not have purely technological solutions, of course, but AI can certainly try to help. That way, many interesting discoveries will be made.
Sayak: Some very interesting and alarming ideas there. Thank you so much, Girish, for doing this interview and for sharing your valuable insights. I hope they will be immensely helpful for the community.
Girish: Thank you!
Girish is a veteran in the field. Having worked in this area for more than 25 years, he discussed many interesting ideas, helpful insights, and some critical challenges. Girish’s zeal to work on highly challenging and impactful areas like fraud detection, anti-money laundering and so on is definitely very inspiring and thought-provoking.
I hope you enjoyed reading this interview. Watch out this space for the next one and I hope to see you soon. This is where you can find all the interviews done so far.
If you want to know more about me, check out my website.