Jay Gala

AI Resident @ AI4Bharat (IIT Madras)Previously: UCSD / TCS / Stratzy

prof_pic.jpg

Hey, thanks for stopping by! 👋

I am an AI Resident at AI4Bharat (IIT Madras) under the supervision of Prof. Mitesh Khapra, Dr. Anoop Kunchukuttan and Dr. Raj Dabre. I work on building open-source datasets and models for Indian languages. I am broadly interested in the areas of multimodal and multilingual learning, specifically in the context of data-efficient learning, training dynamics, reasoning and generalization.

I am also collaborating with Dr. Zeerak Talat on hate speech detection using federated learning. Before that, I was a research intern at University of California San Diego under the supervision of Prof. Pengtao Xie, where I worked on neural architecture search and generative models.

I completed my Bachelor’s degree in Computer Engineering from University of Mumbai, India. In the past, I was a machine learning intern at Tata Consultancy Services where I worked on understanding customer behavior using natural language processing. Before that, I collaborated with Prof. Pratik Kanani on an industry project focusing on anomaly detection in heart rate (pulse) using IoT and machine learning.

I also served as a mentor at DJ Unicode, a student organization that aims to inspire sophomores and juniors to contribute to open-source projects. Additionally, I led a team that developed a platform for conducting C programming examinations in college for over 500 students (demo).

I co-founded the research division of Unicode (a.k.a. Unicode Research) with Swapneel Mehta from NYU CSMaP group. We were fortunate to be joined by Dr. Akash Srivastava from MIT-IBM AI Lab for foundational lectures on deep generative models and probabilistic machine learning. I also worked as a teaching assistant for the Unicode Machine Learning Summer Course 2021 supported by Google Research India. Additionally, I was a founding research engineer at SimPPL where I collaborated with The Sunday Times and Ippen Digital to develop tools (parrot.report) that help policymakers and journalists audit online disinformation on social media.


News and Timeline

2024
2023
  • December - Will be attending EMNLP 2023 in Singapore 🇸🇬.
  • November - IndicTrans2 submission has been accepted at TMLR. Check out the Camera Ready Version.
  • November - Presenting tutorial on Developing SOTA MNMT Systems for Related Languages at AACL-IJCNLP 2023.
  • May - Excited to share the release of IndicTrans2, first open-source model to support all 22 Scheduled Indian languages. Check out the Preprint and Code.
  • January - A Federated Approach for Hate Speech Detection has been accepted to EACL 2023. Check out the Preprint and Code.
2022
2021