Hi, I'm

Shrainik Jain

Senior Software Engineer at Snowflake

PhD in CS from University of Washington · SQL compiler & query optimizer · Database systems researcher

About Me

I'm a Computer Science PhD graduate from the Paul G. Allen School at the University of Washington, where I worked with advisors Bill Howe and Ed Lazowska on generalized SQL workload analytics.

My research lies at the intersection of database systems and machine learning — exploring how NLP techniques can be applied to SQL to enable smarter workload management, query optimization, and data discovery.

Currently, I work as a Software Engineer at Snowflake, where I work on the SQL compiler and query optimizer, helping make cloud-scale analytics faster and more intelligent.

Before my PhD, I was an engineer at Microsoft India Development Center, where I helped build Azure Site Recovery and MOHORO (the precursor to Azure RemoteApp). I completed my undergraduate degree in Computer Science at BITS Pilani.

❄️

Senior Software Engineer

Snowflake — SQL compiler & query optimizer

🎓

PhD, Computer Science

University of Washington (Paul G. Allen School)

🔬

Research

SQL workload analytics, query representation learning

🏛️

B.E., Computer Science

BITS Pilani

Experience

Senior Software Engineer
2019 – Present

Working on the SQL compiler and query optimizer. Delivered Snowtrail — a production query testing system that replays and compares query behavior across compiler versions. Improving query plan quality and optimizer correctness at cloud scale.

SQL Query Optimization Compiler Cloud Databases
PhD Researcher
2014 – 2019

Developed Querc, a system for database-agnostic workload management using learned query representations. Investigated NLP techniques (word2vec, paragraph vectors) applied to SQL. Maintained and analyzed the SQLShare DB-as-a-Service platform, publishing findings at SIGMOD 2016, CIDR 2019, and other top venues.

Research NLP SQL Analytics Python
Research Intern
2015

Introduced support for dynamic types and JavaScript development to Project Orleans, Microsoft's distributed actor framework — enabling a new class of JavaScript-based distributed applications on .NET.

JavaScript Distributed Systems Orleans .NET
Software Dev Engineer in Test (SDET)
2012 – 2014

Built HyperV-based Disaster Recovery as a Service (now Azure Site Recovery). Was part of the team that created MOHORO, a scalable pay-by-usage Desktop-as-a-Service on Azure that served as the precursor to Azure RemoteApp.

Azure HyperV Cloud C#

Researched autonomous pattern formation algorithms for asynchronous swarm robots without agreement on chirality. The resulting work was published in a peer-reviewed paper on theoretical distributed robotics.

Distributed Algorithms Robotics Theory

Research

Querc: Database-Agnostic Workload Management

Querc: Database-Agnostic Workload Management

A system enabling generalized workload management tasks using learned query representations — applied to diverse SQL workloads across different database backends. Published at CIDR 2019.

Read paper
Query2Vec: NLP for SQL

Query2Vec: Learning Query Representations

Systematically evaluated NLP embedding techniques — word2vec, paragraph vectors, and graph methods — applied to SQL to enable downstream workload management tasks like query recommendation and anomaly detection.

Read paper
SQLShare: DB-as-a-Service

SQLShare: DB-as-a-Service

A multi-year SQL-as-a-Service platform used by scientists to share and query datasets. Analyzed real-world query logs to uncover patterns in data cleaning, schema usage, and SQL behavior in the wild. SIGMOD 2016 Reproducibility Award.

Read paper
Polystore Data Management

Polystore Data Management

Contributed to RACO — a query compiler and middleware that optimizes queries across heterogeneous backends (relational, NoSQL, parallel systems), choosing the best execution plan across multiple storage systems.

Learn more

Publications & Patents

Database-Agnostic Workload Management
Shrainik Jain, Jiaqi Yan, Thierry Cruanes, Bill Howe
9th Conference on Innovative Data Systems Research (CIDR 2019)
Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics
Shrainik Jain, Bill Howe, Jiaqi Yan, Thierry Cruanes
arXiv e-prints, January 2018 — arXiv:cs.DB/1801.05613
Snowtrail: Testing with Production Queries on a Cloud Database
Jiaqi Yan, Qiuye Jin, Shrainik Jain, Stratis D. Viglas, Allison Lee
7th International Workshop on Testing Database Systems (DBTest 2018) · US Patent Application No.: 62/646,817
The Myria Big Data Management and Analytics System and Cloud Service
Jingjing Wang, Tobin Baker, Magdalena Balazinska, Dan Halperin, Brandon Haynes, Bill Howe, Dylan Hutchison, Shrainik Jain, Ryan Maas, Parmita Mehta, Dominik Moritz, Brandon Myers, Jennifer Ortiz, Dan Suciu, Andrew Whitaker, Shengliang Xu
8th Conference on Innovative Data Systems Research (CIDR 2017)
Data Cleaning in the Wild: Reusable Curation Idioms from a Multi-Year SQL Workload
Shrainik Jain, Bill Howe
11th International Workshop on Quality in Databases (QDB 2016)
SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment
Shrainik Jain, Dominik Moritz, Daniel Halperin, Bill Howe, Ed Lazowska
ACM SIGMOD International Conference on Management of Data, 2016
High Variety Cloud Databases
Shrainik Jain, Dominik Moritz, Bill Howe
IEEE Cloud Data Management Workshop (co-located with ICDE 2016)
Pattern Formation for Asynchronous Robots without Agreement in Chirality
Sruti Gan Chaudhuri, Swapnil Ghike, Shrainik Jain, Krishnendu Mukhopadhyaya
Theoretical Computer Science

Skills

Languages

SQL Python Java C++ C# JavaScript Scala

Database Systems

Query Optimization SQL Compiler Snowflake PostgreSQL Myria SQLShare

Machine Learning

Word2Vec Representation Learning NLP Scikit-learn NumPy Pandas

Cloud & Infrastructure

Azure Distributed Systems Orleans HyperV Cloud Storage

Research

Workload Analytics Federated Databases Data Science Systems Research Academic Writing

Tools

Git Linux Jupyter LaTeX Vim

Contact

I'm always happy to chat about database systems, query optimization, machine learning for databases, or interesting engineering problems. Feel free to reach out via email or connect on social media.

You can also check out my research and code on GitHub.