Alvin Cheung
I am an associate professor in the Computer Science Division
at UC Berkeley EECS .
I am a member of the
Data Systems and Foundations group,
Programming Systems group,
Sky Lab ,
SLICE Lab ,
and a faculty affiliate in the Berkeley Institute for Data Science .
I also serve as an advisor to the Data Science Discovery Program ,
and technical advisor to several companies.
My research interests include data management, programming languages,
and building software systems. My group aims to help end users,
from data scientists to coding experts, to easily extract insights from large amounts of data.
We also develop techniques and tools that make it easy to build
large-scale, efficient, and manageable data processing pipelines, where such pipelines range
from traditional computer science applications (web, cloud) to emerging application domains in data science (physical sciences, healthcare, social sciences)
and beyond.
Some research themes:
Verified lifting
is a new technique for inferring properties of programs. We have applied this
to database applications (QBS ),
stencil computations (STNG ),
programmable switches (Domino ),
parallel data processing frameworks (Casper ),
GPUs (Dexter and Rake ),
and CRDTs (Katara ).
Code generated using our techniques are now deployed at Adobe and Google.
Designing new data processing and programming languages techniques to build and optimize data management and machine learning systems
for emerging applications:
Cosette (reasoning about data transformations from grading homework assignments to
building query optimizers),
Hyperloop (improving the performance of database-backed applications),
Spatialyze (data management system for large-scale geospatial video datasets), and
OSD (adaptive speculative decoding).
Improving end user (such as data scientists) data programming experience with new user interfaces and code generators across different domains, using techniques
from machine learning, program synthesis, and human-computer interaction:
PlotCoder ,
Falx ,
AST-T5 , and
SAFIM .
If you are a Berkeley student interested in doing research
in these areas and have done well in CS186, CS164, or CS182 please send me an email mentioning which class(es) you have
taken, and include your resume and unofficial transcript.
I regularly teach the undergraduate data management class at Berkeley.
Check out the recorded videos if you are interested.
We thank the NSF, DOE, ONR, ARO, Adobe, Google, Intel, Meta, and VMware for their generous support of our group's research, and
transferring our technology to real-world products. We are also grateful for the early career awards and other recognitions from the
data management and programming systems research communities.
I was earlier on the CSE faculty at the University of Washington and
an affiliate in the UW eScience Institute , and before that
a graduate student in the MIT
database group and the
computer-aided programming group ,
working with Professors
Sam Madden and
Armando Solar-Lezama .
I worked on tools that make use of programming language techniques
to improve application performance.
What's new?
Previous News
[6/23] Lily will be presenting Coco at VLDB this year. Coco is a new tool that extracts
functional dependencies from webapp code and uses them to optimize queries!
[5/23] We will be presenting an overview paper on Metalift at ECOOP later this year, along with a talk on
integrating Metalift with Gemmini at the upcoming OSCAR workshop at PLDI.
[10/22] Shadaj's new tool Katara has been released! We use verified lifting to synthesize new CRDTs for cloud applications.
[6/22] Congrats to Xiangyu, Chenglong, and Ras for winning the
distinguished paper award at PLDI 22 for Sickle !
[1/22] Thank you Army Research Office for supporting our work on
synthesizing data visualization programs from multi-modal inputs through
an Early Career Program grant!
[10/21] I recently spoke with Justin Gottschlich from Intel Labs on
the exciting future of machine programming .
[7/21] Two upcoming invited talks on using synthesis to build compilers and
new directions in cloud programming !
[6/21] Looking forward to giving one of the keynotes at the BiDEDE workshop at SIGMOD this year!
[5/21] Thank you ONR for the Young Investigator Award !
[3/21] Congrats to Chenglong for winning a CHI best paper award for Falx .
[2/21] Thank you Intel for the Outstanding Researcher Award !
[1/21] Looking forward to presenting a new programming paradigm for cloud computing at CIDR this year.
[8/20] I recently spoke with BBC on the ongoing revolution on
no-code technologies .
[6/20] Multiple papers to be presented this and next month! Check out Guna's SIGMOD presentation on Strife , and Junwen's ICSE presentation on managing data constraints in database applications .
[4/20] Thrilled to be the recipient of the IEEE TCDE Rising Star Award this year!
[2/20] Thank you VMware for the early career award !
[1/20] Wouldn't it be great to create viz easily from examples? Check out Chenglong 's talk at POPL this year !
[10/19] Both papers to CIDR 20 got accepted. Congrats to Brandon, Cong, and co-authors!
[8/19] Two paper acceptances: generating data layouts for database applications (VLDB),
and using program idioms to generate code from natural language (EMNLP). Congrats Cong and Srini!
[7/19] Incredibly thrilled to be one of the PECASE awardees this year!
Extremely grateful to all the awesome students and colleagues who I get to learn from in the past few years.
[7/19] Thank you Adrian for a very nice summary of
Panorama on the morning paper.
[6/19] Check out Visual Road , our new benchmark for video data management systems!
[4/19] Our paper on a new IDE for optimizing database-backed web apps has
won a distinguished paper award at ICSE 19 !
This will be the third paper we have published on our Hyperloop project.
[2/19] Extremely honored to be selected as one of the Sloan fellows this year!
[7/18] We have released PowerStation , a
tool for fixing performance bugs in Rails applications. Our project has been featured on
Hacker News !
[6/18] We have implemented a new theory for reasoning about relational query languages
using Cosette , and will be presenting the results at
VLDB this year.
[5/18] LightDB will be presented at VLDB this year!
[4/18] Thank you Intel Software and Services Group for their generous support of our MetaLift project!
[3/18] Casper has been accepted by SIGMOD18 .
[3/18] Received a Google research award for LightDB . Thank you Google!
[1/18] The first Pacific Northwest Database Meeting had over 100 participants and was a great gathering of academic and industrial research groups in the region!
[12/17] Our paper on studying real-world Rails applications has been accepted by ICSE18 !
[10/17] Presented MetaLift at the
StrangeLoop conference.
Check out our work if you build compilers for DSLs!
[8/17] Our work on understanding performance inefficiencies in
real-world database applications has been accepted to the full paper track at CIKM .
[8/17] Presented our work on using
machine learning to generate SQL at
ACL .
[7/17] We have released Cosette , and it was featured
on the front pages of reddit and hackernews !
[6/17] We hosted the first UWDB affiliates workshop on database usability.
Thank you all the speakers for their excellent talks.
[6/17] Two papers presented at PLDI :
Cosette , our solver for SQL,
and Scythe , a tool for synthesizing SQL queries from input-output examples.
[5/17] Five demos presented at SIGMOD .
Thank you to the demo committee for giving us the best demo and two honorable mentions for our work!
[3/17] We have written up our work on
proving SQL query equivalences on the
Homotopy Type Theory Blog .
[9/15] Thank you DoE for
covering our work
on verified lifting!
[9/1] Casper has been released. If
you want to automatically parallelize your sequential Java code into Spark, try it out!
[8/15] Our survey / tutorial on computer-aided authoring of database queries has been
published in Foundations and Trends in Programming Languages .
Teaching
Students who I humbly learn from
Graduated students
Papers
Flo: a Semantic Foundation for Progressive Stream Processing
Shadaj Laddad, Alvin Cheung, Joseph M. Hellerstein, Mae Milano
POPL 2025
Verified Code Transpilation with LLMs
Sahil Bhatia, Jie Qiu, Niranjan Hasabnis, Sanjit A. Seshia, Alvin Cheung
NeurIPS 2024
Inferring Visualization Intent from Conversation
Haotian Li, Nithin Chalapathi, Huamin Qu, Alvin Cheung and Aditya G. Parameswaran
CIKM 2024
Tenspiler: A Verified Lifting-Based Compiler for Tensor Operations
Jie Qiu, Colin Cai, Sahil Bhatia, Niranjan Hasabnis, Sanjit Seshia, Alvin Cheung
ECOOP 2024
QED: A Powerful Query Equivalence Decider for SQL
Shuxian Wang, Sicheng Pan, Alvin Cheung
VLDB 2024
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks
Linyuan Gong, Sida Wang, Mostafa Elhoushi, Alvin Cheung
ICML 2024
AST-T5: Structure-Aware Pretraining for Code Generation and Understanding
Linyuan Gong, Mostafa Elhoushi, Alvin Cheung
ICML 2024
Online Speculative Decoding
Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang
ICML 2024
ADELT: Transpilation Between Deep Learning Frameworks
Linyuan Gong, Jiayi Wang, Alvin Cheung
IJCAI 2024
SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics
Arash Ardakani, Altan Haan, Shangyin Tan, Doru Thom Popovici, Alvin Cheung, Costin Iancu, Koushik Sen
NAACL 2024
Syntactic Code Search with Sequence-to-Tree Matching
Gabriel Matute, Wode Ni, Titus Barik, Alvin Cheung, Sarah E. Chasins
PLDI 2024
Spatialyze: A Geospatial Video Analytics System with Spatial-Aware Optimizations
Chanwut Kittivorawong, Yongming Ge, Yousef Helal, Alvin Cheung
VLDB 2024
Towards Auto-Generated Data Systems
Alvin Cheung, Maaz Bin Safeer Ahmad, Brandon Haynes, Chanwut Kittivorawong, Shadaj Laddad, Xiaoxuan Liu, Chenglong Wang, Cong Yan
VLDB 2023
Building Code Transpilers for Domain-Specific Languages Using Program Synthesis
Sahil Bhatia, Sumer Kohli, Sanjit Seshia, Alvin Cheung
ECOOP 2023
Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers
Linyuan Gong, Chenyan Xiong, Xiaodong Liu, Payal Bajaj, Yiqing Xie, Alvin Cheung, Jianfeng Gao, Xia Song
ACL 2023
Optimizing Stateful Dataflow with Local Rewrites
Shadaj Laddad, Conor Power, Tyler Hou, Alvin Cheung, Joseph M. Hellerstein
EGRAPHS 2023
Leveraging Application Data Constraints to Optimize Database-Backed Web Applications
Lily Liu, Shuxian Wang, Mengzhu Sun, Sicheng Pan, Ge Li, Siddharth Jha, Cong Yan, Junwen Yang, Shan Lu, Alvin Cheung
VLDB 2023
Keep CALM and CRDT On
Conor Power, Mae Milano, Shadaj Laddad, Alvin Cheung, Joseph Hellerstein, Natacha Crooks
VLDB 2023
Synthesizing CRDTs from Sequential Data Types with Verified Lifting
Shadaj Laddad, Conor Power, Mae Milano, Alvin Cheung, Joseph M. Hellerstein
OOPSLA 2022
GACT: Activation Compressed Training for Generic Network Architectures
Lily Liu, Lianmin Zheng, Dequan Wang, Yukuo Cen, Weize Chen, Xu Han, Jianfei Chen, Zhiyuan Liu, Jie Tang, Joey Gonzalez, Michael Mahoney, Alvin Cheung
ICML 2022
Synthesizing Analytical SQL Queries from Computation Demonstration
Xiangyu Zhou, Ras Bodik, Alvin Cheung, Chenglong Wang
PLDI 2022 , distinguished paper award
Vector Instruction Selection for Digital Signal Processors Using Program Synthesis
Maaz Bin Safeer Ahmad, Andrew Adams, Shoaib Kamil, Alvin Cheung, Alexander J. Root
ASPLOS 2022
Change in Software Ecosystems: Social Challenges of Automating Upgrades
Gabriel Matute, Alvin Cheung, Sarah E. Chasins
PLATEAU 2021
Demonstration of Apperception: A Database Management System for Geospatial Video Data
Vanessa Lin, Frank Ge, Brandon Haynes, Maureen Daum, Alvin Cheung, Magdalena Balazinska
VLDB 2021
PlotCoder: Hierarchical Decoding for Synthesizing Visualization Code in Programmatic Context
Xinyun Chen, Linyuan Gong, Alvin Cheung, Dawn Song
ACL 2021
VSS: A Storage System for Video Analytics
Brandon Haynes, Maureen Daum, Dong He, Amrita Mazumdar, Magdalena Balazinska, Alvin Cheung, Luis Ceze
SIGMOD 2021
Falx: Synthesis-powered Visualization Authoring
Chenglong Wang, Yu Feng, Rastislav Bodik, Isil Dillig, Alvin Cheung, Amy J. Ko
CHI 2021 , best paper award
New Directions in Cloud Programming
Alvin Cheung, Natacha Crooks, Joseph Hellerstein, Mae Milano
CIDR 2021
Handling Highly Contended OLTP Workloads using Fast Dynamic Partitioning
Guna Prasaad, Alvin Cheung, Dan Suciu
SIGMOD 2020
Demonstration of Chestnut: An In-memory Data Layout Designer for Database Applications
Mingwei Samuel, Cong Yan, Alvin Cheung
SIGMOD 2020
Testing Query Execution Engines with Mutations
Xinyue Chen, Chenglong Wang, Alvin Cheung
DBTest 2020
Managing Data Constraints in Database-Backed Web Applications
Junwen Yang, Utsav Sethi, Cong Yan, Shan Lu, Alvin Cheung
ICSE 2020
Visualization by Example
Chenglong Wang, Yu Feng, Rastislav Bodik, Alvin Cheung, Isil Dillig
POPL 2020
LightDB++ : A DBMS for the Visual World
Bandon Haynes, Maureen Daum, Amrita Mazumdar, Magdalena Balazinska, Alvin Cheung, Luis Ceze
CIDR 2020
View-Driven Optimization of Database-Backed Web Applications
Cong Yan, Alvin Cheung, Junwen Yang, Shan Lu
CIDR 2020
Automatically Translating Image Processing Libraries to Halide
Maaz Bin Safeer Ahmad, Jonathan Ragan-Kelley, Alvin Cheung, Shoaib Kamil
SIGGRAPH ASIA 2019
Perceptual Compression for Video Storage and Processing Systems
Amrita Mazumdar, Brandon Haynes, Magda Balazinska, Luis Ceze, Alvin Cheung, Mark Oskin
SoCC 2019 , best poster award
Learning Programmatic Idioms for Scalable Semantic Parser
Srini Iyer, Alvin Cheung, Luke Zettlemoyer
EMNLP 2019
Generating Application-Specific Data Layouts for In-Memory Databases
Cong Yan, Alvin Cheung
VLDB 2019
Visual Road: A Video Data Management Benchmark
Brandon Haynes, Amrita Mazumdar, Magdalena Balazinska, Luis Ceze, Alvin Cheung
SIGMOD 2019
View-Centric Performance Optimization for Database-Backed Web Applications
Junwen Yang, Cong Yan, Chengcheng Wan, Shan Lu, Alvin Cheung
ICSE 2019 , distinguished paper award
2018
Iterative Search for Reconfigurable Accelerator Blocks with a Compiler in the Loop
Max Willsey, Vincent Lee, Alvin Cheung, Ras Bodik, Luis Ceze
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , 2018.
Speeding up Symbolic Reasoning for Relational Queries
Chenglong Wang, Alvin Cheung, Ras Bodik
SPLASH 2018
Mapping Language to Code in Programmatic Context
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Luke Zettlemoyer
EMNLP 2018
PowerStation: Automatically detecting and fixing inefficiencies of database-backed web applications in IDE
Junwen Yang, Cong Yan, Pranav Subramaniam, Shan Lu, Alvin Cheung
FSE 2018 (demo)
Axiomatic Foundations and Algorithms for Deciding Semantic Equivalences of SQL Queries
Shumo Chu, Brendan Murphy, Jared Roesch, Alvin Cheung, Dan Suciu
VLDB 2018
Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications
Maaz Bin Safeer Ahmad, Alvin Cheung
SIGMOD 2018
LightDB: A DBMS for Virtual Reality Video
Brandon Haynes, Amrita Mazumdar, Armin Alaghi, Magdalena Balazinska, Luis Ceze, Alvin Cheung
VLDB 2018
How Not to Structure Your Database-backed Web Applications: a Study of Performance Bugs in the Wild
Junwen Yang, Cong Yan, Pranav Subramaniam, Shan Lu, Alvin Cheung
ICSE 2018
2017
Understanding Performance Inefficiencies in Real-world Database-backed Applications
Cong Yan, Junwen Yang, Alvin Cheung, Shan Lu
CIKM 2017
Comparative Evaluation of Big-Data Systems on Scientific Image Analytics Workloads
Parmita Mehta, Sven Dorkenwald, Dongfang Zhao, Tomer Kaftan, Alvin Cheung, Magdalena Balazinska, Ariel Rokem, Andrew Connolly, Jacob Vanderplas, Yusra AlSayyad
VLDB 2017
Learning a Neural Semantic Parser from User Feedback
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy and Luke Zettlemoyer
ACL 2017
Demonstration of the Cosette Automated SQL Prover
Shumo Chu, Daniel Li, Chenglong Wang, Alvin Cheung, Dan Suciu
SIGMOD 2017 (demo) , best demo award
VisualCloud Demonstration: A DBMS for Virtual Reality
Brandon Haynes, Artem Minyalov, Magdalena Balazinska, Luis Ceze, Alvin Cheung
SIGMOD 2017 (demo) , selected as one of the "best of" demos ,
and honorable mention for the best demo award
Interactive Query Synthesis from Input-Output Examples
Chenglong Wang, Alvin Cheung, Ras Bodik
SIGMOD 2017 (demo) , selected as one of the "best of" demos
Optimizing Java Applications Using Apache Spark
Maaz Bin Safeer Ahmad, Alvin Cheung
SIGMOD 2017 (demo) , selected as one of the "best of" demos ,
and honorable mention for the best demo award
HoTTSQL: Proving Query Rewrites with Univalent SQL Semantics
Shumo Chu, Konstantin Weitz, Alvin Cheung, Dan Suciu
PLDI 2017
Synthesizing Highly Expressive SQL Queries from Input-Output Examples
Chenglong Wang, Alvin Cheung, Ras Bodik
PLDI 2017
Cosette: An Automated SQL Solver
Shumo Chu, Chenglong Wang, Konstantin Weitz, Alvin Cheung
CIDR 2017
2016
PipeGen: Data Pipe Generator for Hybrid Analytics
Brandon Haynes, Alvin Cheung, Magdalena Balazinska
SoCC 2016
Leveraging Parallel Data Processing Frameworks with Verified Lifting
Maaz Bin Safeer Ahmad, Alvin Cheung
SYNT 2016 , best student paper award
Summarizing Source Code using a Neural Attention Model
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung and Luke Zettlemoyer
ACL 2016
Packet Transactions: High-level Programming for Line-Rate Switches
Anirudh Sivaraman, Mihai Budiu, Alvin Cheung, Changhoon Kim, Steve Licking, George Varghese, Hari Balakrishnan, Mohammad Alizadeh, Nick McKeown
SIGCOMM 2016
Computer-Assisted Query Formulation
Alvin Cheung, Armando Solar-Lezama
Foundations and Trends in Programming Languages , volume 3 issue 1
Verified Lifting of Stencil Computations
Shoaib Kamil, Alvin Cheung, Shachar Itzhaky, Armando Solar-Lezama
PLDI 2016
Leveraging Lock Contention to Improve OLTP Application Performance
Cong Yan, Alvin Cheung
VLDB 2016
2015
2014 and earlier
Sloth: Being Lazy is a Virtue (When Issuing Database Queries)
Alvin Cheung, Samuel Madden, Armando Solar-Lezama
SIGMOD 2014 , selected as one of the best of SIGMOD 2014
Using Program Analysis to Optimize Database Applications
Alvin Cheung, Owen Arden, Samuel Madden, Armando Solar-Lezama, Andrew C. Myers
IEEE Computer Society Data Engineering Bulletin, March 2014
Mobile Applications Need Targeted Micro-Updates
Alvin Cheung, Lenin Ravindranath, Eugene Wu, Samuel Madden, Hari Balakrishnan
APSys 2013
Speeding up Database Applications with Pyxis
(demo)
Alvin Cheung, Owen Arden, Samuel Madden, Andrew C. Myers
SIGMOD 2013
Optimizing Database-Backed Applications with Query Synthesis
Alvin Cheung, Samuel Madden, Armando Solar-Lezama
PLDI 2013
StatusQuo: Making Familiar Abstractions Perform Using Program Analysis
Alvin Cheung, Owen Arden, Samuel Madden, Armando Solar-Lezama, Andrew C. Myers
CIDR 2013 , best paper award
Using Program Synthesis for Social Recommendations
Alvin Cheung, Armando Solar-Lezama, Samuel Madden
CIKM 2012
Undefined Behavior: What Happened to My Code?
Xi Wang, Haogang Chen, Alvin Cheung, Zhihao Jia, Nickolai Zeldovich, M. Frans Kaashoek
APSys 2012
Automatic Partitioning of Database Applications
Alvin Cheung, Owen Arden, Samuel Madden, Andrew C. Myers
VLDB 2012
Automatically Generating Interesting Events with LifeJoin
(demo)
Alvin Cheung, Arvind Thiagarajan, Samuel Madden
SenSys 2011
Partial Replay of Long-Running Applications
Alvin Cheung, Armando Solar-Lezama, Samuel Madden
FSE 2011
Performance Profiling with EndoScope, an Acquisitional Software Monitoring Framework
Alvin Cheung, Samuel Madden
VLDB 2008
Theseos: A Query Engine for Traceability across Sovereign, Distributed RFID Database
(demo)
Alvin Cheung, Karin Kailing, Stefan Schönauer
ICDE 2007
Ïnfïnïty: a generic platform for application development and information sharing on mobile devices
Alvin Cheung, Tyrone Grandison, Christopher M. Johnson, Stefan Schönauer
MobiDE 2007
Towards Traceability across Sovereign, Distributed RFID Databases
Rakesh Agrawal, Alvin Cheung, Karin Kailing, Stefan Schönauer
IDEAS 2006
A New Method for Design of Robust Digital Circuits
Dinesh Patil, Sunghee Yun, Seung-Jean Kim, Alvin Cheung, Stephen Boyd, Mark Horowitz
ISQED 2005 , best paper award
Dissertation
Recent Awards
Service
2026:
2025:
2024:
2023:
2022:
2021:
Earlier service
2020:
PLDI
Program Committee
SIGMOD
Program Committee
VLDB
Demo Program Committee
2019:
DBPL
Program Committee Co-Chair
SoCC
Program Committee
VLDB
Demo Program Committee
2018:
MAPL
Program Committee Chair
SIGMOD
Program Committee, Student Research Competition co-chair
ICDE
Program Committee (poster)
MMVE
Program Committee
2017:
SIGMOD
Program Committee, Student Research Competition co-chair
VLDB
Proceedings co-chair
PLDI
Program Committee
CIKM
Senior Program Committee
DBPL
Program Committee
UIST
External Reviewer
2016:
2015: