I am currently a research scientist at the Database Systems Group, IBM Research Almaden (manager: Berthold Reinwald). My current research is mainly focused on two challenging problems: (1) design and implement distributed database systems for fast transaction processing and real-time analytics in the cloud; (2) explore new mechanisms to optimize training and inference performance for modern machine learning systems.
I received my Ph.D. degree in 2017 from National University of Singapore, where I was affiliated with the Database Group (advisor: Kian-Lee Tan). Previously, I was a visiting Ph.D. student at the Database Group, Carnegie Mellon University (host advisor: Andrew Pavlo), a research intern at the System Group, Microsoft Research Asia, and a research intern at the Cloud Infrastructure Group, EMC Labs China. I earned my bachelor's degree from South China University of Technology in 2012.
I have particular interests in exploiting modern hardware (e.g., highly parallel CPU/GPU, hierarchical storage) and low-level optimizations (e.g., code generation) to build ridiculously fast database and machine learning systems. I am enthusiastic to integrate research into real-world systems. I am a developer of a commercial DBMS product, IBM Db2 Event Store, and a key developer of two DBMS prototypes, namely Peloton and Cavalia. I was also a contributor to Apache Flink.
I am glad to review machine learning/database/systems-related research papers!
How Good Are Log-Structured Merge Trees, Really? A study from the RDBMS Perspective.
Yingjun Wu, Ronald Barber, Vijayshankar Raman, Richard Sidle, Yuanyuan Tian.
Preprint 2018 (request for copy).
Fast Failure Recovery for Main-Memory DBMSs on Multicores. [paper]
Yingjun Wu, Wentian Guo, Chee-Yong Chan, and Kian-Lee Tan.
An Empirical Evaluation of In-Memory Multi-Version Concurrency Control. [paper]
Yingjun Wu, Joy Arulraj, Jiexi Lin, Ran Xian, and Andrew Pavlo.
Self-Driving Database Management Systems. [paper]
Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang.
Transaction Healing: Scaling Optimistic Concurrency Control on Multicores. [paper]
Yingjun Wu, Chee-Yong Chan, and Kian-Lee Tan.
ChronoStream: Elastic Stateful Stream Computation in the Cloud. [paper]
Yingjun Wu and Kian-Lee Tan.
SocialTransfer: Transferring Social Knowledge for Cold-Start Crowdsourcing. [paper]
Zhou Zhao, James Cheng, Furu Wei, Ming Zhou, Wilfred Ng, and Yingjun Wu.
Grand challenge: SPRINT Stream Processing Engine as a Solution. [paper]
Yingjun Wu, David Maier, and Kian-Lee Tan.
DEBS 2013. (Best Paper Award)
Understanding the Effects of Hypervisor I/O Scheduling for Virtual Machine Performance Interference. [paper]
Ziye Yang, Haifeng Fang, Yingjun Wu, Chunqi Li, Bin Zhao, and H. Howie Huang.
Optimization Of OLTP Database Systems Through Program Analysis.
Carnegie Mellon University, Pittsburgh, PA, USA, May 2017. [link]
Brown University, Providence, RI, USA, May 2017.
Building Faster Main-Memory Database Management Systems on Multicores.
National University of Singapore, Singapore, October 2016. [link]
This is the Best Paper Ever on In-Memory Multi-Version Concurrency Control.
Carnegie Mellon University, Pittsburgh, PA, USA, September 2016. [link]
Scalable In-Memory Transaction Processing with HTM.
Carnegie Mellon University, Pittsburgh, PA, USA, June 2016.
Transaction Healing: Scaling Optimistic Concurrency Control on Multicores.
Carnegie Mellon University, Pittsburgh, PA, USA, March 2016. [link]
ChronoStream: Elastic Stateful Stream Computation in the Cloud.
National University of Singapore, Singapore, May 2015.
Program Committee: SIGMOD 2018 (Demo), ICDE 2018 (Demo), EuroSys 2018 (Shadow), VLDB 2019.
Journal Reviewer: JCST, TKDE, TPDS, VLDBJ, TODS, KAIS.
CS3103: Computer Networks and Protocols.
National University of Singapore, 2014-2015.
Last update: August 2018