Prof. Shen

joined North Carolina State University in August 2014 as a Chancellor’s Faculty Excellence Program cluster hire in Data-Driven Science. He is a receipt of the DOE Early Career Award, NSF CAREER Award, Google Faculty Research Award, and IBM CAS Faculty Fellow Award. He is an ACM Distinguished Member, ACM Distinguished Speaker, and a senior member of IEEE. He was honored with University Faculty Scholars Award "as an emerging academic leader who turns research into solutions to society’s most pressing issues".

His Research

lies in the broad fields of Programming Systems and Machine Learning, with an emphasis on enabling extreme-scale data-intensive computing and intelligent computing through innovations in compilers,runtime systems, and Machine Learning algorithms. His current research focuses on Heterogeneous Massively Parallel Computing, High Performance Machine Learning, and High-Level Large-Scale Program Optimizations. He leads the PICTure research group. He is part of the NCSU Systems Laboratory.

Featured Articles (Full Publication List)

HARP: Holistic Analysis for Refactoring Python-Based Analytics Programs (ICSE'20)

Modern machine learning programs are often written in Python, with the main computations specified through calls to some highly optimized libraries (e.g., TensorFlow, PyTorch). Existing performance optimizers focus on the static computation graphs specified by library APIs, but leave the influence from the hosting Python code largely unconsidered. This work proposes a new approach named HARP to address the problem. Through a set of novel techniques, HARP enables holistic analysis that spans across computation graphs and their hosting Python code. Refactoring based on HARP gives 1.3--3X and 2.07X average speedups on a set of TensorFlow and PyTorch programs. Read more...

MERR: Improving Security of Persistent Memory Objects via Efficient Memory Exposure Reduction and Randomization (ASPLOS'20)

This work proposes a new defensive technique for memory, especially useful for long-living objects on Non-Volatile Memory (NVM), or called Persistent Memory objects (PMOs). The method takes a distinctive perspective, trying to reduce memory exposure time by largely shortening the overhead in attaching and detaching PMOs into the memory space. It does it through a novel idea, embedding page table subtrees inside PMOs. The new technique reduces memory exposure time by 60% with a 5% time overhead (70% with 10.9% overhead). It allows much more frequent address randomizations (shortening the period from seconds to less than 41.4us), offering significant potential for enhancing memory security. Details to be posted in our ASPLOS'2020 paper.

Wootz: A Compiler-Based Framework for Fast CNN Pruning via Composability (PLDI'19)

Convolution Neural Networks (CNN) pruning is an important method to adapt a large CNN model attained on general datasets to a more specialized task or to fit a device with stricter space or power constraints. Finding the best pruned network is time-consuming. This work tackles the problem by creating a compiler-based framework named Wootz, which for the first time enables composability-based CNN pruning. Wootz shortens the state-of-art pruning process by up to 117.9X while producing significantly better pruning results. See our PLDI'2019 paper for more...

Document Analytics directly on Compressed Data (ICDE'20, VLDB'18, ICS'18)

We propose the first known solution to enable direct document analytics on compressed data, which save 90.8% storage space and 77.5% memory usage, while halving the analytics time. It employs a hierarchical compression algorithm to convert analytics problems into graph traversal problems. The article presents a set of guidelines and assistant software modules for developers to effectively apply compression-based direct processing. See our VLDB'18, ICS'18 and ICDE'20 papers for more.

Inter-Disciplinary Research Challenges in Computer Systems for the 2020s

It is now an exciting time in computer systems research. New technologies such as machine learning and the Internet of Things (IoT) are rapidly enabling new capabilities that were only dreamed of a few years ago. At the same time, technology discontinuities such as the end of Moore’s Law and the inescapable Energy Wall combine with new challenges in security and privacy, and the rise of Artificial Intelligence (AI). Against this backdrop, an NSF-sponsored community visioning workshop convened about 150 researchers of multiple computer systems areas during ASPLOS'2018. The goal was to outline a few high-priority areas where inter-disciplinary research is likely to have a high payoff in the next 10 years. This report summarizes the workshop’s findings. (ACM DL link here.)

Current Research Areas

High-Performance Machine Learning & Data Analytics

Meets the relentless demands of Machine Learning and Data Analytics for efficiency, responsiveness, quality, and scalability in all kinds of settings through innovations in algorithms, infrastructures, and implementations. More ...

Heterogeneous Massively Parallel Computing

Bridges the gap between productivity needs of programmers and the extreme power and complexity of modern heterogeneous massively parallel computing devices (e.g., GPU) through innovations in programming systems.More ...

Foundations of Programming Systems & Languages

Tackles fundamental challenges that prevent modern software from tapping into the full potential of computing hardware by advancing compilers, runtime, and programming language implementations in general. More ...

Research Group: PICTure

Ph.Ds with their placement upon graduation

  • Yue Zhao (2018, Facebook)
  • Yufei Ding (2017, Assist. Prof @ UC Santa Barbara)
  • Guoyang Chen (2016, Qualcomm)
  • Zhijia Zhao (2015, Assist. Prof @ UC Riverside)
  • Mingzhou Zhou (2015, IBM)
  • Bo Wu (2014, Assist. Prof @ Colorado School of Mines)
  • Zheng (Eddy) Zhang (2012, Assist. Prof @ Rutgers University)
  • Kai (Kelvin) Tian (2012, Microsoft)
  • Yunlian Jiang (2011, Google)


Recent Professional Activities