I’m Xiangzhe. I’m a Ph.D. student at Purdue University advised by Prof. Xiangyu Zhang. My research interest lies in the intersection of program analysis and machine learning models for code. I explore code quality assurance with software engineering practices. For example, I measure and mitigate biases in a code language model inspired by metamorphic testing; I develop a proactive code model security technique inspired by scenario-based testing. Besides model quality assurance, I believe formulating program semantics is a key for better code models. My exploration ranges from lower-level semantics, such as formal semantics (CompCertELF), probabilistic execution semantics(PEM), to higher-level semantics that reflects developers’ abstraction (GenNm,ProRec).
Email: xzx@purdue.edu
SecAlign: Fortifying Code LLMs with Proactive Security Alignment, Xiangzhe Xu, Zian Su, Jinyao Guo, Kaiyuan Zhang, Zhenting Wang, Xiangyu Zhang. PDF
Improving Binary Code Similarity Transformer Models by Semantics-Driven Instruction Deemphasis, Xiangzhe Xu, Shiwei Feng, Yapeng Ye, Guangyu Shen, Zian Su, Siyuan Cheng, Guanhong Tao, Qingkai Shi, Zhuo Zhang, and Xiangyu Zhang. The 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’23). PDF
Symbol Preference Aware Generative Models for Recovering Variable Names from Stripped Binary, Xiangzhe Xu, Zhuo Zhang, Zian Su, Ziyang Huang, Shiwei Feng, Yapeng Ye, Nan Jiang, Danning Xie Siyuan Cheng, Lin Tan, Xiangyu Zhang. PDF
Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases, Zian Su, Xiangzhe Xu, Ziyang Huang, Kaiyuan Zhang, Xiangyu Zhang. NeurIPS’2024. PDF
Codeart: Better code models by attention regularization when symbols are lacking, Zian Su, Xiangzhe Xu, Ziyang Huang, Zhuo Zhang, Yapeng Ye, Jianjun Huang, Xiangyu Zhang. The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE’24). PDF
PEM: Representing Binary Program Semantics for Similarity Analysis via A Probabilistic Execution Model, Xiangzhe Xu*, Zhou Xuan*, Shiwei Feng, Siyuan Cheng, Yapeng Ye, Qingkai Shi, Guanhong Tao, Le Yu, Zhuo Zhang, Xiangyu Zhang. The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE’23). PDF, Full-length version
CompCertELF: Verified Separate Compilation of C Programs into ELF Object Files, Yuting Wang, Xiangzhe Xu, Pierre Wilke, Zhong Shao. The 2020 ACM International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’20). PDF
LLMDFA:Analyzing Dataflow in Code with Large Language Models, Chengpeng Wang, Wuqi Zhang, Zian Su, Xiangzhe Xu, Xiaoheng Xie, Xiangyu Zhang. NeurIPS’2024. PDF
Sanitizing Large Language Models in Bug Detection with Data-Flow, Chengpeng Wang, Wuqi Zhang, Zian Su, Xiangzhe Xu, Xiangyu Zhang. EMNLP’2024, PDF
ROCAS: Root Cause Analysis of Autonomous Driving Accidents via Cyber-Physical Co-mutation, Shiwei Feng, Yapeng Ye, Qingkai Shi, Zhiyuan Cheng, Xiangzhe Xu, Siyuan Cheng, Hongjun Choi, Xiangyu Zhang. IEEE/ACM International Conference on Automated Software Engineering (ASE 2024). 🎖 ACM SIGSOFT Distinguished Paper Award PDF
ParDiff: Practical Static Differential Analysis of Network Protocol Parsers, Mingwei Zheng, Qingkai Shi, Xuwei Liu, Xiangzhe Xu, Le Yu, Congyu Liu, Guannan Wei, Xiangyu Zhang. The ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA 2024). 🎖 ACM SIGPLAN Distinguished Paper Award PDF
Extracting Protocol Format as State Machine via Controlled Static Loop Analysis, Qingkai Shi, Xiangzhe Xu, Xiangyu Zhang. The USENIX Security Symposium (USENIX’23). PDF
Automatic Generation and Validation of Instruction Encoders and Decoders, Xiangzhe Xu, Jinhua Wu, Yuting Wang*, Zhenguo Yin and Pengfei Li. The 33rd International Conference on Computer-Aided Verification (CAV’21). PDF
CPC: Automatically Classifying and Propagating Natural Language Comments via Program Analysis, Juan Zhai, Xiangzhe Xu, Yu Shi, Guanhong Tao, Minxue Pan, Shiqing Ma, Lei Xu, Weifeng Zhang, Lin Tan, Xiangyu Zhang Proceedings of the 42nd International Conference on Software Engineering (ICSE’20). PDF
Review: ACM Transactions on Software Engineering and Methodology(TOSEM)
Artifact Evaluation: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’24), IEEE/ACM International Symposium on Code Generation and Optimization(CGO’24-25), International Symposium on Software Testing and Analysis (ISSTA’24), ACM Conference on Computer and Communications Security (CCS’23)
The 42nd International Conference on Software Engineering(ICSE’20) Track Scheduling co-Chair