Xuechen Li

PhD Candidate, Stanford Computer Science
Stanford Artificial Intelligence Laboratory (SAIL)
Stanford Machine Learning Group
Stanford NLP Group

lxuechen [at] cs [dot] stanford [dot] edu

CV, Google Scholar, Github, Twitter


I'm a PhD candidate at Stanford CS. I go by Chen, the second segment of my first name.

My research revolves around machine learning, deep learning, and NLP. My current focus is large language model training, data collection/curation, and human evaluation. Here are some specific topics of interest:

Data and Evaluation: Recent improvements in AI systems have been partially driven by training with high-quality human data. I am interested in (alternative) scalable data collection paradigms that leverage novel incentive structures. As models become more capable, their evaluation becomes increasingly challenging. I am also interested in developing new evaluations that measure not only capability but also usability.

Learning From Human Feedback: Human feedback has become a primary driver for recent successes in AI, such as ChatGPT. However, collecting and training on such data can be costly and cumbersome. Some of the questions I'm recently interested in are: How can we efficiently elicit high-quality feedback? How can we augment the feedback data when it comes in limited quantities? How should we aggregate this feedback signal without marginalizing the minority voices and views?

Red Teaming and Auditing: Despite the rapid progress in capability research, machine learning models still exhibit systematic flaws. I am interested in developing automated tools to assist humans in identifying and rectifying these flaws.

Memorization and Privacy: Large models can memorize training data. This not only poses privacy risks but also raises emergent sociotechnical questions (e.g., on copyright and intellectual property). I am interested in understanding this memorization phenomenon and in developing tools to mitigate its undesirable consequences. Here is an outdated statement I wrote on privacy and security in machine learning. Some of my research has seen growing adoption in industry.

Selected and Recent Research (full list see google scholar)