Welcome to my personal page!
I am an Instructor at Department of Biomedical Informatics, Harvard University. Before that I was a Research Associate and a postdoctoral fellow at Department of Biostatistics, Harvard University and a postdoctoral fellow at Department of Biostatistics, University of Washington earlier. I got my phD degree in Probability and Mathematical Statistics from Chinese Academy of Sciences.
My research interests include statistical methods for surrogate validation, causal inference and missing data analysis, complex survival data analysis, supervised learning, semi-supervised learning, federated transfer learning, etc. Meanwhile, I make great effect in applying these noval statistical methods to analyze real world data, especially electronic health records (EHR) data.
SOFTWARE
OptimalSurrogate
This package provides functions to identify an optimal transformation of
a potential surrogate marker such that the proportion of the treatment
effect on a primary out- come can be inferred based on this identified
optimal transformation. The potential surrogate may be continuous or
discrete. These estimates are based on model-free definitions of the
proportion of treatment effect explained and thus, do not require any
correct model specification.
OSsurvival
The goal of Optimal Surrogate Survival (OSsurvival) is to
nonparametrically es- timate the proportion of treatment on the primary
outcome explained (PTE) by an optimally transformation of surrogate
marker measured at an earlier time. The primary outcome measured at a
later time may be subject to censoring.
CMFsurrogate
The goal of Calibrated Model Fusion (CMFsurrogate) approach is to
estimate the proportion of treatment on the primary outcome explained
(PTE) by optimally com- bine multiple markers. This approach is unique
in that it identifies an optimal combi- nation of the multiple
surrogates without strictly relying on parametric assumptions while
borrowing modeling strategies to avoid fully non-parametric estimation
which is subject to curse of dimensionality.
PanelCurrentStatus
This package contains R functions to compute the conditional censoring
logistic (CCL) estimator and model metrics to evaluate risk predictions
using panel current status data. The CCL estimator takes advantage of
the ability to transform panel current status data into a binary outcome
analysis, building on existing logistic regression estimators by
incorporating monitoring time information into the working model.
SurvMaximin
For multi-center heterogeneous Real-World Data (RWD) with time-to-event
out- comes and high-dimensional features, we propose the SurvMaximin
algorithm to es- timate Cox model feature coefficients for a target
population by borrowing summary information from a set of health care
centers without sharing patient-level informa- tion. An interactive
online shiny
app has been implemented to perform the proposed algorithm with user
input data.
TEACHING
- TA
- 2021, BST 244: Analysis of Failure Time Data
Department of Biostatistics, Harvard University
Certificate of Distinction in Teaching
- 2021, BST 244: Analysis of Failure Time Data
- Teacher
- 2018-2019, Probability and Statistics
- 2018, Sampling Survey
- 2018, Probability and Statistics
- 2017-2018, Probability
Undergraduate course, Zhejiang University
- 2018-2019, Probability and Statistics