Xuan Wang

xwang\(@\)hsph.harvard.edu

2022-09-10

Welcome to my personal page!

I am an Instructor at Department of Biomedical Informatics, Harvard University. Before that I was a Research Associate and a postdoctoral fellow at Department of Biostatistics, Harvard University and a postdoctoral fellow at Department of Biostatistics, University of Washington earlier. I got my phD degree in Probability and Mathematical Statistics from Chinese Academy of Sciences.

My research interests include statistical methods for surrogate validation, causal inference and missing data analysis, complex survival data analysis, supervised learning, semi-supervised learning, federated transfer learning, etc. Meanwhile, I make great effect in applying these noval statistical methods to analyze real world data, especially electronic health records (EHR) data.

SOFTWARE

OptimalSurrogate
This package provides functions to identify an optimal transformation of a potential surrogate marker such that the proportion of the treatment effect on a primary out- come can be inferred based on this identified optimal transformation. The potential surrogate may be continuous or discrete. These estimates are based on model-free definitions of the proportion of treatment effect explained and thus, do not require any correct model specification.

OSsurvival
The goal of Optimal Surrogate Survival (OSsurvival) is to nonparametrically es- timate the proportion of treatment on the primary outcome explained (PTE) by an optimally transformation of surrogate marker measured at an earlier time. The primary outcome measured at a later time may be subject to censoring.

CMFsurrogate
The goal of Calibrated Model Fusion (CMFsurrogate) approach is to estimate the proportion of treatment on the primary outcome explained (PTE) by optimally com- bine multiple markers. This approach is unique in that it identifies an optimal combi- nation of the multiple surrogates without strictly relying on parametric assumptions while borrowing modeling strategies to avoid fully non-parametric estimation which is subject to curse of dimensionality.

PanelCurrentStatus
This package contains R functions to compute the conditional censoring logistic (CCL) estimator and model metrics to evaluate risk predictions using panel current status data. The CCL estimator takes advantage of the ability to transform panel current status data into a binary outcome analysis, building on existing logistic regression estimators by incorporating monitoring time information into the working model.

SurvMaximin
For multi-center heterogeneous Real-World Data (RWD) with time-to-event out- comes and high-dimensional features, we propose the SurvMaximin algorithm to es- timate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level informa- tion. An interactive online shiny app has been implemented to perform the proposed algorithm with user input data.

TEACHING

  • TA
    • 2021, BST 244: Analysis of Failure Time Data
      Department of Biostatistics, Harvard University
      Certificate of Distinction in Teaching
  • Teacher
    • 2018-2019, Probability and Statistics
    • 2018, Sampling Survey
    • 2018, Probability and Statistics
    • 2017-2018, Probability
      Undergraduate course, Zhejiang University