• E-mail :[email]
  • Phone : 05 57 57 13 93
  • Location : Bordeaux, France
Last update 2021-09-05 23:21:43.465

Boris Hejblum Research Faculty in Biostatistics

Course and current status

I am currently Research Faculty (Chargé de Recherche) at Inserm Bordeaux Population Health research center U1219, and my research activities are fostered by the SISTM team from both INRIA and INSERM institutes.

Broadly speaking, my main research focus is on the longitudinal analysis of high-dimensional biomedical data. My latest projects include gene set analysis of longitudinal gene expression, statistical processing of (longitudinal) flow-cytometry data through Bayesian nonparametric modeling and Optimal Transport, probabilistic analysis of high-dimensional data from Electronic Health Records, and methods development for multi-omics integrative analysis.

Until 2021, I was tenured associate professor (Maître de Conférences) at the Bordeaux School of Public Health from the University of Bordeaux. Before that I did a postdoc with Tianxi Cai at the Harvard T.H. Chan School of Public Health in Boston. Before that I did my Ph.D. in Public Health - Biostatistics at ISPED in Bordeaux, France, under the supervision of François Caron and Rodolphe Thiébaut. I graduated from the ENSAI (National School for Statistics and Information Analysis — part of the very French Grandes Écoles system) in 2011, where I majored in Biostatistics. In case you would want to learn more about my curriculum, please go see my CV or check my LinkedIn profile.

Scientific summary

Artificial Intelligence for health: I have developed various artificial intelligence ap-
proaches to solve biomedical data analysis bottlenecks. In particular, I am working on machine learning approaches to automate the processing of flow and mass cytometry measurements, and also on automated medical diagnosis from both structured data and free text medical notes in English, French and Chinese through language agnostic algorithms.
Statistical genomics & high dimensional data: I have a strong interest in models for
high dimensional data. I am familiar with the multiple testing issue and potential strategies
to face it. I have worked on sparse Partial Least Squares methods, and with other dimension
reduction approaches such as the random forests or the LASSO. I have analyzed gene expression data in a clinical trial context and I am familiar with the specificities of this kind of data, such as preprocessing.
Electronic Health Records: I am currently developing models to perform probabilistic
record linkage to match electronic health records without using identifier variables, and to
predict disease phenotype from electronic health record data, with application in infection and rheumatoïd arthritis.
Bayesian nonparametric models: I am interested in statistical learning methods such as
nonparametric Bayesian mixture of skew distributions for the clustering of large cell popula-
tions.
Evidence synthesis causal analysis: I studied stochastic modeling of life-course health
data. The developed idea was to explore potential causal factors of myocardial infarction by
relating the drift of a degradation process with metadata from the literature.

Image d’exemple