Lanjing Zhang, MD
Department of Chemical Biology
Ernest Mario School of Pharmacy
Rutgers University
Office Room #: 107, 164 Frelinghuysen Rd.
Piscataway, NJ 08854
E-mail: Lanjing.Zhang #at# rutgers.edu, URL: https://thezhanglab.github.io/
List of Contributors
§ Wei Vivian Li, PhD,
co-Principal Investigator, School of Public Health, Rutgers University (now at
UC Riverside)
§ Jinchuan
Xing, PhD, co-Principal Investigator, School of Arts and Sciences, Rutgers
University
§ Nan Gao, PhD, co-Principal Investigator, School of Arts and
Sciences, Rutgers University Newark
§ Fei Deng, Postdoc research associate, School of Pharmacy, Rutgers
University
§ Catherine Feng, Summer undergraduate student, Harvard College,
Cambridge, MA (REU student) and previously high school student at Montgomery High School, NJ
Project Summary
In recent years, massive and
complex datasets such as data from, facial recognition systems, autonomous
cars, medical imaging, single-cell biology, etc. are increasing dramatically.
Machine learning as part of artificial intelligence has been used to combine
and understand these massive and complex datasets. The current mainstream
machine learning algorithms have performed well, they are primarily
mathematics-based and abstracted from their sources. Thus, these algorithms do
not consider nor incorporate the rich knowledge from which these datasets were
produced. Thus, this project aims to examine whether and how domain knowledge
influences the outcomes of machine learning algorithms on combining and
analyzing massive and complex datasets. If successful, this project will
develop and substantially validate a domain knowledge driven computing
framework. This project will enable scientists and engineers in various fields
to apply their domain knowledge to better combine and analyze massive and
complex datasets. Additional insights will also be generated to understand and
improve the machine learning algorithms themselves. Therefore, the findings of
this project will promote the progress of sciences and can directly advance
biomedical fields and human health.
Technically, this project
aims to address the knowledge gap in mathematics-driven integration and
analysis of high-dimensional datasets. This mathematics-driven knowledge gap
has limited the full and robust integration of large, high-dimensional
datasets. Moreover, external validation is required for rigorous examination of
tuned machine learning algorithms. However, a majority of the studies on
high-dimensional biomedical datasets did not use validation, largely due to
missing data. Therefore, this project will improve the integration and analysis
of high-dimensional datasets using domain-knowledge based data-normalization,
missing data imputation and dimensionality reduction. As a proof of principle,
the project also aims to develop and validate an adaptive multimetric pipeline
to integrate various types of mutiomic data using novel feature-selection and
dimensionality reduction algorithms. The resulted pipeline and package will
enable researchers to better understand and classify high-dimensional datasets
in biomedical and other fields. The project will result in a paradigm shift
because the domain-knowledge driven data normalization, data imputation and
dimensionality reduction are radically different from the mainstream
mathematics driven approaches. Finally, this project also aims to expose
undergraduate and high-school students who are interested in Computer Science
to experiences in machine learning and data science.
Publications
and Products:
Note: All full-text papers can be searched and
downloaded in PDF, if legally available, at the PI's ResearchGate page.
Journal articles
·
Ryu E, Xia
HH, GuoGL, Zhang L. "Multivariable-adjusted trends in
mortality due to alcoholic liver disease among adults in the United States,
from 1999-2017." American
journal of translational research, 2022, 14(2): 10921099 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8902556/
·
Cui M, Cheng
C, Zhang L "High-throughput proteomics: a methodological
mini-review" Laboratory
Investigation, 2022, 102 (11), 1170-1181 https://doi.org/10.1038/s41374-022-00830-7
·
Zhang L. "The Challenges and Opportunities of
Translational Pathology" Journal of
Clinical and Translational Pathology, 2022, 2(2): 6366 https://doi.org/10.14218%2Fjctp.2022.00001
·
Shrestha D, Bag A, Wu R, Zhang Y, Tang X, Qi Q, Xing
J, Cheng Y. Genomics and epigenetics guided identification of
tissue-specific genomic safe harbors. Genome Biology
2022, 23: 199
https://doi.org/10.1186/s13059-022-02770-3
·
Cheng A, Hu G, Li WV. Benchmarking cell-type clustering methods for spatially resolved transcriptomics data. Brief Bioinform 2023, 24(1): bbac475 https://doi.org/10.1093/bib/bbac475
·
Balasubramanian
I, Bandyopadhyay S, Flores J, Smak JB,
Lin X, Liu H, Sun S, Golovchenko NB, Liu Y, Wang D, Patel R, Joseph II,
Suntornsaratoon P, Vargas J, Green PHR, Bhagat Govind, Lagana SM, Ying W, Zhang
Y, Wang Z, Li WV, Singh S, Zhou Z, Kollias G, Farr LA, Moonah SN, Yu S,
Wei Z, Ferraris R, Bonder EM, Zhang L, Kiela PR, Edelblum KL, Liu TL, Gao
N. Infection and inflammation stimulate expansion of a CD74+ Paneth cell
subset to regulate disease progression. EMBO J. 2023 Nov
2;42(21):e113975 DOI: 10.15252/embj.2023113975
PMID: 37718683
·
Hu K, Zhang
L. Challenges and Opportunities Associated with Lifting the Zero COVID-19
Policy in China. Explor Res Hypothesis Med. 2024 Jan-Mar;9(1):71-75.
doi: 10.14218/erhm.2023.00002.
Epub 2023 Mar 8. PMID: 38572142; PMCID:PMC10989839.
·
Deng F, Zhao L, Yu N, Lin Y, Zhang L. Union
With Recursive Feature Elimination: A Feature Selection Framework to Improve
the Classification Performance of Multicategory Causes of Death in Colorectal
Cancer. Lab Invest. 2024 Mar;104(3):100320. doi: 10.1016/j.labinv.2023.100320.
Epub 2023 Dec 28. PMID: 38158124.
·
Liang Y, Guo
GL, Zhang L. Current and Emerging Molecular Markers of Liver Diseases: A
Pathogenic Perspective. Gene Expression 2022; 21(1), 919. doi: 10.14218/GEJLR.2022.00010 PMCID: PMC11192043
·
Suntornsaratoon P, Antonio JM, Flores J, Upadhyay R, Veltri J, Bandyopadhyay S, Dadala R, Kim M, Liu Y, Balasubramanian I, Turner JR, Su X, Li WV, Gao N, Ferraris RP. (2024) Lactobacillus rhamnosus GG Stimulates Dietary Tryptophan-Dependent Production of Barrier-Protecting Methylnicotinamide. Cell Mol Gastroenterol Hepatol. 18(2):101346. doi: 10.1016/j.jcmgh.2024.04.003. Online ahead of print. PMID: 38641207
·
Suntornsaratoon P, Ferraris RP, Ambat J, Antonio JM, Flores J, Jones A, Su X, Gao N, Li WV. (2024) Metabolomic and Transcriptomic Correlative Analyses in Germ-Free Mice Link Lacticaseibacillus rhamnosus GG-Associated Metabolites to Host Intestinal Fatty Acid Metabolism and β-Oxidation. Lab Invest. 104(4):100330. doi: 10.1016/j.labinv.2024.100330. Epub 2024 Jan 18. PMID: 38242234
· Cui M, Deng
F, Disis ML, Cheng C, Zhang L. Advances in the Clinical Application
of High-throughput Proteomics. Explor Res Hypothesis Med (in press).
Project Impact
§ Education: Parts of the project
results are used in data science education among high school students. We
hosted a series of lectures at the Montgomery High School, New Jersey.
Catherine Feng also founded a data science club at that high school (url: https://montydsc.wordpress.com/ ).
We also involved several high school and undergraduate students in the project.
They became very interested in machine learning and its application. Most
of the software and coding developed in this project have been made publicly
available (see below). All new progress will be added into the other research
collections upon completion.
§ Collaborations: For this project
we have established collaborations with several schools of Rutgers University
and Montgomery High School, New Jersey. Through such collaborations we expect
to explore many real applications and produce bigger Research Impacts.
Current and
Future Activities
The following are some of the highlights of our
ongoing work.
1. Develop
highly sensitive and specific machine learning algorithms to classify
non-cancer causes in cancer patients.
2. Study
effective and scalable methods for improving machine learning fairness.
Potential
Related Project(s)
Project Web
site URL: https://thezhanglab.github.io/EAGER.html
Online
software: Online software can be downloaded at https://github.com/FeiDeng-RUTGERS/URFE.