B. Malone, C. Tosch, B. Grellier, K. Onoue, T. Sztyler, K. Ritter, Y. Yamashita, E. Quemeneur, K. Bendjama: "Performance of neoantigen prediction for the design of TG4050, a patient specific neoantigen cancer vaccine", American Association for Cancer Research Annual Meeting AACR, April 2020
The development of therapeutic cancer vaccines to immunize against tumor antigens constitutes a promising modality. Mutation associated antigens are considered major targets given their specificity to tumor cells. These mutations are specific to the patients and require a tailor-made vaccine targeting mutations identified in each tumor. Many mutations are identified in the tumoral genome in most patients, but only a small fraction (around 1%) is suitable as vaccine target. Herein, we report data documenting the prediction performance of the algorithm used for the design of TG4050, a clinical stage patient specific viral-based neoantigen vaccine.
We have trained a set of independent machine learning algorithms to score each candidate neoantigen for several steps of the MHC antigen presentation pathway, including MHC binding, intracellular processing, similarity to self, and likelihood to elicit a T-cell response in peptide stimulated ELISPOT. Further, we have developed a novel graph neural network to combine all these scores to predict the likelihood that a neoantigen will elicit a T-cell response while also incorporating patient-specific factors, such as expression level and conservation of the mutation across different clones. To validate the system, we collected samples from 6 patients diagnosed with NSCLC, sequenced healthy and tumor tissue, identified mutations and ranked them using our algorithm; then, to evaluate immunogenicity, we focused our analysis on CD8+ T cell and measured the frequency of IFN γ+ cells against predicted peptides in autologous PBMC. Immunogenicity of peptides was assayed in 5 pools then deconvoluted against individual peptides.
From 3339 to 4782 somatic variants were detected in tumor tissue samples. After applying technical filtering, removing synonymous mutations, and filtering on transcript expression we detected a median of 281 (192-471) expressed tumor mutations resulting in a median of 2767 candidate class I epitopes (1769 - 4573). The model resulted in high accuracy allowing us to identify peptides with pre-existing ex vivo immunogenic responses in 5 out of 6 patients. Immunogenicity of peptide pools was correlated with ranking by the algorithm. Immunogenicity of the 6 top ranking individual epitopes in each patient showed a median of 5 (2-6) immunogenic peptides resulting in a 77% of true positive rate (TP). It should be noted that when no response was detected, it cannot be excluded that a response could be primed by a vaccine. In a similar setting, the netMHC 4.0 algorithm yielded a TP of 30% and only identified 39% of positive calls of our algorithm.
We demonstrate that the prediction algorithm is accurate in identifying immunogenic cancer mutations even among a large set of candidates. Ongoing TG4050 clinical studies (NCT03839524 and NCT04183166) will allow further validation of the antitumor activity of the elicited immune response.
Paper available at: Cancer Res 2020;80(16 Suppl):Abstract nr 4566
Brandon Malone, Boris Simovski, Clément Moliné, Jun Cheng, Marius Gheorghe, Hugues Fontenelle, Ioannis Vardaxis, Simen Tennøe, Jenny-Ann Malmberg, Richard Stratford, Trevor Clancy: "Artificial intelligence predicts the immunogenic landscape of SARS-CoV-2: toward universal blueprints for vaccine designs", Scientific Reports 2020
The global population is at present suffering from a pandemic of Coronavirus disease 2019(COVID-19), caused by the novel coronavirus Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2).The goals of this study were to use artificial intelligence (AI) to predict blueprints for designing universal vaccines against SARS-CoV-2, that contain a sufficiently broad repertoire of T-cell epitopes capable of providing coverage and protectionacross the global population. To help achieve these aims, we profiled the entire SARS-CoV-2 proteome across the most frequent 100 HLA-A, HLA-B and HLA-DR alleles in the human population, using host-infected cell surface antigen presentation and immunogenicity predictors from theNEC Immune Profilersuite of tools, and generated comprehensive epitope maps. We then used these epitope maps as input for a MonteCarlo simulation designed to identify statistically significant “epitope hotspot” regions in the virus that are most likely to be immunogenic across a broad spectrum of HLA types. We then removed epitope hotspots that shared significant homology with proteins in the human proteome to reduce the chance of inducing off-target autoimmune responses. We also analyzed the antigen presentation and immunogenic landscape of all the nonsynonymous mutations across 3400 different sequences of the virus, to identify a trend whereby SARS-COV-2 mutations are predicted to have reduced potential to be presented by host-infected cells, and consequently detected by the host immune system. A sequenceconservation analysis then removed epitope hotspots that occurred in less-conserved regions of the viral proteome. Finally, we used a database of the HLA genotypes of approximately 22 000 individuals to develop a “digital twin” type simulation to model how effective different combinations of hotspots would work in a diverse human population, and used the approach to identify an optimal constellation of epitopes hotspots that could provide maximum coverage in the global population.By combining the antigen presentation to the infected-host cell surface and immunogenicity predictions of the NEC Immune Profilerwith a robust Monte Carlo and digital twin simulation, we have managed to profile the entire SARS-CoV-2 proteome and identify a subset of epitope hotspots that could be harnessed in a vaccine formulation to provide a broad coverage across the global population.
"Learning Representations of Missing Data using Graph Neural Networks for Predicting Patient Outcomes," AAAI Workshop 2021
Extracting actionable insight from Electronic Health Records(EHRs) poses several challenges for traditional machinelearning approaches. Patients are often missing data relativeto each other; the data comes in a variety of modalities, suchas multivariate time series, free text, and categorical demo-graphic information; important relationships among patientscan be difficult to detect; and many others. We propose anovel approach to address these first three challenges usinga representation learning scheme based on graph neural net-works. Our proposed approach is competitive with or outper-forms the state of the art for predicting in-hospital mortality(binary classification), the length of hospital visits (regres-sion) and the discharge destination (multiclass classification).
Timo Sztyler, Brandon Malone: “Learning Embeddings from a Biomedical Knowledge Graph for Predicting Novel Relations”, GCB2019
Timo Sztyler, Carolin Lawrence, Brandon Malone: “Building a Biomedical Knowledge Graph and Predicting Novel Relations”, AKBC 2019
Alberto García Durán, Mathias Niepert, Brandon Malone: “MULTI-modal Knowledge Graph Completion to Predict Polypharmacy Side Effects”, DILS 2018