Makoto Takamoto, Timothy Praditia, Raphael Leiteritz, Dan MacKinlay, Dirk Pflüger, Francesco Alesiani, Mathias Niepert: “PDEBENCH: An Extensive Benchmark for Scientific Machine Learning,” NeurIPS, 2022 (accepted)
Machine learning-based modeling of physical systems has experienced increased interest in recent years. Despite some impressive progress, there is still a lack of benchmarks for Scientific ML that are easy to use but still challenging and representative of a wide range of problems. We introduce PDEBENCH, a benchmark suite of time-dependent simulation tasks based on Partial Differential Equations (PDEs). PDEBENCH comprises both code and data to benchmark the performance of novel machine learning models against both classical numerical simulations and machine learning baselines. Our proposed set of benchmark problems con-tribute the following unique features: (1) A much wider range of PDEs compared to existing benchmarks, ranging from relatively common examples to more real-istic and difficult problems; (2) much larger ready-to-use datasets compared to prior work, comprising multiple simulation runs across a larger number of initial and boundary conditions and PDE parameters; (3) more extensible source codes with user-friendly APIs for data generation and baseline results with popular machine learning models (FNO, U-Net, PINN, Gradient-Based Inverse Method). PDEBENCH allows researchers to extend the benchmark freely for their own purposes using a standardized API and to compare the performance of new models to existing baseline methods. We also propose new evaluation metrics with the aim to provide a more holistic understanding of learning methods in the context of Scientific ML. With those metrics we identify tasks which are challenging for recent ML methods and propose these tasks as future challenges for the community. The code is available at https://github.com/pdebench/PDEBench.
To be presented at: Conference on Neural Information Processing Systems (NeurIPS), 2022 (accepted)
Preprint paper available at: https://arxiv.org/abs/2210.07182
D. Friede, M. Niepert: “Efficient Learning of Discrete-Continuous Computations Graphs”, Conference on Neural Information Processing Systems (NeurIPS) 2021
Numerous models for supervised and reinforcement learning benefit from combinations of discrete and continuous model components. End-to-end learnable discrete-continuous models are compositional, tend to generalize better, and are more interpretable. A popular approach to building discrete-continuous computation graphs is that of integrating discrete probability distributions into neural networks using stochastic softmax tricks. Prior work has mainly focused on computation graphs with a single discrete component on each of the graph’s execution paths. We analyze the behavior of more complex stochastic computations graphs with multiple sequential discrete components. We show that it is challenging to optimize the parameters of these models, mainly due to small gradients and local minima. We then propose two new strategies to overcome these challenges. First, we show that increasing the scale parameter of the Gumbel noise perturbations during training improves the learning behavior. Second, we propose dropout residual connections specifically tailored to stochastic, discrete-continuous computation graphs. With an extensive set of experiments, we show that we can train complex discrete-continuous models which one cannot train with standard stochastic softmax tricks. We also show that complex discrete-stochastic models generalize better than their continuous counterparts on several benchmark datasets.
Conference: Conference on Neural Information Processing Systems (NeurIPS) 2021
M. Niepert, Minervini Pasquale, Luca Franceschi: “Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions”, Conference on Neural Information Processing Systems (NeurIPS) 2021
Combining discrete probability distributions and combinatorial optimization problems with neural network components has numerous applications but poses several challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE), a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components. I-MLE is widely applicable as it only requires the ability to compute the most probable states and does not rely on smooth relaxations. The framework encompasses several approaches such as perturbation-based implicit differentiation and recent methods to differentiate through black-box combinatorial solvers. We introduce a novel class of noise distributions for approximating marginals via perturb-and-MAP. Moreover, we show that I-MLE simplifies to maximum likelihood estimation when used in some recently studied learning settings that involve combinatorial solvers. Experiments on several datasets suggest that I-MLE is competitive with and often outperforms existing approaches which rely on problem-specific relaxations.
Conference: Conference on Neural Information Processing Systems (NeurIPS)
Francesco Alesiani, Shujian Yu, Xi Yu: “Gated Information Bottleneck for Generalization in Sequential Environments” IEEE International Conference on Data Mining (ICDM) 2021 (accepted)
Abstract—Deep neural networks suffer from poor generalization to unseen environments when the underlying data distribution is different from that in the training set. By learning minimum sufficient representations from training data, the information bottleneck (IB) approach has demonstrated its effectiveness to improve generalization in different AI applications. In this work, we propose a new neural network-based IB approach, termed gated information bottleneck (GIB), that dynamically drops spurious features and progressively selects the most relevant ones across different environments by a trainable soft mask (on raw features). GIB enjoys a simple and tractable objective, without any variational approximation or distributional assumption. We empirically demonstrate the superiority of GIB over other popular neural network-based IB approaches in adversarial robustness and out-of-distribution (OOD) detection. Meanwhile, we also establish the connection between IB theory and invariant causal representation learning, and observed that GIB demonstrate appealing performance when different environments are observed sequentially, a more practical scenario where invariant risk minimization (IRM) fails.
Shujian Yu, Francesco Alesiani, Xi Yu, Robert Jenssen, Jose C. Príncipe: “Measuring Dependence with Matrix-based Entropy Functional,” AAAI 2021
Measuring the dependence of data plays a central role in statistics and machine learning. In this work, we summarize and generalize the main idea of existing information-theoretic dependence measures into a higher-level perspective by the Shearer’s inequality. Based on our generalization, we then propose two measures, namely the matrix-based normalized total correlation Tα* and the matrix-based normalized dual total correlation Dα* to quantify the dependence of multiple variables in arbitrary dimensional space, without explicit estimation of the underlying data distributions. We show that our measures are differentiable and statistically more powerful than prevalent ones. We also show the impact of our measures in four different machine learning problems, namely the gene regulatory network inference, the robust machine learning under covariate shift and non-Gaussian noises, the subspace outlier detection, and the understanding of the learning dynamics of convolutional neural networks (CNNs), to demonstrate their utilities, advantages, as well as implications to those problems. Code of our dependence measure is available at: https://bit.ly/AAAI-dependence.
Full author details: Shujian Yu, NEC Laboratories Europe; Francesco Alesiani, NEC Laboratories Europe; Xi Yu, University of Florida; Robert Jenssen, UiT - The Arctic University of Norway; Jose C. Príncipe, University of Florida
Presented at: 35th Conference on Artificial Intelligence (AAAI-21)
Carolin Lawrence, Timo Sztyler and Mathias Niepert: “Explaining Neural Matrix Factorization with Gradient Rollback”, AAAI 2021
Explaining the predictions of neural black-box models is an important problem, especially when such models are used in applications where user trust is crucial. Estimating the influence of training examples on a learned neural model's behavior allows us to identify training examples most responsible for a given prediction and, therefore, to faithfully explain the output of a black-box model. The most generally applicable existing method is based on influence functions, which scale poorly for larger sample sizes and models.
We propose gradient rollback, a general approach for influence estimation, applicable to neural models where each parameter update step during gradient descent touches a smaller number of parameters, even if the overall number of parameters is large. Neural matrix factorization models trained with gradient descent are part of this model class. These models are popular and have found a wide range of applications in industry. Especially knowledge graph embedding methods, which belong to this class, are used extensively. We show that gradient rollback is highly efficient at both training and test time. Moreover, we show theoretically that the difference between gradient rollback's influence approximation and the true influence on a model's behavior is smaller than known bounds on the stability of stochastic gradient descent. This establishes that gradient rollback is robustly estimating example influence. We also conduct experiments which show that gradient rollback provides faithful explanations for knowledge base completion and recommender datasets.
Presented at: 35th Conference on Artificial Intelligence (AAAI-21)
Full paper download: 16632-Article_Text-20126-1-2-20210518.pdf
A. Garcıa-Duran, R. Gonzalez, D. Onoro-Rubio, M. Niepert, H. Li: "TransRev: Modeling Reviews as Translations from Users to Items", 42nd European Conference on Information Retrieval (ECIR 2020), April 2020
C. Lawrence, B. Kotnis, M. Niepert: “Attending to Future Tokens for Bidirectional Sequence Generation”, EMNLP 2019
Neural sequence generation is typically performed token-by-token and left-to-right.Whenever a token is generated only previously produced tokens are taken into consideration. In contrast, for problems such as sequence classification, bidirectional attention, which takes both past and future tokens into consideration, has been shown to perform much better. We propose to make the sequence generation process bidirectional by employing special placeholder tokens. Treated as a node in a fully connected graph, a placeholder token can take past and future tokens into consideration when generating the actual output token. We verify the effectiveness of our approach experimentally on two conversational tasks where the proposed bidirectional model outperforms competitive baselines by a large margin.
Presented at: Conference on Empirical Methods in Natural Language Processing 2019 and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019
K. Akimoto, T. Hiraoka, K. Sadamasa and M. Niepert: “Cross-Sentence N-ary Relation Extraction using Lower-Arity Universal Schemas”, EMNLP 2019
Luca Franceschi, Xiao He, Mathias Niepert, Massimiliano Pontil, “Graph structure learning for GCNs”, ICLR, July 2019
C. Wang, M.Niepert: “State-Regularized Recurrent Neural Networks” ICML 2019 (Thirty-sixth International Conference on Machine Learning), May 2019
L. Franceschi, X. He, M. Niepert, M. Pontil:“Learning Discrete Structures for Graph Neural Networks” ICML 2019 (Thirty-sixth International Conference on Machine Learning), 2019
C. Wang, M. Niepert, H. Li, "RecSys-DAN: Discriminative Adversarial Networks for Cross-Domain Recommender Systems" in IEEE Transactions on Neural Networks and Learning Systems. March 2019
A. G. Duran, D. Rubio, M. Niepert, Y. Liu, H. Li, D. Rosenblum, "MMKG: Multi-Modal Knowledge Graphs" in ESWC 2019, the 16th Extended Semantic Web Conference. March 2019
D. Rubio, A. G. Duran, M. Niepert, R. Gonzales, R. Lopez-Sastre, "Answering Visual-Relational Queries in Web-Extracted Knowledge Graphs" in AKBC 2019, Automated Knowledge Base Construction Conference. March 2019
B. Kotnis, A. G. Duran, "Learning Numerical Attributes in Knowledge Bases" in AKBC 2019, Automated Knowledge Base Construction Conference. March 2019