NEC orchestrating a brighter world
NEC Laboratories Europe

Intelligent Software Systems

Federico Errica, Mathias Niepert: “Tractable Probabilistic Graph Representation Learning with Graph-Induced Sum-Product Networks”, the 12th International Conference on Learning Representations (ICLR) 2024

Paper Details


We introduce Graph-Induced Sum-Product Networks (GSPNs), a new probabilistic framework for graph representation learning that can tractably answer probabilistic queries. Inspired by the computational trees induced by vertices in the context of message-passing neural networks, we build hierarchies of sum-product networks (SPNs) where the parameters of a parent SPN are learnable transformations of the a-posterior mixing probabilities of its children’s sum units. Due to weight sharing and the tree-shaped computation graphs of GSPNs, we obtain the efficiency and efficacy of deep graph networks with the additional advantages of a purely probabilistic model. We show the model’s competitiveness on scarce supervision scenarios, handling missing data, and graph classification in comparison to popular neural models. We complement the experiments with qualitative analyses on hyper-parameters and the model’s ability to answer probabilistic queries.

Accepted at: The 12th International Conference on Learning Representations (ICLR) 2024

In collaboration with: University of Stuttgart

Full paper download: Tractable_probabilistic_graph_representation_learning_with_graph-induced_sum-product_networks_pre-print.pdf

Tailin Wu, Takashi Maruyama, Long Wei, Tao Zhang, Yilun Du, Gianluca Iaccarino, Jure Leskovec: “Compositional Generative Inverse Design”, the 12th International Conference on Learning Representations (ICLR) 2024

Paper Details


Inverse design, where we seek to design input variables in order to optimize an underlying objective function, is an important problem that arises across fields such as mechanical engineering to aerospace engineering. Inverse design is typically formulated as an optimization problem, with recent works leveraging optimization across learned dynamics models. However, as models are optimized they tend to fall into adversarial modes, preventing effective sampling. We illustrate that by instead optimizing over the learned energy function captured by the diffusion model, we can avoid such adversarial examples and significantly improve design performance. We further illustrate how such a design system is compositional, enabling us to combine multiple different diffusion models representing subcomponents of our desired system to design systems with every specified component. In an N-body interaction task and a challenging 2D multi-airfoil design task, we demonstrate that our method allows us to design initial states and boundary shapes that are more complex than those in the training data. Our method outperforms state-of-the-art neural inverse design method for the N-body dataset and discovers formation flying to minimize drag in the multi-airfoil design task.

Accepted at: The 12th International Conference on Learning Representations (ICLR) 2024

In collaboration with: Westlake University, Massachusetts Institute of Technology, Stanford University

Paper link:

Federico Errica: “On Class Distributions Induced by Nearest Neighbor Graphs for Node Classification of Tabular Data”, the 37th Conference on Neural Information Processing Systems (NeurIPS) 2023

Paper Details


Researchers have used nearest neighbor graphs to transform classical machine learning problems on tabular data into node classification tasks to solve with graph representation learning methods. Such artificial structures often reflect the homophily assumption, believed to be a key factor in the performances of deep graph networks. In light of recent results demystifying these beliefs, we introduce a theoretical framework to understand the benefits of Nearest Neighbor (NN) graphs when a graph structure is missing. We formally analyze the Cross-Class Neighborhood Similarity (CCNS), used to empirically evaluate the usefulness of structures, in the context of nearest neighbor graphs. Moreover, we study the class separability induced by deep graph networks on a k-NN graph. Motivated by the theory, our quantitative experiments demonstrate that, under full supervision, employing a k-NN graph offers no benefits compared to a structure-agnostic baseline. Qualitative analyses suggest that our framework is good at estimating the CCNS and hint at k-NN graphs never being useful for such classification tasks under full supervision, thus advocating for the study of alternative graph construction techniques in combination with deep graph networks.

Presented at: The 37th Conference on Neural Information Processing Systems (NeurIPS) 2023

Full paper download: On_Class_Distributions_Induced_by_Nearest_Neighbor_Graphs_for_Node_Classification_of_Tabular_Data.pdf

Cristóbal Corvalán, Francesco Alesiani, Markus Zopf: Continuous-Discrete Message Passing for Graph Logic Reasoning”, Knowledge and Logical Reasoning in the Era of Data-driven Learning Workshop at ICML 2023

Paper Details


The message-passing principle is used in the most popular neural networks for graph-structured data. However, current message-passing approaches use black-box neural models that transform features over continuous domain, thus limiting the reasoning capability of GNNs. Traditional neural networks fail to model reasoning over discrete variables. In this work, we explore a novel type of message passing based on a differentiable satisfiability solver. Our model learns logical rules that encode which and how messages are passed from one node to another node. The rules are learned in a relaxed continuous space, which renders the training process end-to-end differentiable and thus enables standard gradient-based training. Our experiments show that MAXSAT-GNN learns arithmetic operations and that is on par with state-of-the-art GNNs, when tested on graph structured data.

Presented at: Knowledge and Logical Reasoning in the Era of Data-driven Learning Workshop at ICML 2023

Full paper download: Continuous-Discrete_Message_Passing_for_Graph_Logic_Reasoning.pdf

Makoto Takamoto, Francesco Alesiani, Mathias Niepert: “Learning Neural PDE Solvers with Parameter-Guided Channel Attention”, International Conference on Machine Learning (ICML) 2023

Paper Details


Scientific Machine Learning (SciML) is concerned with the development of learned emulators of physical systems governed by partial differential equations (PDE). In application domains such as weather forecasting, molecular dynamics, and inverse design, ML-based surrogate models are increasingly used to augment or replace inefficient and often non-differentiable numerical simulation algorithms. While a number of ML-based methods for approximating the solutions of PDEs have been proposed in recent years, they typically do not adapt to the parameters of the PDEs, making it difficult to generalize to PDE parameters not seen during training. We propose a Channel Attention mechanism guided by PDE Parameter Embeddings (CAPE) component for neural surrogate models and a simple yet effective curriculum learning strategy. The CAPE module can be combined with neural PDE solvers allowing them to adapt to unseen PDE parameters. The curriculum learning strategy provides a seamless transition between teacher-forcing and fully auto-regressive training. We compare CAPE in conjunction with the curriculum learning strategy using a popular PDE benchmark and obtain consistent and significant improvements over the baseline models. The experiments also show several advantages of CAPE, such as its increased ability to generalize to unseen PDE parameters without large increases in inference time and parameter count. 

Accepted at: International Conference on Machine Learning (ICML) 2023

In collaboration with: University of Stuttgart

Full paper download: Learning_Neural_PDE_Solvers_with_Parameter-Guided_Channel_Attention_arxiv_pre-print.pdf

Mauro Allegretta, Giuseppe Siracusano, Roberto González Sánchez, Pelayo Vallina Rodriguez, Marco Gramaglia: “Using CTI Data to Understand Real World Cyberattacks”, WONS 2023

Paper Details

The forensic analysis of Cyber Threat Intelligence (CTI) data is of capital importance for businesses and enterprises to understand what has possibly gone wrong in a cybersecurity
system. Moreover, the fast evolution of the techniques used by cybercriminals requires collaboration among multiple partnersto provide efficient security mechanisms. STIX has emerged as the industrial standard to share CTI data in a structured format, allowing entities from over the world to exchange information to broaden the knowledge base in the area. In this work, we shed light on the type of information contained in these datasets shared among partners. We analyze a large real-world STIX dataset and identify trends for the reporting of CTI data. Then, we deep dive into two kinds of attack patterns found in the dataset: Command
& Control and Malicious Software Download. We found the data is not only useful for forensic analysis but can also be used to improve the protection against new attacks.

Presented at: Wireless on Demand Network Systems and Service (WONS)
In collaboration with: IMDEA Networks, Universidad Carlos III of Madrid (UC3M)

Full paper download: Using CTI Data to Understand Real World Cyberattacks, WONS 2023 (pdf)

Davide Sanvito, Giuseppe Siracusano, Roberto Gonzalez, Roberto Bifulco: “MUSTARD: Adaptive Behavioral Analysis for Ransomware Detection”, ACM Conference on Computer and Communications Security (CCS), 2022

Paper Details


Behavioural analysis based on filesystem operations is one of the most promising approaches for the detection of ransomware. Nonetheless, tracking all the operations on all the files for all the processes can introduce a significant overhead on the monitored system. In this paper, we present MUSTARD, a solution to dynamically adapt the degree of monitoring for each process based on their behaviour to achieve a reduction of monitoring resources for the benign processes

Presented as a poster at: ACM Conference on Computer and Communications Security (CCS)


Full paper download: Adaptive_Behavioral_Analysis_for_Ransomware_Detection.pdf

Paper Details

We present an approach to improve the scalability of online machine learning-based network traffic analysis. We first make the case to replace widely-used supervised machine learning models for network traffic analysis with binary neural networks. We then introduce Neural Networks on the NIC (N3IC), a system that compiles binary neural network models into implementations that can be directly integrated in the data plane of SmartNICs. N3IC supports different hardware targets, and it generates data plane descriptions using both micro-C and P4 languages.

We implement and evaluate our solution using two use cases related to traffic identification and to anomaly detection. In both cases, N3IC provides up to a 100x lower classification latency, and 1.5-7x higher throughput than state-of-the-art software-based machine learning classification systems. This is achieved by running the entire traffic analysis pipeline within the data plane of the SmartNIC, thereby completely freeing the system’s CPU from any related tasks, while for-warding traffic at line rate (40Gbps) on the target NICs. Encouraged by these results we finally present the design and FPGA-based prototype of a hardware primitive that adds bi-nary neural network support to a NIC data plane. Our new primitive requires less than 1-2% of the logic and memory resources of a VirteX7 FPGA. We show through experimental evaluation that extending the NIC data plane enables more challenging use cases that require online traffic analysis to be performed in a few microseconds.

Conference:19th USENIX Symposium on Networked Systems Design and Implementation

Research partners: NEC Laboratories Europe, University of Cambridge, Queen Mary University of London, Imperial College London and Microsoft Research

Full paper download: Re-architecting_Traffic_Analysis_with_Neural_Network_Interface_Cards.pdf

Felipe Huici, Hugo Lefeuvre, Pierre Olivier, Costin Lupu, Sebastian Rauch, Stefan Lucian Teodorescu, Alexander Jung, Vlad-Andrei Bădoiu: “FlexOS: Towards Flexible OS Isolation”, ACM ASPLOS 2022. Distinguished Artifact Award

Paper Details


At design time, modern operating systems are locked in a specific safety and isolation strategy that mixes one or more hardware/software protection mechanisms (e.g. user/kernel separation); revisiting these choices after deployment requires a major refactoring effort. This rigid approach shows its limits given the wide variety of modern applications' safety/performance requirements, when new hardware isolation mechanisms are rolled out, or when existing ones break.

We present FlexOS, a novel OS allowing users to easily specialize the safety and isolation strategy of an OS at compilation/deployment time instead of design time. This modular LibOS is composed of fine-grained components that can be isolated via a range of hardware protection mechanisms with various data sharing strategies and additional software hardening. The OS ships with an exploration technique helping the user navigate the vast safety/performance design space it unlocks. We implement a prototype of the system and demonstrate, for several applications (Redis/Nginx/SQLite), FlexOS' vast configuration space as well as the efficiency of the exploration technique: we evaluate 80 FlexOS configurations for Redis and show how that space can be probabilistically subset to the 5 safest ones under a given performance budget. We also show that, under equivalent configurations, FlexOS performs similarly or better than existing solutions which use fixed safety configurations.

Conference: ACM ASPLOS 2022

ACM Digital Library paper abstract: FlexOS: Towards Flexible OS Isolation (F.Huici et al.)


Simon Kuenzer, Sharan Santhanam, Vlad-Andrei Bădoiu, Hugo Lefeuvre, Alexander Jung, Gaulthier Gain, Cyril Soldani, Costin Lupu, Stefan Teodorescu, Costi Răducanu , Cristian Banu, Laurent Mathy, Răzvan Deaconescu, Costin Raiciu, Felipe Huici: “Unikraft: Fast, Specialized Unikernels the Easy Way”, EuroSys 2021

EuroSys Best Paper Award, 2021
Artifact Evaluation badges: Artifacts Available, Artifacts Functional, Results Reproduced

Paper Details

Unikernels are famous for providing excellent performance in terms of boot times, throughput and memory consumption, to name a few metrics. However, they are in famous for making it hard and extremely time consuming to extract such performance, and for needing significant engineering effort in order to port applications to them. We introduce Unikraft, a novel micro-library OS that (1) fully modularizes OS primitives so that it is easy to customize the unikernel and include only relevant components and (2) exposes a set of composable, performance-oriented APIs in order to make it easy for developers to obtain high performance. Our evaluation using off-the-shelf applications such as nginx, SQLite, and Redis shows that running them on Unikraft results in a 1.7x-2.7x performance improvement compared to Linux guests. In addition, Unikraft images for these apps are around 1MB, require less than 10MB of RAM to run, and boot in around 1ms on top of the VMM time (total boot time 3ms-40ms). Unikraft is a Linux Foundation open source project and can be found

Presented at: EuroSys 2021

M.Brunella, G.Belocchi, M.Bonola, S.Pontarelli, G.Siracusano, G.Bianchi, A.Cammarano, A.Palumbo, L.Petrucci, R.Bifulco: "hXDP: Efficient Software Packet Processing on FPGA NIC"  USENIX OSDI, 2020

Jay Lepreau Best Paper Award, USENIX OSDI’20

Paper Details

FPGA accelerators on the NIC enable the offloading of expensive packet processing tasks from the CPU. However, FPGAs have limited resources that may need to be shared among diverse applications, and programming them is difficult.

We present a solution to run Linux's eXpress Data Path programs written in eBPF on FPGAs, using only a fraction of the available hardware resources while matching the performance of high-end CPUs. The iterative execution model of eBPF is not a good fit for FPGA accelerators. Nonetheless, we show that many of the instructions of an eBPF program can be compressed, parallelized or completely removed, when targeting a purpose-built FPGA executor, thereby significantly improving performance. We leverage that to design hXDP, which includes (i) an optimizing-compiler that parallelizes and translates eBPF bytecode to an extended eBPF Instruction-set Architecture defined by us; a (ii) soft-processor to execute such instructions on FPGA; and (iii) an FPGA-based infrastructure to provide XDP's maps and helper functions as defined within the Linux kernel.

We implement hXDP on an FPGA NIC and evaluate it running real-world unmodified eBPF programs. Our implementation is clocked at 156.25MHz, uses about 15% of the FPGA resources, and can run dynamically loaded programs. Despite these modest requirements, it achieves the packet processing throughput of a high-end CPU core and provides a 10x lower packet forwarding latency.

Full author details: Marco Spaziani Brunella and Giacomo Belocchi, Axbryd/University of Rome Tor Vergata; Marco Bonola, Axbryd/CNIT; Salvatore Pontarelli, Axbryd; Giuseppe Siracusano, NEC Laboratories Europe; Giuseppe Bianchi, University of Rome Tor Vergata; Aniello Cammarano, Alessandro Palumbo, and Luca Petrucci, CNIT/University of Rome Tor Vergata; Roberto Bifulco, NEC Laboratories Europe

Conference:       USENIX OSDI, 2020
Published on:     USENIX OSDI, 2020

Davide Sanvito, Andrea Marchini, Ilario Filippini, Antonio Capone; “CEDRO: an in-switch elephant flows rescheduling scheme for data-centers”, IEEE Conference on Network Softwarization (NetSoft) 2020

Paper Details

Data-center topologies interconnect an ever largernumber of servers using a high number of alternative pathsto provide high bandwidth and a high degree of resiliency. Thestate-of-the-art routing strategy is based on Equal-cost multipath(ECMP) which employs static hashing mechanism over packetheaderfields to spread the traffic over multiple paths. Routing thetraffic without considering the size of theflows and the utilizationof the paths might cause congestion due to the collision of multiplelargeflows on a same downstream path. We present CEDRO,an in-switch mechanism to detect and reschedule colliding largeflows. By exploiting the latest advances in SDN programmablenetwork devices, we offload to the network the detection ofboth the elephantflows and the path congestion conditions andthe rescheduling mechanism. CEDRO is able to promptly copewith path congestion and failures directly from the dataplane,regardless of the availability of the external controller. Weimplemented CEDRO in an emulated SDN network and testedit against realistic traffic scenarios. Numerical evaluation showsCEDRO is able to improve the average and 95-th percentile ofthe Flow Completion Time compared to ECMP.

Nicolas Weber, Felipe Huici “SOL: Effortless Device Support for AI Frameworks without Source Code Changes”, 3rd High Performance Machine Learning Workshop, 2020

Paper Details

Abstract—Modern high performance computing clusters heav-ily rely on accelerators to overcome the limited compute powerof CPUs. These supercomputers run various applications fromdifferent domains such as simulations, numerical applications orartificial intelligence(AI). As a result, vendors need to be able toefficiently run a wide variety of workloads on their hardware.In the AI domain this is in particular exacerbated by theexistance of a number of popular frameworks (e.g, PyTorch,TensorFlow, etc.) that have no common code base, and can varyin functionality. The code of these frameworks evolves quickly,making it expensive to keep up with all changes and potentiallyforcing developers to go through constant rounds of upstreaming.In this paper we explore how to provide hardware support inAI frameworks without changing the framework’s source code inorder to minimize maintenance overhead. We introduce SOL, anAI acceleration middleware that provides a hardware abstractionlayer that allows us to transparently support heterogenous hard-ware. As a proof of concept, we implemented SOL for PyTorchwith three backends: CPUs, GPUs and vector processors.Index Terms—artificial intelligence, middleware, high perfor-mance computing

P. Luigi Ventre, P. Lungaroni, G. Siracusano, C. Pisa, F. Schmidt, F. Lombardo, S. Salsano, "On the Fly Orchestration of Unikernels: Tuning and Performance Evaluation of Virtual Infrastructure Managers", IEEE Transactions on Cloud Computing Accepted Date November 2018

Paper Details

Network operators are facing significant challenges meeting the demand for more bandwidth, agile infrastructures, innovative services, while keeping costs low. Network Functions Virtualization (NFV) and Cloud Computing are emerging as key trends of 5G network architectures, providing flexibility, fast instantiation times, support of Commercial Off The Shelf hardware and significant cost savings. NFV leverages Cloud Computing principles to move the data-plane network functions from expensive, closed and proprietary hardware to the so-called Virtual Network Functions (VNFs). In this paper we deal with the management of virtual computing resources (Unikernels) for the execution of VNFs. This functionality is performed by the Virtual Infrastructure Manager (VIM) in the NFV MANagement and Orchestration (MANO) reference architecture. We discuss the instantiation process of virtual resources and propose a generic reference model, starting from the analysis of three open source VIMs, namely OpenStack, Nomad and OpenVIM. We improve the aforementioned VIMs introducing the support for special-purpose Unikernels and aiming at reducing the duration of the instantiation process. We evaluate some performance aspects of the VIMs, considering both stock and tuned versions. The VIM extensions and performance evaluation tools are available under a liberal open source licence.

Published in: IEEE Transactions on Cloud Computing
PDF download: On the Fly Orchestration of Unikernels

Full paper download: NLE_Research_Paper_On_the_Fly_Orchestration_of_Unikernels_2018.pdf

S. Pontarelli, R. Bifulco, M. Bonola, G. Siracusano, M. Honda, F. Huici, "FlowBlaze: Stateful Packet Processing in Hardware" in USENIX Symposium on Networked Systems Design and Implementation (NSDI),  March 2019

Paper Details

While programmable NICs allow for better scalability to handle growing network workloads, providing an expressive yet simple abstraction to program stateful network functions in hardware remains a research challenge. We address the problem with FlowBlaze, an open abstraction for building stateful packet processing functions in hardware. The abstraction is based on Extended Finite State Machines and introduces the explicit definition of flow state, allowing FlowBlaze to leverage flow-level parallelism. FlowBlaze is expressive, supporting a wide range of complex network functions, and easy to use, hiding low-level hardware implementation issues from the programmer. Our implementation of FlowBlaze on a NetFPGA SmartNIC achieves very low latency (in the order of a few microseconds), consumes relatively little power, can hold per-flow state for hundreds of thousands of flows, and yields speeds of 40 Gb/s, allowing for even higher speeds on newer FPGA models. Both hardware and software implementations of FlowBlaze are publicly available.

Conference: USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2019
PDF download: FlowBlaze: Stateful Packet Processing in Hardware

G. Siracusano, R. Gonzalez, R. Bifulco: "On the application of NLP to discover relationships between malicious network entities”, poster at CCS 2019.

Carmelo Cascone, Roberto Bifulco, Salvatore Pontarelli, Antonio Capone: “Relaxing state-access constraints in stateful programmable data planes”, ACM SIGCOMM Computer Communication Review, 2018

Paper Details

Supporting programmable stateful packet forwarding functions in hardware requires a tight balance between functionality and performance. Current state-of-the-art solutions are based on a very conservative model that assumes worst-case workloads. This finally limits the programmability of the system, even if actual deployment conditions may be very different from the worst-case scenario. We use trace-based simulations to highlight the benefits of accounting for specific workload characteristics. Furthermore, we show that relatively simple additions to a switching chip design can take advantage of such characteristics. In particular, we argue that introducing stalls in the switching chip pipeline enables stateful functions to be executed in a larger but bounded time without harming the overall forwarding performance. Our results show that, in some cases, the stateful processing of a packet could use 30x the time budget provided by state of the art solutions.

Top of this page