# Bilateral Project ULCO / YNU 2025 > PHC SAKURA / JSPS — Deadline: 20 Aug (JP) / 3 Sept (FR) https://www.campusfrance.org/fr/sakura https://chercheurs.campusfrance.org/candidature/ ## **Identification** - ~~Title 1: **Improving the Explainability of Feature Selection and Related Optimization Problems** (AI?)~~ - ~~Title 2: **Improving the Explainability of Feature Selection and Related Optimization Problems in AI**~~ - Title 3: **Improving XAI through the Landscape Analysis of Feature Selection and other Pseudo-Boolean Optimization Problems** - **Domain:** Computer Science - **Keywords:** optimization, artificial intelligence, feature selection, explainability --- ## Context and history ### Project objectives Explainable Artificial Intelligence (**XAI**) aims to increase the transparency and interpretability of decisions made by AI systems. A key component of XAI is **feature selection**, which is critical for providing insights into machine learning (**ML**) models. Feature selection consists in selecting a subset of relevant features (the input variables, or predictors) for constructing prediction models. One reason for this is to simplify these models, thus making them easier to interpret for the user and highlighting the key features that are essential for reaching accurate predictions. Simplified models also typically result in fewer computations and faster training times, making them a key element of frugal AI. Unlike other reduction techniques such as feature construction and extraction, feature selection is known to preserve the original data semantics, thereby facilitating human understanding. Feature selection is an integral part of the typical XAI pipeline, with applications ranging from pattern and image recognition to disease diagnosis and medical imaging in healthcare. Advanced technologies for feature selection typically formulate it as an **optimization problem**. However, its combinatorial, expensive, and black-box nature raise a number of computational challenges that need to be addressed by the optimization method. These are usually tackled using search-based optimization heuristics from **computational intelligence**, such as (greedy) **local search** and **evolutionary computation** algorithms. Unfortunately, these methods often result in **suboptimal solutions**. As such, the inherent **bias** of feature selection approaches, even with the most recent progress, is not fully understood by researchers. This is precisely the purpose of the project objectives: to systematically study feature selection technologies within the context of optimization and XAI. ### Issues and international state of the art Existing feature selection technologies encompass filter, embedded, and wrapper approaches [GE03]. Filter methods assess the relevance of features using statistical tests, while embedded methods integrate feature selection directly into the ML model. By contrast, **wrapper feature selection** can be expressed as a **combinatorial (pseudo-Boolean) optimization problem**. This problem is then typically solved through the iterative search of a subset of features. For a dataset with n features, the feature selection problem aims to identify the minimal subset of p ≤ n features that maximizes the accuracy of the ML model. The quality of a feature subset is evaluated by training the considered ML model using the selected features as predictors. Established wrapper methods for feature selection include forward and backward sequential feature selection (SFS) [FP+94], both of which are simple greedy **local search** algorithms. **Forward-SFS**, also known as *greedy forward selection*, starts with an empty set of features and iteratively adds them one by one, each time maximizing an accuracy score. This process continues until the target number of features is reached, or further improvements are no longer possible, indicating that any remaining features negatively impact prediction accuracy. On the other hand, **Backward-SFS**, also referred to as *greedy backward elimination*, operates in the opposite direction. It starts with all the features and removes the least contributing one in each iteration. SFS is so prevalent that it is available in widely recognized ML libraries such as [**scikit-learn**](https://scikit-learn.org/). However, because of its greedy and local nature, the subset provided by SFS is generally *not* optimal for the dataset and ML model under consideration. SFS often falls into suboptimal subsets, known as **local optima**, which can hinder search progress. To better cope with this, **evolutionary algorithms** have been extensively studied. These advanced computational intelligence techniques help escape local optima by maintaining multiple solutions simultaneously and randomizing the exploration of new solutions. As a result, the literature suggests that evolutionary wrapper feature selection can significantly improve performance over greedy local search [DDK22, XZ+16]. However, while these methods increase the chance of finding high-quality local optima, there is no guarantee of finding the optimum. Moreover, their effectiveness is primarily based on experimental observations, thus indicating a **lack a fundamental understanding**. Consequently, exhaustive search is the only known unbiased method. However, it becomes computationally intractable for problems with over 20 features. In order to deal with practical datasets, all wrapper feature selection methods carry an **inherent bias**. Therefore, we argue that this bias must be thoroughly understood and explained to users before attempting to **improve the robustness of feature selection**. Despite this bias, and although the wrapper approach is time-consuming due to its iterative nature, it typically yields a better subset of features compared to the filter and embedded approaches [DDK22, XZ+16]. This is why cost-effective ML algorithms such as k-nearest neighbors (kNN) are commonly used in wrapper feature selection [DDK22]. Existing studies on the fundamental analysis of feature selection are sparse and, likewise, focus on kNN only [MME18, MM+19]. However, our initial experiments reveal that the optimal subset of features can vary significantly between different ML algorithms [LTV24]. This means that using kNN for selection might result in a different solution than with the ML model chosen by the user. On top of that, it remains unclear whether the inherent optimization challenges brought on by different ML models are the same. Under the same line of reasoning, it is evident that the choice of the ML accuracy score also leads to a distinct feature selection problem, with its own characteristics and solutions. In fact, this choice can bring additional challenges since feature selection often involves **multiple objectives** [JN+24], which may include various accuracy scores, while minimizing the number of selected features. As such, we argue that each (1) dataset, (2) ML model, and (3) ML accuracy score(s) creates a unique feature selection problem. We propose to **characterize the resulting solutions and optimization difficulty** of feature selection problems, depending on the dataset, ML method, and ML accuracy score(s) being considered. > [DDK22] T. Dökeroglu, A. Deniz, H.E. Kiziloz: A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing 494, 269–296 (2022) > > [FP+94] F.J. Ferri, P. Pudil, M. Hatef, J. Kittler: Comparative study of techniques for large-scale feature selection. Machine Intelligence and Pattern Recognition 16, 403-413 (1994) > > [GE03] I. Guyon, A. Elisseeff: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003) > > [JN+24] R. Jiao, B.H. Nguyen, B. Xue, M. Zhang: A survey on evolutionary multiobjective feature selection in classification: Approaches, applications, and challenges. IEEE Transactions on Evolutionary Computation (2024, in press) > > [LTV24] A. Liefooghe, R. Tanabe, S. Verel: Contrasting the landscapes of feature selection under different machine learning models. International Conference on Parallel Problem Solving from Nature (PPSN 2024), Lecture Notes in Computer Science, vol 15148, Hagenberg, Austria (2024, in press) > > [MME18] W. Mostert, K.M. Malan, A.P. Engelbrecht: Filter versus wrapper feature selection based on problem landscape features. Genetic and Evolutionary Computation Conference Companion (GECCO 2018), 1489-1496, Kyoto, Japan > > [MM+19] W. Mostert, K.M. Malan, G. Ochoa, A.P. Engelbrecht: Insights into the feature selection problem using local optima networks. European Conference on Evolutionary Computation in Combinatorial Optimisation (EvoCOP 2019), Lecture Notes in Computer Science, vol 11452, 147-162, Leipzig, Germany > > [XZ+16] B. Xue, M. Zhang, W.N. Browne, X. Yao: A survey on evolutionary computation approaches to feature selection. IEEE Transactions on Evolutionary Computation 20(4), 606–626 (2016) > ### Ongoing projects or existing activities related to the main objective of the project This project is in line with the recent European COST Action "Randomised Optimisation Algorithms Research Network" ([**ROAR-NET**](https://roar-net.eu/), 2023-2027). The aim is to enhance the competitiveness of randomized optimization algorithms, such as local search and evolutionary algorithms, and promote their widespread adoption. The focus is not only on their performance but also on all aspects of their practical application, including the humans, processes, and technologies involved. One work package focuses on selecting and configuring randomized optimization algorithms for specific problems, or alternatively, to achieve high performance across a wide range of problems. A dedicated task force specifically addresses **explainability**, striving to improve the interpretability of algorithm performance, evaluate their ability to handle particular problem properties, and identify their strengths and weaknesses. Team members actively participate in this COST action, which is more focused on optimization algorithms rather than their application in the context of feature selection. The project also aligns with the pan-European Confederation of Laboratories for Artificial Intelligence Research in Europe ([**CLAIRE**](https://claire-ai.org/)), aiming to strengthen European AI research and innovation. In collaboration with Japanese researchers, it adheres to CLAIRE's broad view of AI, going beyond traditional ML. It encompasses automated reasoning, learning, search and optimization, with a focus on combining automated learning and reasoning. This is a crucial and promising research area with significant potential impact on industrial applications. The project underscores CLAIRE's core aspect of human-centered AI by developing a decision support system that blends with human expertise for understanding and designing search algorithms and enhancing their practical application for explainability. It is worth mentioning that evolutionary computation for feature selection was recently discussed in a [**tutorial**](https://gecco-2024.sigevo.org/Tutorials#id_Evolutionary%20Computation%20for%20Feature%20Selection%20and%20Feature%20Construction) at [GECCO 2024](https://gecco-2024.sigevo.org/). It was also the main topic of a [**special session**](https://hoaibach.github.io/CEC24SS.html) at the [WCCI-CEC 2024](https://wcci2024.org/) conference. Both venues are major conferences on evolutionary computation. However, their primary focus is on the design and experimental evaluation of algorithms, rather than their fundamental understanding or the variations in optimization difficulty induced by the dataset, ML model, and score function under consideration. ### Reminder of the context of the cooperation and the existing relations between the 2 teams The team members first met informally at international conferences and workshops on computational intelligence. It was during these events that we recognized the commonality and complementarity of our research and the potential benefits of collaboration. For these reasons, we are confident in our ability to cooperate and make productive progress on the interplay between XAI and optimization. Furthermore, this project will provide an opportunity to establish a long-term collaboration, including student exchanges between France and Japan. In August 2023, Arnaud Liefooghe from ULCO (coordinator for France) visited Ryoji Tanabe’s laboratory at YNU (coordinator for Japan) for a week. This visit was supported by the French Embassy in Japan as part of the **Japan Exploration Program**. During the one-week stay, we had in-depth discussions not only on short-term joint research projects but also on long-term joint research platforms. This visit laid the groundwork for ongoing research collaborations between the two teams. We also held a seminar to share our latest research results and had a chance to interact with the Japanese participants. In April 2024, members from both teams participated in the renowned international **workshop on Benchmarking in Multi-Criteria Optimization** ([**BeMCO**](https://www.lorentzcenter.nl/bemco-benchmarking-in-multi-criteria-optimisation.html)) at the Lorentz Center, Leiden University, the Netherlands. This week-long workshop not only provided us with an opportunity to actively engage in research topics related to our project, but also to connect and interact with European and Japanese researchers, whom we identified as potential partners for future expansion of our project. As a significant initial achievement, our first joint research effort has recently been accepted at [**PPSN 2024**](http://ppsn2024.fh-ooe.at/), a premier international conference in the field of evolutionary computation. This **co-authored article** addresses the core topic of this project, focusing on the landscape analysis of feature selection problems. This research outcome serves as the basis for work package **`WP1`**, which is discussed in more detail below. In June 2024, the French members of our collaborative research team made a return visit to the counterpart Japanese laboratory. This visit was strategically planned to align with the investigators' trip to Japan for the international conference [**WCCI 2024**](http://wcci2024.org/), held in Yokohama. The timing allowed for both conference participation and a productive exchange of ideas and findings. During this visit, we engaged in detailed discussions and collaborative sessions, ultimately finalizing the comprehensive content of this project proposal. From the above exchanges and the course of negotiations, it is clear that the Japanese and French members have invested significant time and effort into planning this research project and are well prepared for its successful implementation. As shown in our preliminary joint publication, the basic study has already taken an important first step toward our research objectives. We are confident in our ability to accomplish the joint research productively and constructively. The collaborative nature of this project, combining the shared vision and complementary expertise of both Japanese and French members, is expected to make significant contributions to the field through our joint research efforts. ### Complementarity of the teams This project will benefit from the expertise of members from both countries who have long been involved in the evolutionary computation community at both fundamental and applied levels. The two partners are international experts in designing and benchmarking advanced optimization algorithms with complementary strengths. The team in Japan has a strong background in designing innovative algorithms, while the team in France has a strong background in the fundamental analysis of optimization problems and algorithms. The principal investigator from Japan, Dr. Ryoji Tanabe, is an expert in multi-objective optimization and algorithm selection in the **continuous** optimization domain. A senior Japanese participant, Dr. Shinichi Shirakawa, specializes in machine learning and its synergy with optimization and evolutionary computation. They have published excellent research in top journals and international conferences in the field. The French participants are experts in the same topics within the **combinatorial** optimization domain. The team gathers one of the largest groups of researchers in this field and is recognized worldwide for its contributions and significant activity in developing advanced methodologies for landscape analysis. Their international collaborations have created synergies between researchers and institutions, enriching their research with complementary expertise. Therefore, this cooperation is a natural follow-up to the complementary and regular scientific exchanges between the French and Japanese partners in recent years. The combined expertise of all participants complements each other, making their collaboration a significant asset for this research project. We believe this collaborative research will enhance scientific visibility and accessibility for the scientific community in optimization, both at the fundamental (advanced analysis) and applied (problem-solving) levels. ### Significant scientific productions of the teams related to the project (5 maximum) - Arnaud Liefooghe, Ryoji Tanabe, Sébastien Verel. **Contrasting the landscapes of feature selection under different machine learning models**. International Conference on Parallel Problem Solving from Nature ([**PPSN 2024**](http://ppsn2024.fh-ooe.at/)), Lecture Notes in Computer Science (LNCS), Hagenberg, Austria, 2024 (in press) - Ryoji Tanabe. [**Benchmarking feature-based algorithm selection systems for black-box numerical optimization**](https://doi.org/10.1109/TEVC.2022.3169770). IEEE Transactions on Evolutionary Computation, vol 26, n 6, pp 1321–1335, 2022 - Arnaud Liefooghe, Fabio Daolio, Sébastien Verel, Bilel Derbel, Hernán Aguirre, Kiyoshi Tanaka. [**Landscape-aware performance prediction for evolutionary multi-objective optimization**](https://doi.org/10.1109/TEVC.2019.2940828). IEEE Transactions on Evolutionary Computation, vol 24, n 6, pp 1063–1077, 2020 - Ryoji Tanabe, Hisao Ishibuchi. [**An analysis of quality indicators using approximated optimal distributions in a 3-D objective space**](https://doi.org/10.1109/TEVC.2020.2966014). IEEE Transactions on Evolutionary Computation, vol 24, n 5, pp 853-867, 2020 - Matthieu Basseur, Bilel Derbel, Adrien Goëffon, Arnaud Liefooghe. [**Experiments on greedy and local search heuristics for d–dimensional hypervolume subset selection**](http://dx.doi.org/10.1145/2908812.2908949). Genetic and Evolutionary Computation Conference ([**GECCO 2016**](http://gecco-2016.sigevo.org/)), pp 541–548, Denver, USA, 2016 --- ## Description ### Project description This project addresses wrapper **feature selection** as a **combinatorial optimization problem**. It aims to identify the most relevant subset of features from a potentially large and diverse set. Our goal is to significantly **deepen our understanding** of the **challenges** inherent in feature selection problems, as well as the **strengths** and **biases** induced by feature selection algorithms. Despite recent progress in the field, it remains challenging to reliably assess outcomes and recommend appropriate algorithms for specific feature selection tasks. These are precisely the questions we tackle in this project. Given its wide range of applications—touching nearly every ML pipeline—feature selection is a critical component of AI systems. This issue relates to automated machine learning ([**AutoML**](https://www.automl.org/automl/)), which aims to automate various stages of ML, including feature engineering and selection. This process is crucial for: (i) enhancing model performance by reducing noise and irrelevant information, (ii) minimizing environmental impact through reduced computational complexity and storage needs, and (iii) improving model **explainability**, thus providing more **meaningful insights** to users. The complexity of feature selection stems from the **heterogeneous nature** of datasets and use cases, as well as the interplay between various elements: - **Dataset characteristics**: Vary in terms of the number of observations, number of features, feature types (e.g., categorical, numerical), and the dependencies among them. - **Output domain**: Can be classification (binary or multi-class) or regression. - **ML models**: Different models may perform best with different subsets of features. - **Evaluation metrics**: The choice of the score function can significantly influence the optimal feature subset. By formulating feature selection as a pseudo-Boolean optimization problem, we can leverage powerful optimization techniques from local search and evolutionary computation while accounting for the specific characteristics of the underlying ML task. Despite active research on feature selection in recent years and the panel of existing algorithms, there is a lack of **systematic understanding** of the feature selection problem. This knowledge gap hinders our ability to explain the strengths and weaknesses of each method. When a particular algorithm fails to optimize a specific problem, current research often falls short in explaining why. Without comprehending the causes of failure, trust in algorithms erodes, and improving their performance becomes increasingly challenging. This, in turn, impedes overall progress in the field. Our aim is to systematize feature selection techniques within the context of optimization and XAI by elucidating the fundamental nature of feature selection and related pseudo-Boolean optimization problems. Our methodology will leverage **landscape analysis** techniques to characterize the relationships between the properties of a feature selection problem and its associated dataset, ML model, and score function. This approach will help us infer landscape properties and identify potential avenues for developing **improved algorithms**. Furthermore, the insights gained from landscape analysis will be used to inform the design of optimization algorithms, leading to more effective optimization strategies. Only two previous studies have investigated the landscape of feature selection problems, focusing on limited datasets, classification tasks only, one ML model, and a single score function [MME18, MM+19]. In contrast, our study aims to provide a comprehensive understanding of optimization problems and algorithms. We will examine how diverse datasets, output domains, ML models, and score functions impact problem difficulty and properties, targeting a more **holistic view** of feature selection. As such, this project will consist of a fundamental component and a more applied component. The **fundamental part** will involve defining landscapes to facilitate the analysis of structured spaces for optimization. The **applied component** will focus on designing problem-solving strategies informed by landscape analysis, primarily targeting feature selection. Both components of the project will be addressed concurrently and cohesively, allowing them to inform and enhance each other. ### Methodology The project is organized into four work packages. Their contents and scientific methods are described in the next section. Here, we first explain our methodology in terms of project management, referred to as work package `WP0`. **`WP0` Organization and dissemination methodology** We aim at disseminating our scientific advances and developed methodologies with the scientific community by submitting our results to high-impact, top-ranked international **journals and conferences** in the fields of evolutionary optimization and artificial intelligence. We also plan to make our findings available in **open access** to ensure the **replicability** and **reproducibility** of our research outcomes. Indeed, given the experimental nature of the project, being able to repeat experiments and reach similar conclusions is crucial for achieving consensus on empirical claims. Similarly, assuming experimental findings hold under similar conditions is key for decision-making and predicting outcomes in real-world applications. **Project management** will involve scientific meetings and seminars held every semester. Each visit will focus on key technical exchanges for research execution and sharing progress in our joint research. This aligns with the mission schedule, where French participants travel to Japan annually and vice versa. Two French and two Japanese researchers will travel each year. We also plan to request external funding to support additional missions. Additionally, participants will be encouraged to share their activities through smaller or online meetings. As discussed below, we will also work on setting up a **Memorandum of Understanding** (MoU) for academic cooperation and exchange between ULCO and YNU. We will set up an open-access project **website** and a web **repository** with a versioning system. This will increase the project's visibility to the international community by providing descriptions of partners, scientific goals, and research outputs. All members will have access with editing rights and will be encouraged to share their progress, ongoing issues, and scheduled items. This will help record and monitor milestones across the project timeline. The website will also host any developed software, benchmarks, and output data. In the later stages, we plan to organize a **workshop** or a **special session** on optimization methods for feature selection at a highly recognized international conference, such as GECCO or PPSN. This event will allow us to share our scientific results with a wider audience and expand our collaborative scientific network. ### Work programme and timetable **`WP1` Contrasting feature selection under different ML models** This work package aims to clarify **how the choice of an ML model influences the difficulty of feature selection**. Our preliminary study analyzed feature selection across various classification datasets and ML models using landscape analysis, relating it to the performance of feature selection algorithms. We considered these state-of-the-art ML classification algorithms: k-nearest neighbors, support vector classification, logistic regression, decision tree, random forests, and naive Bayes. Our findings indicate that the difficulty is inherent to the landscape and **varies significantly across ML models**. We plan to extend this study along several research lines. First, we will consider more **datasets**, ML **models**, and feature selection **algorithms** for both classification and **regression** tasks. Second, our findings suggest that examining how landscape difficulty changes with the number of **classes** and **observations** in the dataset needs further exploration. Third, we believe our methodology could help formalize the success of identifying the global optimum for established feature selection algorithms, like SFS. Lastly, we aim to improve the **explainability** of feature selection by analyzing features most often chosen by wrapper methods. The last issue focuses on analyzing feature selection across ML models from an **algebraic** perspective using a **Walsh representation**. Walsh functions form an orthogonal basis that can represent any pseudo-Boolean function. A Walsh model for each landscape (combination of dataset, ML model, and accuracy score) can be constructed through regression from solution examples. This enables exploration of algebraic properties in terms of polynomial expansion, sparsity, and Walsh **spectral analysis**. On top of that, using Walsh representation for carefully-selected feature selection problems may provide a relevant optimization **benchmark** without requiring extensive ML model training. **`WP2` Characterizing feature selection under multiple scores** This work package focuses on the impact of the **score** used in feature selection, which measures the quality of ML model predictions. For example, classification accuracy is defined as the number of correct predictions divided by the total number of observations. Other classification scores include balanced accuracy, F1 score, precision, and recall. For regression, typical measures are the coefficient of determination (R²), mean squared error, and mean absolute error. Similar to the ML model being used, we aim to clarify **how the choice of this score alters the difficulty of feature selection**. In addition, previous studies revealed that feature selection problems often exhibit high **neutrality**, where different feature subsets yield the same prediction accuracy, forming **plateaus**. This significantly hinders the progress of feature selection algorithms, given that there is no criterion to distinguish among candidate solutions. However, it is worth noting that two solutions might have, say, equal classification accuracy but different F1 scores. Therefore, this work package will explore the benefits of using **tie-breaking rules** based on **multiple scores** to escape from plateaus. In fact, some methods for handling equivalent solutions have been proposed in the context of multi-objective feature selection. Indeed, maximizing prediction accuracy while minimizing the number of selected features are natural objectives in feature selection. Prediction accuracy can be further defined using different scores that may all be optimized simultaneously. This makes feature selection an inherent **multi-objective problem**, requiring careful consideration and balancing of these competing goals. Users then seek not one solution, but multiple optimal trade-offs between these objectives, allowing them to choose the solution that best matches their preferences. Therefore, it becomes essential to delve into the **landscape of multi-objective feature selection problems** in order to gain insights into the complex interactions between datasets, models, and **scores**. Grasping these compromises and interactions may lead to more effective feature selection methods, ultimately improving model performance and reducing computational costs. The partners have extensive experience in analyzing **multi-objective landscapes**. We are confident this exploration will enhance our knowledge and contribute to advancing techniques in various applications where feature selection is critical. Furthermore, beyond problem-solving, multi-objective feature selection may lead to new measures of **feature importance**, enhancing solution explainability and complementing existing measures from the ML literature. **`WP3` Scaling to large-size feature selection and performance prediction** An important limitation of existing landscape analyses for feature selection is that they require exhaustive enumeration of all feature subsets, which grows exponentially with the number of features. While this step is crucial for gaining unbiased knowledge of problem difficulties, it restricts studies to low-dimensional problems, i.e., datasets with typically fewer than 20 features. This work package aims to **address the challenges raised by problems with a larger number of features**, thus moving beyond complete enumeration. We plan to address this using advanced sampling techniques from landscape analysis, including **random and adaptive walks**, along with collecting the trace and **trajectory** from actual feature selection algorithms like local search. Since score measures are specific functions known as sub-modular set functions, we will also explore solution space sparsity by considering a small number of selected features for landscape sampling. This will help clarify the difficulty caused by the problem dimension in terms of the number of features. From a practical perspective, landscape sampling is necessary to **predict the performance** of feature selection algorithms and to automate the tedious task of choosing the most suitable algorithm for solving a new problem — a problem known as the **algorithm selection problem**. ML techniques can use landscape properties to predict the most efficient algorithm. The challenge is to design and analyze landscape properties with a focus on their computational efficiency to make them useful for automated algorithm selection. There are no examples of automated algorithm selection for feature selection, as its landscape itself has been scarcely studied. This work package first aims to clarify the usefulness of algorithm selection for feature selection, specifically examining whether there is any **complementarity among algorithms** in solving diverse instances. We will then evaluate how well the performance of each feature selection algorithm can be predicted and analyze the **impact of landscape properties** on performance prediction and selection. This should help identify **algorithm strengths and biases** and pinpoint feature selection problems where efficient algorithms are lacking and **new methods** are needed. **`WP4` Generalizing to pseudo-Boolean optimization** This work package aims to generalize the project’s methodology to other optimization tasks. Indeed, feature selection belongs to the class of pseudo-Boolean optimization problems. By clarifying the relationship between feature selection and pseudo-Boolean optimization, we could derive our findings and establish effective empirical rules. On the one hand, this work package will clarify the landscape of quadratic unconstrained binary optimization (**QUBO**) problems and explore solving them using combinatorial and numerical optimization methods. This is closely connected to the **PhD thesis** of Rita Arfoul (ULCO), set to begin in October 2024. QUBO problems have many applications and serve as input for **quantum optimization**. One focus is to develop and analyze new algorithms combining combinatorial and numerical optimization. Depending on whether it is formulated as a combinatorial or a continuous problem, it can be easier or harder to solve. Similar to previous work packages, landscape analysis will help characterize the difficulty induced by different formulations. Conversely, we will also investigate the potential of (approximately) **formulating feature selection problems as QUBO** models and compare the pros and cons of both approaches. On the other hand, **indicator-based subset selection** is a core issue in multi-objective optimization, involving the selection of the best subset from a large set of solutions to a multi-objective optimization problem. This process is useful not only as a post-processing phase but also at each iteration of multi-objective optimizers. Typically, such a set is maintained and iteratively improved by generating new solutions and discarding less interesting ones. This is the topic of Keisuke Korogi’s **Master thesis** (YNU), who is expected to continue into a PhD. Both French and Japanese team members have experience with indicator-based subset selection. This is a particularly challenging problem in combinatorial optimization, sensitive to the choice of indicator—much like how feature selection is sensitive to the chosen ML model and score. This work package further aims to address the **landscape analysis of indicator-based subset selection**. Given the low sparsity in this problem and feature selection, where the number of selected items is typically large, we expect similarities. We will contrast the problem's properties and difficulties under different multi-objective indicators. We also plan to improve the **explainability** of the subsets obtained with different indicators. **Timetable** - **`WP0`** (Month 1 to 24): Kick-off (T0), website & repository (T0+3), MoU (T0+12), event co-organization (T0+20) - **`WP1`** (Month 1 to 6): Tech. report (T0+6) - **`WP2`** (Month 6 to 18): Tech. report (T0+18) - **`WP3`** (Month 12 to 24): Tech. report (T0+24) - **`WP4`** (Month 1 to 24): Tech. reports (T0+12 & T0+24) *Technical reports are expected to be expanded into journal or conference papers.* ### Resources available for the implementation of the project In addition to our internal research funding, this project will enhance our ability to maintain technical exchanges through research visits, which are essential for the success of this research initiative. These exchanges will help us refine our research lines and share our collaborative progress. Each participating institution will provide desks, facilities, and accommodation assistance for visiting foreign members, ensuring a comfortable and productive stay. We also envision this project with a strong emphasis on **open data,** which is fundamental for promoting the dissemination and availability of publications, research data, and digital artifacts generated by the project in open access, thus enhancing their **replicability** and **reproducibility**. With the help of our institutions' technical teams, we plan to provide access to the code and data generated during the project. This will facilitate broader academic and practical applications, fostering an open and continuous research effort. ### Research infrastructures associated with the project This project does not require any particular materials or facilities apart from **computational resources**. Yokohama National University in Japan will provide extensive computing resources to project members. This includes access to around ten high-performance computers, each equipped with many-core CPUs capable of efficiently handling complex computational tasks. The University of Littoral Opal Coast in France offers researchers access to an advanced computing platform designed for scientific research projects that demand intensive computational power. This platform, named [CalcUlco](https://www-calculco.univ-littoral.fr/), optimizes the use of computing resources managed by the university's scientific computing division. The cluster currently features 1,624 cores distributed across 26 compute nodes, 12.66TB of RAM, and about 800TB of storage space. The total cumulative peak computing power is 180 TFlops. These infrastructures highlight our universities' commitment to supporting cutting-edge scientific research and fostering innovation through advanced computational resources. These resources will allow us to handle computation-intensive experiments and accommodate the vast amounts of data generated by this research. These infrastructures comply with sustainable development and energy transition policies, aiming to minimize environmental impact with innovative and efficient cooling solutions. They will significantly enhance the project's ability to conduct in-depth research and analysis. ### Participation and role of young researchers The financial support for this project will promote researcher mobility between France and Japan and enhance collaboration among all team members towards new scientific challenges. Participation in project **seminars** and **workshops** will provide the involved **students** and **young researchers** with the opportunity to develop high-level expertise in optimization and evolutionary computation, as well as machine learning and feature selection. One PhD student from France (starting in October 2024) and one Master student from Japan (expected to advance to the doctoral course during the project) will actively participate in the different work packages. The exchange program will allow them to work under the **co-supervision** of top-tier researchers. During their stays abroad, students will have the opportunity to broaden their scientific knowledge, experience new research organizations, and benefit from diverse cultures, aligning with the goals of this research program. **Joint PhD** opportunities between ULCO and YNU are currently being explored. This project aims to foster long-term student exchanges and offer researchers the chance to enhance their knowledge and establish a strong presence in the international scene of evolutionary computation and artificial intelligence. In particular, setting up a **Memorandum of Understanding** (MoU) for academic cooperation and exchange between the two universities is an integral part of our collaborative plan. ### Consideration of gender issues in the implementation of the project (participants and subject) We acknowledge that significant progress towards gender balance needs to be made in computer science and AI. Addressing gender issues in the context of this research project is crucial for fostering inclusivity and diversity and for creating an environment where everyone can thrive, regardless of gender. It is essential to actively engage women students and researchers, ensuring they have equal access to research opportunities and resources. As such, half of the French participants in this project (a PhD student and a young associate professor) are women. Our long-term goal is to engage female students, facilitating research access for women and improving gender parity. This initiative should start at an early age, and French participants are actively involved in sensitizing high school students, both male and female, to encourage them to pursue careers in research and academia and contribute to advancements in technology and innovation. We also plan to involve women as collaborators by seeking advice from renowned female researchers in the field, in particular in landscape analysis and feature selection. We plan to invite them as keynote speakers at co-organized workshops, which are scheduled for the second year of the project. We expect this will contribute to enhancing gender parity and highlighting role models for aspiring women scientists. In terms of research subjects, it is essential to consider gender-specific impacts and ensure that the research outcomes are beneficial and accessible to all genders. Additionally, we must address any potential disparities that may arise during the research process and actively work to eliminate them. This will contribute to making our scientific advancements more equitable and impactful. --- ## Perspectives ### Expected results Given our common research interests and complementary scientific expertise, we anticipate fruitful exchanges, discussions, and outcomes between the two partners. We have identified potential collaborations that align with the priorities and perspectives of both institutions. We are confident that our joint research program on emerging topics will lead to **high-impact outcomes**, including publications in top-tier venues and open access to code and data, ensuring the reproducibility of our research. The insights on **feature selection** we aim to reveal in this research will help design more efficient optimization algorithms and may fundamentally change how optimization is performed for this core issue in XAI. This is indeed a key problem in ML, as it directly impacts the performance, accuracy, and environmental impact of prediction models. By selecting the most relevant features, we can significantly reduce computational costs and improve model interpretability. The broader implications of this research are vast, potentially leading to breakthroughs in a **wide range of applications** that will be further discussed below. Beyond the expected research progress and impact foreseen by the consortium, this project will help **sustain** and **strengthen** the structure and institutionalization of our **bilateral international collaboration**. Our goal is to increase the visibility of these topics in the long run, fostering a deeper impact within the international community. Our current joint international experiences amplify our commitment to **expanding cooperation between France and Japan**. Finally, we expect the collaboration initiated during this project to be beneficial for both parties, strengthening our research networks and **international visibility**. We anticipate that the long-term outcomes of this project will significantly impact the international scientific community. Additionally, we envision this partnership as an opportunity to foster **long-term relationships** and to extend the cooperation between our two universities. As such, this initiative will allow us to further explore various opportunities for academic exchanges between France and Japan. ### European and/or International Perspectives Thanks to this project, we wish to promote and strengthen French/Japanese scientific cooperation by contributing to the mobility of researchers, thereby facilitating their ability to work across borders and exchange knowledge and expertise. We plan to leverage cooperation from this project and our involvement in international networks, such as the European COST Action "Randomized Optimization Algorithms Research Network" ([**ROAR-NET**](https://roar-net.eu/)), to establish a **collaborative project** at the French (or European) and Japanese levels. We aim to foster collaborative projects in the field of information and communication sciences and technologies, encouraging **joint research initiatives** that bring together researchers from both countries to innovate and advance these research fields. In addition, we aim to build a bridge between the research communities working on evolutionary computation in both countries. To this end, we plan to explore the connection between the French Association on Artificial Evolution ([**EA**](https://sites.google.com/view/artificial-evolution/)) and the Japanese Society for Evolutionary Computation ([**JPNSEC**](http://www.jpnsec.org/)), where some project members are actively involved. Both organizations have been instrumental in advancing the field of evolutionary computation in their respective countries. By fostering this connection, we hope to establish a joint **French/Japanese evolutionary computation community** where young and senior researchers, as well as research students, could meet during jointly organized events and share ideas, resources, and progress in the field. ### Industrial perspectives It is widely acknowledged that AI research has made remarkable progress in recent years. This can be attributed to two factors: the evolution of computational science and the high demand from society for technology to handle complex tasks beyond human capabilities. However, core **AI technologies** often suffer from poor explanations of their recommendations. This project precisely aims to enhance the **explainability** of AI systems by improving the performance and understanding of **feature selection**. Motivated by real-world applications identified during our academic and industrial collaborations, this research aims to tackle important issues in ML and XAI. Starting with the fundamental and methodological research proposed in this project, we plan to address complex optimization and ML tasks and develop AI algorithms for a **wide range of applications**. We have already discussed these opportunities and will incorporate them into our long-term cooperation. The results from this research are expected to be applied not only in information engineering, influencing areas such as data mining, pattern recognition, and bioinformatics, but also in various engineering fields, including electrical engineering, mechanical engineering, and aeronautics. For instance, **marine science** is a common denominator between ULCO and YNU due to their geographical location—both universities have a dedicated marine science department. Within this field, many applications involve various **classification tasks** in ML. Under these tasks, feature selection is a must to ensure accurate and meaningful results. More generally, **optimization** is at the core of many industrial applications, development, and innovation tasks. The French team has numerous **industrial contacts** specializing in solving optimization and ML problems, which often appear in the modeling of real-world applications. A significant opportunity for technology transfer is to consolidate the **software** resources for analyzing and solving the feature selection problem. This would provide a flexible and operational **toolbox** for a wide range of ML and XAI users. --- ## Teams ### Laboratory | | France | Japan | | --- | --- | --- | | Name | Laboratoire d'Informatique Signal et Image de la Côte d'Opale | Faculty of Environment and Information Sciences | | Acronym | LISIC | YNU | | Address | Maison de la Recherche Blaise Pascal, 50 rue Ferdinand Buisson | Environment and Information Sciences, Building 1 | | Postcode | 62228 | 240-8501 | | City | Calais | Yokohama | | Country | France | Japan | | Phone | +33321463653 | +81453394425 | | Website | http://lisic-prod.univ-littoral.fr/ | https://www.eis.ynu.ac.jp/english/ | | Email | mailto:gaelle.compiegne@univ-littoral.fr | mailto:kankyojoho@ynu.ac.jp | | Email DRI | mailto:international@univ-littoral.fr | xxx (no need) | | Name of the doctoral school | École doctorale en Sciences, Technologie, Santé (ED-STS) | xxx (no need) | | Number of the doctoral school | 585 | xxx (no need) | ### Laboratory Director | | France | Japan | | --- | --- | --- | | Civility | Mister | Mister | | Name | Verel | Mori | | First name | Sébastien | Tatsunori | | Nationality | France | Japan | | Email | mailto:verel@univ-littoral.fr | mailto:tmori@ynu.ac.jp | ### **Supervising institution** | | France | Japan | | --- | --- | --- | | Type | UNI | xxx (no need) | | Name | Université du Littoral Côte d'Opale | Yokohama National University | | Adress | 1 place de l'Yser 59375 Dunkerque FRA | 79-8, Tokiwadai, Hodogaya-ku, Yokohama, Kanagawa JPN (240-8501) | | Website | https://www.univ-littoral.fr/ | https://www.ynu.ac.jp/english/ | ### Members | Civility | Name | First name | Nationality | Email | Status | HDR | CV | Birth | Role | Participation | Trip | Thesis | Start date | Topic | Defense | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | M | Liefooghe | Arnaud | FR | mailto:arnaud.liefooghe@univ-littoral.fr | Professor | Y | todo | 13/01/1982 | Coord FR | 40% | Y | Y | 01/10/2007 | Multiobjective optimization | 2009 | | M | Verel | Sébastien | FR | mailto:verel@univ-littoral.fr | Professor | Y | todo | | Expert optim | 20% | Y | Y | | Landscape analysis | | | W | Tari (TBC) | Sara | FR | | Associate Professor | N | todo | | Expert LA | 20% | Y | Y | | Landscape analysis | | | W | Arfoul | Rita | FR | | PhD student | N | todo | | Student | 20% | Y | Y | | Pseudo-Boolean optimization | | | M | Tanabe | Ryoji | JP | mailto:tanabe-ryoji-sn@ynu.ac.jp | Associate Professor | N | todo | | Coord JP | | Y | Y | | | | | M | Shirakawa | Shinichi | JP | | Professor | | | | | | Y | Y | | | | | M | Korogi | Keisuke | JP | | Master student | N | | | | | Y | Y | | | | --- ## Budgets ### FR — Budget requested 2025 (**Mobilities requested**) - Arnaud Liefooghe, 6 days (3080 EUR) - Sébastien Verel, 6 days (3080 EUR) ### FR — Budget requested 2026 (**Mobilities requested**) - Sara Tari, 6 days (3080 EUR) - PhD student, 6 days (3080 EUR) ### JP — Budget requested 2025 (**Mobilities requested**) - Ryoji Tanabe, 6 days - Keisuke Korogi, 6 days ### JP — Budget requested 2026 (**Mobilities requested**) - Ryoji Tanabe, 6 days - Shinichi Shirakawa, 6 days ### **Other funding requested** - Internal funding (ULCO / LISIC) [BQI] ⇒ 2000 EUR