We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 198 entries: 1-198 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 25 Apr 24

[1]  arXiv:2404.15379 [pdf, other]
Title: Clustering of timed sequences -- Application to the analysis of care pathways
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Improving the future of healthcare starts by better understanding the current actual practices in hospitals. This motivates the objective of discovering typical care pathways from patient data. Revealing homogeneous groups of care pathways can be achieved through clustering. The difficulty in clustering care pathways, represented by sequences of timestamped events, lies in defining a semantically appropriate metric and clustering algorithms.
In this article, we adapt two methods developed for time series to time sequences: the drop-DTW metric and the DBA approach for the construction of averaged time sequences. These methods are then applied in clustering algorithms to propose original and sound clustering algorithms for timed sequences.
This approach is experimented with and evaluated on synthetic and real use cases.

[2]  arXiv:2404.15380 [pdf, other]
Title: ControlTraj: Controllable Trajectory Generation with Topology-Constrained Diffusion Model
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Generating trajectory data is among promising solutions to addressing privacy concerns, collection costs, and proprietary restrictions usually associated with human mobility analyses. However, existing trajectory generation methods are still in their infancy due to the inherent diversity and unpredictability of human activities, grappling with issues such as fidelity, flexibility, and generalizability. To overcome these obstacles, we propose ControlTraj, a Controllable Trajectory generation framework with the topology-constrained diffusion model. Distinct from prior approaches, ControlTraj utilizes a diffusion model to generate high-fidelity trajectories while integrating the structural constraints of road network topology to guide the geographical outcomes. Specifically, we develop a novel road segment autoencoder to extract fine-grained road segment embedding. The encoded features, along with trip attributes, are subsequently merged into the proposed geographic denoising UNet architecture, named GeoUNet, to synthesize geographic trajectories from white noise. Through experimentation across three real-world data settings, ControlTraj demonstrates its ability to produce human-directed, high-fidelity trajectory generation with adaptability to unexplored geographical contexts.

[3]  arXiv:2404.15381 [pdf, other]
Title: Advances and Open Challenges in Federated Learning with Foundation Models
Comments: Survey of Federated Foundation Models (FedFM)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The integration of Foundation Models (FMs) with Federated Learning (FL) presents a transformative paradigm in Artificial Intelligence (AI), offering enhanced capabilities while addressing concerns of privacy, data decentralization, and computational efficiency. This paper provides a comprehensive survey of the emerging field of Federated Foundation Models (FedFM), elucidating their synergistic relationship and exploring novel methodologies, challenges, and future directions that the FL research field needs to focus on in order to thrive in the age of foundation models. A systematic multi-tiered taxonomy is proposed, categorizing existing FedFM approaches for model training, aggregation, trustworthiness, and incentivization. Key challenges, including how to enable FL to deal with high complexity of computational demands, privacy considerations, contribution evaluation, and communication efficiency, are thoroughly discussed. Moreover, the paper explores the intricate challenges of communication, scalability and security inherent in training/fine-tuning FMs via FL, highlighting the potential of quantum computing to revolutionize the training, inference, optimization and data encryption processes. This survey underscores the importance of further research to propel innovation in FedFM, emphasizing the need for developing trustworthy solutions. It serves as a foundational guide for researchers and practitioners interested in contributing to this interdisciplinary and rapidly advancing field.

[4]  arXiv:2404.15382 [pdf, other]
Title: Feature Distribution Shift Mitigation with Contrastive Pretraining for Intrusion Detection
Comments: accepted by ICMLCN24
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)

In recent years, there has been a growing interest in using Machine Learning (ML), especially Deep Learning (DL) to solve Network Intrusion Detection (NID) problems. However, the feature distribution shift problem remains a difficulty, because the change in features' distributions over time negatively impacts the model's performance. As one promising solution, model pretraining has emerged as a novel training paradigm, which brings robustness against feature distribution shift and has proven to be successful in Computer Vision (CV) and Natural Language Processing (NLP). To verify whether this paradigm is beneficial for NID problem, we propose SwapCon, a ML model in the context of NID, which compresses shift-invariant feature information during the pretraining stage and refines during the finetuning stage. We exemplify the evidence of feature distribution shift using the Kyoto2006+ dataset. We demonstrate how pretraining a model with the proper size can increase robustness against feature distribution shifts by over 8%. Moreover, we show how an adequate numerical embedding strategy also enhances the performance of pretrained models. Further experiments show that the proposed SwapCon model also outperforms eXtreme Gradient Boosting (XGBoost) and K-Nearest Neighbor (KNN) based models by a large margin.

[5]  arXiv:2404.15384 [pdf, other]
Title: FL-TAC: Enhanced Fine-Tuning in Federated Learning via Low-Rank, Task-Specific Adapter Clustering
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Although large-scale pre-trained models hold great potential for adapting to downstream tasks through fine-tuning, the performance of such fine-tuned models is often limited by the difficulty of collecting sufficient high-quality, task-specific data. Federated Learning (FL) offers a promising solution by enabling fine-tuning across large-scale clients with a variety of task data, but it is bottlenecked by significant communication overhead due to the pre-trained models' extensive size. This paper addresses the high communication cost for fine-tuning large pre-trained models within FL frameworks through low-rank fine-tuning. Specifically, we train a low-rank adapter for each individual task on the client side, followed by server-side clustering for similar group of adapters to achieve task-specific aggregation. Extensive experiments on various language and vision tasks, such as GLUE and CIFAR-10/100, reveal the evolution of task-specific adapters throughout the FL training process and verify the effectiveness of the proposed low-rank task-specific adapter clustering (TAC) method.

[6]  arXiv:2404.15386 [pdf, ps, other]
Title: Large-Scale Multipurpose Benchmark Datasets For Assessing Data-Driven Deep Learning Approaches For Water Distribution Networks
Comments: Presented at WDSA CCWI, Ferrara, Italy, July 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Currently, the number of common benchmark datasets that researchers can use straight away for assessing data-driven deep learning approaches is very limited. Most studies provide data as configuration files. It is still up to each practitioner to follow a particular data generation method and run computationally intensive simulations to obtain usable data for model training and evaluation. In this work, we provide a collection of datasets that includes several small and medium size publicly available Water Distribution Networks (WDNs), including Anytown, Modena, Balerma, C-Town, D-Town, L-Town, Ky1, Ky6, Ky8, and Ky13. In total 1,394,400 hours of WDNs data operating under normal conditions is made available to the community.

[7]  arXiv:2404.15388 [pdf, other]
Title: ML-based identification of the interface regions for coupling local and nonlocal models
Comments: 23 pages, 14 figures, research paper
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Local-nonlocal coupling approaches combine the computational efficiency of local models and the accuracy of nonlocal models. However, the coupling process is challenging, requiring expertise to identify the interface between local and nonlocal regions. This study introduces a machine learning-based approach to automatically detect the regions in which the local and nonlocal models should be used in a coupling approach. This identification process uses the loading functions and provides as output the selected model at the grid points. Training is based on datasets of loading functions for which reference coupling configurations are computed using accurate coupled solutions, where accuracy is measured in terms of the relative error between the solution to the coupling approach and the solution to the nonlocal model. We study two approaches that differ from one another in terms of the data structure. The first approach, referred to as the full-domain input data approach, inputs the full load vector and outputs a full label vector. In this case, the classification process is carried out globally. The second approach consists of a window-based approach, where loads are preprocessed and partitioned into windows and the problem is formulated as a node-wise classification approach in which the central point of each window is treated individually. The classification problems are solved via deep learning algorithms based on convolutional neural networks. The performance of these approaches is studied on one-dimensional numerical examples using F1-scores and accuracy metrics. In particular, it is shown that the windowing approach provides promising results, achieving an accuracy of 0.96 and an F1-score of 0.97. These results underscore the potential of the approach to automate coupling processes, leading to more accurate and computationally efficient solutions for material science applications.

[8]  arXiv:2404.15390 [pdf, other]
Title: Uncertainty in latent representations of variational autoencoders optimized for visual tasks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Deep learning methods are increasingly becoming instrumental as modeling tools in computational neuroscience, employing optimality principles to build bridges between neural responses and perception or behavior. Developing models that adequately represent uncertainty is however challenging for deep learning methods, which often suffer from calibration problems. This constitutes a difficulty in particular when modeling cortical circuits in terms of Bayesian inference, beyond single point estimates such as the posterior mean or the maximum a posteriori. In this work we systematically studied uncertainty representations in latent representations of variational auto-encoders (VAEs), both in a perceptual task from natural images and in two other canonical tasks of computer vision, finding a poor alignment between uncertainty and informativeness or ambiguities in the images. We next showed how a novel approach which we call explaining-away variational auto-encoders (EA-VAEs), fixes these issues, producing meaningful reports of uncertainty in a variety of scenarios, including interpolation, image corruption, and even out-of-distribution detection. We show EA-VAEs may prove useful both as models of perception in computational neuroscience and as inference tools in computer vision.

[9]  arXiv:2404.15392 [pdf, ps, other]
Title: Naïve Bayes and Random Forest for Crop Yield Prediction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

This study analyzes crop yield prediction in India from 1997 to 2020, focusing on various crops and key environmental factors. It aims to predict agricultural yields by utilizing advanced machine learning techniques like Linear Regression, Decision Tree, KNN, Na\"ive Bayes, K-Mean Clustering, and Random Forest. The models, particularly Na\"ive Bayes and Random Forest, demonstrate high effectiveness, as shown through data visualizations. The research concludes that integrating these analytical methods significantly enhances the accuracy and reliability of crop yield predictions, offering vital contributions to agricultural data science.

[10]  arXiv:2404.15409 [pdf, ps, other]
Title: Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares
Comments: 42 pages, 3 figures
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

We present a sample- and time-efficient differentially private algorithm for ordinary least squares, with error that depends linearly on the dimension and is independent of the condition number of $X^\top X$, where $X$ is the design matrix. All prior private algorithms for this task require either $d^{3/2}$ examples, error growing polynomially with the condition number, or exponential time. Our near-optimal accuracy guarantee holds for any dataset with bounded statistical leverage and bounded residuals. Technically, we build on the approach of Brown et al. (2023) for private mean estimation, adding scaled noise to a carefully designed stable nonprivate estimator of the empirical regression vector.

[11]  arXiv:2404.15417 [pdf, other]
Title: The Power of Resets in Online Reinforcement Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Simulators are a pervasive tool in reinforcement learning, but most existing algorithms cannot efficiently exploit simulator access -- particularly in high-dimensional domains that require general function approximation. We explore the power of simulators through online reinforcement learning with {local simulator access} (or, local planning), an RL protocol where the agent is allowed to reset to previously observed states and follow their dynamics during training. We use local simulator access to unlock new statistical guarantees that were previously out of reach:
- We show that MDPs with low coverability Xie et al. 2023 -- a general structural condition that subsumes Block MDPs and Low-Rank MDPs -- can be learned in a sample-efficient fashion with only $Q^{\star}$-realizability (realizability of the optimal state-value function); existing online RL algorithms require significantly stronger representation conditions.
- As a consequence, we show that the notorious Exogenous Block MDP problem Efroni et al. 2022 is tractable under local simulator access.
The results above are achieved through a computationally inefficient algorithm. We complement them with a more computationally efficient algorithm, RVFS (Recursive Value Function Search), which achieves provable sample complexity guarantees under a strengthened statistical assumption known as pushforward coverability. RVFS can be viewed as a principled, provable counterpart to a successful empirical paradigm that combines recursive search (e.g., MCTS) with value function approximation.

[12]  arXiv:2404.15418 [pdf, other]
Title: Machine Learning Techniques with Fairness for Prediction of Completion of Drug and Alcohol Rehabilitation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

The aim of this study is to look at predicting whether a person will complete a drug and alcohol rehabilitation program and the number of times a person attends. The study is based on demographic data obtained from Substance Abuse and Mental Health Services Administration (SAMHSA) from both admissions and discharge data from drug and alcohol rehabilitation centers in Oklahoma. Demographic data is highly categorical which led to binary encoding being used and various fairness measures being utilized to mitigate bias of nine demographic variables. Kernel methods such as linear, polynomial, sigmoid, and radial basis functions were compared using support vector machines at various parameter ranges to find the optimal values. These were then compared to methods such as decision trees, random forests, and neural networks. Synthetic Minority Oversampling Technique Nominal (SMOTEN) for categorical data was used to balance the data with imputation for missing data. The nine bias variables were then intersectionalized to mitigate bias and the dual and triple interactions were integrated to use the probabilities to look at worst case ratio fairness mitigation. Disparate Impact, Statistical Parity difference, Conditional Statistical Parity Ratio, Demographic Parity, Demographic Parity Ratio, Equalized Odds, Equalized Odds Ratio, Equal Opportunity, and Equalized Opportunity Ratio were all explored at both the binary and multiclass scenarios.

[13]  arXiv:2404.15471 [pdf, other]
Title: Training all-mechanical neural networks for task learning through in situ backpropagation
Comments: 25 pages, 5 figures
Subjects: Machine Learning (cs.LG); Applied Physics (physics.app-ph)

Recent advances unveiled physical neural networks as promising machine learning platforms, offering faster and more energy-efficient information processing. Compared with extensively-studied optical neural networks, the development of mechanical neural networks (MNNs) remains nascent and faces significant challenges, including heavy computational demands and learning with approximate gradients. Here, we introduce the mechanical analogue of in situ backpropagation to enable highly efficient training of MNNs. We demonstrate that the exact gradient can be obtained locally in MNNs, enabling learning through their immediate vicinity. With the gradient information, we showcase the successful training of MNNs for behavior learning and machine learning tasks, achieving high accuracy in regression and classification. Furthermore, we present the retrainability of MNNs involving task-switching and damage, demonstrating the resilience. Our findings, which integrate the theory for training MNNs and experimental and numerical validations, pave the way for mechanical machine learning hardware and autonomous self-learning material systems.

[14]  arXiv:2404.15503 [pdf, other]
Title: FedGreen: Carbon-aware Federated Learning with Model Size Adaptation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

Federated learning (FL) provides a promising collaborative framework to build a model from distributed clients, and this work investigates the carbon emission of the FL process. Cloud and edge servers hosting FL clients may exhibit diverse carbon footprints influenced by their geographical locations with varying power sources, offering opportunities to reduce carbon emissions by training local models with adaptive computations and communications. In this paper, we propose FedGreen, a carbon-aware FL approach to efficiently train models by adopting adaptive model sizes shared with clients based on their carbon profiles and locations using ordered dropout as a model compression technique. We theoretically analyze the trade-offs between the produced carbon emissions and the convergence accuracy, considering the carbon intensity discrepancy across countries to choose the parameters optimally. Empirical studies show that FedGreen can substantially reduce the carbon footprints of FL compared to the state-of-the-art while maintaining competitive model accuracy.

[15]  arXiv:2404.15518 [pdf, other]
Title: An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i.i.d.) following an unknown probability distribution. This paper presents a contrasting viewpoint, perceiving data points as interconnected and employing a Markov reward process (MRP) for data modeling. We reformulate the typical supervised learning as an on-policy policy evaluation problem within reinforcement learning (RL), introducing a generalized temporal difference (TD) learning algorithm as a resolution. Theoretically, our analysis draws connections between the solutions of linear TD learning and ordinary least squares (OLS). We also show that under specific conditions, particularly when noises are correlated, the TD's solution proves to be a more effective estimator than OLS. Furthermore, we establish the convergence of our generalized TD algorithms under linear function approximation. Empirical studies verify our theoretical results, examine the vital design of our TD algorithm and show practical utility across various datasets, encompassing tasks such as regression and image classification with deep learning.

[16]  arXiv:2404.15585 [pdf, other]
Title: Brain Storm Optimization Based Swarm Learning for Diabetic Retinopathy Image Classification
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV)

The application of deep learning techniques to medical problems has garnered widespread research interest in recent years, such as applying convolutional neural networks to medical image classification tasks. However, data in the medical field is often highly private, preventing different hospitals from sharing data to train an accurate model. Federated learning, as a privacy-preserving machine learning architecture, has shown promising performance in balancing data privacy and model utility by keeping private data on the client's side and using a central server to coordinate a set of clients for model training through aggregating their uploaded model parameters. Yet, this architecture heavily relies on a trusted third-party server, which is challenging to achieve in real life. Swarm learning, as a specialized decentralized federated learning architecture that does not require a central server, utilizes blockchain technology to enable direct parameter exchanges between clients. However, the mining of blocks requires significant computational resources, limiting its scalability. To address this issue, this paper integrates the brain storm optimization algorithm into the swarm learning framework, named BSO-SL. This approach clusters similar clients into different groups based on their model distributions. Additionally, leveraging the architecture of BSO, clients are given the probability to engage in collaborative learning both within their cluster and with clients outside their cluster, preventing the model from converging to local optima. The proposed method has been validated on a real-world diabetic retinopathy image classification dataset, and the experimental results demonstrate the effectiveness of the proposed approach.

[17]  arXiv:2404.15593 [pdf, other]
Title: A Survey of Deep Long-Tail Classification Advancements
Authors: Charika de Alvis (The University of Sydney, Australia), Suranga Seneviratne (The University of Sydney, Australia)
Subjects: Machine Learning (cs.LG)

Many data distributions in the real world are hardly uniform. Instead, skewed and long-tailed distributions of various kinds are commonly observed. This poses an interesting problem for machine learning, where most algorithms assume or work well with uniformly distributed data. The problem is further exacerbated by current state-of-the-art deep learning models requiring large volumes of training data. As such, learning from imbalanced data remains a challenging research problem and a problem that must be solved as we move towards more real-world applications of deep learning. In the context of class imbalance, state-of-the-art (SOTA) accuracies on standard benchmark datasets for classification typically fall less than 75%, even for less challenging datasets such as CIFAR100. Nonetheless, there has been progress in this niche area of deep learning. To this end, in this survey, we provide a taxonomy of various methods proposed for addressing the problem of long-tail classification, focusing on works that happened in the last few years under a single mathematical framework. We also discuss standard performance metrics, convergence studies, feature distribution and classifier analysis. We also provide a quantitative comparison of the performance of different SOTA methods and conclude the survey by discussing the remaining challenges and future research direction.

[18]  arXiv:2404.15595 [pdf, other]
Title: Variational Deep Survival Machines: Survival Regression with Censored Outcomes
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)

Survival regression aims to predict the time when an event of interest will take place, typically a death or a failure. A fully parametric method [18] is proposed to estimate the survival function as a mixture of individual parametric distributions in the presence of censoring. In this paper, We present a novel method to predict the survival time by better clustering the survival data and combine primitive distributions. We propose two variants of variational auto-encoder (VAE), discrete and continuous, to generate the latent variables for clustering input covariates. The model is trained end to end by jointly optimizing the VAE loss and regression loss. Thorough experiments on dataset SUPPORT and FLCHAIN show that our method can effectively improve the clustering result and reach competitive scores with previous methods. We demonstrate the superior result of our model prediction in the long-term. Our code is available at https://github.com/qinzzz/auton-survival-785.

[19]  arXiv:2404.15598 [pdf, other]
Title: Federated Learning with Only Positive Labels by Exploring Label Correlations
Comments: To be published in IEEE Transactions on Neural Networks and Learning Systems
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Federated learning aims to collaboratively learn a model by using the data from multiple users under privacy constraints. In this paper, we study the multi-label classification problem under the federated learning setting, where trivial solution and extremely poor performance may be obtained, especially when only positive data w.r.t. a single class label are provided for each client. This issue can be addressed by adding a specially designed regularizer on the server-side. Although effective sometimes, the label correlations are simply ignored and thus sub-optimal performance may be obtained. Besides, it is expensive and unsafe to exchange user's private embeddings between server and clients frequently, especially when training model in the contrastive way. To remedy these drawbacks, we propose a novel and generic method termed Federated Averaging by exploring Label Correlations (FedALC). Specifically, FedALC estimates the label correlations in the class embedding learning for different label pairs and utilizes it to improve the model training. To further improve the safety and also reduce the communication overhead, we propose a variant to learn fixed class embedding for each client, so that the server and clients only need to exchange class embeddings once. Extensive experiments on multiple popular datasets demonstrate that our FedALC can significantly outperform existing counterparts.

[20]  arXiv:2404.15617 [pdf, other]
Title: DPO: Differential reinforcement learning with application to optimal configuration search
Comments: 24 pages, 1 figure, 2 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Statistics Theory (math.ST)

Reinforcement learning (RL) with continuous state and action spaces remains one of the most challenging problems within the field. Most current learning methods focus on integral identities such as value functions to derive an optimal strategy for the learning agent. In this paper, we instead study the dual form of the original RL formulation to propose the first differential RL framework that can handle settings with limited training samples and short-length episodes. Our approach introduces Differential Policy Optimization (DPO), a pointwise and stage-wise iteration method that optimizes policies encoded by local-movement operators. We prove a pointwise convergence estimate for DPO and provide a regret bound comparable with current theoretical works. Such pointwise estimate ensures that the learned policy matches the optimal path uniformly across different steps. We then apply DPO to a class of practical RL problems which search for optimal configurations with Lagrangian rewards. DPO is easy to implement, scalable, and shows competitive results on benchmarking experiments against several popular RL methods.

[21]  arXiv:2404.15622 [pdf, other]
Title: FR-NAS: Forward-and-Reverse Graph Predictor for Efficient Neural Architecture Search
Comments: IJCNN'24
Subjects: Machine Learning (cs.LG)

Neural Architecture Search (NAS) has emerged as a key tool in identifying optimal configurations of deep neural networks tailored to specific tasks. However, training and assessing numerous architectures introduces considerable computational overhead. One method to mitigating this is through performance predictors, which offer a means to estimate the potential of an architecture without exhaustive training. Given that neural architectures fundamentally resemble Directed Acyclic Graphs (DAGs), Graph Neural Networks (GNNs) become an apparent choice for such predictive tasks. Nevertheless, the scarcity of training data can impact the precision of GNN-based predictors. To address this, we introduce a novel GNN predictor for NAS. This predictor renders neural architectures into vector representations by combining both the conventional and inverse graph views. Additionally, we incorporate a customized training loss within the GNN predictor to ensure efficient utilization of both types of representations. We subsequently assessed our method through experiments on benchmark datasets including NAS-Bench-101, NAS-Bench-201, and the DARTS search space, with a training dataset ranging from 50 to 400 samples. Benchmarked against leading GNN predictors, the experimental results showcase a significant improvement in prediction accuracy, with a 3%--16% increase in Kendall-tau correlation. Source codes are available at https://github.com/EMI-Group/fr-nas.

[22]  arXiv:2404.15625 [pdf, other]
Title: Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models
Comments: 11 pages,10 figures
Subjects: Machine Learning (cs.LG)

The open-world test dataset is often mixed with out-of-distribution (OOD) samples, where the deployed models will struggle to make accurate predictions. Traditional detection methods need to trade off OOD detection and in-distribution (ID) classification performance since they share the same representation learning model. In this work, we propose to detect OOD molecules by adopting an auxiliary diffusion model-based framework, which compares similarities between input molecules and reconstructed graphs. Due to the generative bias towards reconstructing ID training samples, the similarity scores of OOD molecules will be much lower to facilitate detection. Although it is conceptually simple, extending this vanilla framework to practical detection applications is still limited by two significant challenges. First, the popular similarity metrics based on Euclidian distance fail to consider the complex graph structure. Second, the generative model involving iterative denoising steps is time-consuming especially when it runs on the enormous pool of drugs. To address these challenges, our research pioneers an approach of Prototypical Graph Reconstruction for Molecular OOD Detection, dubbed as PGR-MOOD and hinges on three innovations: i) An effective metric to comprehensively quantify the matching degree of input and reconstructed molecules; ii) A creative graph generator to construct prototypical graphs that are in line with ID but away from OOD; iii) An efficient and scalable OOD detector to compare the similarity between test samples and pre-constructed prototypical graphs and omit the generative process on every new molecule. Extensive experiments on ten benchmark datasets and six baselines are conducted to demonstrate our superiority.

[23]  arXiv:2404.15656 [pdf, other]
Title: MISLEAD: Manipulating Importance of Selected features for Learning Epsilon in Evasion Attack Deception
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Emerging vulnerabilities in machine learning (ML) models due to adversarial attacks raise concerns about their reliability. Specifically, evasion attacks manipulate models by introducing precise perturbations to input data, causing erroneous predictions. To address this, we propose a methodology combining SHapley Additive exPlanations (SHAP) for feature importance analysis with an innovative Optimal Epsilon technique for conducting evasion attacks. Our approach begins with SHAP-based analysis to understand model vulnerabilities, crucial for devising targeted evasion strategies. The Optimal Epsilon technique, employing a Binary Search algorithm, efficiently determines the minimum epsilon needed for successful evasion. Evaluation across diverse machine learning architectures demonstrates the technique's precision in generating adversarial samples, underscoring its efficacy in manipulating model outcomes. This study emphasizes the critical importance of continuous assessment and monitoring to identify and mitigate potential security risks in machine learning systems.

[24]  arXiv:2404.15657 [pdf, other]
Title: FedSI: Federated Subnetwork Inference for Efficient Uncertainty Quantification
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

While deep neural networks (DNNs) based personalized federated learning (PFL) is demanding for addressing data heterogeneity and shows promising performance, existing methods for federated learning (FL) suffer from efficient systematic uncertainty quantification. The Bayesian DNNs-based PFL is usually questioned of either over-simplified model structures or high computational and memory costs. In this paper, we introduce FedSI, a novel Bayesian DNNs-based subnetwork inference PFL framework. FedSI is simple and scalable by leveraging Bayesian methods to incorporate systematic uncertainties effectively. It implements a client-specific subnetwork inference mechanism, selects network parameters with large variance to be inferred through posterior distributions, and fixes the rest as deterministic ones. FedSI achieves fast and scalable inference while preserving the systematic uncertainties to the fullest extent. Extensive experiments on three different benchmark datasets demonstrate that FedSI outperforms existing Bayesian and non-Bayesian FL baselines in heterogeneous FL scenarios.

[25]  arXiv:2404.15673 [pdf, other]
Title: Augmented CARDS: A machine learning approach to identifying triggers of climate change misinformation on Twitter
Subjects: Machine Learning (cs.LG)

Misinformation about climate change poses a significant threat to societal well-being, prompting the urgent need for effective mitigation strategies. However, the rapid proliferation of online misinformation on social media platforms outpaces the ability of fact-checkers to debunk false claims. Automated detection of climate change misinformation offers a promising solution. In this study, we address this gap by developing a two-step hierarchical model, the Augmented CARDS model, specifically designed for detecting contrarian climate claims on Twitter. Furthermore, we apply the Augmented CARDS model to five million climate-themed tweets over a six-month period in 2022. We find that over half of contrarian climate claims on Twitter involve attacks on climate actors or conspiracy theories. Spikes in climate contrarianism coincide with one of four stimuli: political events, natural events, contrarian influencers, or convinced influencers. Implications for automated responses to climate misinformation are discussed.

[26]  arXiv:2404.15691 [pdf, other]
Title: Long-term Off-Policy Evaluation and Learning
Comments: TheWebConference 2024
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Short- and long-term outcomes of an algorithm often differ, with damaging downstream effects. A known example is a click-bait algorithm, which may increase short-term clicks but damage long-term user engagement. A possible solution to estimate the long-term outcome is to run an online experiment or A/B test for the potential algorithms, but it takes months or even longer to observe the long-term outcomes of interest, making the algorithm selection process unacceptably slow. This work thus studies the problem of feasibly yet accurately estimating the long-term outcome of an algorithm using only historical and short-term experiment data. Existing approaches to this problem either need a restrictive assumption about the short-term outcomes called surrogacy or cannot effectively use short-term outcomes, which is inefficient. Therefore, we propose a new framework called Long-term Off-Policy Evaluation (LOPE), which is based on reward function decomposition. LOPE works under a more relaxed assumption than surrogacy and effectively leverages short-term rewards to substantially reduce the variance. Synthetic experiments show that LOPE outperforms existing approaches particularly when surrogacy is severely violated and the long-term reward is noisy. In addition, real-world experiments on large-scale A/B test data collected on a music streaming platform show that LOPE can estimate the long-term outcome of actual algorithms more accurately than existing feasible methods.

[27]  arXiv:2404.15692 [pdf, other]
Title: Deep Learning for Accelerated and Robust MRI Reconstruction: a Review
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Deep learning (DL) has recently emerged as a pivotal technology for enhancing magnetic resonance imaging (MRI), a critical tool in diagnostic radiology. This review paper provides a comprehensive overview of recent advances in DL for MRI reconstruction. It focuses on DL approaches and architectures designed to improve image quality, accelerate scans, and address data-related challenges. These include end-to-end neural networks, pre-trained networks, generative models, and self-supervised methods. The paper also discusses the role of DL in optimizing acquisition protocols, enhancing robustness against distribution shifts, and tackling subtle bias. Drawing on the extensive literature and practical insights, it outlines current successes, limitations, and future directions for leveraging DL in MRI reconstruction, while emphasizing the potential of DL to significantly impact clinical imaging practices.

[28]  arXiv:2404.15704 [pdf, other]
Title: Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning
Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance. Although multi-model fusion (MMF) can mitigate some of these issues, redundancy in learned representations may limits improvements. To this end, we propose an adversarial complementary representation learning (ACoRL) framework that enables newly trained models to avoid previously acquired knowledge, allowing each individual component model to learn maximally distinct, complementary representations. We make three detailed explanations of why this works and experimental results demonstrate that our method more efficiently improves performance compared to traditional MMF. Furthermore, attribution analysis validates the model trained under ACoRL acquires more complementary knowledge, highlighting the efficacy of our approach in enhancing efficiency and robustness across tasks.

[29]  arXiv:2404.15729 [pdf, other]
Title: Gradformer: Graph Transformer with Exponential Decay
Comments: 9 pages, 7 figures. Accepted by IJCAI 2024
Subjects: Machine Learning (cs.LG)

Graph Transformers (GTs) have demonstrated their advantages across a wide range of tasks. However, the self-attention mechanism in GTs overlooks the graph's inductive biases, particularly biases related to structure, which are crucial for the graph tasks. Although some methods utilize positional encoding and attention bias to model inductive biases, their effectiveness is still suboptimal analytically. Therefore, this paper presents Gradformer, a method innovatively integrating GT with the intrinsic inductive bias by applying an exponential decay mask to the attention matrix. Specifically, the values in the decay mask matrix diminish exponentially, correlating with the decreasing node proximities within the graph structure. This design enables Gradformer to retain its ability to capture information from distant nodes while focusing on the graph's local details. Furthermore, Gradformer introduces a learnable constraint into the decay mask, allowing different attention heads to learn distinct decay masks. Such an design diversifies the attention heads, enabling a more effective assimilation of diverse structural information within the graph. Extensive experiments on various benchmarks demonstrate that Gradformer consistently outperforms the Graph Neural Network and GT baseline models in various graph classification and regression tasks. Additionally, Gradformer has proven to be an effective method for training deep GT models, maintaining or even enhancing accuracy compared to shallow models as the network deepens, in contrast to the significant accuracy drop observed in other GT models.Codes are available at \url{https://github.com/LiuChuang0059/Gradformer}.

[30]  arXiv:2404.15731 [pdf, other]
Title: MD-NOMAD: Mixture density nonlinear manifold decoder for emulating stochastic differential equations and uncertainty propagation
Subjects: Machine Learning (cs.LG)

We propose a neural operator framework, termed mixture density nonlinear manifold decoder (MD-NOMAD), for stochastic simulators. Our approach leverages an amalgamation of the pointwise operator learning neural architecture nonlinear manifold decoder (NOMAD) with mixture density-based methods to estimate conditional probability distributions for stochastic output functions. MD-NOMAD harnesses the ability of probabilistic mixture models to estimate complex probability and the high-dimensional scalability of pointwise neural operator NOMAD. We conduct empirical assessments on a wide array of stochastic ordinary and partial differential equations and present the corresponding results, which highlight the performance of the proposed framework.

[31]  arXiv:2404.15744 [pdf, other]
Title: A General Black-box Adversarial Attack on Graph-based Fake News Detectors
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Graph Neural Network (GNN)-based fake news detectors apply various methods to construct graphs, aiming to learn distinctive news embeddings for classification. Since the construction details are unknown for attackers in a black-box scenario, it is unrealistic to conduct the classical adversarial attacks that require a specific adjacency matrix. In this paper, we propose the first general black-box adversarial attack framework, i.e., General Attack via Fake Social Interaction (GAFSI), against detectors based on different graph structures. Specifically, as sharing is an important social interaction for GNN-based fake news detectors to construct the graph, we simulate sharing behaviors to fool the detectors. Firstly, we propose a fraudster selection module to select engaged users leveraging local and global information. In addition, a post injection module guides the selected users to create shared relations by sending posts. The sharing records will be added to the social context, leading to a general attack against different detectors. Experimental results on empirical datasets demonstrate the effectiveness of GAFSI.

[32]  arXiv:2404.15757 [pdf, other]
Title: Exploring Machine Learning Algorithms for Infection Detection Using GC-IMS Data: A Preliminary Study
Comments: EEITE 2024, IEEE Conference
Subjects: Machine Learning (cs.LG)

The developing field of enhanced diagnostic techniques in the diagnosis of infectious diseases, constitutes a crucial domain in modern healthcare. By utilizing Gas Chromatography-Ion Mobility Spectrometry (GC-IMS) data and incorporating machine learning algorithms into one platform, our research aims to tackle the ongoing issue of precise infection identification. Inspired by these difficulties, our goals consist of creating a strong data analytics process, enhancing machine learning (ML) models, and performing thorough validation for clinical applications. Our research contributes to the emerging field of advanced diagnostic technologies by integrating Gas Chromatography-Ion Mobility Spectrometry (GC-IMS) data and machine learning algorithms within a unified Laboratory Information Management System (LIMS) platform. Preliminary trials demonstrate encouraging levels of accuracy when employing various ML algorithms to differentiate between infected and non-infected samples. Continuing endeavors are currently concentrated on enhancing the effectiveness of the model, investigating techniques to clarify its functioning, and incorporating many types of data to further support the early detection of diseases.

[33]  arXiv:2404.15760 [pdf, other]
Title: Debiasing Machine Unlearning with Counterfactual Examples
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

The right to be forgotten (RTBF) seeks to safeguard individuals from the enduring effects of their historical actions by implementing machine-learning techniques. These techniques facilitate the deletion of previously acquired knowledge without requiring extensive model retraining. However, they often overlook a critical issue: unlearning processes bias. This bias emerges from two main sources: (1) data-level bias, characterized by uneven data removal, and (2) algorithm-level bias, which leads to the contamination of the remaining dataset, thereby degrading model accuracy. In this work, we analyze the causal factors behind the unlearning process and mitigate biases at both data and algorithmic levels. Typically, we introduce an intervention-based approach, where knowledge to forget is erased with a debiased dataset. Besides, we guide the forgetting procedure by leveraging counterfactual examples, as they maintain semantic data consistency without hurting performance on the remaining dataset. Experimental results demonstrate that our method outperforms existing machine unlearning baselines on evaluation metrics.

[34]  arXiv:2404.15766 [pdf, other]
Title: Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Bayesian flow networks (BFNs) iteratively refine the parameters, instead of the samples in diffusion models (DMs), of distributions at various noise levels through Bayesian inference. Owing to its differentiable nature, BFNs are promising in modeling both continuous and discrete data, while simultaneously maintaining fast sampling capabilities. This paper aims to understand and enhance BFNs by connecting them with DMs through stochastic differential equations (SDEs). We identify the linear SDEs corresponding to the noise-addition processes in BFNs, demonstrate that BFN's regression losses are aligned with denoise score matching, and validate the sampler in BFN as a first-order solver for the respective reverse-time SDE. Based on these findings and existing recipes of fast sampling in DMs, we propose specialized solvers for BFNs that markedly surpass the original BFN sampler in terms of sample quality with a limited number of function evaluations (e.g., 10) on both image and text datasets. Notably, our best sampler achieves an increase in speed of 5~20 times for free. Our code is available at https://github.com/ML-GSAI/BFN-Solver.

[35]  arXiv:2404.15772 [pdf, other]
Title: Bi-Mamba4TS: Bidirectional Mamba for Time Series Forecasting
Subjects: Machine Learning (cs.LG)

Long-term time series forecasting (LTSF) provides longer insights into future trends and patterns. In recent years, deep learning models especially Transformers have achieved advanced performance in LTSF tasks. However, the quadratic complexity of Transformers rises the challenge of balancing computaional efficiency and predicting performance. Recently, a new state space model (SSM) named Mamba is proposed. With the selective capability on input data and the hardware-aware parallel computing algorithm, Mamba can well capture long-term dependencies while maintaining linear computational complexity. Mamba has shown great ability for long sequence modeling and is a potential competitor to Transformer-based models in LTSF. In this paper, we propose Bi-Mamba4TS, a bidirectional Mamba for time series forecasting. To address the sparsity of time series semantics, we adopt the patching technique to enrich the local information while capturing the evolutionary patterns of time series in a finer granularity. To select more appropriate modeling method based on the characteristics of the dataset, our model unifies the channel-independent and channel-mixing tokenization strategies and uses a series-relation-aware decider to control the strategy choosing process. Extensive experiments on seven real-world datasets show that our model achieves more accurate predictions compared with state-of-the-art methods.

[36]  arXiv:2404.15778 [pdf, other]
Title: BASS: Batched Attention-optimized Speculative Sampling
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Speculative decoding has emerged as a powerful method to improve latency and throughput in hosting large language models. However, most existing implementations focus on generating a single sequence. Real-world generative AI applications often require multiple responses and how to perform speculative decoding in a batched setting while preserving its latency benefits poses non-trivial challenges. This paper describes a system of batched speculative decoding that sets a new state of the art in multi-sequence generation latency and that demonstrates superior GPU utilization as well as quality of generations within a time budget. For example, for a 7.8B-size model on a single A100 GPU and with a batch size of 8, each sequence is generated at an average speed of 5.8ms per token, the overall throughput being 1.1K tokens per second. These results represent state-of-the-art latency and a 2.15X speed-up over optimized regular decoding. Within a time budget that regular decoding does not finish, our system is able to generate sequences with HumanEval Pass@First of 43% and Pass@All of 61%, far exceeding what's feasible with single-sequence speculative decoding. Our peak GPU utilization during decoding reaches as high as 15.8%, more than 3X the highest of that of regular decoding and around 10X of single-sequence speculative decoding.

[37]  arXiv:2404.15784 [pdf, other]
Title: An Empirical Study of Aegis
Comments: 9 pages, 6 figures, 3 tables
Subjects: Machine Learning (cs.LG)

Bit flipping attacks are one class of attacks on neural networks with numerous defense mechanisms invented to mitigate its potency. Due to the importance of ensuring the robustness of these defense mechanisms, we perform an empirical study on the Aegis framework. We evaluate the baseline mechanisms of Aegis on low-entropy data (MNIST), and we evaluate a pre-trained model with the mechanisms fine-tuned on MNIST. We also compare the use of data augmentation to the robustness training of Aegis, and how Aegis performs under other adversarial attacks, such as the generation of adversarial examples. We find that both the dynamic-exit strategy and robustness training of Aegis has some drawbacks. In particular, we see drops in accuracy when testing on perturbed data, and on adversarial examples, as compared to baselines. Moreover, we found that the dynamic exit-strategy loses its uniformity when tested on simpler datasets. The code for this project is available on GitHub.

[38]  arXiv:2404.15804 [pdf, other]
Title: GeckOpt: LLM System Efficiency via Intent-Based Tool Selection
Comments: GLSVLSI 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In this preliminary study, we investigate a GPT-driven intent-based reasoning approach to streamline tool selection for large language models (LLMs) aimed at system efficiency. By identifying the intent behind user prompts at runtime, we narrow down the API toolset required for task execution, reducing token consumption by up to 24.6\%. Early results on a real-world, massively parallel Copilot platform with over 100 GPT-4-Turbo nodes show cost reductions and potential towards improving LLM-based system efficiency.

[39]  arXiv:2404.15806 [pdf, other]
Title: Where to Mask: Structure-Guided Masking for Graph Masked Autoencoders
Comments: 9 pages, 3 Figures. Accepted by IJCAI 2024
Subjects: Machine Learning (cs.LG)

Graph masked autoencoders (GMAE) have emerged as a significant advancement in self-supervised pre-training for graph-structured data. Previous GMAE models primarily utilize a straightforward random masking strategy for nodes or edges during training. However, this strategy fails to consider the varying significance of different nodes within the graph structure. In this paper, we investigate the potential of leveraging the graph's structural composition as a fundamental and unique prior in the masked pre-training process. To this end, we introduce a novel structure-guided masking strategy (i.e., StructMAE), designed to refine the existing GMAE models. StructMAE involves two steps: 1) Structure-based Scoring: Each node is evaluated and assigned a score reflecting its structural significance. Two distinct types of scoring manners are proposed: predefined and learnable scoring. 2) Structure-guided Masking: With the obtained assessment scores, we develop an easy-to-hard masking strategy that gradually increases the structural awareness of the self-supervised reconstruction task. Specifically, the strategy begins with random masking and progresses to masking structure-informative nodes based on the assessment scores. This design gradually and effectively guides the model in learning graph structural information. Furthermore, extensive experiments consistently demonstrate that our StructMAE method outperforms existing state-of-the-art GMAE models in both unsupervised and transfer learning tasks. Codes are available at https://github.com/LiuChuang0059/StructMAE.

[40]  arXiv:2404.15814 [pdf, other]
Title: Fast Ensembling with Diffusion Schrödinger Bridge
Journal-ref: ICLR 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Deep Ensemble (DE) approach is a straightforward technique used to enhance the performance of deep neural networks by training them from different initial points, converging towards various local optima. However, a limitation of this methodology lies in its high computational overhead for inference, arising from the necessity to store numerous learned parameters and execute individual forward passes for each parameter during the inference stage. We propose a novel approach called Diffusion Bridge Network (DBN) to address this challenge. Based on the theory of the Schr\"odinger bridge, this method directly learns to simulate an Stochastic Differential Equation (SDE) that connects the output distribution of a single ensemble member to the output distribution of the ensembled model, allowing us to obtain ensemble prediction without having to invoke forward pass through all the ensemble models. By substituting the heavy ensembles with this lightweight neural network constructing DBN, we achieved inference with reduced computational cost while maintaining accuracy and uncertainty scores on benchmark datasets such as CIFAR-10, CIFAR-100, and TinyImageNet. Our implementation is available at https://github.com/kim-hyunsu/dbn.

[41]  arXiv:2404.15821 [pdf, other]
Title: SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data
Subjects: Machine Learning (cs.LG); Performance (cs.PF)

With the growing demand for synthetic data to address contemporary issues in machine learning, such as data scarcity, data fairness, and data privacy, having robust tools for assessing the utility and potential privacy risks of such data becomes crucial. SynthEval, a novel open-source evaluation framework distinguishes itself from existing tools by treating categorical and numerical attributes with equal care, without assuming any special kind of preprocessing steps. This~makes it applicable to virtually any synthetic dataset of tabular records. Our tool leverages statistical and machine learning techniques to comprehensively evaluate synthetic data fidelity and privacy-preserving integrity. SynthEval integrates a wide selection of metrics that can be used independently or in highly customisable benchmark configurations, and can easily be extended with additional metrics. In this paper, we describe SynthEval and illustrate its versatility with examples. The framework facilitates better benchmarking and more consistent comparisons of model capabilities.

[42]  arXiv:2404.15833 [pdf, ps, other]
Title: OpTC -- A Toolchain for Deployment of Neural Networks on AURIX TC3xx Microcontrollers
Subjects: Machine Learning (cs.LG)

The AURIX 2xx and 3xx families of TriCore microcontrollers are widely used in the automotive industry and, recently, also in applications that involve machine learning tasks. Yet, these applications are mainly engineered manually, and only little tool support exists for bringing neural networks to TriCore microcontrollers. Thus, we propose OpTC, an end-to-end toolchain for automatic compression, conversion, code generation, and deployment of neural networks on TC3xx microcontrollers. OpTC supports various types of neural networks and provides compression using layer-wise pruning based on sensitivity analysis for a given neural network. The flexibility in supporting different types of neural networks, such as multi-layer perceptrons (MLP), convolutional neural networks (CNN), and recurrent neural networks (RNN), is shown in case studies for a TC387 microcontroller. Automotive applications for predicting the temperature in electric motors and detecting anomalies are thereby used to demonstrate the effectiveness and the wide range of applications supported by OpTC.

[43]  arXiv:2404.15836 [pdf, other]
Title: Employing Two-Dimensional Word Embedding for Difficult Tabular Data Stream Classification
Authors: Paweł Zyblewski
Comments: 16 pages, 8 figures
Subjects: Machine Learning (cs.LG)

Rapid technological advances are inherently linked to the increased amount of data, a substantial portion of which can be interpreted as data stream, capable of exhibiting the phenomenon of concept drift and having a high imbalance ratio. Consequently, developing new approaches to classifying difficult data streams is a rapidly growing research area. At the same time, the proliferation of deep learning and transfer learning, as well as the success of convolutional neural networks in computer vision tasks, have contributed to the emergence of a new research trend, namely Multi-Dimensional Encoding (MDE), focusing on transforming tabular data into a homogeneous form of a discrete digital signal. This paper proposes Streaming Super Tabular Machine Learning (SSTML), thereby exploring for the first time the potential of MDE in the difficult data stream classification task. SSTML encodes consecutive data chunks into an image representation using the STML algorithm and then performs a single ResNet-18 training epoch. Experiments conducted on synthetic and real data streams have demonstrated the ability of SSTML to achieve classification quality statistically significantly superior to state-of-the-art algorithms while maintaining comparable processing time.

[44]  arXiv:2404.15899 [pdf, other]
Title: ST-MambaSync: The Confluence of Mamba Structure and Spatio-Temporal Transformers for Precipitous Traffic Prediction
Comments: 11 pages. arXiv admin note: substantial text overlap with arXiv:2404.13257
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Balancing accuracy with computational efficiency is paramount in machine learning, particularly when dealing with high-dimensional data, such as spatial-temporal datasets. This study introduces ST-MambaSync, an innovative framework that integrates a streamlined attention layer with a simplified state-space layer. The model achieves competitive accuracy in spatial-temporal prediction tasks. We delve into the relationship between attention mechanisms and the Mamba component, revealing that Mamba functions akin to attention within a residual network structure. This comparative analysis underpins the efficiency of state-space models, elucidating their capability to deliver superior performance at reduced computational costs.

[45]  arXiv:2404.15919 [pdf, other]
Title: An Element-Wise Weights Aggregation Method for Federated Learning
Comments: 2023 IEEE International Conference on Data Mining Workshops (ICDMW)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Federated learning (FL) is a powerful Machine Learning (ML) paradigm that enables distributed clients to collaboratively learn a shared global model while keeping the data on the original device, thereby preserving privacy. A central challenge in FL is the effective aggregation of local model weights from disparate and potentially unbalanced participating clients. Existing methods often treat each client indiscriminately, applying a single proportion to the entire local model. However, it is empirically advantageous for each weight to be assigned a specific proportion. This paper introduces an innovative Element-Wise Weights Aggregation Method for Federated Learning (EWWA-FL) aimed at optimizing learning performance and accelerating convergence speed. Unlike traditional FL approaches, EWWA-FL aggregates local weights to the global model at the level of individual elements, thereby allowing each participating client to make element-wise contributions to the learning process. By taking into account the unique dataset characteristics of each client, EWWA-FL enhances the robustness of the global model to different datasets while also achieving rapid convergence. The method is flexible enough to employ various weighting strategies. Through comprehensive experiments, we demonstrate the advanced capabilities of EWWA-FL, showing significant improvements in both accuracy and convergence speed across a range of backbones and benchmarks.

[46]  arXiv:2404.15943 [pdf, other]
Title: Decentralized Personalized Federated Learning based on a Conditional Sparse-to-Sparser Scheme
Comments: 15 pages, 9 figures, 3 pages theory
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Decentralized Federated Learning (DFL) has become popular due to its robustness and avoidance of centralized coordination. In this paradigm, clients actively engage in training by exchanging models with their networked neighbors. However, DFL introduces increased costs in terms of training and communication. Existing methods focus on minimizing communication often overlooking training efficiency and data heterogeneity. To address this gap, we propose a novel \textit{sparse-to-sparser} training scheme: DA-DPFL. DA-DPFL initializes with a subset of model parameters, which progressively reduces during training via \textit{dynamic aggregation} and leads to substantial energy savings while retaining adequate information during critical learning periods.
Our experiments showcase that DA-DPFL substantially outperforms DFL baselines in test accuracy, while achieving up to $5$ times reduction in energy costs. We provide a theoretical analysis of DA-DPFL's convergence by solidifying its applicability in decentralized and personalized learning. The code is available at:https://github.com/EricLoong/da-dpfl

[47]  arXiv:2404.15993 [pdf, other]
Title: Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach
Comments: 29 pages, 14 figures
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Large language models (LLMs) are highly capable of many tasks but they can sometimes generate unreliable or inaccurate outputs. To tackle this issue, this paper studies the problem of uncertainty estimation and calibration for LLMs. We begin by formulating the uncertainty estimation problem for LLMs and then propose a supervised approach that takes advantage of the labeled datasets and estimates the uncertainty of the LLMs' responses. Based on the formulation, we illustrate the difference between the uncertainty estimation for LLMs and that for standard ML models and explain why the hidden activations of the LLMs contain uncertainty information. Our designed approach effectively demonstrates the benefits of utilizing hidden activations for enhanced uncertainty estimation across various tasks and shows robust transferability in out-of-distribution settings. Moreover, we distinguish the uncertainty estimation task from the uncertainty calibration task and show that a better uncertainty estimation mode leads to a better calibration performance. In practice, our method is easy to implement and is adaptable to different levels of model transparency including black box, grey box, and white box, each demonstrating strong performance based on the accessibility of the LLM's internal mechanisms.

[48]  arXiv:2404.15999 [pdf, other]
Title: BeSound: Bluetooth-Based Position Estimation Enhancing with Cross-Modality Distillation
Comments: Accepted in IEEE 6th International Conference on Activity and Behavior Computing
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Smart factories leverage advanced technologies to optimize manufacturing processes and enhance efficiency. Implementing worker tracking systems, primarily through camera-based methods, ensures accurate monitoring. However, concerns about worker privacy and technology protection make it necessary to explore alternative approaches. We propose a non-visual, scalable solution using Bluetooth Low Energy (BLE) and ultrasound coordinates. BLE position estimation offers a very low-power and cost-effective solution, as the technology is available on smartphones and is scalable due to the large number of smartphone users, facilitating worker localization and safety protocol transmission. Ultrasound signals provide faster response times and higher accuracy but require custom hardware, increasing costs. To combine the benefits of both modalities, we employ knowledge distillation (KD) from ultrasound signals to BLE RSSI data. Once the student model is trained, the model only takes as inputs the BLE-RSSI data for inference, retaining the advantages of ubiquity and low cost of BLE RSSI. We tested our approach using data from an experiment with twelve participants in a smart factory test bed environment. We obtained an increase of 11.79% in the F1-score compared to the baseline (target model without KD and trained with BLE-RSSI data only).

[49]  arXiv:2404.16005 [pdf, other]
Title: Unimodal and Multimodal Sensor Fusion for Wearable Activity Recognition
Authors: Hymalai Bello
Comments: Accepted in IEEE 22nd International Conference on Pervasive Computing and Communications (PerCom 2024)
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Combining different sensing modalities with multiple positions helps form a unified perception and understanding of complex situations such as human behavior. Hence, human activity recognition (HAR) benefits from combining redundant and complementary information (Unimodal/Multimodal). Even so, it is not an easy task. It requires a multidisciplinary approach, including expertise in sensor technologies, signal processing, data fusion algorithms, and domain-specific knowledge. This Ph.D. work employs sensing modalities such as inertial, pressure (audio and atmospheric pressure), and textile capacitive sensing for HAR. The scenarios explored are gesture and hand position tracking, facial and head pattern recognition, and body posture and gesture recognition. The selected wearable devices and sensing modalities are fully integrated with machine learning-based algorithms, some of which are implemented in the embedded device, on the edge, and tested in real-time.

[50]  arXiv:2404.16014 [pdf, other]
Title: Improving Dictionary Learning with Gated Sparse Autoencoders
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent work has found that sparse autoencoders (SAEs) are an effective technique for unsupervised discovery of interpretable features in language models' (LMs) activations, by finding sparse, linear reconstructions of LM activations. We introduce the Gated Sparse Autoencoder (Gated SAE), which achieves a Pareto improvement over training with prevailing methods. In SAEs, the L1 penalty used to encourage sparsity introduces many undesirable biases, such as shrinkage -- systematic underestimation of feature activations. The key insight of Gated SAEs is to separate the functionality of (a) determining which directions to use and (b) estimating the magnitudes of those directions: this enables us to apply the L1 penalty only to the former, limiting the scope of undesirable side effects. Through training SAEs on LMs of up to 7B parameters we find that, in typical hyper-parameter ranges, Gated SAEs solve shrinkage, are similarly interpretable, and require half as many firing features to achieve comparable reconstruction fidelity.

[51]  arXiv:2404.16032 [pdf, other]
Title: Studying Large Language Model Behaviors Under Realistic Knowledge Conflicts
Subjects: Machine Learning (cs.LG)

Retrieval-augmented generation (RAG) mitigates many problems of fully parametric language models, such as temporal degradation, hallucinations, and lack of grounding. In RAG, the model's knowledge can be updated from documents provided in context. This leads to cases of conflict between the model's parametric knowledge and the contextual information, where the model may not always update its knowledge. Previous work studied knowledge conflicts by creating synthetic documents that contradict the model's correct parametric answers. We present a framework for studying knowledge conflicts in a realistic setup. We update incorrect parametric knowledge using real conflicting documents. This reflects how knowledge conflicts arise in practice. In this realistic scenario, we find that knowledge updates fail less often than previously reported. In cases where the models still fail to update their answers, we find a parametric bias: the incorrect parametric answer appearing in context makes the knowledge update likelier to fail. These results suggest that the factual parametric knowledge of LLMs can negatively influence their reading abilities and behaviors. Our code is available at https://github.com/kortukov/realistic_knowledge_conflicts/.

Cross-lists for Thu, 25 Apr 24

[52]  arXiv:2404.13101 (cross-list from eess.IV) [pdf, ps, other]
Title: DensePANet: An improved generative adversarial network for photoacoustic tomography image reconstruction from sparse data
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Image reconstruction is an essential step of every medical imaging method, including Photoacoustic Tomography (PAT), which is a promising modality of imaging, that unites the benefits of both ultrasound and optical imaging methods. Reconstruction of PAT images using conventional methods results in rough artifacts, especially when applied directly to sparse PAT data. In recent years, generative adversarial networks (GANs) have shown a powerful performance in image generation as well as translation, rendering them a smart choice to be applied to reconstruction tasks. In this study, we proposed an end-to-end method called DensePANet to solve the problem of PAT image reconstruction from sparse data. The proposed model employs a novel modification of UNet in its generator, called FD-UNet++, which considerably improves the reconstruction performance. We evaluated the method on various in-vivo and simulated datasets. Quantitative and qualitative results show the better performance of our model over other prevalent deep learning techniques.

[53]  arXiv:2404.15289 (cross-list from eess.SP) [pdf, other]
Title: EEGDiR: Electroencephalogram denoising network for temporal information storage and global modeling through Retentive Network
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Electroencephalogram (EEG) signals play a pivotal role in clinical medicine, brain research, and neurological disease studies. However, susceptibility to various physiological and environmental artifacts introduces noise in recorded EEG data, impeding accurate analysis of underlying brain activity. Denoising techniques are crucial to mitigate this challenge. Recent advancements in deep learningbased approaches exhibit substantial potential for enhancing the signal-to-noise ratio of EEG data compared to traditional methods. In the realm of large-scale language models (LLMs), the Retentive Network (Retnet) infrastructure, prevalent for some models, demonstrates robust feature extraction and global modeling capabilities. Recognizing the temporal similarities between EEG signals and natural language, we introduce the Retnet from natural language processing to EEG denoising. This integration presents a novel approach to EEG denoising, opening avenues for a profound understanding of brain activities and accurate diagnosis of neurological diseases. Nonetheless, direct application of Retnet to EEG denoising is unfeasible due to the one-dimensional nature of EEG signals, while natural language processing deals with two-dimensional data. To facilitate Retnet application to EEG denoising, we propose the signal embedding method, transforming one-dimensional EEG signals into two dimensions for use as network inputs. Experimental results validate the substantial improvement in denoising effectiveness achieved by the proposed method.

[54]  arXiv:2404.15294 (cross-list from eess.SP) [pdf, ps, other]
Title: Multimodal Physical Fitness Monitoring (PFM) Framework Based on TimeMAE-PFM in Wearable Scenarios
Comments: 5 pages, 6 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Physical function monitoring (PFM) plays a crucial role in healthcare especially for the elderly. Traditional assessment methods such as the Short Physical Performance Battery (SPPB) have failed to capture the full dynamic characteristics of physical function. Wearable sensors such as smart wristbands offer a promising solution to this issue. However, challenges exist, such as the computational complexity of machine learning methods and inadequate information capture. This paper proposes a multi-modal PFM framework based on an improved TimeMAE, which compresses time-series data into a low-dimensional latent space and integrates a self-enhanced attention module. This framework achieves effective monitoring of physical health, providing a solution for real-time and personalized assessment. The method is validated using the NHATS dataset, and the results demonstrate an accuracy of 70.6% and an AUC of 82.20%, surpassing other state-of-the-art time-series classification models.

[55]  arXiv:2404.15296 (cross-list from math.NA) [pdf, other]
Title: Maximum Discrepancy Generative Regularization and Non-Negative Matrix Factorization for Single Channel Source Separation
Comments: arXiv admin note: substantial text overlap with arXiv:2305.01758
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)

The idea of adversarial learning of regularization functionals has recently been introduced in the wider context of inverse problems. The intuition behind this method is the realization that it is not only necessary to learn the basic features that make up a class of signals one wants to represent, but also, or even more so, which features to avoid in the representation. In this paper, we will apply this approach to the training of generative models, leading to what we call Maximum Discrepancy Generative Regularization. In particular, we apply this to problem of source separation by means of Non-negative Matrix Factorization (NMF) and present a new method for the adversarial training of NMF bases. We show in numerical experiments, both for image and audio separation, that this leads to a clear improvement of the reconstructed signals, in particular in the case where little or no strong supervision data is available.

[56]  arXiv:2404.15297 (cross-list from eess.SP) [pdf, ps, other]
Title: Multi-stream Transmission for Directional Modulation Network via distributed Multi-UAV-aided Multi-IRS
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)

Active intelligent reflecting surface (IRS) is a revolutionary technique for the future 6G networks. The conventional far-field single-IRS-aided directional modulation(DM) networks have only one (no direct path) or two (existing direct path) degrees of freedom (DoFs). This means that there are only one or two streams transmitted simultaneously from base station to user and will seriously limit its rate gain achieved by IRS. How to create multiple DoFs more than two for DM? In this paper, single large-scale IRS is divided to multiple small IRSs and a novel multi-IRS-aided multi-stream DM network is proposed to achieve a point-to-point multi-stream transmission by creating $K$ ($\geq3$) DoFs, where multiple small IRSs are placed distributively via multiple unmanned aerial vehicles (UAVs). The null-space projection, zero-forcing (ZF) and phase alignment are adopted to design the transmit beamforming vector, receive beamforming vector and phase shift matrix (PSM), respectively, called NSP-ZF-PA. Here, $K$ PSMs and their corresponding beamforming vectors are independently optimized. The weighted minimum mean-square error (WMMSE) algorithm is involved in alternating iteration for the optimization variables by introducing the power constraint on IRS, named WMMSE-PC, where the majorization-minimization (MM) algorithm is used to solve the total PSM. To achieve a lower computational complexity, a maximum trace method, called Max-TR-SVD, is proposed by optimize the PSM of all IRSs. Numerical simulation results has shown that the proposed NSP-ZF-PA performs much better than Max-TR-SVD in terms of rate. In particular, the rate of NSP-ZF-PA with sixteen small IRSs is about five times that of NSP-ZF-PA with combining all small IRSs as a single large IRS. Thus, a dramatic rate enhancement may be achieved by multiple distributed IRSs.

[57]  arXiv:2404.15305 (cross-list from eess.SP) [pdf, other]
Title: ADAPT^2: Adapting Pre-Trained Sensing Models to End-Users via Self-Supervision Replay
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Self-supervised learning has emerged as a method for utilizing massive unlabeled data for pre-training models, providing an effective feature extractor for various mobile sensing applications. However, when deployed to end-users, these models encounter significant domain shifts attributed to user diversity. We investigate the performance degradation that occurs when self-supervised models are fine-tuned in heterogeneous domains. To address the issue, we propose ADAPT^2, a few-shot domain adaptation framework for personalizing self-supervised models. ADAPT2 proposes self-supervised meta-learning for initial model pre-training, followed by a user-side model adaptation by replaying the self-supervision with user-specific data. This allows models to adjust their pre-trained representations to the user with only a few samples. Evaluation with four benchmarks demonstrates that ADAPT^2 outperforms existing baselines by an average F1-score of 8.8%p. Our on-device computational overhead analysis on a commodity off-the-shelf (COTS) smartphone shows that ADAPT2 completes adaptation within an unobtrusive latency (in three minutes) with only a 9.54% memory consumption, demonstrating the computational efficiency of the proposed method.

[58]  arXiv:2404.15308 (cross-list from eess.SP) [pdf, ps, other]
Title: Label-Efficient Sleep Staging Using Transformers Pre-trained with Position Prediction
Comments: 4 pages, 1 figure. This was work was presented at the IEEE International Conference on AI for Medicine, Health, and Care 2024
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Sleep staging is a clinically important task for diagnosing various sleep disorders, but remains challenging to deploy at scale because it because it is both labor-intensive and time-consuming. Supervised deep learning-based approaches can automate sleep staging but at the expense of large labeled datasets, which can be unfeasible to procure for various settings, e.g., uncommon sleep disorders. While self-supervised learning (SSL) can mitigate this need, recent studies on SSL for sleep staging have shown performance gains saturate after training with labeled data from only tens of subjects, hence are unable to match peak performance attained with larger datasets. We hypothesize that the rapid saturation stems from applying a sub-optimal pretraining scheme that pretrains only a portion of the architecture, i.e., the feature encoder, but not the temporal encoder; therefore, we propose adopting an architecture that seamlessly couples the feature and temporal encoding and a suitable pretraining scheme that pretrains the entire model. On a sample sleep staging dataset, we find that the proposed scheme offers performance gains that do not saturate with amount of labeled training data (e.g., 3-5\% improvement in balanced sleep staging accuracy across low- to high-labeled data settings), reducing the amount of labeled training data needed for high performance (e.g., by 800 subjects). Based on our findings, we recommend adopting this SSL paradigm for subsequent work on SSL for sleep staging.

[59]  arXiv:2404.15309 (cross-list from eess.SP) [pdf, other]
Title: Sparse Bayesian Correntropy Learning for Robust Muscle Activity Reconstruction from Noisy Brain Recordings
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)

Sparse Bayesian learning has promoted many effective frameworks for brain activity decoding, especially for the reconstruction of muscle activity. However, existing sparse Bayesian learning mainly employs Gaussian distribution as error assumption in the reconstruction task, which is not necessarily the truth in the real-world application. On the other hand, brain recording is known to be highly noisy and contains many non-Gaussian noises, which could lead to significant performance degradation for sparse Bayesian learning method. The goal of this paper is to propose a new robust implementation for sparse Bayesian learning, so that robustness and sparseness can be realized simultaneously. Motivated by the great robustness of maximum correntropy criterion (MCC), we proposed an integration of MCC into the sparse Bayesian learning regime. To be specific, we derived the explicit error assumption inherent in the MCC and then leveraged it for the likelihood function. Meanwhile, we used the automatic relevance determination (ARD) technique for the sparse prior distribution. To fully evaluate the proposed method, a synthetic dataset and a real-world muscle activity reconstruction task with two different brain modalities were employed. Experimental results showed that our proposed sparse Bayesian correntropy learning framework improves significantly the robustness in a noisy regression task. The proposed method can realize higher correlation coefficient and lower root mean squared error in the real-world muscle activity reconstruction tasks. Sparse Bayesian correntropy learning provides a powerful tool for neural decoding which can promote the development of brain-computer interfaces.

[60]  arXiv:2404.15310 (cross-list from cs.HC) [pdf, other]
Title: Automated Assessment of Encouragement and Warmth in Classrooms Leveraging Multimodal Emotional Features and ChatGPT
Comments: Accepted as a full paper by the 25th International Conference on Artificial Intelligence in Education (AIED 2024)
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)

Classroom observation protocols standardize the assessment of teaching effectiveness and facilitate comprehension of classroom interactions. Whereas these protocols offer teachers specific feedback on their teaching practices, the manual coding by human raters is resource-intensive and often unreliable. This has sparked interest in developing AI-driven, cost-effective methods for automating such holistic coding. Our work explores a multimodal approach to automatically estimating encouragement and warmth in classrooms, a key component of the Global Teaching Insights (GTI) study's observation protocol. To this end, we employed facial and speech emotion recognition with sentiment analysis to extract interpretable features from video, audio, and transcript data. The prediction task involved both classification and regression methods. Additionally, in light of recent large language models' remarkable text annotation capabilities, we evaluated ChatGPT's zero-shot performance on this scoring task based on transcripts. We demonstrated our approach on the GTI dataset, comprising 367 16-minute video segments from 92 authentic lesson recordings. The inferences of GPT-4 and the best-trained model yielded correlations of r = .341 and r = .441 with human ratings, respectively. Combining estimates from both models through averaging, an ensemble approach achieved a correlation of r = .513, comparable to human inter-rater reliability. Our model explanation analysis indicated that text sentiment features were the primary contributors to the trained model's decisions. Moreover, GPT-4 could deliver logical and concrete reasoning as potential teacher guidelines. Our findings provide insights into using advanced, multimodal techniques for automated classroom observation, aiming to foster teacher training through frequent and valuable feedback.

[61]  arXiv:2404.15311 (cross-list from eess.SP) [pdf, other]
Title: Fusing Pretrained ViTs with TCNet for Enhanced EEG Regression
Comments: Accepted HCI International 2024
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The task of Electroencephalogram (EEG) analysis is paramount to the development of Brain-Computer Interfaces (BCIs). However, to reach the goal of developing robust, useful BCIs depends heavily on the speed and the accuracy at which BCIs can understand neural dynamics. In response to that goal, this paper details the integration of pre-trained Vision Transformers (ViTs) with Temporal Convolutional Networks (TCNet) to enhance the precision of EEG regression. The core of this approach lies in harnessing the sequential data processing strengths of ViTs along with the superior feature extraction capabilities of TCNet, to significantly improve EEG analysis accuracy. In addition, we analyze the importance of how to construct optimal patches for the attention mechanism to analyze, balancing both speed and accuracy tradeoffs. Our results showcase a substantial improvement in regression accuracy, as evidenced by the reduction of Root Mean Square Error (RMSE) from 55.4 to 51.8 on EEGEyeNet's Absolute Position Task, outperforming existing state-of-the-art models. Without sacrificing performance, we increase the speed of this model by an order of magnitude (up to 4.32x faster). This breakthrough not only sets a new benchmark in EEG regression analysis but also opens new avenues for future research in the integration of transformer architectures with specialized feature extraction methods for diverse EEG datasets.

[62]  arXiv:2404.15314 (cross-list from eess.SP) [pdf, other]
Title: Detection of direct path component absence in NLOS UWB channel
Comments: The dataset used in the study is available at Zenodo: Marcin Kolakowski. (2021). UWB Channel Impulse Responses Registered in a Furnished Apartment (Version 1.0) [Data set]. Zenodo. \url{this http URL} Originally presented at 2018 22nd International Microwave and Radar Conference (MIKON), Poznan, Poland, 2018
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)

In this paper a novel NLOS (Non-Line-of-Sight) identification technique is proposed. In comparison to other methods described in the literature, it discerns a situation when the delayed direct path component is available from when it's totally blocked and introduced biases are much higher and harder to mitigate. In the method, NLOS identification is performed using Support Vector Machine (SVM) algorithm based on various signal features. The paper includes description of the method and the results of performed experiment.

[63]  arXiv:2404.15317 (cross-list from cs.SE) [pdf, other]
Title: Concept-Guided LLM Agents for Human-AI Safety Codesign
Comments: 5 pages
Journal-ref: Proceedings of the AAAI-make Spring Symposium, 2024
Subjects: Software Engineering (cs.SE); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Generative AI is increasingly important in software engineering, including safety engineering, where its use ensures that software does not cause harm to people. This also leads to high quality requirements for generative AI. Therefore, the simplistic use of Large Language Models (LLMs) alone will not meet these quality demands. It is crucial to develop more advanced and sophisticated approaches that can effectively address the complexities and safety concerns of software systems. Ultimately, humans must understand and take responsibility for the suggestions provided by generative AI to ensure system safety. To this end, we present an efficient, hybrid strategy to leverage LLMs for safety analysis and Human-AI codesign. In particular, we develop a customized LLM agent that uses elements of prompt engineering, heuristic reasoning, and retrieval-augmented generation to solve tasks associated with predefined safety concepts, in interaction with a system model graph. The reasoning is guided by a cascade of micro-decisions that help preserve structured information. We further suggest a graph verbalization which acts as an intermediate representation of the system model to facilitate LLM-graph interactions. Selected pairs of prompts and responses relevant for safety analytics illustrate our method for the use case of a simplified automated driving system.

[64]  arXiv:2404.15319 (cross-list from eess.SP) [pdf, other]
Title: The largest EEG-based BCI reproducibility study for open science: the MOABB benchmark
Comments: 43 pages, 13 figures, 5 tables
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)

Objective. This study conduct an extensive Brain-computer interfaces (BCI) reproducibility analysis on open electroencephalography datasets, aiming to assess existing solutions and establish open and reproducible benchmarks for effective comparison within the field. The need for such benchmark lies in the rapid industrial progress that has given rise to undisclosed proprietary solutions. Furthermore, the scientific literature is dense, often featuring challenging-to-reproduce evaluations, making comparisons between existing approaches arduous.
Approach. Within an open framework, 30 machine learning pipelines (separated into raw signal: 11, Riemannian: 13, deep learning: 6) are meticulously re-implemented and evaluated across 36 publicly available datasets, including motor imagery (14), P300 (15), and SSVEP (7). The analysis incorporates statistical meta-analysis techniques for results assessment, encompassing execution time and environmental impact considerations.
Main results. The study yields principled and robust results applicable to various BCI paradigms, emphasizing motor imagery, P300, and SSVEP. Notably, Riemannian approaches utilizing spatial covariance matrices exhibit superior performance, underscoring the necessity for significant data volumes to achieve competitive outcomes with deep learning techniques. The comprehensive results are openly accessible, paving the way for future research to further enhance reproducibility in the BCI domain.
Significance. The significance of this study lies in its contribution to establishing a rigorous and transparent benchmark for BCI research, offering insights into optimal methodologies and highlighting the importance of reproducibility in driving advancements within the field.

[65]  arXiv:2404.15323 (cross-list from eess.SP) [pdf, other]
Title: Transportation mode recognition based on low-rate acceleration and location signals with an attention-based multiple-instance learning network
Comments: 13 pages, 5 figures, 9 tables, accepted in IEEE Transactions on Intelligent Transportation Systems
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Transportation mode recognition (TMR) is a critical component of human activity recognition (HAR) that focuses on understanding and identifying how people move within transportation systems. It is commonly based on leveraging inertial, location, or both types of signals, captured by modern smartphone devices. Each type has benefits (such as increased effectiveness) and drawbacks (such as increased battery consumption) depending on the transportation mode (TM). Combining the two types is challenging as they exhibit significant differences such as very different sampling rates. This paper focuses on the TMR task and proposes an approach for combining the two types of signals in an effective and robust classifier. Our network includes two sub-networks for processing acceleration and location signals separately, using different window sizes for each signal. The two sub-networks are designed to also embed the two types of signals into the same space so that we can then apply an attention-based multiple-instance learning classifier to recognize TM. We use very low sampling rates for both signal types to reduce battery consumption. We evaluate the proposed methodology on a publicly available dataset and compare against other well known algorithms.

[66]  arXiv:2404.15328 (cross-list from eess.SP) [pdf, other]
Title: Time topological analysis of EEG using signature theory
Comments: 14 pages, 5 figures Under review for Journ\'ee des Statistiques 2024
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)

Anomaly detection in multivariate signals is a task of paramount importance in many disciplines (epidemiology, finance, cognitive sciences and neurosciences, oncology, etc.). In this perspective, Topological Data Analysis (TDA) offers a battery of "shape" invariants that can be exploited for the implementation of an effective detection scheme. Our contribution consists of extending the constructions presented in \cite{chretienleveraging} on the construction of simplicial complexes from the Signatures of signals and their predictive capacities, rather than the use of a generic distance as in \cite{petri2014homological}. Signature theory is a new theme in Machine Learning arXiv:1603.03788 stemming from recent work on the notions of Rough Paths developed by Terry Lyons and his team \cite{lyons2002system} based on the formalism introduced by Chen \cite{chen1957integration}. We explore in particular the detection of changes in topology, based on tracking the evolution of homological persistence and the Betti numbers associated with the complex introduced in \cite{chretienleveraging}. We apply our tools for the analysis of brain signals such as EEG to detect precursor phenomena to epileptic seizures.

[67]  arXiv:2404.15330 (cross-list from eess.SP) [pdf, other]
Title: Anchor Pair Selection in TDOA Positioning Systems by Door Transition Error Minimization
Comments: Originally presented at 2022 24th International Microwave and Radar Conference (MIKON), Gdansk, Poland, 2022
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

This paper presents an adaptive anchor pairs selection algorithm for UWB (ultra-wideband) TDOA-based (Time Difference of Arrival) indoor positioning systems. The method assumes dividing the system operation area into zones. The most favorable anchor pairs are selected by minimizing the positioning errors in doorways leading to these zones where possible users' locations are limited to small, narrow areas. The sets are determined separately for going in and out of the zone to take users' body shadowing into account. The determined anchor pairs are then used to calculate TDOA values and localize the user moving around the apartment with an Extended Kalman Filter based algorithm. The method was tested experimentally in a furnished apartment. The results have shown that the adaptive selection of the anchor pairs leads to an increase in the user's localization accuracy. The median trajectory error was about 0.32 m.

[68]  arXiv:2404.15332 (cross-list from eess.SP) [pdf, other]
Title: Clinical translation of machine learning algorithms for seizure detection in scalp electroencephalography: a systematic review
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Machine learning algorithms for seizure detection have shown great diagnostic potential, with recent reported accuracies reaching 100%. However, few published algorithms have fully addressed the requirements for successful clinical translation. For example, the properties of training data may critically limit the generalisability of algorithms, algorithms may be sensitive to variability across EEG acquisition hardware, and run-time processing costs may render them unfeasible for real-time clinical use cases. Here, we systematically review machine learning seizure detection algorithms with a focus on clinical translatability, assessed by criteria including generalisability, run-time costs, explainability, and clinically-relevant performance metrics. For non-specialists, we provide domain-specific knowledge necessary to contextualise model development and evaluation. Our critical evaluation of machine learning algorithms with respect to their potential real-world effectiveness can help accelerate clinical translation and identify gaps in the current seizure detection literature.

[69]  arXiv:2404.15333 (cross-list from eess.SP) [pdf, other]
Title: EB-GAME: A Game-Changer in ECG Heartbeat Anomaly Detection
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Cardiologists use electrocardiograms (ECG) for the detection of arrhythmias. However, continuous monitoring of ECG signals to detect cardiac abnormal-ities requires significant time and human resources. As a result, several deep learning studies have been conducted in advance for the automatic detection of arrhythmia. These models show relatively high performance in supervised learning, but are not applicable in cases with few training examples. This is because abnormal ECG data is scarce compared to normal data in most real-world clinical settings. Therefore, in this study, GAN-based anomaly detec-tion, i.e., unsupervised learning, was employed to address the issue of data imbalance. This paper focuses on detecting abnormal signals in electrocardi-ograms (ECGs) using only labels from normal signals as training data. In-spired by self-supervised vision transformers, which learn by dividing images into patches, and masked auto-encoders, known for their effectiveness in patch reconstruction and solving information redundancy, we introduce the ECG Heartbeat Anomaly Detection model, EB-GAME. EB-GAME was trained and validated on the MIT-BIH Arrhythmia Dataset, where it achieved state-of-the-art performance on this benchmark.

[70]  arXiv:2404.15335 (cross-list from eess.SP) [pdf, ps, other]
Title: Integrative Deep Learning Framework for Parkinson's Disease Early Detection using Gait Cycle Data Measured by Wearable Sensors: A CNN-GRU-GNN Approach
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Efficient early diagnosis is paramount in addressing the complexities of Parkinson's disease because timely intervention can substantially mitigate symptom progression and improve patient outcomes. In this paper, we present a pioneering deep learning architecture tailored for the binary classification of subjects, utilizing gait cycle datasets to facilitate early detection of Parkinson's disease. Our model harnesses the power of 1D-Convolutional Neural Networks (CNN), Gated Recurrent Units (GRU), and Graph Neural Network (GNN) layers, synergistically capturing temporal dynamics and spatial relationships within the data. In this work, 16 wearable sensors located at the end of subjects' shoes for measuring the vertical Ground Reaction Force (vGRF) are considered as the vertices of a graph, their adjacencies are modelled as edges of this graph, and finally, the measured data of each sensor is considered as the feature vector of its corresponding vertex. Therefore, The GNN layers can extract the relations among these sensors by learning proper representations. Regarding the dynamic nature of these measurements, GRU and CNN are used to analyze them spatially and temporally and map them to an embedding space. Remarkably, our proposed model achieves exceptional performance metrics, boasting accuracy, precision, recall, and F1 score values of 99.51%, 99.57%, 99.71%, and 99.64%, respectively.

[71]  arXiv:2404.15337 (cross-list from eess.SP) [pdf, other]
Title: RSSI Estimation for Constrained Indoor Wireless Networks using ANN
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)

In the expanding field of the Internet of Things (IoT), wireless channel estimation is a significant challenge. This is specifically true for low-power IoT (LP-IoT) communication, where efficiency and accuracy are extremely important. This research establishes two distinct LP-IoT wireless channel estimation models using Artificial Neural Networks (ANN): a Feature-based ANN model and a Sequence-based ANN model. Both models have been constructed to enhance LP-IoT communication by lowering the estimation error in the LP-IoT wireless channel. The Feature-based model aims to capture complex patterns of measured Received Signal Strength Indicator (RSSI) data using environmental characteristics. The Sequence-based approach utilises predetermined categorisation techniques to estimate the RSSI sequence of specifically selected environment characteristics. The findings demonstrate that our suggested approaches attain remarkable precision in channel estimation, with an improvement in MSE of $88.29\%$ of the Feature-based model and $97.46\%$ of the Sequence-based model over existing research. Additionally, the comparative analysis of these techniques with traditional and other Deep Learning (DL)-based techniques also highlights the superior performance of our developed models and their potential in real-world IoT applications.

[72]  arXiv:2404.15341 (cross-list from eess.SP) [pdf, other]
Title: Classifier-guided neural blind deconvolution: a physics-informed denoising module for bearing fault diagnosis under heavy noise
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Blind deconvolution (BD) has been demonstrated as an efficacious approach for extracting bearing fault-specific features from vibration signals under strong background noise. Despite BD's desirable feature in adaptability and mathematical interpretability, a significant challenge persists: How to effectively integrate BD with fault-diagnosing classifiers? This issue arises because the traditional BD method is solely designed for feature extraction with its own optimizer and objective function. When BD is combined with downstream deep learning classifiers, the different learning objectives will be in conflict. To address this problem, this paper introduces classifier-guided BD (ClassBD) for joint learning of BD-based feature extraction and deep learning-based fault classification. Firstly, we present a time and frequency neural BD that employs neural networks to implement conventional BD, thereby facilitating the seamless integration of BD and the deep learning classifier for co-optimization of model parameters. Subsequently, we develop a unified framework to use a deep learning classifier to guide the learning of BD filters. In addition, we devise a physics-informed loss function composed of kurtosis, $l_2/l_4$ norm, and a cross-entropy loss to jointly optimize the BD filters and deep learning classifier. Consequently, the fault labels provide useful information to direct BD to extract features that distinguish classes amidst strong noise. To the best of our knowledge, this is the first of its kind that BD is successfully applied to bearing fault diagnosis. Experimental results from three datasets demonstrate that ClassBD outperforms other state-of-the-art methods under noisy conditions.

[73]  arXiv:2404.15342 (cross-list from eess.SP) [pdf, other]
Title: WaveSleepNet: An Interpretable Network for Expert-like Sleep Staging
Authors: Yan Pei, Wei Luo
Comments: 17 pages, 6 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Although deep learning algorithms have proven their efficiency in automatic sleep staging, the widespread skepticism about their "black-box" nature has limited its clinical acceptance. In this study, we propose WaveSleepNet, an interpretable neural network for sleep staging that reasons in a similar way to sleep experts. In this network, we utilize the latent space representations generated during training to identify characteristic wave prototypes corresponding to different sleep stages. The feature representation of an input signal is segmented into patches within the latent space, each of which is compared against the learned wave prototypes. The proximity between these patches and the wave prototypes is quantified through scores, indicating the prototypes' presence and relative proportion within the signal. The scores are served as the decision-making criteria for final sleep staging. During training, an ensemble of loss functions is employed for the prototypes' diversity and robustness. Furthermore, the learned wave prototypes are visualized by analysing occlusion sensitivity. The efficacy of WaveSleepNet is validated across three public datasets, achieving sleep staging performance that are on par with the state-of-the-art models when several WaveSleepNets are combine into a larger network. A detailed case study examined the decision-making process of the WaveSleepNet which aligns closely with American Academy of Sleep Medicine (AASM) manual guidelines. Another case study systematically explained the misidentified reason behind each sleep stage. WaveSleepNet's transparent process provides specialists with direct access to the physiological significance of its criteria, allowing for future adaptation or enrichment by sleep experts.

[74]  arXiv:2404.15343 (cross-list from eess.SP) [pdf, other]
Title: Edge-Efficient Deep Learning Models for Automatic Modulation Classification: A Performance Analysis
Comments: Accepted at the IEEE Wireless Communications and Networking Conference (WCNC) 2024
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

The recent advancement in deep learning (DL) for automatic modulation classification (AMC) of wireless signals has encouraged numerous possible applications on resource-constrained edge devices. However, developing optimized DL models suitable for edge applications of wireless communications is yet to be studied in depth. In this work, we perform a thorough investigation of optimized convolutional neural networks (CNNs) developed for AMC using the three most commonly used model optimization techniques: a) pruning, b) quantization, and c) knowledge distillation. Furthermore, we have proposed optimized models with the combinations of these techniques to fuse the complementary optimization benefits. The performances of all the proposed methods are evaluated in terms of sparsity, storage compression for network parameters, and the effect on classification accuracy with a reduction in parameters. The experimental results show that the proposed individual and combined optimization techniques are highly effective for developing models with significantly less complexity while maintaining or even improving classification performance compared to the benchmark CNNs.

[75]  arXiv:2404.15344 (cross-list from eess.SP) [pdf, other]
Title: Adversarial Robustness of Distilled and Pruned Deep Learning-based Wireless Classifiers
Comments: Accepted at the IEEE Wireless Communications and Networking Conference (WCNC) 2024
Subjects: Signal Processing (eess.SP); Cryptography and Security (cs.CR); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

Data-driven deep learning (DL) techniques developed for automatic modulation classification (AMC) of wireless signals are vulnerable to adversarial attacks. This poses a severe security threat to the DL-based wireless systems, specifically for edge applications of AMC. In this work, we address the joint problem of developing optimized DL models that are also robust against adversarial attacks. This enables efficient and reliable deployment of DL-based AMC on edge devices. We first propose two optimized models using knowledge distillation and network pruning, followed by a computationally efficient adversarial training process to improve the robustness. Experimental results on five white-box attacks show that the proposed optimized and adversarially trained models can achieve better robustness than the standard (unoptimized) model. The two optimized models also achieve higher accuracy on clean (unattacked) samples, which is essential for the reliability of DL-based solutions at edge applications.

[76]  arXiv:2404.15346 (cross-list from eess.SP) [pdf, other]
Title: A Novel Micro-Doppler Coherence Loss for Deep Learning Radar Applications
Comments: Presented at 2021 18th European Radar Conference (EuRAD)
Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deep learning techniques are subject to increasing adoption for a wide range of micro-Doppler applications, where predictions need to be made based on time-frequency signal representations. Most, if not all, of the reported applications focus on translating an existing deep learning framework to this new domain with no adjustment made to the objective function. This practice results in a missed opportunity to encourage the model to prioritize features that are particularly relevant for micro-Doppler applications. Thus the paper introduces a micro-Doppler coherence loss, minimized when the normalized power of micro-Doppler oscillatory components between input and output is matched. The experiments conducted on real data show that the application of the introduced loss results in models more resilient to noise.

[77]  arXiv:2404.15347 (cross-list from eess.SP) [pdf, ps, other]
Title: Advanced Neural Network Architecture for Enhanced Multi-Lead ECG Arrhythmia Detection through Optimized Feature Extraction
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cardiovascular diseases are a pervasive global health concern, contributing significantly to morbidity and mortality rates worldwide. Among these conditions, arrhythmia, characterized by irregular heart rhythms, presents formidable diagnostic challenges. This study introduces an innovative approach utilizing deep learning techniques, specifically Convolutional Neural Networks (CNNs), to address the complexities of arrhythmia classification. Leveraging multi-lead Electrocardiogram (ECG) data, our CNN model, comprising six layers with a residual block, demonstrates promising outcomes in identifying five distinct heartbeat types: Left Bundle Branch Block (LBBB), Right Bundle Branch Block (RBBB), Atrial Premature Contraction (APC), Premature Ventricular Contraction (PVC), and Normal Beat. Through rigorous experimentation, we highlight the transformative potential of our methodology in enhancing diagnostic accuracy for cardiovascular arrhythmias. Arrhythmia diagnosis remains a critical challenge in cardiovascular care, often relying on manual interpretation of ECG signals, which can be time-consuming and prone to subjectivity. To address these limitations, we propose a novel approach that leverages deep learning algorithms to automate arrhythmia classification. By employing advanced CNN architectures and multi-lead ECG data, our methodology offers a robust solution for precise and efficient arrhythmia detection. Through comprehensive evaluation, we demonstrate the effectiveness of our approach in facilitating more accurate clinical decision-making, thereby improving patient outcomes in managing cardiovascular arrhythmias.

[78]  arXiv:2404.15349 (cross-list from eess.SP) [pdf, other]
Title: A Survey on Multimodal Wearable Sensor-based Human Action Recognition
Comments: Multimodal Survey for Wearable Sensor-based Human Action Recognition
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Multimedia (cs.MM)

The combination of increased life expectancy and falling birth rates is resulting in an aging population. Wearable Sensor-based Human Activity Recognition (WSHAR) emerges as a promising assistive technology to support the daily lives of older individuals, unlocking vast potential for human-centric applications. However, recent surveys in WSHAR have been limited, focusing either solely on deep learning approaches or on a single sensor modality. In real life, our human interact with the world in a multi-sensory way, where diverse information sources are intricately processed and interpreted to accomplish a complex and unified sensing system. To give machines similar intelligence, multimodal machine learning, which merges data from various sources, has become a popular research area with recent advancements. In this study, we present a comprehensive survey from a novel perspective on how to leverage multimodal learning to WSHAR domain for newcomers and researchers. We begin by presenting the recent sensor modalities as well as deep learning approaches in HAR. Subsequently, we explore the techniques used in present multimodal systems for WSHAR. This includes inter-multimodal systems which utilize sensor modalities from both visual and non-visual systems and intra-multimodal systems that simply take modalities from non-visual systems. After that, we focus on current multimodal learning approaches that have applied to solve some of the challenges existing in WSHAR. Specifically, we make extra efforts by connecting the existing multimodal literature from other domains, such as computer vision and natural language processing, with current WSHAR area. Finally, we identify the corresponding challenges and potential research direction in current WSHAR area for further improvement.

[79]  arXiv:2404.15350 (cross-list from eess.SP) [pdf, other]
Title: Evaluating Fast Adaptability of Neural Networks for Brain-Computer Interface
Comments: Accepted in IJCNN 2024
Subjects: Signal Processing (eess.SP); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Electroencephalography (EEG) classification is a versatile and portable technique for building non-invasive Brain-computer Interfaces (BCI). However, the classifiers that decode cognitive states from EEG brain data perform poorly when tested on newer domains, such as tasks or individuals absent during model training. Researchers have recently used complex strategies like Model-agnostic meta-learning (MAML) for domain adaptation. Nevertheless, there is a need for an evaluation strategy to evaluate the fast adaptability of the models, as this characteristic is essential for real-life BCI applications for quick calibration. We used motor movement and imaginary signals as input to Convolutional Neural Networks (CNN) based classifier for the experiments. Datasets with EEG signals typically have fewer examples and higher time resolution. Even though batch-normalization is preferred for Convolutional Neural Networks (CNN), we empirically show that layer-normalization can improve the adaptability of CNN-based EEG classifiers with not more than ten fine-tuning steps. In summary, the present work (i) proposes a simple strategy to evaluate fast adaptability, and (ii) empirically demonstrate fast adaptability across individuals as well as across tasks with simple transfer learning as compared to MAML approach.

[80]  arXiv:2404.15351 (cross-list from eess.SP) [pdf, other]
Title: Integrating Physiological Data with Large Language Models for Empathic Human-AI Interaction
Subjects: Signal Processing (eess.SP); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

This paper explores enhancing empathy in Large Language Models (LLMs) by integrating them with physiological data. We propose a physiological computing approach that includes developing deep learning models that use physiological data for recognizing psychological states and integrating the predicted states with LLMs for empathic interaction. We showcase the application of this approach in an Empathic LLM (EmLLM) chatbot for stress monitoring and control. We also discuss the results of a pilot study that evaluates this EmLLM chatbot based on its ability to accurately predict user stress, provide human-like responses, and assess the therapeutic alliance with the user.

[81]  arXiv:2404.15352 (cross-list from eess.SP) [pdf, other]
Title: TransfoRhythm: A Transformer Architecture Conductive to Blood Pressure Estimation via Solo PPG Signal Capturing
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Recent statistics indicate that approximately 1.3 billion individuals worldwide suffer from hypertension, a leading cause of premature death globally. Blood pressure (BP) serves as a critical health indicator for accurate and timely diagnosis and/or treatment of hypertension. Driven by recent advancements in Artificial Intelligence (AI) and Deep Neural Networks (DNNs), there has been a surge of interest in developing data-driven and cuff-less BP estimation solutions. In this context, current literature predominantly focuses on coupling Electrocardiography (ECG) and Photoplethysmography (PPG) sensors, though this approach is constrained by reliance on multiple sensor types. An alternative, utilizing standalone PPG signals, presents challenges due to the absence of auxiliary sensors (ECG), requiring the use of morphological features while addressing motion artifacts and high-frequency noise. To address these issues, the paper introduces the TransfoRhythm framework, a Transformer-based DNN architecture built upon the recently released physiological database, MIMIC-IV. Leveraging Multi-Head Attention (MHA) mechanism, TransfoRhythm identifies dependencies and similarities across data segments, forming a robust framework for cuff-less BP estimation solely using PPG signals. To our knowledge, this paper represents the first study to apply the MIMIC IV dataset for cuff-less BP estimation, and TransfoRhythm is the first MHA-based model trained via MIMIC IV for BP prediction. Performance evaluation through comprehensive experiments demonstrates TransfoRhythm's superiority over its state-of-the-art counterparts. Specifically, TransfoRhythm achieves highly accurate results with Root Mean Square Error (RMSE) of [1.84, 1.42] and Mean Absolute Error (MAE) of [1.50, 1.17] for systolic and diastolic blood pressures, respectively.

[82]  arXiv:2404.15353 (cross-list from eess.SP) [pdf, other]
Title: SQUWA: Signal Quality Aware DNN Architecture for Enhanced Accuracy in Atrial Fibrillation Detection from Noisy PPG Signals
Comments: 15 pages; 9 figures; 2024 Conference on Health, Inference, and Learning (CHIL)
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambulatory settings. Conventional approaches typically discard corrupted segments or attempt to reconstruct original signals, allowing for the use of standard machine learning techniques. However, this reduces dataset size and introduces biases, compromising prediction accuracy and the effectiveness of continuous monitoring. We propose a novel deep learning model, Signal Quality Weighted Fusion of Attentional Convolution and Recurrent Neural Network (SQUWA), designed to learn how to retain accurate predictions from partially corrupted PPG. Specifically, SQUWA innovatively integrates an attention mechanism that directly considers signal quality during the learning process, dynamically adjusting the weights of time series segments based on their quality. This approach enhances the influence of higher-quality segments while reducing that of lower-quality ones, effectively utilizing partially corrupted segments. This approach represents a departure from the conventional methods that exclude such segments, enabling the utilization of a broader range of data, which has great implications for less disruption when monitoring of AF risks and more accurate estimation of AF burdens. Our extensive experiments show that SQUWA outperform existing PPG-based models, achieving the highest AUCPR of 0.89 with label noise mitigation. This also exceeds the 0.86 AUCPR of models trained with using both electrocardiogram (ECG) and PPG data.

[83]  arXiv:2404.15354 (cross-list from eess.SP) [pdf, other]
Title: Elevating Spectral GNNs through Enhanced Band-pass Filter Approximation
Comments: Preprint
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Numerical Analysis (math.NA)

Spectral Graph Neural Networks (GNNs) have attracted great attention due to their capacity to capture patterns in the frequency domains with essential graph filters. Polynomial-based ones (namely poly-GNNs), which approximately construct graph filters with conventional or rational polynomials, are routinely adopted in practice for their substantial performances on graph learning tasks. However, previous poly-GNNs aim at achieving overall lower approximation error on different types of filters, e.g., low-pass and high-pass, but ignore a key question: \textit{which type of filter warrants greater attention for poly-GNNs?} In this paper, we first show that poly-GNN with a better approximation for band-pass graph filters performs better on graph learning tasks. This insight further sheds light on critical issues of existing poly-GNNs, i.e., those poly-GNNs achieve trivial performance in approximating band-pass graph filters, hindering the great potential of poly-GNNs. To tackle the issues, we propose a novel poly-GNN named TrigoNet. TrigoNet constructs different graph filters with novel trigonometric polynomial, and achieves leading performance in approximating band-pass graph filters against other polynomials. By applying Taylor expansion and deserting nonlinearity, TrigoNet achieves noticeable efficiency among baselines. Extensive experiments show the advantages of TrigoNet in both accuracy performances and efficiency.

[84]  arXiv:2404.15360 (cross-list from eess.SP) [pdf, other]
Title: Towards Robust and Interpretable EMG-based Hand Gesture Recognition using Deep Metric Meta Learning
Comments: 11 pages, 9 figures, submitted to IEEE Transactions on Neural Networks and Learning Systems
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Systems and Control (eess.SY)

Current electromyography (EMG) pattern recognition (PR) models have been shown to generalize poorly in unconstrained environments, setting back their adoption in applications such as hand gesture control. This problem is often due to limited training data, exacerbated by the use of supervised classification frameworks that are known to be suboptimal in such settings. In this work, we propose a shift to deep metric-based meta-learning in EMG PR to supervise the creation of meaningful and interpretable representations. We use a Siamese Deep Convolutional Neural Network (SDCNN) and contrastive triplet loss to learn an EMG feature embedding space that captures the distribution of the different classes. A nearest-centroid approach is subsequently employed for inference, relying on how closely a test sample aligns with the established data distributions. We derive a robust class proximity-based confidence estimator that leads to a better rejection of incorrect decisions, i.e. false positives, especially when operating beyond the training data domain. We show our approach's efficacy by testing the trained SDCNN's predictions and confidence estimations on unseen data, both in and out of the training domain. The evaluation metrics include the accuracy-rejection curve and the Kullback-Leibler divergence between the confidence distributions of accurate and inaccurate predictions. Outperforming comparable models on both metrics, our results demonstrate that the proposed meta-learning approach improves the classifier's precision in active decisions (after rejection), thus leading to better generalization and applicability.

[85]  arXiv:2404.15364 (cross-list from eess.SP) [pdf, other]
Title: MP-DPD: Low-Complexity Mixed-Precision Neural Networks for Energy-Efficient Digital Predistortion of Wideband Power Amplifiers
Comments: Accepted to IEEE Microwave and Wireless Technology Letters (MWTL)
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Digital Pre-Distortion (DPD) enhances signal quality in wideband RF power amplifiers (PAs). As signal bandwidths expand in modern radio systems, DPD's energy consumption increasingly impacts overall system efficiency. Deep Neural Networks (DNNs) offer promising advancements in DPD, yet their high complexity hinders their practical deployment. This paper introduces open-source mixed-precision (MP) neural networks that employ quantized low-precision fixed-point parameters for energy-efficient DPD. This approach reduces computational complexity and memory footprint, thereby lowering power consumption without compromising linearization efficacy. Applied to a 160MHz-BW 1024-QAM OFDM signal from a digital RF PA, MP-DPD gives no performance loss against 32-bit floating-point precision DPDs, while achieving -43.75 (L)/-45.27 (R) dBc in Adjacent Channel Power Ratio (ACPR) and -38.72 dB in Error Vector Magnitude (EVM). A 16-bit fixed-point-precision MP-DPD enables a 2.8X reduction in estimated inference power. The PyTorch learning and testing code is publicly available at \url{https://github.com/lab-emi/OpenDPD}.

[86]  arXiv:2404.15366 (cross-list from eess.SP) [pdf, other]
Title: A Weight-aware-based Multi-source Unsupervised Domain Adaptation Method for Human Motion Intention Recognition
Comments: 13 pages, 5 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Accurate recognition of human motion intention (HMI) is beneficial for exoskeleton robots to improve the wearing comfort level and achieve natural human-robot interaction. A classifier trained on labeled source subjects (domains) performs poorly on unlabeled target subject since the difference in individual motor characteristics. The unsupervised domain adaptation (UDA) method has become an effective way to this problem. However, the labeled data are collected from multiple source subjects that might be different not only from the target subject but also from each other. The current UDA methods for HMI recognition ignore the difference between each source subject, which reduces the classification accuracy. Therefore, this paper considers the differences between source subjects and develops a novel theory and algorithm for UDA to recognize HMI, where the margin disparity discrepancy (MDD) is extended to multi-source UDA theory and a novel weight-aware-based multi-source UDA algorithm (WMDD) is proposed. The source domain weight, which can be adjusted adaptively by the MDD between each source subject and target subject, is incorporated into UDA to measure the differences between source subjects. The developed multi-source UDA theory is theoretical and the generalization error on target subject is guaranteed. The theory can be transformed into an optimization problem for UDA, successfully bridging the gap between theory and algorithm. Moreover, a lightweight network is employed to guarantee the real-time of classification and the adversarial learning between feature generator and ensemble classifiers is utilized to further improve the generalization ability. The extensive experiments verify theoretical analysis and show that WMDD outperforms previous UDA methods on HMI recognition tasks.

[87]  arXiv:2404.15367 (cross-list from eess.SP) [pdf, other]
Title: Leveraging Visibility Graphs for Enhanced Arrhythmia Classification with Graph Convolutional Networks
Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Arrhythmias, detectable via electrocardiograms (ECGs), pose significant health risks, emphasizing the need for robust automated identification techniques. Although traditional deep learning methods have shown potential, recent advances in graph-based strategies are aimed at enhancing arrhythmia detection performance. However, effectively representing ECG signals as graphs remains a challenge. This study explores graph representations of ECG signals using Visibility Graph (VG) and Vector Visibility Graph (VVG), coupled with Graph Convolutional Networks (GCNs) for arrhythmia classification. Through experiments on the MIT-BIH dataset, we investigated various GCN architectures and preprocessing parameters. The results reveal that GCNs, when integrated with VG and VVG for signal graph mapping, can classify arrhythmias without the need for preprocessing or noise removal from ECG signals. While both VG and VVG methods show promise, VG is notably more efficient. The proposed approach was competitive compared to baseline methods, although classifying the S class remains challenging, especially under the inter-patient paradigm. Computational complexity, particularly with the VVG method, required data balancing and sophisticated implementation strategies. The source code is publicly available for further research and development at https://github.com/raffoliveira/VG_for_arrhythmia_classification_with_GCN.

[88]  arXiv:2404.15368 (cross-list from eess.SP) [pdf, other]
Title: Unmasking the Role of Remote Sensors in Comfort, Energy and Demand Response
Comments: 13 Figures, 8 Tables, 25 Pages. Submitted to Data-Centric Engineering Journal and it is under review
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Systems and Control (eess.SY); Applications (stat.AP)

In single-zone multi-room houses (SZMRHs), temperature controls rely on a single probe near the thermostat, resulting in temperature discrepancies that cause thermal discomfort and energy waste. Augmenting smart thermostats (STs) with per-room sensors has gained acceptance by major ST manufacturers. This paper leverages additional sensory information to empirically characterize the services provided by buildings, including thermal comfort, energy efficiency, and demand response (DR). Utilizing room-level time-series data from 1,000 houses, metadata from 110,000 houses across the United States, and data from two real-world testbeds, we examine the limitations of SZMRHs and explore the potential of remote sensors. We discovered that comfortable DR durations (CDRDs) for rooms are typically 70% longer or 40% shorter than for the room with the thermostat. When averaging, rooms at the control temperature's bounds are typically deviated around -3{\deg}F to 2.5{\deg}F from the average. Moreover, in 95\% of houses, we identified rooms experiencing notably higher solar gains compared to the rest of the rooms, while 85% and 70% of houses demonstrated lower heat input and poor insulation, respectively. Lastly, it became evident that the consumption of cooling energy escalates with the increase in the number of sensors, whereas heating usage experiences fluctuations ranging from -19% to +25% This study serves as a benchmark for assessing the thermal comfort and DR services in the existing housing stock, while also highlighting the energy efficiency impacts of sensing technologies. Our approach sets the stage for more granular, precise control strategies of SZMRHs.

[89]  arXiv:2404.15370 (cross-list from eess.SP) [pdf, other]
Title: Self-Supervised Learning for User Localization
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)

Machine learning techniques have shown remarkable accuracy in localization tasks, but their dependency on vast amounts of labeled data, particularly Channel State Information (CSI) and corresponding coordinates, remains a bottleneck. Self-supervised learning techniques alleviate the need for labeled data, a potential that remains largely untapped and underexplored in existing research. Addressing this gap, we propose a pioneering approach that leverages self-supervised pretraining on unlabeled data to boost the performance of supervised learning for user localization based on CSI. We introduce two pretraining Auto Encoder (AE) models employing Multi Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) to glean representations from unlabeled data via self-supervised learning. Following this, we utilize the encoder portion of the AE models to extract relevant features from labeled data, and finetune an MLP-based Position Estimation Model to accurately deduce user locations. Our experimentation on the CTW-2020 dataset, which features a substantial volume of unlabeled data but limited labeled samples, demonstrates the viability of our approach. Notably, the dataset covers a vast area spanning over 646x943x41 meters, and our approach demonstrates promising results even for such expansive localization tasks.

[90]  arXiv:2404.15373 (cross-list from eess.SP) [pdf, other]
Title: Robust EEG-based Emotion Recognition Using an Inception and Two-sided Perturbation Model
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Automated emotion recognition using electroencephalogram (EEG) signals has gained substantial attention. Although deep learning approaches exhibit strong performance, they often suffer from vulnerabilities to various perturbations, like environmental noise and adversarial attacks. In this paper, we propose an Inception feature generator and two-sided perturbation (INC-TSP) approach to enhance emotion recognition in brain-computer interfaces. INC-TSP integrates the Inception module for EEG data analysis and employs two-sided perturbation (TSP) as a defensive mechanism against input perturbations. TSP introduces worst-case perturbations to the model's weights and inputs, reinforcing the model's elasticity against adversarial attacks. The proposed approach addresses the challenge of maintaining accurate emotion recognition in the presence of input uncertainties. We validate INC-TSP in a subject-independent three-class emotion recognition scenario, demonstrating robust performance.

[91]  arXiv:2404.15374 (cross-list from eess.SP) [pdf, other]
Title: Minimum Description Feature Selection for Complexity Reduction in Machine Learning-based Wireless Positioning
Comments: This paper has been accepted for the publication in IEEE Journal on Selected Areas in Communications. arXiv admin note: text overlap with arXiv:2402.09580
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Recently, deep learning approaches have provided solutions to difficult problems in wireless positioning (WP). Although these WP algorithms have attained excellent and consistent performance against complex channel environments, the computational complexity coming from processing high-dimensional features can be prohibitive for mobile applications. In this work, we design a novel positioning neural network (P-NN) that utilizes the minimum description features to substantially reduce the complexity of deep learning-based WP. P-NN's feature selection strategy is based on maximum power measurements and their temporal locations to convey information needed to conduct WP. We improve P-NN's learning ability by intelligently processing two different types of inputs: sparse image and measurement matrices. Specifically, we implement a self-attention layer to reinforce the training ability of our network. We also develop a technique to adapt feature space size, optimizing over the expected information gain and the classification capability quantified with information-theoretic measures on signal bin selection. Numerical results show that P-NN achieves a significant advantage in performance-complexity tradeoff over deep learning baselines that leverage the full power delay profile (PDP). In particular, we find that P-NN achieves a large improvement in performance for low SNR, as unnecessary measurements are discarded in our minimum description features.

[92]  arXiv:2404.15377 (cross-list from quant-ph) [pdf, other]
Title: Fourier Series Guided Design of Quantum Convolutional Neural Networks for Enhanced Time Series Forecasting
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)

In this study, we apply 1D quantum convolution to address the task of time series forecasting. By encoding multiple points into the quantum circuit to predict subsequent data, each point becomes a feature, transforming the problem into a multidimensional one. Building on theoretical foundations from prior research, which demonstrated that Variational Quantum Circuits (VQCs) can be expressed as multidimensional Fourier series, we explore the capabilities of different architectures and ansatz. This analysis considers the concepts of circuit expressibility and the presence of barren plateaus. Analyzing the problem within the framework of the Fourier series enabled the design of an architecture that incorporates data reuploading, resulting in enhanced performance. Rather than a strict requirement for the number of free parameters to exceed the degrees of freedom of the Fourier series, our findings suggest that even a limited number of parameters can produce Fourier functions of higher degrees. This highlights the remarkable expressive power of quantum circuits. This observation is also significant in reducing training times. The ansatz with greater expressibility and number of non-zero Fourier coefficients consistently delivers favorable results across different scenarios, with performance metrics improving as the number of qubits increases.

[93]  arXiv:2404.15378 (cross-list from cs.CV) [pdf, other]
Title: Hierarchical Hybrid Sliced Wasserstein: A Scalable Metric for Heterogeneous Joint Distributions
Authors: Khai Nguyen, Nhat Ho
Comments: 24 pages, 11 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Machine Learning (stat.ML)

Sliced Wasserstein (SW) and Generalized Sliced Wasserstein (GSW) have been widely used in applications due to their computational and statistical scalability. However, the SW and the GSW are only defined between distributions supported on a homogeneous domain. This limitation prevents their usage in applications with heterogeneous joint distributions with marginal distributions supported on multiple different domains. Using SW and GSW directly on the joint domains cannot make a meaningful comparison since their homogeneous slicing operator i.e., Radon Transform (RT) and Generalized Radon Transform (GRT) are not expressive enough to capture the structure of the joint supports set. To address the issue, we propose two new slicing operators i.e., Partial Generalized Radon Transform (PGRT) and Hierarchical Hybrid Radon Transform (HHRT). In greater detail, PGRT is the generalization of Partial Radon Transform (PRT), which transforms a subset of function arguments non-linearly while HHRT is the composition of PRT and multiple domain-specific PGRT on marginal domain arguments. By using HHRT, we extend the SW into Hierarchical Hybrid Sliced Wasserstein (H2SW) distance which is designed specifically for comparing heterogeneous joint distributions. We then discuss the topological, statistical, and computational properties of H2SW. Finally, we demonstrate the favorable performance of H2SW in 3D mesh deformation, deep 3D mesh autoencoders, and datasets comparison.

[94]  arXiv:2404.15387 (cross-list from q-bio.QM) [pdf, other]
Title: Machine Learning Applied to the Detection of Mycotoxin in Food: A Review
Comments: 39 pages, 8 figures, review paper
Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG)

Mycotoxins, toxic secondary metabolites produced by certain fungi, pose significant threats to global food safety and public health. These compounds can contaminate a variety of crops, leading to economic losses and health risks to both humans and animals. Traditional lab analysis methods for mycotoxin detection can be time-consuming and may not always be suitable for large-scale screenings. However, in recent years, machine learning (ML) methods have gained popularity for use in the detection of mycotoxins and in the food safety industry in general, due to their accurate and timely predictions. We provide a systematic review on some of the recent ML applications for detecting/predicting the presence of mycotoxin on a variety of food ingredients, highlighting their advantages, challenges, and potential for future advancements. We address the need for reproducibility and transparency in ML research through open access to data and code. An observation from our findings is the frequent lack of detailed reporting on hyperparameters in many studies as well as a lack of open source code, which raises concerns about the reproducibility and optimisation of the ML models used. The findings reveal that while the majority of studies predominantly utilised neural networks for mycotoxin detection, there was a notable diversity in the types of neural network architectures employed, with convolutional neural networks being the most popular.

[95]  arXiv:2404.15410 (cross-list from cs.RO) [pdf, ps, other]
Title: Planning the path with Reinforcement Learning: Optimal Robot Motion Planning in RoboCup Small Size League Environments
Comments: 12 pages, 3 figures, 3 tables
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

This work investigates the potential of Reinforcement Learning (RL) to tackle robot motion planning challenges in the dynamic RoboCup Small Size League (SSL). Using a heuristic control approach, we evaluate RL's effectiveness in obstacle-free and single-obstacle path-planning environments. Ablation studies reveal significant performance improvements. Our method achieved a 60% time gain in obstacle-free environments compared to baseline algorithms. Additionally, our findings demonstrated dynamic obstacle avoidance capabilities, adeptly navigating around moving blocks. These findings highlight the potential of RL to enhance robot motion planning in the challenging and unpredictable SSL environment.

[96]  arXiv:2404.15419 (cross-list from physics.ao-ph) [pdf, ps, other]
Title: Using Deep Learning to Identify Initial Error Sensitivity of ENSO Forecasts
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG)

We introduce a hybrid method that integrates deep learning with model-analog forecasting, a straightforward yet effective approach that generates forecasts from similar initial climate states in a repository of model simulations. This hybrid framework employs a convolutional neural network to estimate state-dependent weights to identify analog states. The advantage of our method lies in its physical interpretability, offering insights into initial-error-sensitive regions through estimated weights and the ability to trace the physically-based temporal evolution of the system through analog forecasting. We evaluate our approach using the Community Earth System Model Version 2 Large Ensemble to forecast the El Ni\~no-Southern Oscillation (ENSO) on a seasonal-to-annual time scale. Results show a 10% improvement in forecasting sea surface temperature anomalies over the equatorial Pacific at 9-12 months leads compared to the traditional model-analog technique. Furthermore, our hybrid model demonstrates improvements in boreal winter and spring initialization when evaluated against a reanalysis dataset. Our deep learning-based approach reveals state-dependent sensitivity linked to various seasonally varying physical processes, including the Pacific Meridional Modes, equatorial recharge oscillator, and stochastic wind forcing. Notably, disparities emerge in the sensitivity associated with El Ni\~no and La Ni\~na events. We find that sea surface temperature over the tropical Pacific plays a more crucial role in El Ni\~no forecasting, while zonal wind stress over the same region exhibits greater significance in La Ni\~na prediction. This approach has broad implications for forecasting diverse climate phenomena, including regional temperature and precipitation, which are challenging for the traditional model-analog forecasting method.

[97]  arXiv:2404.15458 (cross-list from physics.optics) [pdf, other]
Title: Can Large Language Models Learn the Physics of Metamaterials? An Empirical Study with ChatGPT
Subjects: Optics (physics.optics); Machine Learning (cs.LG)

Large language models (LLMs) such as ChatGPT, Gemini, LlaMa, and Claude are trained on massive quantities of text parsed from the internet and have shown a remarkable ability to respond to complex prompts in a manner often indistinguishable from humans. We present a LLM fine-tuned on up to 40,000 data that can predict electromagnetic spectra over a range of frequencies given a text prompt that only specifies the metasurface geometry. Results are compared to conventional machine learning approaches including feed-forward neural networks, random forest, linear regression, and K-nearest neighbor (KNN). Remarkably, the fine-tuned LLM (FT-LLM) achieves a lower error across all dataset sizes explored compared to all machine learning approaches including a deep neural network. We also demonstrate the LLM's ability to solve inverse problems by providing the geometry necessary to achieve a desired spectrum. LLMs possess some advantages over humans that may give them benefits for research, including the ability to process enormous amounts of data, find hidden patterns in data, and operate in higher-dimensional spaces. We propose that fine-tuning LLMs on large datasets specific to a field allows them to grasp the nuances of that domain, making them valuable tools for research and analysis.

[98]  arXiv:2404.15466 (cross-list from stat.ML) [pdf, other]
Title: Private Optimal Inventory Policy Learning for Feature-based Newsvendor with Unknown Demand
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The data-driven newsvendor problem with features has recently emerged as a significant area of research, driven by the proliferation of data across various sectors such as retail, supply chains, e-commerce, and healthcare. Given the sensitive nature of customer or organizational data often used in feature-based analysis, it is crucial to ensure individual privacy to uphold trust and confidence. Despite its importance, privacy preservation in the context of inventory planning remains unexplored. A key challenge is the nonsmoothness of the newsvendor loss function, which sets it apart from existing work on privacy-preserving algorithms in other settings. This paper introduces a novel approach to estimate a privacy-preserving optimal inventory policy within the f-differential privacy framework, an extension of the classical $(\epsilon, \delta)$-differential privacy with several appealing properties. We develop a clipped noisy gradient descent algorithm based on convolution smoothing for optimal inventory estimation to simultaneously address three main challenges: (1) unknown demand distribution and nonsmooth loss function; (2) provable privacy guarantees for individual-level data; and (3) desirable statistical precision. We derive finite-sample high-probability bounds for optimal policy parameter estimation and regret analysis. By leveraging the structure of the newsvendor problem, we attain a faster excess population risk bound compared to that obtained from an indiscriminate application of existing results for general nonsmooth convex loss. Our bound aligns with that for strongly convex and smooth loss function. Our numerical experiments demonstrate that the proposed new method can achieve desirable privacy protection with a marginal increase in cost.

[99]  arXiv:2404.15497 (cross-list from cond-mat.soft) [pdf, other]
Title: Deep-learning Optical Flow Outperforms PIV in Obtaining Velocity Fields from Active Nematics
Subjects: Soft Condensed Matter (cond-mat.soft); Machine Learning (cs.LG)

Deep learning-based optical flow (DLOF) extracts features in adjacent video frames with deep convolutional neural networks. It uses those features to estimate the inter-frame motions of objects at the pixel level. In this article, we evaluate the ability of optical flow to quantify the spontaneous flows of MT-based active nematics under different labeling conditions. We compare DLOF against the commonly used technique, particle imaging velocimetry (PIV). We obtain flow velocity ground truths either by performing semi-automated particle tracking on samples with sparsely labeled filaments, or from passive tracer beads. We find that DLOF produces significantly more accurate velocity fields than PIV for densely labeled samples. We show that the breakdown of PIV arises because the algorithm cannot reliably distinguish contrast variations at high densities, particularly in directions parallel to the nematic director. DLOF overcomes this limitation. For sparsely labeled samples, DLOF and PIV produce results with similar accuracy, but DLOF gives higher-resolution fields. Our work establishes DLOF as a versatile tool for measuring fluid flows in a broad class of active, soft, and biophysical systems.

[100]  arXiv:2404.15500 (cross-list from cs.AI) [pdf, other]
Title: GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots
Comments: Earthvision 2024, CVPR Workshop
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Geospatial Copilots unlock unprecedented potential for performing Earth Observation (EO) applications through natural language instructions. However, existing agents rely on overly simplified single tasks and template-based prompts, creating a disconnect with real-world scenarios. In this work, we present GeoLLM-Engine, an environment for tool-augmented agents with intricate tasks routinely executed by analysts on remote sensing platforms. We enrich our environment with geospatial API tools, dynamic maps/UIs, and external multimodal knowledge bases to properly gauge an agent's proficiency in interpreting realistic high-level natural language commands and its functional correctness in task completions. By alleviating overheads typically associated with human-in-the-loop benchmark curation, we harness our massively parallel engine across 100 GPT-4-Turbo nodes, scaling to over half a million diverse multi-tool tasks and across 1.1 million satellite images. By moving beyond traditional single-task image-caption paradigms, we investigate state-of-the-art agents and prompting techniques against long-horizon prompts.

[101]  arXiv:2404.15510 (cross-list from cs.AR) [pdf, other]
Title: NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator
Comments: Visit this https URL for WebGUI based simulations
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Graph Neural Networks (GNNs) are emerging as a formidable tool for processing non-euclidean data across various domains, ranging from social network analysis to bioinformatics. Despite their effectiveness, their adoption has not been pervasive because of scalability challenges associated with large-scale graph datasets, particularly when leveraging message passing.
To tackle these challenges, we introduce NeuraChip, a novel GNN spatial accelerator based on Gustavson's algorithm. NeuraChip decouples the multiplication and addition computations in sparse matrix multiplication. This separation allows for independent exploitation of their unique data dependencies, facilitating efficient resource allocation. We introduce a rolling eviction strategy to mitigate data idling in on-chip memory as well as address the prevalent issue of memory bloat in sparse graph computations. Furthermore, the compute resource load balancing is achieved through a dynamic reseeding hash-based mapping, ensuring uniform utilization of computing resources agnostic of sparsity patterns. Finally, we present NeuraSim, an open-source, cycle-accurate, multi-threaded, modular simulator for comprehensive performance analysis.
Overall, NeuraChip presents a significant improvement, yielding an average speedup of 22.1x over Intel's MKL, 17.1x over NVIDIA's cuSPARSE, 16.7x over AMD's hipSPARSE, and 1.5x over prior state-of-the-art SpGEMM accelerator and 1.3x over GNN accelerator. The source code for our open-sourced simulator and performance visualizer is publicly accessible on GitHub https://neurachip.us

[102]  arXiv:2404.15538 (cross-list from cs.GR) [pdf, other]
Title: DreamCraft: Text-Guided Generation of Functional 3D Environments in Minecraft
Comments: 16 pages, 9 figures, accepted to Foundation of Digital Games 2024
Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Procedural Content Generation (PCG) algorithms enable the automatic generation of complex and diverse artifacts. However, they don't provide high-level control over the generated content and typically require domain expertise. In contrast, text-to-3D methods allow users to specify desired characteristics in natural language, offering a high amount of flexibility and expressivity. But unlike PCG, such approaches cannot guarantee functionality, which is crucial for certain applications like game design. In this paper, we present a method for generating functional 3D artifacts from free-form text prompts in the open-world game Minecraft. Our method, DreamCraft, trains quantized Neural Radiance Fields (NeRFs) to represent artifacts that, when viewed in-game, match given text descriptions. We find that DreamCraft produces more aligned in-game artifacts than a baseline that post-processes the output of an unconstrained NeRF. Thanks to the quantized representation of the environment, functional constraints can be integrated using specialized loss terms. We show how this can be leveraged to generate 3D structures that match a target distribution or obey certain adjacency rules over the block types. DreamCraft inherits a high degree of expressivity and controllability from the NeRF, while still being able to incorporate functional constraints through domain-specific objectives.

[103]  arXiv:2404.15552 (cross-list from cs.CV) [pdf, other]
Title: Cross-Temporal Spectrogram Autoencoder (CTSAE): Unsupervised Dimensionality Reduction for Clustering Gravitational Wave Glitches
Subjects: Computer Vision and Pattern Recognition (cs.CV); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); General Relativity and Quantum Cosmology (gr-qc)

The advancement of The Laser Interferometer Gravitational-Wave Observatory (LIGO) has significantly enhanced the feasibility and reliability of gravitational wave detection. However, LIGO's high sensitivity makes it susceptible to transient noises known as glitches, which necessitate effective differentiation from real gravitational wave signals. Traditional approaches predominantly employ fully supervised or semi-supervised algorithms for the task of glitch classification and clustering. In the future task of identifying and classifying glitches across main and auxiliary channels, it is impractical to build a dataset with manually labeled ground-truth. In addition, the patterns of glitches can vary with time, generating new glitches without manual labels. In response to this challenge, we introduce the Cross-Temporal Spectrogram Autoencoder (CTSAE), a pioneering unsupervised method for the dimensionality reduction and clustering of gravitational wave glitches. CTSAE integrates a novel four-branch autoencoder with a hybrid of Convolutional Neural Networks (CNN) and Vision Transformers (ViT). To further extract features across multi-branches, we introduce a novel multi-branch fusion method using the CLS (Class) token. Our model, trained and evaluated on the GravitySpy O3 dataset on the main channel, demonstrates superior performance in clustering tasks when compared to state-of-the-art semi-supervised learning methods. To the best of our knowledge, CTSAE represents the first unsupervised approach tailored specifically for clustering LIGO data, marking a significant step forward in the field of gravitational wave research. The code of this paper is available at https://github.com/Zod-L/CTSAE

[104]  arXiv:2404.15564 (cross-list from cs.CV) [pdf, other]
Title: Guided AbsoluteGrad: Magnitude of Gradients Matters to Explanation's Localization and Saliency
Authors: Jun Huang, Yan Liu
Comments: CAI2024 Camera-ready Submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

This paper proposes a new gradient-based XAI method called Guided AbsoluteGrad for saliency map explanations. We utilize both positive and negative gradient magnitudes and employ gradient variance to distinguish the important areas for noise deduction. We also introduce a novel evaluation metric named ReCover And Predict (RCAP), which considers the Localization and Visual Noise Level objectives of the explanations. We propose two propositions for these two objectives and prove the necessity of evaluating them. We evaluate Guided AbsoluteGrad with seven gradient-based XAI methods using the RCAP metric and other SOTA metrics in three case studies: (1) ImageNet dataset with ResNet50 model; (2) International Skin Imaging Collaboration (ISIC) dataset with EfficientNet model; (3) the Places365 dataset with DenseNet161 model. Our method surpasses other gradient-based approaches, showcasing the quality of enhanced saliency map explanations through gradient magnitude.

[105]  arXiv:2404.15592 (cross-list from cs.CV) [pdf, other]
Title: ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)

Existing datasets for attribute value extraction (AVE) predominantly focus on explicit attribute values while neglecting the implicit ones, lack product images, are often not publicly available, and lack an in-depth human inspection across diverse domains. To address these limitations, we present ImplicitAVE, the first, publicly available multimodal dataset for implicit attribute value extraction. ImplicitAVE, sourced from the MAVE dataset, is carefully curated and expanded to include implicit AVE and multimodality, resulting in a refined dataset of 68k training and 1.6k testing data across five domains. We also explore the application of multimodal large language models (MLLMs) to implicit AVE, establishing a comprehensive benchmark for MLLMs on the ImplicitAVE dataset. Six recent MLLMs with eleven variants are evaluated across diverse settings, revealing that implicit value extraction remains a challenging task for MLLMs. The contributions of this work include the development and release of ImplicitAVE, and the exploration and benchmarking of various MLLMs for implicit AVE, providing valuable insights and potential future research directions. Dataset and code are available at https://github.com/HenryPengZou/ImplicitAVE

[106]  arXiv:2404.15597 (cross-list from cs.NE) [pdf, other]
Title: GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms, the simulation results of multiple time steps can only correspond to a single-step decision in RL. This is quite different from the real temporal dynamics in the brain and also fails to fully exploit the capacity of SNNs to process temporal data. In order to address this temporal mismatch issue and further take advantage of the inherent temporal dynamics of spiking neurons, we propose a novel temporal alignment paradigm (TAP) that leverages the single-step update of spiking neurons to accumulate historical state information in RL and introduces gated units to enhance the memory capacity of spiking neurons. Experimental results show that our method can solve partially observable Markov decision processes (POMDPs) and multi-agent cooperation problems with similar performance as recurrent neural networks (RNNs) but with about 50% power consumption.

[107]  arXiv:2404.15615 (cross-list from cs.HC) [pdf, other]
Title: MDDD: Manifold-based Domain Adaptation with Dynamic Distribution for Non-Deep Transfer Learning in Cross-subject and Cross-session EEG-based Emotion Recognition
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Emotion decoding using Electroencephalography (EEG)-based affective brain-computer interfaces represents a significant area within the field of affective computing. In the present study, we propose a novel non-deep transfer learning method, termed as Manifold-based Domain adaptation with Dynamic Distribution (MDDD). The proposed MDDD includes four main modules: manifold feature transformation, dynamic distribution alignment, classifier learning, and ensemble learning. The data undergoes a transformation onto an optimal Grassmann manifold space, enabling dynamic alignment of the source and target domains. This process prioritizes both marginal and conditional distributions according to their significance, ensuring enhanced adaptation efficiency across various types of data. In the classifier learning, the principle of structural risk minimization is integrated to develop robust classification models. This is complemented by dynamic distribution alignment, which refines the classifier iteratively. Additionally, the ensemble learning module aggregates the classifiers obtained at different stages of the optimization process, which leverages the diversity of the classifiers to enhance the overall prediction accuracy. The experimental results indicate that MDDD outperforms traditional non-deep learning methods, achieving an average improvement of 3.54%, and is comparable to deep learning methods. This suggests that MDDD could be a promising method for enhancing the utility and applicability of aBCIs in real-world scenarios.

[108]  arXiv:2404.15618 (cross-list from stat.ML) [pdf, other]
Title: Neural Operator induced Gaussian Process framework for probabilistic solution of parametric partial differential equations
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The study of neural operators has paved the way for the development of efficient approaches for solving partial differential equations (PDEs) compared with traditional methods. However, most of the existing neural operators lack the capability to provide uncertainty measures for their predictions, a crucial aspect, especially in data-driven scenarios with limited available data. In this work, we propose a novel Neural Operator-induced Gaussian Process (NOGaP), which exploits the probabilistic characteristics of Gaussian Processes (GPs) while leveraging the learning prowess of operator learning. The proposed framework leads to improved prediction accuracy and offers a quantifiable measure of uncertainty. The proposed framework is extensively evaluated through experiments on various PDE examples, including Burger's equation, Darcy flow, non-homogeneous Poisson, and wave-advection equations. Furthermore, a comparative study with state-of-the-art operator learning algorithms is presented to highlight the advantages of NOGaP. The results demonstrate superior accuracy and expected uncertainty characteristics, suggesting the promising potential of the proposed framework.

[109]  arXiv:2404.15621 (cross-list from cs.ET) [pdf, ps, other]
Title: Layer Ensemble Averaging for Improving Memristor-Based Artificial Neural Network Performance
Subjects: Emerging Technologies (cs.ET); Hardware Architecture (cs.AR); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Artificial neural networks have advanced due to scaling dimensions, but conventional computing faces inefficiency due to the von Neumann bottleneck. In-memory computation architectures, like memristors, offer promise but face challenges due to hardware non-idealities. This work proposes and experimentally demonstrates layer ensemble averaging, a technique to map pre-trained neural network solutions from software to defective hardware crossbars of emerging memory devices and reliably attain near-software performance on inference. The approach is investigated using a custom 20,000-device hardware prototyping platform on a continual learning problem where a network must learn new tasks without catastrophically forgetting previously learned information. Results demonstrate that by trading off the number of devices required for layer mapping, layer ensemble averaging can reliably boost defective memristive network performance up to the software baseline. For the investigated problem, the average multi-task classification accuracy improves from 61 % to 72 % (< 1 % of software baseline) using the proposed approach.

[110]  arXiv:2404.15635 (cross-list from cs.CV) [pdf, other]
Title: A Real-time Evaluation Framework for Pedestrian's Potential Risk at Non-Signalized Intersections Based on Predicted Post-Encroachment Time
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Addressing pedestrian safety at intersections is one of the paramount concerns in the field of transportation research, driven by the urgency of reducing traffic-related injuries and fatalities. With advances in computer vision technologies and predictive models, the pursuit of developing real-time proactive protection systems is increasingly recognized as vital to improving pedestrian safety at intersections. The core of these protection systems lies in the prediction-based evaluation of pedestrian's potential risks, which plays a significant role in preventing the occurrence of accidents. The major challenges in the current prediction-based potential risk evaluation research can be summarized into three aspects: the inadequate progress in creating a real-time framework for the evaluation of pedestrian's potential risks, the absence of accurate and explainable safety indicators that can represent the potential risk, and the lack of tailor-made evaluation criteria specifically for each category of pedestrians. To address these research challenges, in this study, a framework with computer vision technologies and predictive models is developed to evaluate the potential risk of pedestrians in real time. Integral to this framework is a novel surrogate safety measure, the Predicted Post-Encroachment Time (P-PET), derived from deep learning models capable to predict the arrival time of pedestrians and vehicles at intersections. To further improve the effectiveness and reliability of pedestrian risk evaluation, we classify pedestrians into distinct categories and apply specific evaluation criteria for each group. The results demonstrate the framework's ability to effectively identify potential risks through the use of P-PET, indicating its feasibility for real-time applications and its improved performance in risk evaluation across different categories of pedestrians.

[111]  arXiv:2404.15648 (cross-list from cs.RO) [pdf, other]
Title: Affordance Blending Networks
Comments: Early preprint
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Affordances, a concept rooted in ecological psychology and pioneered by James J. Gibson, have emerged as a fundamental framework for understanding the dynamic relationship between individuals and their environments. Expanding beyond traditional perceptual and cognitive paradigms, affordances represent the inherent effect and action possibilities that objects offer to the agents within a given context. As a theoretical lens, affordances bridge the gap between effect and action, providing a nuanced understanding of the connections between agents' actions on entities and the effect of these actions. In this study, we propose a model that unifies object, action and effect into a single latent representation in a common latent space that is shared between all affordances that we call the affordance space. Using this affordance space, our system is able to generate effect trajectories when action and object are given and is able to generate action trajectories when effect trajectories and objects are given. In the experiments, we showed that our model does not learn the behavior of each object but it learns the affordance relations shared by the objects that we call equivalences. In addition to simulated experiments, we showed that our model can be used for direct imitation in real world cases. We also propose affordances as a base for Cross Embodiment transfer to link the actions of different robots. Finally, we introduce selective loss as a solution that allows valid outputs to be generated for indeterministic model inputs.

[112]  arXiv:2404.15653 (cross-list from cs.CV) [pdf, other]
Title: CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text embeddings. However, pairwise similarity computation in contrastive loss between image and text pairs poses computational challenges. This paper presents a novel weakly supervised pre-training of vision models on web-scale image-text data. The proposed method reframes pre-training on image-text data as a classification task. Consequently, it eliminates the need for pairwise similarity computations in contrastive loss, achieving a remarkable $2.7\times$ acceleration in training speed compared to contrastive learning on web-scale data. Through extensive experiments spanning diverse vision tasks, including detection and segmentation, we demonstrate that the proposed method maintains high representation quality. Our source code along with pre-trained model weights and training recipes is available at \url{https://github.com/apple/corenet}.

[113]  arXiv:2404.15681 (cross-list from cs.CR) [pdf, other]
Title: Automated Creation of Source Code Variants of a Cryptographic Hash Function Implementation Using Generative Pre-Trained Transformer Models
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Generative pre-trained transformers (GPT's) are a type of large language machine learning model that are unusually adept at producing novel, and coherent, natural language. In this study the ability of GPT models to generate novel and correct versions, and notably very insecure versions, of implementations of the cryptographic hash function SHA-1 is examined. The GPT models Llama-2-70b-chat-h, Mistral-7B-Instruct-v0.1, and zephyr-7b-alpha are used. The GPT models are prompted to re-write each function using a modified version of the localGPT framework and langchain to provide word embedding context of the full source code and header files to the model, resulting in over 130,000 function re-write GPT output text blocks, approximately 40,000 of which were able to be parsed as C code and subsequently compiled. The generated code is analyzed for being compilable, correctness of the algorithm, memory leaks, compiler optimization stability, and character distance to the reference implementation. Remarkably, several generated function variants have a high implementation security risk of being correct for some test vectors, but incorrect for other test vectors. Additionally, many function implementations were not correct to the reference algorithm of SHA-1, but produced hashes that have some of the basic characteristics of hash functions. Many of the function re-writes contained serious flaws such as memory leaks, integer overflows, out of bounds accesses, use of uninitialised values, and compiler optimization instability. Compiler optimization settings and SHA-256 hash checksums of the compiled binaries are used to cluster implementations that are equivalent but may not have identical syntax - using this clustering over 100,000 novel and correct versions of the SHA-1 codebase were generated where each component C function of the reference implementation is different from the original code.

[114]  arXiv:2404.15690 (cross-list from cs.CL) [pdf, other]
Title: Neural Proto-Language Reconstruction
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Proto-form reconstruction has been a painstaking process for linguists. Recently, computational models such as RNN and Transformers have been proposed to automate this process. We take three different approaches to improve upon previous methods, including data augmentation to recover missing reflexes, adding a VAE structure to the Transformer model for proto-to-language prediction, and using a neural machine translation model for the reconstruction task. We find that with the additional VAE structure, the Transformer model has a better performance on the WikiHan dataset, and the data augmentation step stabilizes the training.

[115]  arXiv:2404.15709 (cross-list from cs.CV) [pdf, other]
Title: ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)

In this work, we aim to learn a unified vision-based policy for a multi-fingered robot hand to manipulate different objects in diverse poses. Though prior work has demonstrated that human videos can benefit policy learning, performance improvement has been limited by physically implausible trajectories extracted from videos. Moreover, reliance on privileged object information such as ground-truth object states further limits the applicability in realistic scenarios. To address these limitations, we propose a new framework ViViDex to improve vision-based policy learning from human videos. It first uses reinforcement learning with trajectory guided rewards to train state-based policies for each video, obtaining both visually natural and physically plausible trajectories from the video. We then rollout successful episodes from state-based policies and train a unified visual policy without using any privileged information. A coordinate transformation method is proposed to significantly boost the performance. We evaluate our method on three dexterous manipulation tasks and demonstrate a large improvement over state-of-the-art algorithms.

[116]  arXiv:2404.15742 (cross-list from math.NA) [pdf, other]
Title: Generalizing the SINDy approach with nested neural networks
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)

Symbolic Regression (SR) is a widely studied field of research that aims to infer symbolic expressions from data. A popular approach for SR is the Sparse Identification of Nonlinear Dynamical Systems (\sindy) framework, which uses sparse regression to identify governing equations from data. This study introduces an enhanced method, Nested SINDy, that aims to increase the expressivity of the SINDy approach thanks to a nested structure. Indeed, traditional symbolic regression and system identification methods often fail with complex systems that cannot be easily described analytically. Nested SINDy builds on the SINDy framework by introducing additional layers before and after the core SINDy layer. This allows the method to identify symbolic representations for a wider range of systems, including those with compositions and products of functions. We demonstrate the ability of the Nested SINDy approach to accurately find symbolic expressions for simple systems, such as basic trigonometric functions, and sparse (false but accurate) analytical representations for more complex systems. Our results highlight Nested SINDy's potential as a tool for symbolic regression, surpassing the traditional SINDy approach in terms of expressivity. However, we also note the challenges in the optimization process for Nested SINDy and suggest future research directions, including the designing of a more robust methodology for the optimization process. This study proves that Nested SINDy can effectively discover symbolic representations of dynamical systems from data, offering new opportunities for understanding complex systems through data-driven methods.

[117]  arXiv:2404.15746 (cross-list from stat.ML) [pdf, other]
Title: Collaborative Heterogeneous Causal Inference Beyond Meta-analysis
Comments: submitted to ICML
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Collaboration between different data centers is often challenged by heterogeneity across sites. To account for the heterogeneity, the state-of-the-art method is to re-weight the covariate distributions in each site to match the distribution of the target population. Nevertheless, this method could easily fail when a certain site couldn't cover the entire population. Moreover, it still relies on the concept of traditional meta-analysis after adjusting for the distribution shift.
In this work, we propose a collaborative inverse propensity score weighting estimator for causal inference with heterogeneous data. Instead of adjusting the distribution shift separately, we use weighted propensity score models to collaboratively adjust for the distribution shift. Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases. To account for the vulnerable density estimation, we further discuss the double machine method and show the possibility of using nonparametric density estimation with d<8 and a flexible machine learning method to guarantee asymptotic normality. We propose a federated learning algorithm to collaboratively train the outcome model while preserving privacy. Using synthetic and real datasets, we demonstrate the advantages of our method.

[118]  arXiv:2404.15770 (cross-list from cs.CV) [pdf, other]
Title: ChEX: Interactive Localization and Region Description in Chest X-rays
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)

Report generation models offer fine-grained textual interpretations of medical images like chest X-rays, yet they often lack interactivity (i.e. the ability to steer the generation process through user queries) and localized interpretability (i.e. visually grounding their predictions), which we deem essential for future adoption in clinical practice. While there have been efforts to tackle these issues, they are either limited in their interactivity by not supporting textual queries or fail to also offer localized interpretability. Therefore, we propose a novel multitask architecture and training paradigm integrating textual prompts and bounding boxes for diverse aspects like anatomical regions and pathologies. We call this approach the Chest X-Ray Explainer (ChEX). Evaluations across a heterogeneous set of 9 chest X-ray tasks, including localized image interpretation and report generation, showcase its competitiveness with SOTA models while additional analysis demonstrates ChEX's interactive capabilities.

[119]  arXiv:2404.15786 (cross-list from eess.IV) [pdf, other]
Title: Rethinking Model Prototyping through the MedMNIST+ Dataset Collection
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The integration of deep learning based systems in clinical practice is often impeded by challenges rooted in limited and heterogeneous medical datasets. In addition, prioritization of marginal performance improvements on a few, narrowly scoped benchmarks over clinical applicability has slowed down meaningful algorithmic progress. This trend often results in excessive fine-tuning of existing methods to achieve state-of-the-art performance on selected datasets rather than fostering clinically relevant innovations. In response, this work presents a comprehensive benchmark for the MedMNIST+ database to diversify the evaluation landscape and conduct a thorough analysis of common convolutional neural networks (CNNs) and Transformer-based architectures, for medical image classification. Our evaluation encompasses various medical datasets, training methodologies, and input resolutions, aiming to reassess the strengths and limitations of widely used model variants. Our findings suggest that computationally efficient training schemes and modern foundation models hold promise in bridging the gap between expensive end-to-end training and more resource-refined approaches. Additionally, contrary to prevailing assumptions, we observe that higher resolutions may not consistently improve performance beyond a certain threshold, advocating for the use of lower resolutions, particularly in prototyping stages, to expedite processing. Notably, our analysis reaffirms the competitiveness of convolutional models compared to ViT-based architectures emphasizing the importance of comprehending the intrinsic capabilities of different model architectures. Moreover, we hope that our standardized evaluation framework will help enhance transparency, reproducibility, and comparability on the MedMNIST+ dataset collection as well as future research within the field. Code will be released soon.

[120]  arXiv:2404.15794 (cross-list from cs.NE) [pdf, other]
Title: Large Language Models as In-context AI Generators for Quality-Diversity
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Quality-Diversity (QD) approaches are a promising direction to develop open-ended processes as they can discover archives of high-quality solutions across diverse niches. While already successful in many applications, QD approaches usually rely on combining only one or two solutions to generate new candidate solutions. As observed in open-ended processes such as technological evolution, wisely combining large diversity of these solutions could lead to more innovative solutions and potentially boost the productivity of QD search. In this work, we propose to exploit the pattern-matching capabilities of generative models to enable such efficient solution combinations. We introduce In-context QD, a framework of techniques that aim to elicit the in-context capabilities of pre-trained Large Language Models (LLMs) to generate interesting solutions using the QD archive as context. Applied to a series of common QD domains, In-context QD displays promising results compared to both QD baselines and similar strategies developed for single-objective optimization. Additionally, this result holds across multiple values of parameter sizes and archive population sizes, as well as across domains with distinct characteristics from BBO functions to policy search. Finally, we perform an extensive ablation that highlights the key prompt design considerations that encourage the generation of promising solutions for QD.

[121]  arXiv:2404.15805 (cross-list from q-bio.BM) [pdf, other]
Title: Beyond ESM2: Graph-Enhanced Protein Sequence Modeling with Efficient Clustering
Subjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)

Proteins are essential to life's processes, underpinning evolution and diversity. Advances in sequencing technology have revealed millions of proteins, underscoring the need for sophisticated pre-trained protein models for biological analysis and AI development. Facebook's ESM2, the most advanced protein language model to date, leverages a masked prediction task for unsupervised learning, crafting amino acid representations with notable biochemical accuracy. Yet, it lacks in delivering functional protein insights, signaling an opportunity for enhancing representation quality.Our study addresses this gap by incorporating protein family classification into ESM2's training.This approach, augmented with Community Propagation-Based Clustering Algorithm, improves global protein representations, while a contextual prediction task fine-tunes local amino acid accuracy. Significantly, our model achieved state-of-the-art results in several downstream experiments, demonstrating the power of combining global and local methodologies to substantially boost protein representation quality.

[122]  arXiv:2404.15817 (cross-list from cs.CV) [pdf, other]
Title: Vision Transformer-based Adversarial Domain Adaptation
Authors: Yahan Li, Yuan Wu
Comments: 6 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. The most recent UDA methods always resort to adversarial training to yield state-of-the-art results and a dominant number of existing UDA methods employ convolutional neural networks (CNNs) as feature extractors to learn domain invariant features. Vision transformer (ViT) has attracted tremendous attention since its emergence and has been widely used in various computer vision tasks, such as image classification, object detection, and semantic segmentation, yet its potential in adversarial domain adaptation has never been investigated. In this paper, we fill this gap by employing the ViT as the feature extractor in adversarial domain adaptation. Moreover, we empirically demonstrate that ViT can be a plug-and-play component in adversarial domain adaptation, which means directly replacing the CNN-based feature extractor in existing UDA methods with the ViT-based feature extractor can easily obtain performance improvement. The code is available at https://github.com/LluckyYH/VT-ADA.

[123]  arXiv:2404.15822 (cross-list from cs.AI) [pdf, other]
Title: Recursive Backwards Q-Learning in Deterministic Environments
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Reinforcement learning is a popular method of finding optimal solutions to complex problems. Algorithms like Q-learning excel at learning to solve stochastic problems without a model of their environment. However, they take longer to solve deterministic problems than is necessary. Q-learning can be improved to better solve deterministic problems by introducing such a model-based approach. This paper introduces the recursive backwards Q-learning (RBQL) agent, which explores and builds a model of the environment. After reaching a terminal state, it recursively propagates its value backwards through this model. This lets each state be evaluated to its optimal value without a lengthy learning process. In the example of finding the shortest path through a maze, this agent greatly outperforms a regular Q-learning agent.

[124]  arXiv:2404.15829 (cross-list from physics.acc-ph) [pdf, ps, other]
Title: Accelerating Cavity Fault Prediction Using Deep Learning at Jefferson Laboratory
Comments: 18 pages, 9 figures
Subjects: Accelerator Physics (physics.acc-ph); Machine Learning (cs.LG)

Accelerating cavities are an integral part of the Continuous Electron Beam Accelerator Facility (CEBAF) at Jefferson Laboratory. When any of the over 400 cavities in CEBAF experiences a fault, it disrupts beam delivery to experimental user halls. In this study, we propose the use of a deep learning model to predict slowly developing cavity faults. By utilizing pre-fault signals, we train a LSTM-CNN binary classifier to distinguish between radio-frequency (RF) signals during normal operation and RF signals indicative of impending faults. We optimize the model by adjusting the fault confidence threshold and implementing a multiple consecutive window criterion to identify fault events, ensuring a low false positive rate. Results obtained from analysis of a real dataset collected from the accelerating cavities simulating a deployed scenario demonstrate the model's ability to identify normal signals with 99.99% accuracy and correctly predict 80% of slowly developing faults. Notably, these achievements were achieved in the context of a highly imbalanced dataset, and fault predictions were made several hundred milliseconds before the onset of the fault. Anticipating faults enables preemptive measures to improve operational efficiency by preventing or mitigating their occurrence.

[125]  arXiv:2404.15834 (cross-list from cs.DC) [pdf, ps, other]
Title: FEDSTR: Money-In AI-Out | A Decentralized Marketplace for Federated Learning and LLM Training on the NOSTR Protocol
Comments: 19 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

The NOSTR is a communication protocol for the social web, based on the w3c websockets standard. Although it is still in its infancy, it is well known as a social media protocol, thousands of trusted users and multiple user interfaces, offering a unique experience and enormous capabilities. To name a few, the NOSTR applications include but are not limited to direct messaging, file sharing, audio/video streaming, collaborative writing, blogging and data processing through distributed AI directories. In this work, we propose an approach that builds upon the existing protocol structure with end goal a decentralized marketplace for federated learning and LLM training. In this proposed design there are two parties: on one side there are customers who provide a dataset that they want to use for training an AI model. On the other side, there are service providers, who receive (parts of) the dataset, train the AI model, and for a payment as an exchange, they return the optimized AI model. The decentralized and censorship resistant features of the NOSTR enable the possibility of designing a fair and open marketplace for training AI models and LLMs.

[126]  arXiv:2404.15848 (cross-list from cs.CL) [pdf, other]
Title: Detecting Conceptual Abstraction in LLMs
Comments: Paper accepted at the LREC-COLING 2024 Conference (Paper ID: 1968) this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

We present a novel approach to detecting noun abstraction within a large language model (LLM). Starting from a psychologically motivated set of noun pairs in taxonomic relationships, we instantiate surface patterns indicating hypernymy and analyze the attention matrices produced by BERT. We compare the results to two sets of counterfactuals and show that we can detect hypernymy in the abstraction mechanism, which cannot solely be related to the distributional similarity of noun pairs. Our findings are a first step towards the explainability of conceptual abstraction in LLMs.

[127]  arXiv:2404.15854 (cross-list from cs.CR) [pdf, other]
Title: CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning
Comments: Submitted to IEEE TDSC
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation attacks. Surprisingly, even manipulations like volume control can significantly bypass detection without affecting human perception. To address this, we propose CLAD (Contrastive Learning-based Audio deepfake Detector) to enhance the robustness against manipulation attacks. The key idea is to incorporate contrastive learning to minimize the variations introduced by manipulations, therefore enhancing detection robustness. Additionally, we incorporate a length loss, aiming to improve the detection accuracy by clustering real audios more closely in the feature space. We comprehensively evaluated the most widely adopted audio deepfake detection models and our proposed CLAD against various manipulation attacks. The detection models exhibited vulnerabilities, with FAR rising to 36.69%, 31.23%, and 51.28% under volume control, fading, and noise injection, respectively. CLAD enhanced robustness, reducing the FAR to 0.81% under noise injection and consistently maintaining an FAR below 1.63% across all tests. Our source code and documentation are available in the artifact repository (https://github.com/CLAD23/CLAD).

[128]  arXiv:2404.15879 (cross-list from cs.CV) [pdf, other]
Title: Revisiting Out-of-Distribution Detection in LiDAR-based 3D Object Detection
Comments: Accepted for publication at the 2024 35th IEEE Intelligent Vehicles Symposium (IV 2024), June 2-5, 2024, in Jeju Island, Korea
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

LiDAR-based 3D object detection has become an essential part of automated driving due to its ability to localize and classify objects precisely in 3D. However, object detectors face a critical challenge when dealing with unknown foreground objects, particularly those that were not present in their original training data. These out-of-distribution (OOD) objects can lead to misclassifications, posing a significant risk to the safety and reliability of automated vehicles. Currently, LiDAR-based OOD object detection has not been well studied. We address this problem by generating synthetic training data for OOD objects by perturbing known object categories. Our idea is that these synthetic OOD objects produce different responses in the feature map of an object detector compared to in-distribution (ID) objects. We then extract features using a pre-trained and fixed object detector and train a simple multilayer perceptron (MLP) to classify each detection as either ID or OOD. In addition, we propose a new evaluation protocol that allows the use of existing datasets without modifying the point cloud, ensuring a more authentic evaluation of real-world scenarios. The effectiveness of our method is validated through experiments on the newly proposed nuScenes OOD benchmark. The source code is available at https://github.com/uulm-mrm/mmood3d.

[129]  arXiv:2404.15880 (cross-list from eess.SP) [pdf, other]
Title: Machine Learning for Pre/Post Flight UAV Rotor Defect Detection Using Vibration Analysis
Comments: Submitted to IEEE GlobeCom 2024
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Unmanned Aerial Vehicles (UAVs) will be critical infrastructural components of future smart cities. In order to operate efficiently, UAV reliability must be ensured by constant monitoring for faults and failures. To this end, the work presented in this paper leverages signal processing and Machine Learning (ML) methods to analyze the data of a comprehensive vibrational analysis to determine the presence of rotor blade defects during pre and post-flight operation. With the help of dimensionality reduction techniques, the Random Forest algorithm exhibited the best performance and detected defective rotor blades perfectly. Additionally, a comprehensive analysis of the impact of various feature subsets is presented to gain insight into the factors affecting the model's classification decision process.

[130]  arXiv:2404.15903 (cross-list from cs.CV) [pdf, other]
Title: Drawing the Line: Deep Segmentation for Extracting Art from Ancient Etruscan Mirrors
Comments: 19 pages, accepted at ICDAR2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Etruscan mirrors constitute a significant category within Etruscan art and, therefore, undergo systematic examinations to obtain insights into ancient times. A crucial aspect of their analysis involves the labor-intensive task of manually tracing engravings from the backside. Additionally, this task is inherently challenging due to the damage these mirrors have sustained, introducing subjectivity into the process. We address these challenges by automating the process through photometric-stereo scanning in conjunction with deep segmentation networks which, however, requires effective usage of the limited data at hand. We accomplish this by incorporating predictions on a per-patch level, and various data augmentations, as well as exploring self-supervised learning. Compared to our baseline, we improve predictive performance w.r.t. the pseudo-F-Measure by around 16%. When assessing performance on complete mirrors against a human baseline, our approach yields quantitative similar performance to a human annotator and significantly outperforms existing binarization methods. With our proposed methodology, we streamline the annotation process, enhance its objectivity, and reduce overall workload, offering a valuable contribution to the examination of these historical artifacts and other non-traditional documents.

[131]  arXiv:2404.15949 (cross-list from cs.CL) [pdf, other]
Title: Sequence can Secretly Tell You What to Discard
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large Language Models (LLMs), despite their impressive performance on a wide range of tasks, require significant GPU memory and consume substantial computational resources. In addition to model weights, the memory occupied by KV cache increases linearly with sequence length, becoming a main bottleneck for inference. In this paper, we introduce a novel approach for optimizing the KV cache which significantly reduces its memory footprint. Through a comprehensive investigation, we find that on LLaMA2 series models, (i) the similarity between adjacent tokens' query vectors is remarkably high, and (ii) current query's attention calculation can rely solely on the attention information of a small portion of the preceding queries. Based on these observations, we propose CORM, a KV cache eviction policy that dynamically retains important key-value pairs for inference without finetuning the model. We validate that CORM reduces the inference memory usage of KV cache by up to 70% without noticeable performance degradation across six tasks in LongBench.

[132]  arXiv:2404.15954 (cross-list from cs.IR) [pdf, other]
Title: Mixed Supervised Graph Contrastive Learning for Recommendation
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Recommender systems (RecSys) play a vital role in online platforms, offering users personalized suggestions amidst vast information. Graph contrastive learning aims to learn from high-order collaborative filtering signals with unsupervised augmentation on the user-item bipartite graph, which predominantly relies on the multi-task learning framework involving both the pair-wise recommendation loss and the contrastive loss. This decoupled design can cause inconsistent optimization direction from different losses, which leads to longer convergence time and even sub-optimal performance. Besides, the self-supervised contrastive loss falls short in alleviating the data sparsity issue in RecSys as it learns to differentiate users/items from different views without providing extra supervised collaborative filtering signals during augmentations. In this paper, we propose Mixed Supervised Graph Contrastive Learning for Recommendation (MixSGCL) to address these concerns. MixSGCL originally integrates the training of recommendation and unsupervised contrastive losses into a supervised contrastive learning loss to align the two tasks within one optimization direction. To cope with the data sparsity issue, instead unsupervised augmentation, we further propose node-wise and edge-wise mixup to mine more direct supervised collaborative filtering signals based on existing user-item interactions. Extensive experiments on three real-world datasets demonstrate that MixSGCL surpasses state-of-the-art methods, achieving top performance on both accuracy and efficiency. It validates the effectiveness of MixSGCL with our coupled design on supervised graph contrastive learning.

[133]  arXiv:2404.15959 (cross-list from physics.geo-ph) [pdf, other]
Title: Explainable AI models for predicting liquefaction-induced lateral spreading
Comments: to be published in "Frontiers in Built Environment"
Subjects: Geophysics (physics.geo-ph); Machine Learning (cs.LG)

Earthquake-induced liquefaction can cause substantial lateral spreading, posing threats to infrastructure. Machine learning (ML) can improve lateral spreading prediction models by capturing complex soil characteristics and site conditions. However, the "black box" nature of ML models can hinder their adoption in critical decision-making. This study addresses this limitation by using SHapley Additive exPlanations (SHAP) to interpret an eXtreme Gradient Boosting (XGB) model for lateral spreading prediction, trained on data from the 2011 Christchurch Earthquake. SHAP analysis reveals the factors driving the model's predictions, enhancing transparency and allowing for comparison with established engineering knowledge. The results demonstrate that the XGB model successfully identifies the importance of soil characteristics derived from Cone Penetration Test (CPT) data in predicting lateral spreading, validating its alignment with domain understanding. This work highlights the value of explainable machine learning for reliable and informed decision-making in geotechnical engineering and hazard assessment.

[134]  arXiv:2404.15967 (cross-list from stat.ML) [pdf, other]
Title: Interpretable clustering with the Distinguishability criterion
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)

Cluster analysis is a popular unsupervised learning tool used in many disciplines to identify heterogeneous sub-populations within a sample. However, validating cluster analysis results and determining the number of clusters in a data set remains an outstanding problem. In this work, we present a global criterion called the Distinguishability criterion to quantify the separability of identified clusters and validate inferred cluster configurations. Our computational implementation of the Distinguishability criterion corresponds to the Bayes risk of a randomized classifier under the 0-1 loss. We propose a combined loss function-based computational framework that integrates the Distinguishability criterion with many commonly used clustering procedures, such as hierarchical clustering, k-means, and finite mixture models. We present these new algorithms as well as the results from comprehensive data analysis based on simulation studies and real data applications.

[135]  arXiv:2404.15979 (cross-list from cs.CV) [pdf, other]
Title: On the Fourier analysis in the SO(3) space : EquiLoPO Network
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Group Theory (math.GR)

Analyzing volumetric data with rotational invariance or equivariance is an active topic in current research. Existing deep-learning approaches utilize either group convolutional networks limited to discrete rotations or steerable convolutional networks with constrained filter structures. This work proposes a novel equivariant neural network architecture that achieves analytical Equivariance to Local Pattern Orientation on the continuous SO(3) group while allowing unconstrained trainable filters - EquiLoPO Network. Our key innovations are a group convolutional operation leveraging irreducible representations as the Fourier basis and a local activation function in the SO(3) space that provides a well-defined mapping from input to output functions, preserving equivariance. By integrating these operations into a ResNet-style architecture, we propose a model that overcomes the limitations of prior methods. A comprehensive evaluation on diverse 3D medical imaging datasets from MedMNIST3D demonstrates the effectiveness of our approach, which consistently outperforms state of the art. This work suggests the benefits of true rotational equivariance on SO(3) and flexible unconstrained filters enabled by the local activation function, providing a flexible framework for equivariant deep learning on volumetric data with potential applications across domains. Our code is publicly available at \url{https://gricad-gitlab.univ-grenoble-alpes.fr/GruLab/ILPO/-/tree/main/EquiLoPO}.

[136]  arXiv:2404.16000 (cross-list from cs.CV) [pdf, other]
Title: A comprehensive and easy-to-use multi-domain multi-task medical imaging meta-dataset (MedIMeta)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

While the field of medical image analysis has undergone a transformative shift with the integration of machine learning techniques, the main challenge of these techniques is often the scarcity of large, diverse, and well-annotated datasets. Medical images vary in format, size, and other parameters and therefore require extensive preprocessing and standardization, for usage in machine learning. Addressing these challenges, we introduce the Medical Imaging Meta-Dataset (MedIMeta), a novel multi-domain, multi-task meta-dataset. MedIMeta contains 19 medical imaging datasets spanning 10 different domains and encompassing 54 distinct medical tasks, all of which are standardized to the same format and readily usable in PyTorch or other ML frameworks. We perform a technical validation of MedIMeta, demonstrating its utility through fully supervised and cross-domain few-shot learning baselines.

[137]  arXiv:2404.16017 (cross-list from cs.CV) [pdf, other]
Title: RetinaRegNet: A Versatile Approach for Retinal Image Registration
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)

We introduce the RetinaRegNet model, which can achieve state-of-the-art performance across various retinal image registration tasks. RetinaRegNet does not require training on any retinal images. It begins by establishing point correspondences between two retinal images using image features derived from diffusion models. This process involves the selection of feature points from the moving image using the SIFT algorithm alongside random point sampling. For each selected feature point, a 2D correlation map is computed by assessing the similarity between the feature vector at that point and the feature vectors of all pixels in the fixed image. The pixel with the highest similarity score in the correlation map corresponds to the feature point in the moving image. To remove outliers in the estimated point correspondences, we first applied an inverse consistency constraint, followed by a transformation-based outlier detector. This method proved to outperform the widely used random sample consensus (RANSAC) outlier detector by a significant margin. To handle large deformations, we utilized a two-stage image registration framework. A homography transformation was used in the first stage and a more accurate third-order polynomial transformation was used in the second stage. The model's effectiveness was demonstrated across three retinal image datasets: color fundus images, fluorescein angiography images, and laser speckle flowgraphy images. RetinaRegNet outperformed current state-of-the-art methods in all three datasets. It was especially effective for registering image pairs with large displacement and scaling deformations. This innovation holds promise for various applications in retinal image analysis. Our code is publicly available at https://github.com/mirthAI/RetinaRegNet.

[138]  arXiv:2404.16023 (cross-list from stat.AP) [pdf, other]
Title: Learning Car-Following Behaviors Using Bayesian Matrix Normal Mixture Regression
Comments: 6 pages, Accepted by the 35th IEEE Intelligent Vehicles Symposium
Subjects: Applications (stat.AP); Machine Learning (cs.LG)

Learning and understanding car-following (CF) behaviors are crucial for microscopic traffic simulation. Traditional CF models, though simple, often lack generalization capabilities, while many data-driven methods, despite their robustness, operate as "black boxes" with limited interpretability. To bridge this gap, this work introduces a Bayesian Matrix Normal Mixture Regression (MNMR) model that simultaneously captures feature correlations and temporal dynamics inherent in CF behaviors. This approach is distinguished by its separate learning of row and column covariance matrices within the model framework, offering an insightful perspective into the human driver decision-making processes. Through extensive experiments, we assess the model's performance across various historical steps of inputs, predictive steps of outputs, and model complexities. The results consistently demonstrate our model's adeptness in effectively capturing the intricate correlations and temporal dynamics present during CF. A focused case study further illustrates the model's outperforming interpretability of identifying distinct operational conditions through the learned mean and covariance matrices. This not only underlines our model's effectiveness in understanding complex human driving behaviors in CF scenarios but also highlights its potential as a tool for enhancing the interpretability of CF behaviors in traffic simulations and autonomous driving systems.

[139]  arXiv:2404.16030 (cross-list from cs.CV) [pdf, other]
Title: MoDE: CLIP Data Experts via Clustering
Comments: IEEE CVPR 2024 Camera Ready. Code Link: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data. We present Mixture of Data Experts (MoDE) and learn a system of CLIP data experts via clustering. Each data expert is trained on one data cluster, being less sensitive to false negative noises in other clusters. At inference time, we ensemble their outputs by applying weights determined through the correlation between task metadata and cluster conditions. To estimate the correlation precisely, the samples in one cluster should be semantically similar, but the number of data experts should still be reasonable for training and inference. As such, we consider the ontology in human language and propose to use fine-grained cluster centers to represent each data expert at a coarse-grained level. Experimental studies show that four CLIP data experts on ViT-B/16 outperform the ViT-L/14 by OpenAI CLIP and OpenCLIP on zero-shot image classification but with less ($<$35\%) training cost. Meanwhile, MoDE can train all data expert asynchronously and can flexibly include new data experts. The code is available at https://github.com/facebookresearch/MetaCLIP/tree/main/mode.

Replacements for Thu, 25 Apr 24

[140]  arXiv:2207.12201 (replaced) [pdf, other]
Title: Calibrated One-class Classification for Unsupervised Time Series Anomaly Detection
Comments: Accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[141]  arXiv:2211.11760 (replaced) [pdf, other]
Title: A Low Latency Adaptive Coding Spiking Framework for Deep Reinforcement Learning
Comments: Accepted at IJCAI 2023. 9 pages, 6 figures, and 4 tables. Codes are available at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
[142]  arXiv:2301.05180 (replaced) [pdf, other]
Title: Effective Decision Boundary Learning for Class Incremental Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[143]  arXiv:2305.16556 (replaced) [pdf, other]
Title: LANISTR: Multimodal Learning from Structured and Unstructured Data
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[144]  arXiv:2306.01843 (replaced) [pdf, other]
Title: Lifting Architectural Constraints of Injective Flows
Comments: Camera-ready version: accepted to ICLR 2024
Subjects: Machine Learning (cs.LG)
[145]  arXiv:2307.12667 (replaced) [pdf, other]
Title: TransFusion: Generating Long, High Fidelity Time Series using Diffusion Models with Transformers
Subjects: Machine Learning (cs.LG)
[146]  arXiv:2307.15398 (replaced) [pdf, other]
Title: The Initial Screening Order Problem
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
[147]  arXiv:2309.08499 (replaced) [pdf, other]
Title: POCKET: Pruning Random Convolution Kernels for Time Series Classification from a Feature Selection Perspective
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[148]  arXiv:2310.02980 (replaced) [pdf, other]
Title: Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[149]  arXiv:2310.16624 (replaced) [pdf, other]
Title: Free-form Flows: Make Any Architecture a Normalizing Flow
Comments: Camera-ready version: accepted at AISTATS 2024
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[150]  arXiv:2311.00865 (replaced) [pdf, other]
Title: Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning
Comments: published at NeurIPS 2023
Journal-ref: Advances in Neural Information Processing Systems, 36 (2023)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
[151]  arXiv:2311.12399 (replaced) [pdf, other]
Title: A Survey of Graph Meets Large Language Model: Progress and Future Directions
Comments: IJCAI 2024 Survey Track
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Social and Information Networks (cs.SI)
[152]  arXiv:2312.07682 (replaced) [pdf, other]
Title: A Conditioned Unsupervised Regression Framework Attuned to the Dynamic Nature of Data Streams
Comments: 23 pages, 13 figures, FTC 2024 - Future Technologies Conference 2024
Subjects: Machine Learning (cs.LG)
[153]  arXiv:2312.07983 (replaced) [pdf, other]
Title: Multi-perspective Feedback-attention Coupling Model for Continuous-time Dynamic Graphs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
[154]  arXiv:2312.09744 (replaced) [pdf, other]
Title: Bridging the Semantic-Numerical Gap: A Numerical Reasoning Method of Cross-modal Knowledge Graph for Material Property Prediction
Subjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci)
[155]  arXiv:2401.15482 (replaced) [pdf, other]
Title: Unsupervised Solution Operator Learning for Mean-Field Games via Sampling-Invariant Parametrizations
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC)
[156]  arXiv:2402.01055 (replaced) [pdf, other]
Title: Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[157]  arXiv:2402.01799 (replaced) [pdf, ps, other]
Title: Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward
Comments: Accepted at IJCAI '24 (Survey Track), Updated TGI results
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[158]  arXiv:2402.03170 (replaced) [pdf, other]
Title: Is Mamba Capable of In-Context Learning?
Subjects: Machine Learning (cs.LG)
[159]  arXiv:2402.14081 (replaced) [pdf, other]
Title: Motion Code: Robust Time series Classification and Forecasting via Sparse Variational Multi-Stochastic Processes Learning
Comments: 20 pages, 5 figures, 4 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[160]  arXiv:2402.16196 (replaced) [pdf, other]
Title: Combining Machine Learning with Computational Fluid Dynamics using OpenFOAM and SmartSim
Journal-ref: Meccanica, 2024
Subjects: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)
[161]  arXiv:2403.09570 (replaced) [pdf, other]
Title: Multi-Fidelity Bayesian Optimization With Across-Task Transferable Max-Value Entropy Search
Comments: 12 pages, 8 figures, submitted to IEEE for peer review
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Signal Processing (eess.SP)
[162]  arXiv:2404.03329 (replaced) [pdf, ps, other]
Title: DeepFunction: Deep Metric Learning-based Imbalanced Classification for Diagnosing Threaded Pipe Connection Defects using Functional Data
Comments: Revised version for submission to IISE Transactions
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[163]  arXiv:2404.13506 (replaced) [pdf, other]
Title: Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[164]  arXiv:2404.13634 (replaced) [pdf, other]
Title: Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[165]  arXiv:2404.13964 (replaced) [pdf, other]
Title: An Economic Solution to Copyright Challenges of Generative AI
Subjects: Machine Learning (cs.LG); General Economics (econ.GN); Methodology (stat.ME)
[166]  arXiv:2404.14442 (replaced) [pdf, ps, other]
Title: Unified ODE Analysis of Smooth Q-Learning Algorithms
Authors: Donghwan Lee
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[167]  arXiv:2404.14462 (replaced) [pdf, other]
Title: Towards smaller, faster decoder-only transformers: Architectural variants and their implications
Comments: 8 pages, 6 figures
Subjects: Machine Learning (cs.LG)
[168]  arXiv:2404.15201 (replaced) [pdf, other]
Title: CORE-BEHRT: A Carefully Optimized and Rigorously Evaluated BEHRT
Comments: 11 pages, 5 figures
Subjects: Machine Learning (cs.LG)
[169]  arXiv:2112.10953 (replaced) [pdf, other]
Title: An adaptation of InfoMap to absorbing random walks using absorption-scaled graphs
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Probability (math.PR); Adaptation and Self-Organizing Systems (nlin.AO); Physics and Society (physics.soc-ph)
[170]  arXiv:2201.07794 (replaced) [pdf, other]
Title: A Non-Expert's Introduction to Data Ethics for Mathematicians
Authors: Mason A. Porter
Comments: A few small tweaks and a new small paragraph in the Conclusions. This is a book chapter. It is associated with my data-ethics lecture at the 2021 AMS Short Course on Mathematical and Computational Methods for Complex Social Systems
Subjects: History and Overview (math.HO); Computers and Society (cs.CY); Machine Learning (cs.LG); Physics and Society (physics.soc-ph); Machine Learning (stat.ML)
[171]  arXiv:2303.17708 (replaced) [pdf, other]
Title: Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
[172]  arXiv:2306.16699 (replaced) [pdf, other]
Title: Rapid-INR: Storage Efficient CPU-free DNN Training Using Implicit Neural Representation
Comments: Accepted by ICCAD 2023
Journal-ref: ICCAD 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Machine Learning (cs.LG)
[173]  arXiv:2307.08939 (replaced) [pdf, other]
Title: Runtime Stealthy Perception Attacks against DNN-based Adaptive Cruise Control Systems
Comments: 19 pages, 23 figures, 11 tables
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[174]  arXiv:2309.13788 (replaced) [pdf, other]
Title: Can LLM-Generated Misinformation Be Detected?
Authors: Canyu Chen, Kai Shu
Comments: Accepted to Proceedings of ICLR 2024. 9 pages for main paper, 40 pages including appendix. The code, results, dataset for this paper and more resources on "LLMs Meet Misinformation" have been released on the project website: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[175]  arXiv:2310.00749 (replaced) [pdf, other]
Title: SEED: Domain-Specific Data Curation With Large Language Models
Comments: preprint, 20 pages, 4 figures
Subjects: Databases (cs.DB); Machine Learning (cs.LG)
[176]  arXiv:2310.03722 (replaced) [pdf, other]
Title: Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance
Comments: Substantive revision in v3 (Apr 23 2024)
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
[177]  arXiv:2310.11566 (replaced) [pdf, other]
Title: Partially Observable Stochastic Games with Neural Perception Mechanisms
Comments: 42 pages, 6 figures
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
[178]  arXiv:2311.08744 (replaced) [pdf, other]
Title: Graph Signal Diffusion Model for Collaborative Filtering
Comments: 11 pages, 8 figures, accepted by SIGIR 2024
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
[179]  arXiv:2311.18736 (replaced) [pdf, other]
Title: Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms
Comments: 25 pages, 16 figures
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Optimization and Control (math.OC)
[180]  arXiv:2312.08555 (replaced) [pdf, other]
Title: KDAS: Knowledge Distillation via Attention Supervision Framework for Polyp Segmentation
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[181]  arXiv:2312.17300 (replaced) [pdf, other]
Title: Improving Intrusion Detection with Domain-Invariant Representation Learning in Latent Space
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[182]  arXiv:2401.12576 (replaced) [pdf, other]
Title: LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-Explanations
Comments: Accepted to NAACL 2024 HCI+NLP workshop; camera-ready version
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[183]  arXiv:2401.17231 (replaced) [pdf, other]
Title: Achieving More Human Brain-Like Vision via Human EEG Representational Alignment
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
[184]  arXiv:2402.01598 (replaced) [pdf, other]
Title: Learning from Two Decades of Blood Pressure Data: Demography-Specific Patterns Across 75 Million Patient Encounters
Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG); Applications (stat.AP)
[185]  arXiv:2403.19011 (replaced) [pdf, other]
Title: Sequential Inference of Hospitalization Electronic Health Records Using Probabilistic Models
Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG)
[186]  arXiv:2403.19612 (replaced) [pdf, other]
Title: ILPO-NET: Network for the invariant recognition of arbitrary volumetric patterns in 3D
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[187]  arXiv:2404.05980 (replaced) [pdf, other]
Title: Tackling Structural Hallucination in Image Translation with Local Diffusion
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[188]  arXiv:2404.06023 (replaced) [pdf, other]
Title: Prelimit Coupling and Steady-State Convergence of Constant-stepsize Nonsmooth Contractive SA
Comments: ACM SIGMETRICS 2024. 71 pages, 3 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Probability (math.PR)
[189]  arXiv:2404.09359 (replaced) [pdf, other]
Title: Exploring Feedback Generation in Automated Skeletal Movement Assessment: A Comprehensive Overview
Authors: Tal Hakim
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[190]  arXiv:2404.09454 (replaced) [pdf, other]
Title: Utility-Fairness Trade-Offs and How to Find Them
Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Machine Learning (cs.LG)
[191]  arXiv:2404.10800 (replaced) [pdf, other]
Title: Integrating Graph Neural Networks with Scattering Transform for Anomaly Detection
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[192]  arXiv:2404.11788 (replaced) [pdf, other]
Title: NonGEMM Bench: Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG); Performance (cs.PF)
[193]  arXiv:2404.11868 (replaced) [pdf, other]
Title: OPTiML: Dense Semantic Invariance Using Optimal Transport for Self-Supervised Medical Image Representation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[194]  arXiv:2404.12923 (replaced) [pdf, other]
Title: Probabilistic Numeric SMC Sampling for Bayesian Nonlinear System Identification in Continuous Time
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[195]  arXiv:2404.13220 (replaced) [pdf, ps, other]
Title: Security and Privacy Product Inclusion
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[196]  arXiv:2404.14700 (replaced) [pdf, other]
Title: FlashSpeech: Efficient Zero-Shot Speech Synthesis
Comments: Efficient zero-shot speech synthesis
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[197]  arXiv:2404.14795 (replaced) [pdf, other]
Title: Talk Too Much: Poisoning Large Language Models under Token Limit
Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[198]  arXiv:2404.14836 (replaced) [src]
Title: Probabilistic forecasting of power system imbalance using neural network-based ensembles
Comments: One of the co-authors objected with having it on Arxiv already
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
[ total of 198 entries: 1-198 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2404, contact, help  (Access key information)