AI Right Now: Multimodal Thinking, Leaner Compute, and Real-World Trust
Big picture first: AI work right now moves on three practical tracks at once — broader thinking (models that read, see, hear, and plan), leaner compute (faster, smaller, cheaper training/inference), and real-world trust (safety, robustness, interpretability). Below are the clearest clusters of ideas and why they matter, written so anyone can follow.
- Foundation models and reasoning
Researchers keep pushing language and multimodal models to do complex, multi-step reasoning—math, physics, science Olympiads—and to produce explanations of their own internal computations. Two themes repeat: (1) combine smaller specialist modules into a coordinating “agent” (a coordinator and workers), and (2) reflect on the model’s own internal signals to guide or check its answers (self-explanation, trajectory-based rewards).
- Why it matters: This makes models better at chained thinking (planning, math) and gives developers new ways to check what the model did.
- Multimodal sensing and retrieval
Work is unifying sound, vision, text, and even 3D gestures. Instead of separate models for each input type, researchers build interfaces that let several pretrained models cooperate and fuse information, then pull facts from external databases or the web to ground answers (retrieval-augmented methods).
- Examples: speech quality judges, aerial-to-ground image synthesis, sign-language translators, and cross-modal contrastive learning.
- Why it matters: Real systems must understand mixed inputs (audio+video+text) for tasks such as virtual assistants, remote diagnosis, and robotics.
- Efficiency and smart fine-tuning
Rather than retrain gigantic models, teams use tiny add-ons (LoRA, sparse adapters), selective layer freezing, or parameter merging to get domain specialization without losing general skills. New dynamic strategies decide “which tokens or which positions to re-compute” so compute goes where it matters.
- Why it matters: Makes deployment cheaper and faster on real hardware, and helps preserve previously learned abilities when adapting to a new domain.
- Better training signals and alignment
Instead of only scoring text by content, researchers are extracting signals from conversation geometry (how dialogues flow), multi-turn interaction dynamics, and human-like feedback. Those structural signals can be privacy-friendly alternatives or complements to text-based reward models.
- Why it matters: Gives new ways to teach models safe behavior and to measure good versus poor interaction style without collecting full transcripts.
- Robustness, security and provenance
AI models are attacked (e.g., prompts that force endless loops or raise costs) and stolen (model extraction). Work covers adversarial examples, watermarking and watermark removal, poisoning, and defenses that make models more auditable and resistant.
- Why it matters: Production systems must be able to prove ownership, resist exploitation, and avoid hidden behavior that drives up costs or harms users.
- Data: smarter augmentation and synthetic data
Real data are scarce or costly (medical images, sEMG). New methods use diffusion models or other generative tools to produce faithful-and-diverse synthetic examples, guided by semantic conditions or “sparse-aware” sampling to focus on underrepresented cases.
- Why it matters: Gives more useful training examples for small datasets and reduces overfitting, especially in healthcare or robotics.
- Benchmarks, datasets, and evaluation tools
Many new, carefully constructed datasets target domain-specific needs (speech naturalness, sign language, bird knowledge tracing, biomedical EHR, multi-view images). Researchers also build more realistic evaluation frameworks that test fairness, long-term sustainability, or cross-domain generalization.
- Why it matters: Good datasets reveal real weaknesses and measure progress that matters to practitioners—e.g., how a model performs on live medical workflows or multi-modal reasoning.
- Explainability and introspection
Because black-box outputs are risky, several works train models to explain their internal computations (what features a neuron encodes, how internal activations influence outputs). Some find that models can better explain their own internals than other models can.
- Why it matters: Leads to debugging tools, helps detect shortcuts, and supports human oversight.
- Theory: sparsity, universality, and dynamics
There’s active theoretical work re-examining assumptions: are dynamics sparse? How do different architectures share a common universal approximation property? What is the true cost/benefit of sparsity priors or iterative refinement?
- Why it matters: Better theory leads to better inductive biases—models that learn faster and generalize more reliably.
- Robotics, control, and multi-agent systems
Applied research shows models coordinating fleets of vehicles, preventing jackknifing in articulated vehicles, learning to place macros in chip design, or deriving game-theory-style equilibria for driving and multi-agent planning. There’s also biologically inspired work (e.g., Physarum transport networks) that helps algorithm design.
- Why it matters: Bridges the gap from planning to safe physical action and real-time control in uncertain environments.
- Medical, speech and domain applications
Targeted systems for clinical diagnostics, speech alignment, medical segmentation, and personalized prosthetic control are being built — often combining domain knowledge with LLMs or multimodal backbones and emphasizing rigorous evaluation and bias checks.
- Why it matters: These are high-value applications where reliability and explainability are essential.
- Systems, hardware, and energy awareness
Work targets practical constraints: making models run on microcontrollers, improving sampling for high-dimensional optimization, guarding against energy-latency attacks, and assessing long-term environmental costs of model updates.
- Why it matters: Real deployment has hard constraints—battery, heat, latency, and regulatory rules.
Quick takeaways for a non-expert:
- AI is becoming multimodal: the same systems are expected to read, listen, see, and act.
- People build smaller “plug-ins” to adapt big models cheaply instead of retraining everything.
- Trust (robustness, explainability, provenance) is suddenly as important as raw accuracy.
- Domain-specific data and smart synthetic data are the path to reliable real-world deployments.
- New evaluation and alignment methods are emerging so systems behave better in dialog, teamwork, and safety-critical tasks.
If you want one short practical pointer: focus on data quality + modularity. High-quality, well-labeled or intelligently augmented data is the lever that turns these advanced model recipes into useful systems; modular adapters and retrieval modules make those systems cheaper to operate and easier to audit.
Want to go deeper? Pick one cluster above (e.g., safety, multimodal, or efficiency) and scan recent papers or datasets in that area — the field is crowded, but each cluster contains fast-moving, practical ideas you can test in a few weeks.
Related Papers
- arXiv Query: search_query=&id_list=2511.08436v1&start=0&max_results=10
Weakly electric fish, like Gnathonemus petersii, use a remarkable electrical modality for active sensing and communication, but studying their rich electrosensing and electrocommunication behavior and…
- arXiv Query: search_query=&id_list=2511.08394v1&start=0&max_results=10
The alignment of Large Language Models (LLMs) for multi-turn conversations typically relies on reward signals derived from the content of the text. This approach, however, overlooks a rich, complement…
- arXiv Query: search_query=&id_list=2511.08344v1&start=0&max_results=10
Surface electromyography (sEMG)-based gesture recognition plays a critical role in human-machine interaction (HMI), particularly for rehabilitation and prosthetic control. However, sEMG-based systems …
- arXiv Query: search_query=&id_list=2511.08225v1&start=0&max_results=10
As teachers increasingly turn to GenAI in their educational practice, we need robust methods to benchmark large language models (LLMs) for pedagogical purposes. This article presents an embedding-base…
- arXiv Query: search_query=&id_list=2511.08086v1&start=0&max_results=10
The use of learned dynamics models, also known as world models, can improve the sample efficiency of reinforcement learning. Recent work suggests that the underlying causal graphs of such dynamics mod…
- arXiv Query: search_query=&id_list=2511.08016v1&start=0&max_results=10
This paper presents a novel approach to avoiding jackknifing and mutual collisions in Heavy Articulated Vehicles (HAVs) by leveraging decentralized swarm intelligence. In contrast to typical swarm rob…
- arXiv Query: search_query=&id_list=2511.08573v1&start=0&max_results=10
Spatial transcriptomics is an emerging field that enables the identification of functional regions based on the spatial distribution of gene expression. Integrating this functional information present…
- 2511.08577v1
…
- arXiv Query: search_query=&id_list=2511.08151v1&start=0&max_results=10
Recent advances in large language models have enabled AI systems to achieve expert-level performance on domain-specific scientific tasks, yet these systems remain narrow and handcrafted. We introduce …
- arXiv Query: search_query=&id_list=2511.08579v1&start=0&max_results=10
Can language models (LMs) learn to faithfully describe their internal computations? Are they better able to describe themselves than other models? We study the extent to which LMs' privileged access t…
- arXiv Query: search_query=&id_list=2511.07926v1&start=0&max_results=10
Resistive random access memory (RRAM) is a promising candidate for next-generation nonvolatile memory (NVM) and in-memory computing applications. Compact models are essential for analyzing the circuit…
- arXiv Query: search_query=&id_list=2511.08389v1&start=0&max_results=10
Speech Foundation Models have gained significant attention recently. Prior works have shown that the fusion of representations from multiple layers of the same model or the fusion of multiple models c…
- arXiv Query: search_query=&id_list=2511.08500v1&start=0&max_results=10
Large language models (LLMs) adapted to financial domains often suffer from catastrophic forgetting of general reasoning capabilities essential for customer interactions and complex financial analysis…
- arXiv Query: search_query=&id_list=2511.07943v1&start=0&max_results=10
Efficient retrieval of external knowledge bases and web pages is crucial for enhancing the reasoning abilities of LLMs. Previous works on training LLMs to leverage external retrievers for solving comp…
- arXiv Query: search_query=&id_list=2511.07938v1&start=0&max_results=10
Power-logistics scheduling in modern seaports typically follow a predict-then-optimize pipeline. To enhance decision quality, decision-focused learning has been proposed to align forecasting and optim…
- arXiv Query: search_query=&id_list=2511.07878v1&start=0&max_results=10
We study how trajectory value depends on the learning algorithm in policy-gradient control. Using Trajectory Shapley in an uncertain LQR, we find a negative correlation between Persistence of Excitati…
- arXiv Query: search_query=&id_list=2511.07876v1&start=0&max_results=10
As large language models (LLMs) scale, their inference incurs substantial computational resources, exposing them to energy-latency attacks, where crafted prompts induce high energy and latency cost. E…
- arXiv Query: search_query=&id_list=2511.08567v1&start=0&max_results=10
Reinforcement Learning with Verifiable Rewards (RLVR) reliably improves the reasoning performance of large language models, yet it appears to modify only a small fraction of parameters. We revisit thi…
- 2511.08512v1
…
- arXiv Query: search_query=&id_list=2511.08090v1&start=0&max_results=10
Face morphing attacks threaten the integrity of biometric identity systems by enabling multiple individuals to share a single identity. To develop and evaluate effective morphing attack detection (MAD…
- arXiv Query: search_query=&id_list=2511.08087v1&start=0&max_results=10
Evaluating identity preservation in generative models remains a critical yet unresolved challenge. Existing metrics rely on global embeddings or coarse VLM prompting, failing to capture fine-grained i…
- arXiv Query: search_query=&id_list=2511.08061v1&start=0&max_results=10
Subject-driven image generation aims to synthesize novel depictions of a specific subject across diverse contexts while preserving its core identity features. Achieving both strong identity consistenc…
- arXiv Query: search_query=&id_list=2511.08027v1&start=0&max_results=10
Zhong-Ping Jiang devoted a large part of his work to the study of the stability properties of interconnected systems. In this short paper we celebrate Zhong-Ping Jiang's 60th birthday by studying a sp…
- arXiv Query: search_query=&id_list=2511.08015v1&start=0&max_results=10
Modern autonomous driving (AD) systems leverage 3D object detection to perceive foreground objects in 3D environments for subsequent prediction and planning. Visual 3D detection based on RGB cameras p…
- arXiv Query: search_query=&id_list=2511.07990v1&start=0&max_results=10
Weeds significantly reduce crop yields worldwide and pose major challenges to sustainable agriculture. Traditional weed management methods, primarily relying on chemical herbicides, risk environmental…
- arXiv Query: search_query=&id_list=2511.08379v1&start=0&max_results=10
Refusal refers to the functional behavior enabling safety-aligned language models to reject harmful or unethical prompts. Following the growing scientific interest in mechanistic interpretability, rec…
- arXiv Query: search_query=&id_list=2511.08287v1&start=0&max_results=10
Graph Contrastive Learning (GCL) has emerged as a powerful paradigm for training Graph Neural Networks (GNNs) in the absence of task-specific labels. However, its scalability on large-scale graphs is …
- arXiv Query: search_query=&id_list=2511.08226v1&start=0&max_results=10
In order to achieve Continual Learning (CL), the problem of catastrophic forgetting, one that has plagued neural networks since their inception, must be overcome. The evaluation of continual learning …
- arXiv Query: search_query=&id_list=2511.08215v1&start=0&max_results=10
The proliferation of digital food applications necessitates robust methods for automated nutritional analysis and culinary guidance. This paper presents a comprehensive comparative evaluation of a dec…
- arXiv Query: search_query=&id_list=2511.07931v1&start=0&max_results=10
Aligning large generative models with human feedback is a critical challenge. In speech synthesis, this is particularly pronounced due to the lack of a large-scale human preference dataset, which hind…
- arXiv Query: search_query=&id_list=2511.07896v1&start=0&max_results=10
Reward models (RMs) are a core component in the post-training of large language models (LLMs), serving as proxies for human preference evaluation and guiding model alignment. However, training reliabl…
- 2511.07857v1
…
- arXiv Query: search_query=&id_list=2511.07836v1&start=0&max_results=10
The curse of dimensionality presents a pervasive challenge in optimization problems, with exponential expansion of the search space rapidly causing traditional algorithms to become inefficient or infe…
- arXiv Query: search_query=&id_list=2511.08420v1&start=0&max_results=10
Scaled Relative Graphs (SRGs) provide a promising tool for stability and robustness analysis of multi-input-multi-output systems. In this paper, we provide tools for exact and computable constructions…
- 2511.08401v1
…
- arXiv Query: search_query=&id_list=2511.08307v1&start=0&max_results=10
Generative models, such as large language models or text-to-image diffusion models, can generate relevant responses to user-given queries. Response-based vector embeddings of generative models facilit…
- 2511.08275v1
…
- arXiv Query: search_query=&id_list=2511.08273v1&start=0&max_results=10
This paper presents the analytical design of a new wide tuning range and low-noise millimeter-wave voltage control oscillators (VCO) for 5G technology. The small signal model analysis and phase noise …
- arXiv Query: search_query=&id_list=2511.08143v1&start=0&max_results=10
Large Language Models (LLMs) have demonstrated their remarkable capabilities in document understanding. However, recent research reveals that LLMs still exhibit performance gaps in Document-level Rela…
- arXiv Query: search_query=&id_list=2511.08082v1&start=0&max_results=10
This paper develops a prudential framework for assessing the reliability of large language models (LLMs) in reinsurance. A five-pillar architecture--governance, data lineage, assurance, resilience, an…
- arXiv Query: search_query=&id_list=2511.08054v1&start=0&max_results=10
This work introduces the Re$^{\text{2}}$MaP method, which generates expert-quality macro placements through recursively prototyping and packing tree-based relocating. We first perform multi-level macr…
- arXiv Query: search_query=&id_list=2511.08043v1&start=0&max_results=10
In modern sequential decision-making systems, the construction of an optimal candidate action space is critical to efficient inference. However, existing approaches either rely on manually defined act…
- arXiv Query: search_query=&id_list=2511.08031v1&start=0&max_results=10
The rapid advancement of generative adversarial networks (GANs) and diffusion models has enabled the creation of highly realistic deepfake content, posing significant threats to digital trust across a…
- arXiv Query: search_query=&id_list=2511.08005v1&start=0&max_results=10
Robotic systems driven by conventional motors often suffer from challenges such as large mass, complex control algorithms, and the need for additional braking mechanisms, which limit their application…
- arXiv Query: search_query=&id_list=2511.08570v1&start=0&max_results=10
Kolmogorov-Arnold Networks (KANs) are a class of neural networks that have received increased attention in recent literature. In contrast to MLPs, KANs leverage parameterized, trainable activation fun…
- arXiv Query: search_query=&id_list=2511.08470v1&start=0&max_results=10
In the context of the Classification and Regression Trees (CART) algorithm, the efficient splitting of categorical features using standard criteria like GINI and Entropy is well-established. However, …
- arXiv Query: search_query=&id_list=2511.08465v1&start=0&max_results=10
This paper presents a comprehensive methodology and comparative performance analysis for the automated classification and object detection of peripheral blood cells (PBCs) in microscopic images. Addre…
- 2511.08451v1
…
- arXiv Query: search_query=&id_list=2511.07950v1&start=0&max_results=10
Developing a robust and effective obstacle detection and tracking system for Unmanned Surface Vehicle (USV) at marine environments is a challenging task. Research efforts have been made in this area d…
- arXiv Query: search_query=&id_list=2511.07947v1&start=0&max_results=10
Machine learning models constitute valuable intellectual property, yet remain vulnerable to model extraction attacks (MEA), where adversaries replicate their functionality through black-box queries. M…
- arXiv Query: search_query=&id_list=2511.07928v1&start=0&max_results=10
This paper presents a novel image-based path planning algorithm that was developed using computer vision techniques, as well as its comparative analysis with well-known deterministic and probabilistic…
- arXiv Query: search_query=&id_list=2511.08152v1&start=0&max_results=10
Multimodal learning, while contributing to numerous success stories across various fields, faces the challenge of prohibitively expensive manual annotation. To address the scarcity of annotated data, …
- arXiv Query: search_query=&id_list=2511.08120v1&start=0&max_results=10
Sustainability and efficiency have become essential considerations in the development and deployment of Artificial Intelligence systems, yet existing regulatory and reporting practices lack standardiz…
- arXiv Query: search_query=&id_list=2511.08085v1&start=0&max_results=10
This research presents a comprehensive investigation into Bangla authorship attribution, introducing a new balanced benchmark corpus BARD10 (Bangla Authorship Recognition Dataset of 10 authors) and sy…
- 2511.08077v1
…
- arXiv Query: search_query=&id_list=2511.08046v1&start=0&max_results=10
Automated medical image segmentation suffers from high inter-observer variability, particularly in tasks such as lung nodule delineation, where experts often disagree. Existing approaches either colla…
- arXiv Query: search_query=&id_list=2511.08399v1&start=0&max_results=10
Most multimodal models treat every negative pair alike, ignoring the ambiguous negatives that differ from the positive by only a small detail. We propose Boundary-Aware Curriculum with Local Attention…
- arXiv Query: search_query=&id_list=2511.08376v1&start=0&max_results=10
This paper introduces TurkEmbed, a novel Turkish language embedding model designed to outperform existing models, particularly in Natural Language Inference (NLI) and Semantic Textual Similarity (STS)…
- arXiv Query: search_query=&id_list=2511.08535v1&start=0&max_results=10
We present Large Sign Language Models (LSLM), a novel framework for translating 3D American Sign Language (ASL) by leveraging Large Language Models (LLMs) as the backbone, which can benefit hearing-im…
- arXiv Query: search_query=&id_list=2511.08531v1&start=0&max_results=10
The slime mould Physarum polycephalum displays adaptive transport dynamics and network formation that have inspired its use as a model of biological computation. We develop a Lagrangian formulation of…
- arXiv Query: search_query=&id_list=2511.08133v1&start=0&max_results=10
Scene Text Recognition (STR) remains challenging due to real-world complexities, where decoupled visual-linguistic optimization in existing frameworks amplifies error propagation through cross-modal m…
- arXiv Query: search_query=&id_list=2511.08052v1&start=0&max_results=10
Recent LLMs have demonstrated sophisticated problem-solving capabilities on various benchmarks through advanced reasoning algorithms. However, the key research question of identifying reasoning steps …
- arXiv Query: search_query=&id_list=2511.07982v1&start=0&max_results=10
Accurate interpretation of Notices to Airmen (NOTAMs) is critical for aviation safety, yet their condensed and cryptic language poses significant challenges to both manual and automated processing. Ex…
- arXiv Query: search_query=&id_list=2511.08507v1&start=0&max_results=10
Bangla Sign Language (BdSL) translation represents a low-resource NLP task due to the lack of large-scale datasets that address sentence-level translation. Correspondingly, existing research in this f…
- arXiv Query: search_query=&id_list=2511.07997v1&start=0&max_results=10
We revisit the problem of generating synthetic data under differential privacy. To address the core limitations of marginal-based methods, we propose the Private Adaptive Generative Adversarial Networ…
- arXiv Query: search_query=&id_list=2511.07897v1&start=0&max_results=10
Deep learning models have been successful in many areas but understanding their behaviors still remains a black-box. Most prior explainable AI (XAI) approaches have focused on interpreting and explain…
- arXiv Query: search_query=&id_list=2511.08377v1&start=0&max_results=10
Intent inferencing in teleoperation has been instrumental in aligning operator goals and coordinating actions with robotic partners. However, current intent inference methods often ignore subtle motio…
- arXiv Query: search_query=&id_list=2511.08369v1&start=0&max_results=10
This work introduces Text-based Aerial-Ground Person Retrieval (TAG-PR), which aims to retrieve person images from heterogeneous aerial and ground views with textual descriptions. Unlike traditional T…
- arXiv Query: search_query=&id_list=2511.08322v1&start=0&max_results=10
Minimizing inconsistencies across successive versions of an AI system is as crucial as reducing the overall error. In image classification, such inconsistencies manifest as negative flips, where an up…
- arXiv Query: search_query=&id_list=2511.08417v1&start=0&max_results=10
Accurately estimating the normalization term (also known as the partition function) in the contrastive loss is a central challenge for training Contrastive Language-Image Pre-training (CLIP) models. C…
- 2511.08305v1
…
- arXiv Query: search_query=&id_list=2511.08303v1&start=0&max_results=10
This study investigates treatment effect estimation in the semi-supervised setting, where we can use not only the standard triple of covariates, treatment indicator, and outcome, but also unlabeled au…
- 2511.08263v1
…
- arXiv Query: search_query=&id_list=2511.08174v1&start=0&max_results=10
Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. To enhance CFR's applicability in large games, researchers use neural networks t…
- 2511.08130v1
…
- arXiv Query: search_query=&id_list=2511.08075v1&start=0&max_results=10
Latent diffusion models such as Stable Diffusion achieve state-of-the-art results on text-to-image generation tasks. However, the extent to which these models have a semantic understanding of the imag…
- arXiv Query: search_query=&id_list=2511.08073v1&start=0&max_results=10
We study an online linear regression setting in which the observed feature vectors are corrupted by noise and the learner can pay to reduce the noise level. In practice, this may happen for several re…
- arXiv Query: search_query=&id_list=2511.08001v1&start=0&max_results=10
To facilitate effective, safe deployment in the real world, individual robots must reason about interactions with other agents, which often occur without explicit communication. Recent work has identi…
- arXiv Query: search_query=&id_list=2511.07927v1&start=0&max_results=10
Obstacle avoidance and path planning are essential for guiding unmanned ground vehicles (UGVs) through environments that are densely populated with dynamic obstacles. This paper develops a novel appro…
- 2511.07889v1
…
- arXiv Query: search_query=&id_list=2511.07940v1&start=0&max_results=10
Talking Face Generation (TFG) aims to produce realistic and dynamic talking portraits, with broad applications in fields such as digital education, film and television production, e-commerce live stre…
- arXiv Query: search_query=&id_list=2511.07916v1&start=0&max_results=10
Several computer vision applications like vehicle license plate recognition, captcha recognition, printed or handwriting character recognition from images etc., text polarity detection and binarizatio…
- arXiv Query: search_query=&id_list=2511.07912v1&start=0&max_results=10
Adaptive reasoning enables humans to flexibly adjust inference strategies when environmental rules or contexts change, yet its underlying neural dynamics remain unclear. This study investigated the ne…
- arXiv Query: search_query=&id_list=2511.07894v1&start=0&max_results=10
We present \textsc{S2C} (Specification-to-Certified-Controller), a multi-agent framework that maps natural-language requirements to certified $\mathcal{H}_\infty$ state-feedback controllers via LMI sy…
- 2511.07892v1
…
- 2511.07887v1
…
- arXiv Query: search_query=&id_list=2511.08562v1&start=0&max_results=10
Climate change is intensifying infectious and chronic diseases like malaria and diabetes, respectively, especially among the vulnerable populations. Global temperatures have risen by approximately $0.…
- arXiv Query: search_query=&id_list=2511.08552v1&start=0&max_results=10
We introduce a novel Mutual Information (MI) estimator that fundamentally reframes the discriminative approach. Instead of training a classifier to discriminate between joint and marginal distribution…
- arXiv Query: search_query=&id_list=2511.08549v1&start=0&max_results=10
Recently, Deep Learning (DL) techniques have been used for User Equipment (UE) positioning. However, the key shortcomings of such models is that: i) they weigh the same attention to the entire input; …
- arXiv Query: search_query=&id_list=2511.08537v1&start=0&max_results=10
This report presents a detailed methodology for constructing a high-quality Semantic Role Labeling (SRL) dataset from the Wall Street Journal (WSJ) portion of the OntoNotes 5.0 corpus and adapting it …
- arXiv Query: search_query=&id_list=2511.08203v1&start=0&max_results=10
Despite their impressive results, large-scale image-to-3D generative models remain opaque in their inductive biases. We identify a significant limitation in image-conditioned 3D generative models: a s…
- arXiv Query: search_query=&id_list=2511.08191v1&start=0&max_results=10
The recent success of machine learning models, especially large-scale classifiers and language models, relies heavily on training with massive data. These data are often collected from online sources.…
- arXiv Query: search_query=&id_list=2511.08177v1&start=0&max_results=10
AI-powered coding assistants, like GitHub Copilot, are increasingly used to boost developers' productivity. However, their output quality hinges on the contextual richness of the prompts. Meanwhile, g…
- arXiv Query: search_query=&id_list=2511.08168v1&start=0&max_results=10
This project was conducted as a 2nd-term adopted project of the "Post-5G Information and Communication System Infrastructure Enhancement R&D Project Development of Competitive Generative AI Founda…
- 2511.08126v1
…
- arXiv Query: search_query=&id_list=2511.08114v1&start=0&max_results=10
Face recognition systems are increasingly deployed across a wide range of applications, including smartphone authentication, access control, and border security. However, these systems remain vulnerab…
- arXiv Query: search_query=&id_list=2511.08072v1&start=0&max_results=10
Multivariate time series data come as a collection of time series describing different aspects of a certain temporal phenomenon. Anomaly detection in this type of data constitutes a challenging proble…
- arXiv Query: search_query=&id_list=2511.08048v1&start=0&max_results=10
Few-shot detection-based counters estimate the number of instances in the image specified only by a few test-time exemplars. A common approach to localize objects across multiple sizes is to merge bac…
- arXiv Query: search_query=&id_list=2511.08028v1&start=0&max_results=10
Graph Transformers (GTs) have shown strong empirical performance, yet current architectures vary widely in their use of attention mechanisms, positional embeddings (PEs), and expressivity. Existing ex…
- arXiv Query: search_query=&id_list=2511.08018v1&start=0&max_results=10
Object detection models demand large-scale annotated datasets, which are costly and labor-intensive to create. This motivated Imaginary Supervised Object Detection (ISOD), where models train on synthe…
- arXiv Query: search_query=&id_list=2511.07994v1&start=0&max_results=10
Graph neural networks (GNNs) can effectively model structural information of graphs, making them widely used in knowledge graph (KG) reasoning. However, existing studies on the expressive power of GNN…
- arXiv Query: search_query=&id_list=2511.07986v1&start=0&max_results=10
This paper critically re-examines "Digital Nature," a concept that has proliferated across various domains over the last ten years. By "Digital Nature," we refer to an evolving view of nature as a dyn…
- arXiv Query: search_query=&id_list=2511.08461v1&start=0&max_results=10
As Artificial Intelligence (AI) is increasingly promoted and used in qualitative research, it also raises profound methodological issues. This position paper critically interrogates the role of genera…
- arXiv Query: search_query=&id_list=2511.08454v1&start=0&max_results=10
Brain-computer interfaces (BCIs) promise to extend human movement capabilities by enabling direct neural control of supernumerary effectors, yet integrating augmented commands with multiple degrees of…
- arXiv Query: search_query=&id_list=2511.08439v1&start=0&max_results=10
Dataset integrity is fundamental to the safety and reliability of AI systems, especially in autonomous driving. This paper presents a structured framework for developing safe datasets aligned with ISO…
- arXiv Query: search_query=&id_list=2511.08438v1&start=0&max_results=10
Connecting the formation and evolution of galaxies to the large-scale structure is crucial for interpreting cosmological observations. While hydrodynamical simulations accurately model the correlated …
- arXiv Query: search_query=&id_list=2511.08405v1&start=0&max_results=10
The multi-phase inverter has become more complicated, particularly in an Electric Vehicle (EV)'s power train, which requires a robust fault protection system. The proposed active short circuit and saf…
- 2511.08365v1
…
- arXiv Query: search_query=&id_list=2511.08363v1&start=0&max_results=10
An AI-powered data visualization platform that automates the entire data analysis process, from uploading a dataset to generating an interactive visualization. Advanced machine learning algorithms are…
- 2511.08298v1
…
- 2511.08281v1
…
- arXiv Query: search_query=&id_list=2511.08560v1&start=0&max_results=10
We develop a bootstrap approach to Euclidean two-point correlators, in the thermal or ground state of quantum mechanical systems. We formulate the problem of bounding the two-point correlator as a sem…
- arXiv Query: search_query=&id_list=2511.08292v1&start=0&max_results=10
Encoding the distance between locations in space is essential for accurate navigation. Grid cells, a functional class of neurons in medial entorhinal cortex, are believed to support this computation. …
- arXiv Query: search_query=&id_list=2511.07948v1&start=0&max_results=10
Extracting robust discriminative features is a critical challenge in person re-identification (ReID). While Transformer-based methods have successfully addressed some limitations of convolutional neur…
- 2511.07919v1
…
- arXiv Query: search_query=&id_list=2511.07914v1&start=0&max_results=10
There is an increasing number of virtual communities and forums available on the web. With social media, people can freely communicate and share their thoughts, ask personal questions, and seek peer-s…
- arXiv Query: search_query=&id_list=2511.07888v1&start=0&max_results=10
A persistent challenge in text classification (TC) is that enhancing model robustness against adversarial attacks typically degrades performance on clean data. We argue that this challenge can be reso…
- arXiv Query: search_query=&id_list=2511.07882v1&start=0&max_results=10
Organisms in nature, such as Cephalopods and Pachyderms, exploit stiffness modulation to achieve amazing dexterity in the control of their appendages. In this paper, we explore the phenomenon of layer…
- 2511.07869v1
…
- arXiv Query: search_query=&id_list=2511.08435v1&start=0&max_results=10
Semi-supervised learning (SSL) enables training of powerful models with the assumption of limited, carefully labelled data and a large amount of unlabeled data to support the learning. In this paper, …
- arXiv Query: search_query=&id_list=2511.08423v1&start=0&max_results=10
A truly universal AI-Generated Image (AIGI) detector must simultaneously generalize across diverse generative models and varied semantic content. Current state-of-the-art methods learn a single, entan…
- arXiv Query: search_query=&id_list=2511.08383v1&start=0&max_results=10
The coexistence of heterogeneous service classes in 5G Enhanced Mobile Broadband (eMBB), Ultra-Reliable Low Latency Communication (URLLC), and Massive Machine-Type Communication (mMTC) poses major cha…
- arXiv Query: search_query=&id_list=2511.08372v1&start=0&max_results=10
This paper describes the current implementation of the dynamic articulatory model DYNARTmo, which generates continuous articulator movements based on the concept of speech gestures and a corresponding…
- arXiv Query: search_query=&id_list=2511.08371v1&start=0&max_results=10
While Deep Learning (DL) experts often have prior knowledge about which hyperparameter settings yield strong performance, only few Hyperparameter Optimization (HPO) algorithms can leverage such prior …
- 2511.08361v1
…
- arXiv Query: search_query=&id_list=2511.08258v1&start=0&max_results=10
Generating ground-level images from aerial views is a challenging task due to extreme viewpoint disparity, occlusions, and a limited field of view. We introduce Top2Ground, a novel diffusion-based met…
- arXiv Query: search_query=&id_list=2511.08206v1&start=0&max_results=10
Structured Electronic Health Record (EHR) data stores patient information in relational tables and plays a central role in clinical decision-making. Recent advances have explored the use of large lang…
- arXiv Query: search_query=&id_list=2511.08119v1&start=0&max_results=10
Latent fingerprint identification remains a challenging task due to low image quality, background noise, and partial impressions. In this work, we propose a novel identification approach called Latent…
- arXiv Query: search_query=&id_list=2511.08108v1&start=0&max_results=10
Machine learning is an essential tool for optimizing industrial quality control processes. However, the complexity of machine learning models often limits their practical applicability due to a lack o…
- 2511.08059v1
…
- arXiv Query: search_query=&id_list=2511.08017v1&start=0&max_results=10
Multi-character role-playing aims to equip models with the capability to simulate diverse roles. Existing methods either use one shared parameterized module across all roles or assign a separate param…
- 2511.08007v1
…
- arXiv Query: search_query=&id_list=2511.07988v1&start=0&max_results=10
Recent studies on audio models show brain-tuning - fine-tuning models to better predict corresponding fMRI activity - improves brain alignment and increases performance on downstream semantic and audi…
- 2511.07987v1
…
- 2511.07979v1
…
- arXiv Query: search_query=&id_list=2511.07966v1&start=0&max_results=10
Unsupervised domain adaptation for LiDAR-based 3D object detection (3D UDA) based on the teacher-student architecture with pseudo labels has achieved notable improvements in recent years. Although it …
- 2511.08581v1
…
- arXiv Query: search_query=&id_list=2511.08561v1&start=0&max_results=10
Physics-Informed Neural Networks (PINNs) have emerged as a promising framework for solving forward and inverse problems governed by differential equations. However, their reliability when used in ill-…
- arXiv Query: search_query=&id_list=2511.08523v1&start=0&max_results=10
We consider a Markovian single server queue with impatient customers. There is a customer abandonment cost and a holding cost for customers in the system. We consider two versions of the problem. In t…
- 2511.08522v1
…
- 2511.08513v1
…
- arXiv Query: search_query=&id_list=2511.08499v1&start=0&max_results=10
Despite the growing number of automated vehicles on public roads, operating such systems in open contexts will inevitably involve incidents. This results from an inherent risk in road traffic, which a…
- arXiv Query: search_query=&id_list=2511.08491v1&start=0&max_results=10
With increasingly sophisticated cybersecurity threats and rising demand for network automation, autonomous cybersecurity mechanisms are becoming critical for securing modern networks. The rapid expans…
- arXiv Query: search_query=&id_list=2511.08490v1&start=0&max_results=10
Concentric tube robots (CTRs) offer dexterous motion at millimeter scales, enabling minimally invasive procedures through natural orifices. This work presents a coordinated model-based resection plann…
- 2511.08476v1
…
- arXiv Query: search_query=&id_list=2511.07955v1&start=0&max_results=10
Speech emotion recognition (SER) has advanced significantly for the sake of deep-learning methods, while textual information further enhances its performance. However, few studies have focused on the …
- arXiv Query: search_query=&id_list=2511.07953v1&start=0&max_results=10
The notion of regular pair $(A,B)$ for two nonempty closed convex subsets $A$ and~$B$ of a Hilbert space $\mathcal{H}$ was introduced by Borwein and Bauschke in 1993 to ensure convergence (in norm) of…
- arXiv Query: search_query=&id_list=2511.07918v1&start=0&max_results=10
Human speech production encompasses multiple modes such as perceived, overt, whispered, and imagined, each reflecting distinct neural mechanisms. Among these, theta-band synchrony has been closely ass…
- arXiv Query: search_query=&id_list=2511.07881v1&start=0&max_results=10
This paper presents a unified framework for robust three-dimensional (3-D) source localization using a network of sensors equipped with one-dimensional (1-D) linear arrays. While such arrays offer pra…
- arXiv Query: search_query=&id_list=2511.07845v1&start=0&max_results=10
With the growth of data-driven services and expansion of mobile application usage, traditional methods of capacity and resource planning methods may not be efficient and often fall short in meeting ra…
- arXiv Query: search_query=&id_list=2511.07843v1&start=0&max_results=10
As deep learning methods increasingly utilize sensitive data on a widespread scale, differential privacy (DP) offers formal guarantees to protect against information leakage during model training. A s…
- 2511.08204v1
…
- 2511.08188v1
…
- arXiv Query: search_query=&id_list=2511.08169v1&start=0&max_results=10
Image composition aims to seamlessly integrate a foreground object into a background, where generating realistic and geometrically accurate shadows remains a persistent challenge. While recent diffusi…
- arXiv Query: search_query=&id_list=2511.08165v1&start=0&max_results=10
The change of electric power generation - from synchronous generator (SG) to converter - is generally regarded as the second revolution of power system. Different from rotor swing of SG in traditional…
- 2511.08163v1
…
- arXiv Query: search_query=&id_list=2511.08154v1&start=0&max_results=10
We revisit the fermion mass problem of the $SU(5)$ grand unified theory using machine learning techniques. The original $SU(5)$ model proposed by Georgi and Glashow is incompatible with the observed f…
- 2511.08117v1
…
- arXiv Query: search_query=&id_list=2511.08109v1&start=0&max_results=10
This paper examines how science fiction destabilises ontological categories by measuring conceptual permeability across the terms human, animal, and machine using masked language modelling (MLM). Draw…
- 2511.08101v1
…
- arXiv Query: search_query=&id_list=2511.08035v1&start=0&max_results=10
Decision-focused learning (DFL) has emerged as a powerful end-to-end alternative to conventional predict-then-optimize (PTO) pipelines by directly optimizing predictive models through downstream decis…
- 2511.08033v1
…
- arXiv Query: search_query=&id_list=2511.07995v1&start=0&max_results=10
In this study, we develop an approach to multivariate time series anomaly detection focused on the transformation of multivariate time series to univariate time series. Several transformation techniqu…
- arXiv Query: search_query=&id_list=2511.07993v1&start=0&max_results=10
With the proliferation of Virtual Reality (VR) technologies and the emergence of the Metaverse, social VR applications have become increasingly prevalent and accessible to the general user base. Servi…
- 2511.07980v1
…
- 2511.07970v1
…
- 2511.08455v1
…
- 2511.08441v1
…
- 2511.08370v1
…
- 2511.08338v1
…
- 2511.08279v1
…
- 2511.08260v1
…
- 2511.08536v1
…
- 2511.08509v1
…
- 2511.08198v1
…
- 2511.08195v1
…
- 2511.08186v1
…
- 2511.08180v1
…
- 2511.08178v1
…
- 2511.08161v1
…
- 2511.08132v1
…
- 2511.08091v1
…
- 2511.08083v1
…
- 2511.08012v1
…
- 2511.07991v1
…
- 2511.07969v1
…
- 2511.08554v1
…
- 2511.07960v1
…
- 2511.07934v1
…
- 2511.07907v1
…
- 2511.07893v1
…
- 2511.07863v1
…
- 2511.07860v1
…
- 2511.08444v1
…
- 2511.08427v1
…
- 2511.08424v1
…
- 2511.08412v1
…
- 2511.08343v1
…
- 2511.08328v1
…
- 2511.08297v1
…
- 2511.08261v1
…
- 2511.08230v1
…
- 2511.08223v1
…
- 2511.08219v1
…
- 2511.08214v1
…
- 2511.08433v1
…
- 2511.08403v1
…
- 2511.08354v1
…
- 2511.08350v1
…
- 2511.08345v1
…
- 2511.08331v1
…
- 2511.08299v1
…
- 2511.08291v1
…
- 2511.08272v1
…
- 2511.08271v1
…
- 2511.08251v1
…
- 2511.08199v1
…
- 2511.08196v1
…
- 2511.08170v1
…
- 2511.08041v1
…
- 2511.08008v1
…
- 2511.07974v1
…
- 2511.07971v1
…
- 2511.07929v1
…
- 2511.07908v1
…
- 2511.07879v1
…
- 2511.08553v1
…
Hiring AI researchers or engineers?
Your job post reaches ML engineers, PhD researchers, and AI leads who read arXiv daily. Transparent pricing, real impression data, no middlemen.
