8 Practical AI Trends: Smarter Generative Models, Multimodal, Efficiency, 3D & Safety
Trend 1 — Smarter image-and-signal generators: diffusion, flows, and “fixes that help”
Generative models are getting better in two practical ways. First, the math used to make images and signals has improved: normalizing flows (invertible maps between real data and a simple distribution) and diffusion models (learn to “undo” noise) are now faster, more stable, and produce higher-quality images. A neat trick that helped both image quality and easier training is to stop learning a VAE variance term and make it a fixed constant — that reduces complexity and makes joint training with flows stable. Another practical advance embeds the physical blur process directly into a diffusion model to get far better deblurring results.
Why it matters: better, faster image generation and restoration means cleaner photos, sharper medical scans, and more realistic synthetic video for animation or simulation — all with less trial-and-error tuning.
Trend 2 — Models that see, read, and act together (multimodal & vision-language)
Researchers are combining images, video, audio and text into unified systems. That includes models that design posters, edit images by dragging points (without needing manual masks), or plan complex actions from language. Work in this area improves two things: (a) how models reason about layout, geometry and aesthetics, and (b) how they edit or generate content controllably (e.g., “move this object here, keep its style”). New tools also let a language model act as a motion planner (turning instructions into explicit 3-D camera and object paths).
Why it matters: expect more interactive design tools (automated poster/layout designers, photo/video editors that follow plain-English directions) and better robots that follow spoken or written commands grounded in visual context.
Trend 3 — Do more with less: efficiency, scaling and on-the-fly serving
Many papers focus on shrinking energy, memory, and latency: low-precision training, quantized updates, compact normalizing flow layers, and techniques to let large models run at lower cost (e.g., sparse Mixture-of-Experts with smarter load balancing). On the systems side, smarter schedulers and dynamic batching methods (those adapt to hardware and request patterns) can massively increase the number of requests handled per second while keeping latency low.
Why it matters: cheaper AI services, smaller models that can run on edge devices, and faster server-side experiences — all reduce cost and carbon footprint while enabling AI on smartphones and factory sensors.
Trend 4 — Making 3D, scenes and rendering practical in real time
Novel 3D representations (for example using tetrahedral meshes) let existing graphics hardware rasterize volumetric scenes quickly and accurately. That enables real-time view synthesis and editing on ordinary consumer hardware. Other work combines mesh-based representations with neural fields to get both precision and speed. For avatars and close-up rendering, systems now switch between low- and high-frequency textures based on camera distance so faces look good up-close while remaining efficient.
Why it matters: faster, higher-fidelity virtual scenes for AR/VR, better avatars for telepresence, and real-time visual effects for games and film.
Trend 5 — Robots and controllers that learn from messy real life
Robotics research is moving past carefully scripted motions. New approaches: (1) learn from mixed-quality teleoperation logs (treat failures as useful data), (2) separate high-level planning from low-level execution so failures can be avoided at planning time, and (3) jointly design robot hands and controllers (co-design) so mechanical shape and software match. There’s also work on active perception (moving sensors like eyes), stable control with limited sensing, and controllers that can detect and adapt to actuator faults.
Why it matters: more reliable robots for factories, farms, warehouses and homes — that can handle messy environments, adapt on the fly, and require less manual tuning.
Trend 6 — Teaching big models with less data (distillation, synthetic data, and “single-life” learning)
Collecting and labeling huge datasets is expensive. Several approaches tackle that: dataset distillation (compress a large dataset’s training power into a tiny synthetic set), clever use of text-to-image priors aligned to target distributions, and surprising results showing that training on many hours of a single person's egocentric video can match training on diverse web data. For text models, researchers also explore distilling large corpora into compact prompts or training traces to give small models big-model skills.
Why it matters: startups and labs with limited budgets can build useful models with far less data and compute.
Trend 7 — Safety, ownership, and privacy: watermarking, unlearning, and robustness
As AI systems become commercial, three practical problems arise: (1) how to prove a model used your data (IP protection), (2) how to make models forget data they were trained on (machine unlearning), and (3) how to detect or prevent adversarial manipulations. New watermarking approaches fine-tune open models so a secret key can detect model outputs without hurting quality. But attackers have found counterattacks that can reverse “unlearning” steps — so defenders must be careful. Practical defenses include regularized, on-policy fine-tuning and robust verification methods that are cheaper than training itself. There’s also a growing focus on distribution-free tests (conformal methods) for checking quantum-classical distinctions in channels and similar rigorous checks in AI evaluation.
Why it matters: better ways to protect training data, to remove problematic samples when required, and to verify models in courts or audits.
Trend 8 — Benchmarks, transparency and fairness (real-world datasets and evaluation rules)
Many papers create datasets (e.g., smartwatch ECGs, prostate biopsies from Iraq, Indian religion question sets) and new frameworks for clear evaluation (Eval Factsheets shows how to document benchmarks). Research also reveals surprising failure modes: models trained only inside one domain may completely forget domain features, causing catastrophic failures at detecting out-of-domain data. Other work shows demographic and language biases in multilingual models. These findings push toward more careful evaluation design and more transparency about what a benchmark can — and cannot — say.
Why it matters: better consumer protection (fewer biased or fragile systems), clearer standards for deploying AI in medicine, government, and business.
Short practical takeaways:
- For users: expect smarter image editors, better phone-health tools, and more helpful multimodal assistants — but watch for privacy and bias.
- For product teams: focus on efficiency (so AI can scale) and on measuring fairness and robustness before launch.
- For policymakers: push for transparent baselines and standard evaluation (Eval Factsheets-style) and require verifiable data provenance for critical models.
What to watch next (simple checklist)
- Is a model validated across varied populations and sensors? (Medical and fairness failures often come from narrow training data.)
- Does the system include ownership and deletion guarantees? (Watermarking or verifiable unlearning are practical signs.)
- Is inference efficient for the target hardware? (If not, cost or latency will bite.)
- Are there clear evaluation documents? (Read the factsheet or benchmark description for hidden constraints.)
Bottom line: recent work is making AI systems more powerful, cheaper to run, and better at combining vision, language and action, while also beginning to grapple with safety, provenance, and fairness in realistic settings. That combination — better algorithms + better instruments for evaluation and protection — is what will make practical AI systems both useful and responsibly deployable in the next few years.
Related Papers
- arXiv Query: search_query=&id_list=2512.04085v1&start=0&max_results=10
We introduce the "single-life" learning paradigm, where we train a distinct vision model exclusively on egocentric videos captured by one individual. We leverage the multiple viewpoints naturally capt…
- arXiv Query: search_query=&id_list=2512.04084v1&start=0&max_results=10
Normalizing Flows (NFs) learn invertible mappings between the data and a Gaussian distribution. Prior works usually suffer from two limitations. First, they add random noise to training samples or VAE…
- arXiv Query: search_query=&id_list=2512.04082v1&start=0&max_results=10
Graphic design forms the cornerstone of modern visual communication, serving as a vital medium for promoting cultural and commercial events. Recent advances have explored automating this process using…
- arXiv Query: search_query=&id_list=2512.04076v1&start=0&max_results=10
We introduce radiance meshes, a technique for representing radiance fields with constant density tetrahedral cells produced with a Delaunay tetrahedralization. Unlike a Voronoi diagram, a Delaunay tet…
- arXiv Query: search_query=&id_list=2512.04072v1&start=0&max_results=10
Reasoning models leveraging long chains of thought employ various cognitive skills, such as verification of their answers, backtracking, retrying by an alternate method, and more. Previous work has sh…
- arXiv Query: search_query=&id_list=2512.04068v1&start=0&max_results=10
To handle underspecified or ambiguous queries, AI assistants need a policy for managing their uncertainty to determine (a) when to guess the user intent and answer directly, (b) when to enumerate and …
- arXiv Query: search_query=&id_list=2512.04065v1&start=0&max_results=10
In todays increasing world, it is very important to have good hailing services like Ola, Uber, and Rapido as it is very essential for our daily transportation. Users often face difficulties in choosin…
- arXiv Query: search_query=&id_list=2512.04062v1&start=0&max_results=10
The rapid proliferation of benchmarks has created significant challenges in reproducibility, transparency, and informed decision-making. However, unlike datasets and models -- which benefit from struc…
- arXiv Query: search_query=&id_list=2512.04058v1&start=0&max_results=10
The discovery of Bell that there exist quantum correlations that cannot be reproduced classically is one of the most important in the foundations of quantum mechanics, as well as having practical impl…
- arXiv Query: search_query=&id_list=2512.04051v1&start=0&max_results=10
Modern deep learning models require immense computational resources, motivating research into low-precision training. Quantised training addresses this by representing training components in low-bit i…
- arXiv Query: search_query=&id_list=2512.04048v1&start=0&max_results=10
Sign Language Production (SLP) is the process of converting the complex input text into a real video. Most previous works focused on the Text2Gloss, Gloss2Pose, Pose2Vid stages, and some concentrated …
- arXiv Query: search_query=&id_list=2512.04047v1&start=0&max_results=10
In democracies, major policy decisions typically require some form of majority or consensus, so elites must secure mass support to govern. Historically, elites could shape support only through limited…
- arXiv Query: search_query=&id_list=2512.04044v1&start=0&max_results=10
Watermarking aims to embed hidden signals in generated text that can be reliably detected when given access to a secret key. Open-weight language models pose acute challenges for such watermarking sch…
- arXiv Query: search_query=&id_list=2512.04039v1&start=0&max_results=10
This thesis presents novel contributions in two primary areas: advancing the efficiency of generative models, particularly normalizing flows, and applying generative models to solve real-world compute…
- arXiv Query: search_query=&id_list=2512.04034v1&start=0&max_results=10
Why do state-of-the-art OOD detection methods exhibit catastrophic failure when models are trained on single-domain datasets? We provide the first theoretical explanation for this phenomenon through t…
- arXiv Query: search_query=&id_list=2512.04031v1&start=0&max_results=10
This work investigates whether large language models (LLMs) offer advantages over traditional neural networks for astronomical data processing, in regimes with non-Gaussian, non-stationary noise and l…
- arXiv Query: search_query=&id_list=2512.04030v1&start=0&max_results=10
Community currencies (CCs) have been adopting innovative systems to overcome implementational hurdles from issuing paper currencies. Using a qualitative approach, this paper examined this digital tran…
- arXiv Query: search_query=&id_list=2512.04025v1&start=0&max_results=10
Attention mechanisms are the core of foundation models, but their quadratic complexity remains a critical bottleneck for scaling. This challenge has driven the development of efficient attention mecha…
- arXiv Query: search_query=&id_list=2512.04016v1&start=0&max_results=10
Quantum key distribution (QKD) security fundamentally relies on the ability to distinguish genuine quantum correlations from classical eavesdropper simulations, yet existing certification methods lack…
- arXiv Query: search_query=&id_list=2512.04015v1&start=0&max_results=10
Modeling group actions on latent representations enables controllable transformations of high-dimensional image data. Prior works applying group-theoretic priors or modeling transformations typically …
- arXiv Query: search_query=&id_list=2512.04013v1&start=0&max_results=10
As augmented large language models (LLMs) with external tools become increasingly popular in web applications, improving augmented LLM inference serving efficiency and optimizing service-level objecti…
- arXiv Query: search_query=&id_list=2512.04009v1&start=0&max_results=10
In online marketplaces like Airbnb, users frequently engage in comparison shopping before making purchase decisions. Despite the prevalence of this behavior, a significant disconnect persists between …
- arXiv Query: search_query=&id_list=2512.04008v1&start=0&max_results=10
Training with differential privacy (DP) provides a guarantee to members in a dataset that they cannot be identified by users of the released model. However, those data providers, and, in general, the …
- arXiv Query: search_query=&id_list=2512.04007v1&start=0&max_results=10
Sketches are simple human hand-drawn abstractions of complex scenes and real-world objects. Although the field of sketch representation learning has advanced significantly, there is still a gap in und…
- arXiv Query: search_query=&id_list=2512.04006v1&start=0&max_results=10
Cross-entropy (CE) training loss dominates deep learning practice, yet existing theory often relies on simplifications, either replacing it with squared loss or restricting to convex models, that miss…
- arXiv Query: search_query=&id_list=2512.03995v1&start=0&max_results=10
Animals with foveated vision, including humans, experience microsaccades, small, rapid eye movements that they are not aware of. Inspired by this phenomenon, we develop a method for "Artificial Micros…
- arXiv Query: search_query=&id_list=2512.03990v1&start=0&max_results=10
Vortex-Induced Vibrations (VIVs) of cylindrical structures present significant challenges in various engineering applications, including marine risers, tall buildings, and renewable energy systems. He…
- arXiv Query: search_query=&id_list=2512.03988v1&start=0&max_results=10
Consumer-grade smartwatches offer a new personalized health monitoring option for general consumers globally as cardiovascular diseases continue to prevail as the leading cause of global mortality. Th…
- arXiv Query: search_query=&id_list=2512.03981v1&start=0&max_results=10
Drag-based image editing using generative models provides intuitive control over image structures. However, existing methods rely heavily on manually provided masks and textual prompts to preserve sem…
- arXiv Query: search_query=&id_list=2512.03979v1&start=0&max_results=10
Diffusion models show promise for dynamic scene deblurring; however, existing studies often fail to leverage the intrinsic nature of the blurring process within diffusion models, limiting their full p…
- arXiv Query: search_query=&id_list=2512.03977v1&start=0&max_results=10
Finite abstractions are discrete approximations of dynamical systems, such that the set of abstraction trajectories contains, in a formal sense, all system trajectories. There is a consensus that abst…
- arXiv Query: search_query=&id_list=2512.03975v1&start=0&max_results=10
Online platforms connect users with relevant products and services using ads. A key challenge is that a user's search query often leaves their true intent ambiguous. Typically, platforms passively pre…
- arXiv Query: search_query=&id_list=2512.03967v1&start=0&max_results=10
In the vision domain, dataset distillation arises as a technique to condense a large dataset into a smaller synthetic one that exhibits a similar result in the training process. While image data prese…
- arXiv Query: search_query=&id_list=2512.03964v1&start=0&max_results=10
Tuning-free face personalization methods have developed along two distinct paradigms: text embedding approaches that map facial features into the text embedding space, and adapter-based methods that i…
- arXiv Query: search_query=&id_list=2512.03962v1&start=0&max_results=10
Deep Image Prior (DIP) has recently emerged as a promising one-shot neural-network based image reconstruction method. However, DIP has seen limited application to 3D image reconstruction problems. In …
- arXiv Query: search_query=&id_list=2512.03960v1&start=0&max_results=10
Maximal clique enumeration is a fundamental graph mining task, but its utility is often limited by computational intractability and highly redundant output. To address these challenges, we introduce \…
- arXiv Query: search_query=&id_list=2512.03958v1&start=0&max_results=10
Agricultural robots are serving as powerful assistants across a wide range of agricultural tasks, nevertheless, still heavily relying on manual operations or railway systems for movement. The AgriVLN …
- arXiv Query: search_query=&id_list=2512.03947v1&start=0&max_results=10
Large-scale problems in data science are often modeled with optimization, and the optimization model is usually solved with first-order methods that may converge at a sublinear rate. Therefore, it is …
- arXiv Query: search_query=&id_list=2512.03943v1&start=0&max_results=10
While recent developments in large language models have improved bias detection and classification, sensitive subjects like religion still present challenges because even minor errors can result in se…
- arXiv Query: search_query=&id_list=2512.03937v1&start=0&max_results=10
Social and information networks may become polarized, leading to echo chambers and political gridlock. Accurately measuring this phenomenon is a critical challenge. Existing measures often conflate ge…
- arXiv Query: search_query=&id_list=2512.03934v1&start=0&max_results=10
A fundamental open question asking whether all real-valued strongly quasiconvex functions defined on $\mathbb R^n$ are necessarily continuous, akin to their convex counterparts, is answered in detail …
- arXiv Query: search_query=&id_list=2512.03932v1&start=0&max_results=10
Deep learning-based image restoration has achieved significant success. However, when addressing real-world degradations, model performance is limited by the quality of ground-truth images in datasets…
- arXiv Query: search_query=&id_list=2512.03928v1&start=0&max_results=10
We introduce Density-Informed VAE (DiVAE), a lightweight, data-driven regularizer that aligns the VAE log-prior probability $\log p_Z(z)$ with a log-density estimated from data. Standard VAEs match la…
- arXiv Query: search_query=&id_list=2512.03923v1&start=0&max_results=10
Solving partial differential equations (PDEs) for reservoir seepage is critical for optimizing oil and gas field development and predicting production performance. Traditional numerical methods suffer…
- arXiv Query: search_query=&id_list=2512.03915v1&start=0&max_results=10
In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load balancing: r…
- arXiv Query: search_query=&id_list=2512.03913v1&start=0&max_results=10
Prior Vision-Language-Action (VLA) models are typically trained on teleoperated successful demonstrations, while discarding numerous failed attempts that occur naturally during data collection. Howeve…
- arXiv Query: search_query=&id_list=2512.03899v1&start=0&max_results=10
Fuzzy simplicial sets have become an object of interest in dimensionality reduction and manifold learning, most prominently through their role in UMAP. However, their definition through tools from alg…
- arXiv Query: search_query=&id_list=2512.03895v1&start=0&max_results=10
The rapid advancement of artificial intelligence (AI) and deep learning (DL) has catalyzed the emergence of several optimization-driven subfields, notably neuromorphic computing and quantum machine le…
- arXiv Query: search_query=&id_list=2512.03891v1&start=0&max_results=10
Active suspension systems are critical for enhancing vehicle comfort, safety, and stability, yet their performance is often limited by fixed hardware designs and control strategies that cannot adapt t…
- arXiv Query: search_query=&id_list=2512.03886v1&start=0&max_results=10
This paper presents an Autonomous System (AS) architecture for vehicles in a closed circuit. The AS performs precision tasks including computer vision for environment perception, positioning and mappi…
- arXiv Query: search_query=&id_list=2512.03879v1&start=0&max_results=10
Spiking neural networks (SNNs) have emerged as a promising direction in both computational neuroscience and artificial intelligence, offering advantages such as strong biological plausibility and low …
- arXiv Query: search_query=&id_list=2512.03878v1&start=0&max_results=10
The growing global population of older adults, combined with ongoing healthcare workforce shortages, has increased reliance on informal caregivers, including family members and friends who provide unp…
- arXiv Query: search_query=&id_list=2512.03874v1&start=0&max_results=10
Dexterous grasp generation aims to produce grasp poses that align with task requirements and human interpretable grasp semantics. However, achieving semantically controllable dexterous grasp synthesis…
- arXiv Query: search_query=&id_list=2512.03872v1&start=0&max_results=10
This paper investigates wireless systems aided by dual-polarized intelligent surfaces. We compare reconfigurable intelligent surface (RIS), which adjust their reflection matrices, with movable signals…
- arXiv Query: search_query=&id_list=2512.03866v1&start=0&max_results=10
This study presents an agent-based model (ABM) developed to simulate staff and resident interactions within a synthetic aged care facility, capturing movement, task execution, and proximity-based cont…
- arXiv Query: search_query=&id_list=2512.03864v1&start=0&max_results=10
Smart manufacturing can significantly improve efficiency and reduce energy consumption, yet the energy demands of AI models may offset these gains. This study utilizes in-situ sensing-based prediction…
- arXiv Query: search_query=&id_list=2512.03862v1&start=0&max_results=10
While transformer-based architectures have taken computer vision and NLP by storm, they often require a vast amount of parameters and training data to attain strong performance. In this work, we exper…
- arXiv Query: search_query=&id_list=2512.03854v1&start=0&max_results=10
Artificial intelligence (AI) is increasingly used in digital pathology. Publicly available histopathology datasets remain scarce, and those that do exist predominantly represent Western populations. C…
- arXiv Query: search_query=&id_list=2512.03852v1&start=0&max_results=10
Traffic image restoration under adverse weather conditions remains a critical challenge for intelligent transportation systems. Existing methods primarily focus on spatial-domain modeling but neglect …
- arXiv Query: search_query=&id_list=2512.03851v1&start=0&max_results=10
Neural networks have become a widely adopted tool for modeling nonlinear dynamical systems from data. However, the choice of training strategy remains a key design decision, particularly for simulatio…
- arXiv Query: search_query=&id_list=2512.03848v1&start=0&max_results=10
Cardiac image analysis remains fragmented across tasks: anatomical segmentation, disease classification, and grounded clinical report generation are typically handled by separate networks trained unde…
- arXiv Query: search_query=&id_list=2512.03847v1&start=0&max_results=10
Reinforcement learning (RL) has shown strong performance in LLM post-training, but real-world deployment often involves noisy or incomplete supervision. In such settings, complex and unreliable superv…
- arXiv Query: search_query=&id_list=2512.03846v1&start=0&max_results=10
This paper presents a novel fault-tolerant control framework for steam temperature regulation in Heat Recovery Steam Generators (HRSGs) subject to actuator faults. Addressing the critical challenge of…
- arXiv Query: search_query=&id_list=2512.03844v1&start=0&max_results=10
Prevailing Dataset Distillation (DD) methods leveraging generative models confront two fundamental limitations. First, despite pioneering the use of diffusion models in DD and delivering impressive pe…
- arXiv Query: search_query=&id_list=2512.03837v1&start=0&max_results=10
Human action recognition (HAR) in videos has garnered widespread attention due to the rich information in RGB videos. Nevertheless, existing methods for extracting deep features from RGB videos face c…
- arXiv Query: search_query=&id_list=2512.03834v1&start=0&max_results=10
Unet and its variations have been standard in semantic image segmentation, especially for computer assisted radiology. Current Unet architectures iteratively downsample spatial resolution while increa…
- arXiv Query: search_query=&id_list=2512.03827v1&start=0&max_results=10
Proliferation of cheap and accessible cameras makes it possible to measure a subject's breath rate from video footage alone. Recent works on this topic have proposed a variety of approaches for accura…
- arXiv Query: search_query=&id_list=2512.03818v1&start=0&max_results=10
Due to their architecture and vast pre-training data, large language models (LLMs) demonstrate strong text classification performance. However, LLM output - here, the category assigned to a text - dep…
- arXiv Query: search_query=&id_list=2512.03817v1&start=0&max_results=10
Egyptian hieroglyphs, the ancient Egyptian writing system, are composed entirely of drawings. Translating these glyphs into English poses various challenges, including the fact that a single glyph can…
- arXiv Query: search_query=&id_list=2512.03807v1&start=0&max_results=10
Boolean matrix factorization (BMF) approximates a given binary input matrix as the product of two smaller binary factors. Unlike binary matrix factorization based on standard arithmetic, BMF employs t…
- arXiv Query: search_query=&id_list=2512.03805v1&start=0&max_results=10
Dynamic Algorithm Configuration (DAC) studies the efficient identification of control policies for parameterized optimization algorithms. Numerous studies have leveraged the robustness of decision-mak…
- arXiv Query: search_query=&id_list=2512.03804v1&start=0&max_results=10
Electrocardiogram is a useful diagnostic signal that can detect cardiac abnormalities by measuring the electrical activity generated by the heart. Due to its rapid, non-invasive, and richly informativ…
- arXiv Query: search_query=&id_list=2512.03803v1&start=0&max_results=10
Contrastive decoding is a lightweight and effective inference-time method that improves the quality of text generation in Large Language Models. However, algorithms such as DoLa (Decoding by Contrasti…
- arXiv Query: search_query=&id_list=2512.03802v1&start=0&max_results=10
Integrated sensing and communication (ISAC) is a promising paradigm for future wireless systems due to spectrum reuse, hardware sharing, and joint waveform design. In dynamic scenes, Doppler shifts de…
- arXiv Query: search_query=&id_list=2512.03787v1&start=0&max_results=10
Clinical pathways are specialized healthcare plans that model patient treatment procedures. They are developed to provide criteria-based progression and standardize patient treatment, thereby improvin…
- arXiv Query: search_query=&id_list=2512.03786v1&start=0&max_results=10
Smartphones and smartwatches are ever-present in daily life, and provide a rich source of information on their users' behaviour. In particular, digital traces derived from the phone's embedded movemen…
- arXiv Query: search_query=&id_list=2512.03784v1&start=0&max_results=10
Sleep disorders have emerged as a critical global health issue, highlighting the urgent need for effective and widely accessible intervention technologies. Non-invasive brain stimulation has garnered …
- arXiv Query: search_query=&id_list=2512.03783v1&start=0&max_results=10
Recent advances in Omni models have enabled unified multimodal perception and generation. However, most existing systems still exhibit rigid reasoning behaviors, either overthinking simple problems or…
- arXiv Query: search_query=&id_list=2512.03779v1&start=0&max_results=10
Parametric representations of various functions are fundamental tools in science and engineering. This paper introduces a fixed-initial-state constant-input dynamical system (FISCIDS) representation, …
- arXiv Query: search_query=&id_list=2512.03772v1&start=0&max_results=10
This paper presents an auto-tuning framework for torque-based Nonlinear Model Predictive Control (nMPC), where the MPC serves as a real-time controller for optimal joint torque commands. The MPC param…
- arXiv Query: search_query=&id_list=2512.03766v1&start=0&max_results=10
The study of networks derived from infrastructure systems has received considerable attention, yet the accessibility of such systems, particularly within public transit networks, remains comparatively…
- arXiv Query: search_query=&id_list=2512.03764v1&start=0&max_results=10
Policy gradient algorithms are widely used in reinforcement learning and belong to the class of approximate dynamic programming methods. This paper studies two key policy gradient algorithms - the Nat…
- arXiv Query: search_query=&id_list=2512.03762v1&start=0&max_results=10
Automatic Heuristic Design (AHD) has gained traction as a promising solution for solving combinatorial optimization problems (COPs). Large Language Models (LLMs) have emerged and become a promising ap…
- arXiv Query: search_query=&id_list=2512.03759v1&start=0&max_results=10
Reinforcement Learning (RL) has proven highly effective for autoregressive language models, but adapting these methods to diffusion large language models (dLLMs) presents fundamental challenges. The c…
- arXiv Query: search_query=&id_list=2512.03755v1&start=0&max_results=10
Urban analytics increasingly relies on AI-driven trajectory analysis, yet current approaches suffer from methodological fragmentation: trajectory learning captures movement patterns but ignores spatia…
- arXiv Query: search_query=&id_list=2512.03751v1&start=0&max_results=10
Previously, image interpretation in radiology relied heavily on manual methods. However, manual classification of brain tumor medical images is time-consuming and labor-intensive. Even with shallow co…
- arXiv Query: search_query=&id_list=2512.03747v1&start=0&max_results=10
In controlled industrial environments, ensuring safety and performance during controller tuning is a challenging and critical task. In particular, control loops in compressor-plenum-throttle systems c…
- arXiv Query: search_query=&id_list=2512.03745v1&start=0&max_results=10
Two-stage learning pipeline has achieved promising results in unsupervised visible-infrared person re-identification (USL-VI-ReID). It first performs single-modality learning and then operates cross-m…
- arXiv Query: search_query=&id_list=2512.03743v1&start=0&max_results=10
Dexterous manipulation is limited by both control and design, without consensus as to what makes manipulators best for performing dexterous tasks. This raises a fundamental challenge: how should we de…
- arXiv Query: search_query=&id_list=2512.03739v1&start=0&max_results=10
Multi-stage decision problems under uncertainty can be efficiently solved with the Stochastic Dual Dynamic Programming (SDDP) algorithm. However, traditional implementations require all stage problems…
- arXiv Query: search_query=&id_list=2512.03732v1&start=0&max_results=10
As a key circular economy strategy, remanufacturing allows original equipment manufacturers (OEMs) to reduce waste by restoring used products to ``as-new'' conditions. This paper investigates an OEM's…
- arXiv Query: search_query=&id_list=2512.03730v1&start=0&max_results=10
Adversarial perturbations are a useful way to expose vulnerabilities in object detectors. Existing perturbation methods are frequently white-box and architecture specific. More importantly, while they…
- arXiv Query: search_query=&id_list=2512.03728v1&start=0&max_results=10
The 3rd Generation Partnership Project (3GPP), the standards body for mobile networks, is in the final phase of Release 19 standardization and is beginning Release 20. Artificial Intelligence/ Machine…
- arXiv Query: search_query=&id_list=2512.03727v1&start=0&max_results=10
Probabilistic Graphical Models (PGMs) encode conditional dependencies among random variables using a graph -nodes for variables, links for dependencies- and factorize the joint distribution into lower…
- arXiv Query: search_query=&id_list=2512.03726v1&start=0&max_results=10
Let $M$ be a complete connected Riemannian manifold. For $n \geq 0$, we endow the Wasserstein space $P^{(n)}_2(M) = P_2(\ldots P_2(M)\ldots)$, equipped with the Wasserstein distance $W_2$, with a vari…
- arXiv Query: search_query=&id_list=2512.03724v1&start=0&max_results=10
The Vision-Language-Action (VLA) models have demonstrated remarkable performance on embodied tasks and shown promising potential for real-world applications. However, current VLAs still struggle to pr…
- arXiv Query: search_query=&id_list=2512.03719v1&start=0&max_results=10
Over-the-Air Federated Learning (AirFL) is an emerging paradigm that tightly integrates wireless signal processing and distributed machine learning to enable scalable AI at the network edge. By levera…
- arXiv Query: search_query=&id_list=2512.03718v1&start=0&max_results=10
We study the computational problem of computing a fair means clustering of discrete vectors, which admits an equivalent formulation as editing a colored matrix into one with few distinct color-balance…
- arXiv Query: search_query=&id_list=2512.03715v1&start=0&max_results=10
This paper presents DINO-RotateMatch, a deep-learning framework designed to address the chal lenges of image matching in large-scale 3D reconstruction from unstructured Internet images. The method i…
- arXiv Query: search_query=&id_list=2512.03712v1&start=0&max_results=10
This paper presents a hybrid Sequential Convex Programming (SCP) framework for solving the unbalanced three-phase AC Optimal Power Flow (OPF) problem. The method combines a fixed McCormick outer appro…
- arXiv Query: search_query=&id_list=2512.03707v1&start=0&max_results=10
In collaborative human-robot tasks, safety requires not only avoiding collisions but also ensuring safe, intentional physical contact. We present ContactRL, a reinforcement learning (RL) based framewo…
- arXiv Query: search_query=&id_list=2512.03704v1&start=0&max_results=10
Long-context dialogue systems suffer from State Inertia, where static constraints prevent models from resolving conflicts between evolving user intents and established historical context. To address t…
- arXiv Query: search_query=&id_list=2512.03703v1&start=0&max_results=10
A novel scalable pixel-based reconfigurable beamforming network (PRBFN) that can be used to form a Fluid Antenna System (FAS), referred to as a PRBFN-FAS, is introduced. The concept of FAS has emerged…
- arXiv Query: search_query=&id_list=2512.03701v1&start=0&max_results=10
Perceptual similarity scores that align with human vision are critical for both training and evaluating computer vision models. Deep perceptual losses, such as LPIPS, achieve good alignment but rely o…
- arXiv Query: search_query=&id_list=2512.03696v1&start=0&max_results=10
We propose a novel QTGNN framework for detecting fraudulent transactions in large-scale financial networks. By integrating quantum embedding, variational graph convolutions, and topological data analy…
- arXiv Query: search_query=&id_list=2512.03694v1&start=0&max_results=10
Multi-Agent Systems (MAS) with large language models (LLMs) enable personalized education but risk leaking minors personally identifiable information (PII) via unstructured dialogue. Existing privacy …
- arXiv Query: search_query=&id_list=2512.03688v1&start=0&max_results=10
We present AITutor-EvalKit, an application that uses language technology to evaluate the pedagogical quality of AI tutors, provides software for demonstration and evaluation, as well as model inspecti…
- arXiv Query: search_query=&id_list=2512.03687v1&start=0&max_results=10
Active visual perception refers to the ability of a system to dynamically engage with its environment through sensing and action, allowing it to modify its behavior in response to specific goals or un…
- arXiv Query: search_query=&id_list=2512.03684v1&start=0&max_results=10
This paper presents an autonomous tomato-harvesting system built around a hybrid robotic gripper that combines six soft auxetic fingers with a rigid exoskeleton and a latex basket to achieve gentle, c…
- arXiv Query: search_query=&id_list=2512.03680v1&start=0&max_results=10
This study proposes a fuzzy-adjusted nonlinear control method based on torque jitter output limit constraints for overhead crane systems with double pendulum effects. The proposed control method can e…
- arXiv Query: search_query=&id_list=2512.03678v1&start=0&max_results=10
While tabular machine learning has achieved remarkable success, temporal distribution shifts pose significant challenges in real-world deployment, as the relationships between features and labels cont…
- arXiv Query: search_query=&id_list=2512.03676v1&start=0&max_results=10
Large language models (LLMs) can reliably distinguish grammatical from ungrammatical sentences, but how grammatical knowledge is represented within the models remains an open question. We investigate …
- arXiv Query: search_query=&id_list=2512.03672v1&start=0&max_results=10
Hydro-Science and Engineering (Hydro-SE) is a critical and irreplaceable domain that secures human water supply, generates clean hydropower energy, and mitigates flood and drought disasters. Featuring…
- arXiv Query: search_query=&id_list=2512.03671v1&start=0&max_results=10
The rise of Artificial Intelligence (AI) language technologies, particularly generative AI (GenAI) chatbots accessible via conversational interfaces, is transforming digital interactions. While these …
- arXiv Query: search_query=&id_list=2512.03661v1&start=0&max_results=10
Activation steering has emerged as a powerful method for guiding the behavior of generative models towards desired outcomes such as toxicity mitigation. However, most existing methods apply interventi…
- arXiv Query: search_query=&id_list=2512.03656v1&start=0&max_results=10
Accurate electricity consumption forecasting is essential for demand management and smart grid operations. This paper introduces a unified deep learning framework that integrates cyclical temporal enc…
- arXiv Query: search_query=&id_list=2512.03653v1&start=0&max_results=10
This study proposes a method to enhance neural network performance when training data and application data are not very similar, e.g., out of distribution problems, as well as pattern and regime shift…
- arXiv Query: search_query=&id_list=2512.03646v1&start=0&max_results=10
We formulate a continuous-time competitive equilibrium model of irreversible capacity investment in which a continuum of heterogeneous producers supplies a single non-durable good subject to exogenous…
- arXiv Query: search_query=&id_list=2512.03643v1&start=0&max_results=10
DeepSeek-OCR demonstrates that rendered text can be reconstructed with high fidelity from a small number of vision tokens. This finding has sparked excitement about vision-based context compression fo…
- arXiv Query: search_query=&id_list=2512.03630v1&start=0&max_results=10
Motion planning schemes are used for planning motions of a manipulator from an initial pose to a final pose during a task execution. A motion planning scheme generally comprises of a trajectory planni…
- arXiv Query: search_query=&id_list=2512.03627v1&start=0&max_results=10
Despite rapid progress in large-scale language and vision models, AI agents still suffer from a fundamental limitation: they cannot remember. Without reliable memory, agents catastrophically forget pa…
- arXiv Query: search_query=&id_list=2512.03625v1&start=0&max_results=10
Although the remarkable performance of deep neural networks (DNNs) in image classification, their vulnerability to adversarial attacks remains a critical challenge. Most existing detection methods rel…
- arXiv Query: search_query=&id_list=2512.03623v1&start=0&max_results=10
Despite the promising capability of multimodal foundation models, their application to the generation of meteorological products and services remains nascent. To accelerate aspiration and adoption, we…
- arXiv Query: search_query=&id_list=2512.03620v1&start=0&max_results=10
The protection of Intellectual Property (IP) in Large Language Models (LLMs) represents a critical challenge in contemporary AI research. While fingerprinting techniques have emerged as a fundamental …
- arXiv Query: search_query=&id_list=2512.03619v1&start=0&max_results=10
Video generation has achieved remarkable progress in visual fidelity and controllability, enabling conditioning on text, layout, or motion. Among these, motion control - specifying object dynamics and…
- arXiv Query: search_query=&id_list=2512.03610v1&start=0&max_results=10
Merging neural networks without retraining is central to federated and distributed learning. Common methods such as weight averaging or Fisher merging often lose accuracy and are unstable across seeds…
- arXiv Query: search_query=&id_list=2512.03607v1&start=0&max_results=10
This paper proposes DeepRule, an integrated framework for automated business rule generation in retail assortment and pricing optimization. Addressing the systematic misalignment between existing theo…
- arXiv Query: search_query=&id_list=2512.03606v1&start=0&max_results=10
Accurate marine wind forecasts are essential for safe navigation, ship routing, and energy operations, yet they remain challenging because observations over the ocean are sparse, heterogeneous, and te…
- arXiv Query: search_query=&id_list=2512.03605v1&start=0&max_results=10
In this paper a position-tracking controller for quadrotors based on perception feedback is developed, which directly uses measurements from onboard sensors such as low cost IMUs and GPS to generate t…
- arXiv Query: search_query=&id_list=2512.03604v1&start=0&max_results=10
Event-Triggered Control (ETC) reduces communication overhead in networked systems by transmitting only when stability requires it. Conventional mechanisms use isotropic error thresholds ($\|e\| \le σ\…
- arXiv Query: search_query=&id_list=2512.03598v1&start=0&max_results=10
Partial dental point clouds often suffer from large missing regions caused by occlusion and limited scanning views, which bias encoder-only global features and force decoders to hallucinate structures…
- arXiv Query: search_query=&id_list=2512.03597v1&start=0&max_results=10
Medical image segmentation is a cornerstone of modern clinical diagnostics. While Vision Transformers that leverage shifted window-based self-attention have established new benchmarks in this field, t…
- arXiv Query: search_query=&id_list=2512.03593v1&start=0&max_results=10
We present a CloseUpAvatar - a novel approach for articulated human avatar representation dealing with more general camera motions, while preserving rendering quality for close-up views. CloseUpAvatar…
- arXiv Query: search_query=&id_list=2512.03592v1&start=0&max_results=10
The RNA inverse folding problem, a key challenge in RNA design, involves identifying nucleotide sequences that can fold into desired secondary structures, which are critical for ensuring molecular sta…
- arXiv Query: search_query=&id_list=2512.03584v1&start=0&max_results=10
This paper presents the VesselEdge system, which leverages federated learning and bandwidth-constrained trajectory compression to enhance maritime situational awareness by extending AIS coverage. Vess…
- arXiv Query: search_query=&id_list=2512.03582v1&start=0&max_results=10
Narratives are the cognitive and emotional scaffolds of propaganda. They organize isolated persuasive techniques into coherent stories that justify actions, attribute blame, and evoke identification w…
- arXiv Query: search_query=&id_list=2512.03580v1&start=0&max_results=10
We propose the Dynamic Optical Test for Bot Identification (DOT-BI): a quick and easy method that uses human perception of motion to differentiate between human respondents and automated systems in su…
- arXiv Query: search_query=&id_list=2512.03579v1&start=0&max_results=10
Optimal transport (OT) and Gromov-Wasserstein (GW) alignment provide interpretable geometric frameworks for comparing, transforming, and aggregating heterogeneous datasets -- tasks ubiquitous in data …
- arXiv Query: search_query=&id_list=2512.03578v1&start=0&max_results=10
Time series extrinsic regression (TSER) refers to the task of predicting a continuous target variable from an input time series. It appears in many domains, including healthcare, finance, environmenta…
- arXiv Query: search_query=&id_list=2512.03574v1&start=0&max_results=10
Scene Text Editing (STE) involves replacing text in a scene image with new target text while preserving both the original text style and background texture. Existing methods suffer from two major chal…
- arXiv Query: search_query=&id_list=2512.03570v1&start=0&max_results=10
Wireless sensor networks (WSNs) are employed across a wide range of industrial applications where ultra-low power consumption is a critical prerequisite. At the same time, these systems must maintain …
- arXiv Query: search_query=&id_list=2512.03568v1&start=0&max_results=10
Conducting usability testing like cognitive walkthrough (CW) can be costly. Recent developments in large language models (LLMs), with visual reasoning and UI navigation capabilities, present opportuni…
- arXiv Query: search_query=&id_list=2512.03566v1&start=0&max_results=10
Articulated object generation has seen increasing advancements, yet existing models often lack the ability to be conditioned on text prompts. To address the significant gap between textual description…
- arXiv Query: search_query=&id_list=2512.03564v1&start=0&max_results=10
Diffusion models are renowned for their state-of-the-art performance in generating synthetic images. However, concerns related to safety, privacy, and copyright highlight the need for machine unlearni…
- arXiv Query: search_query=&id_list=2512.03562v1&start=0&max_results=10
Integrating demand-responsive mobility services with transit systems is recognized as a practical and effective strategy to mitigate their impact on traffic congestion and the environment. This study …
- arXiv Query: search_query=&id_list=2512.03557v1&start=0&max_results=10
Sequential convex programming has been established as an effective framework for solving nonconvex trajectory planning problems. However, its performance is highly sensitive to problem parameters, inc…
- arXiv Query: search_query=&id_list=2512.03555v1&start=0&max_results=10
In component shape optimization, the component properties are often evaluated by computationally expensive simulations. Such optimization becomes unfeasible when it is focused on a global search requi…
- arXiv Query: search_query=&id_list=2512.03545v1&start=0&max_results=10
Industrial installations across several sectors have seen a dramatic increase in productivity, accuracy and efficiency over the last decade due to expanded utilization of medium voltage, variable spee…
- arXiv Query: search_query=&id_list=2512.03542v1&start=0&max_results=10
Multimodal Large Language Models (MLLMs) excel in numerous vision-language tasks yet suffer from hallucinations, producing content inconsistent with input visuals, that undermine reliability in precis…
- arXiv Query: search_query=&id_list=2512.03540v1&start=0&max_results=10
Cooking is a sequential and visually grounded activity, where each step such as chopping, mixing, or frying carries both procedural logic and visual semantics. While recent diffusion models have shown…
- arXiv Query: search_query=&id_list=2512.03539v1&start=0&max_results=10
Manual operation of microscopes for repetitive tasks in cell biology is a significant bottleneck, consuming invaluable expert time, and introducing human error. Automation is essential, and while Digi…
- arXiv Query: search_query=&id_list=2512.03536v1&start=0&max_results=10
This work presents an experimental performance evaluation of a private 5G airfield network under controlled directional SDR jamming attacks targeting UAV-based UE nodes. Using a QualiPoc Android UE, m…
- arXiv Query: search_query=&id_list=2512.03535v1&start=0&max_results=10
This paper studies open-loop and feedback solutions to leader-follower mean field linear-quadratic-Gaussian games with multiplicative noise by the direct approach. The leader-follower game involves a …
- arXiv Query: search_query=&id_list=2512.03525v1&start=0&max_results=10
Compressed sensing enables sparse sampling but relies on generic bases and random measurements, limiting efficiency and reconstruction quality. Optimal sensor placement uses historcal data to design t…
- arXiv Query: search_query=&id_list=2512.03524v1&start=0&max_results=10
We study the dynamics and equilibria of a new kind of routing games, where players - drivers of future autonomous vehicles - may switch between individual (HDV) and collective (CAV) routing. In indivi…
- arXiv Query: search_query=&id_list=2512.03522v1&start=0&max_results=10
Robots are often required to localize in environments with unknown object classes and semantic ambiguity. However, when performing global localization using semantic objects, high semantic ambiguity i…
- arXiv Query: search_query=&id_list=2512.03521v1&start=0&max_results=10
Multimodal Emotion Recognition in Conversation (MERC) aims to predict speakers' emotions by integrating textual, acoustic, and visual cues. Existing approaches either struggle to capture complex cross…
- arXiv Query: search_query=&id_list=2512.03519v1&start=0&max_results=10
Developing high-stakes autonomous systems that include Artificial Intelligence (AI) components is complex; the consequences of errors can be catastrophic, yet it is challenging to plan for all operati…
- arXiv Query: search_query=&id_list=2512.03514v1&start=0&max_results=10
Multimodal document retrieval systems have shown strong progress in aligning visual and textual content for semantic search. However, most existing approaches remain heavily English-centric, limiting …
- arXiv Query: search_query=&id_list=2512.03512v1&start=0&max_results=10
Electrical impedance tomography (EIT) provides an attractive solution for large-area tactile sensing due to its minimal wiring and shape flexibility, but its nonlinear inverse problem often leads to s…
- arXiv Query: search_query=&id_list=2512.03510v1&start=0&max_results=10
Crowdsourcing enables scalable autonomous driving map construction, but low-cost sensor noise hinders quality from improving with data volume. We propose CSMapping, a system that produces accurate sem…
- arXiv Query: search_query=&id_list=2512.03509v1&start=0&max_results=10
This paper presents a preliminary investigation into automated dance movement analysis using contemporary computer vision techniques. We propose a proof-of-concept framework that integrates YOLOv8 and…
- arXiv Query: search_query=&id_list=2512.03508v1&start=0&max_results=10
Recent domain generalized semantic segmentation (DGSS) studies have achieved notable improvements by distilling semantic knowledge from Vision-Language Models (VLMs). However, they overlook the semant…
- arXiv Query: search_query=&id_list=2512.03506v1&start=0&max_results=10
Integrated Sensing and Communication (ISAC) has been identified as a key 6G application by ITU and 3GPP. Channel measurement and modeling is a prerequisite for ISAC system design and has attracted wid…
- arXiv Query: search_query=&id_list=2512.03503v1&start=0&max_results=10
While the reasoning capabilities of Large Language Models (LLMs) excel in analytical tasks such as mathematics and code generation, their utility for abstractive summarization remains widely assumed b…
- arXiv Query: search_query=&id_list=2512.03502v1&start=0&max_results=10
Pinching-antenna systems (PASS) have emerged as a promising technology due to their ability to dynamically reconfigure wireless propagation environments. A novel PASS-based multi-user non-orthogonal m…
- arXiv Query: search_query=&id_list=2512.03499v1&start=0&max_results=10
The Segment Anything Model (SAM) has emerged as a powerful visual foundation model for image segmentation. However, adapting SAM to specific downstream tasks, such as medical and agricultural imaging,…
- 2512.03491v1
…
- arXiv Query: search_query=&id_list=2512.03477v1&start=0&max_results=10
Vision-language models achieve expert-level performance on medical imaging tasks but exhibit significant diagnostic accuracy disparities across demographic groups. We introduce fairness-aware Low-Rank…
- arXiv Query: search_query=&id_list=2512.03471v1&start=0&max_results=10
The global rise in type 2 diabetes underscores the need for scalable and cost-effective screening methods. Current diagnosis requires biochemical assays, which are invasive and costly. Advances in con…
- arXiv Query: search_query=&id_list=2512.03466v1&start=0&max_results=10
Large Language Model (LLM) agents are increasingly studied in multi-turn, multi-agent scenarios, yet most existing setups emphasize open-ended role-play rather than controlled evaluation. We introduce…
- arXiv Query: search_query=&id_list=2512.03465v1&start=0&max_results=10
In this study, we more rigorously evaluated our attack script $\textit{TraceTarnish}$, which leverages adversarial stylometry principles to anonymize the authorship of text-based messages. To ensure t…
- arXiv Query: search_query=&id_list=2512.03464v1&start=0&max_results=10
In recent years, financial sentiment analysis of public opinion has become increasingly important for market forecasting and risk assessment. However, existing methods often struggle to effectively in…
- arXiv Query: search_query=&id_list=2512.03463v1&start=0&max_results=10
Recent large vision-language models (LVLMs) have been applied to diverse VQA tasks. However, achieving practical performance typically requires task-specific fine-tuning with large numbers of image-te…
- arXiv Query: search_query=&id_list=2512.03460v1&start=0&max_results=10
In cell culture bioprocessing, real-time batch process monitoring (BPM) refers to the continuous tracking and analysis of key process variables such as viable cell density, nutrient levels, metabolite…
- arXiv Query: search_query=&id_list=2512.03459v1&start=0&max_results=10
Human motor control remains agile and robust despite limited sensory information for feedback, a property attributed to the body's ability to perform morphological computation through muscle coordinat…
- arXiv Query: search_query=&id_list=2512.03450v1&start=0&max_results=10
Understanding and representing the structure of 3D objects in an unsupervised manner remains a core challenge in computer vision and graphics. Most existing unsupervised keypoint methods are not desig…
- arXiv Query: search_query=&id_list=2512.03449v1&start=0&max_results=10
Background and Objective: Radiomics of knee MRI requires robust, anatomically meaningful regions of interest (ROIs) that jointly capture cartilage and subchondral bone. Most existing work relies on ma…
Hiring AI researchers or engineers?
Your job post reaches ML engineers, PhD researchers, and AI leads who read arXiv daily. Transparent pricing, real impression data, no middlemen.
