生成时间: 2026-02-17 16:53:24 (UTC+8); Arxiv 发布时间: 2026-02-17 20:00 EST (2026-02-18 09:00 UTC+8)

今天共有 77 篇相关文章

Keyword: reinforcement learning

Reinforcement Learning-Enabled Dynamic Code Assignment for Ultra-Dense IoT Networks: A NOMA-Based Approach to Massive Device Connectivity

基于NOMA的大规模设备连接方法强化学习驱动动态代码赋值:基于NOMA的大规模设备连接方法

A Safety-Constrained Reinforcement Learning Framework for Reliable Wireless Autonomy

一个安全受限的强化学习框架,实现可靠的无线自主

Large Language Model (LLM)-enabled Reinforcement Learning for Wireless Network Optimization

大型语言模型(LLM)支持的强化学习用于无线网络优化

An Overlay Multicast Routing Method Based on Network Situational Aware-ness and Hierarchical Multi-Agent Reinforcement Learning

一种基于网络态势感知和分层多智能体强化学习的叠加多播路由方法

Scaling the Scaling Logic: Agentic Meta-Synthesis of Logic Reasoning

缩放逻辑的尺度化:逻辑推理的能动元综合

Lang2Act: Fine-Grained Visual Reasoning through Self-Emergent Linguistic Toolchains

Lang2Act:通过自发语言工具链实现细粒度视觉推理

Securing SIM-Assisted Wireless Networks via Quantum Reinforcement Learning

通过量子强化学习保护SIM辅助无线网络

General learned delegation by clones

克隆人普遍的学术委派

Cooperative Edge Caching with Large Language Model in Wireless Networks

无线网络中的协作边缘缓存与大型语言模型

Adaptive Value Decomposition: Coordinating a Varying Number of Agents in Urban Systems

自适应价值分解:协调城市系统中不同数量的代理

FireRed-Image-Edit-1.0 Techinical Report

FireRed-Image-Edit-1.0 技术报告

Robust Mean-Field Games with Risk Aversion and Bounded Rationality

具有风险厌恶和有界理性的稳健平均场博弈

Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts

南贝哥4.1-3B:一个推理、对齐并行动的小型通用模型

On-Policy Supervised Fine-Tuning for Efficient Reasoning

政策监督微调以实现高效推理

OpAgent: Operator Agent for Web Navigation

OpAgent:用于网页导航的作代理

Building Autonomous GUI Navigation via Agentic-Q Estimation and Step-Wise Policy Optimization

通过Agentic-Q估计和逐步策略优化构建自主GUI导航

AuTAgent: A Reinforcement Learning Framework for Tool-Augmented Audio Reasoning

AuTAgent:工具增强音频推理的强化学习框架

Skeleton2Stage: Reward-Guided Fine-Tuning for Physically Plausible Dance Generation

Skeleton2Stage:以奖励为导向的微调,实现身体上合理的舞蹈生成

Cast-R1: Learning Tool-Augmented Sequential Decision Policies for Time Series Forecasting

Cast-R1:学习工具增强的顺序决策策略以实现时间序列预测

AnomaMind: Agentic Time Series Anomaly Detection with Tool-Augmented Reasoning

AnomaMind:基于工具增强推理的代理时间序列异常检测

Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation

带有瞬时速度约束的单步动作生成平均流策略

Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings

嵌入强化学习:推理驱动多模态嵌入的强化学习

Beyond Words: Evaluating and Bridging Epistemic Divergence in User-Agent Interaction via Theory of Mind

超越言语:通过心智理论评估并弥合用户-代理互动中的认识论分歧

Enabling Option Learning in Sparse Rewards with Hindsight Experience Replay

在稀疏奖励中启用选项学习,结合事后诸葛体验回放

Probabilistic Reachability Analysis of Multi-scale Voltage Dynamics Using Reinforcement Learning

利用增强学习对多尺度电压动力学进行概率可达性分析

From Pixels to Policies: Reinforcing Spatial Reasoning in Language Models for Content-Aware Layout Design

从像素到策略:在内容感知布局设计中强化语言模型中的空间推理

Why Code, Why Now: Learnability, Computability, and the Real Limits of Machine Learning

为什么是代码,为什么是现在:可学习性、可计算性以及机器学习的真正局限性

You Can Learn Tokenization End-to-End with Reinforcement Learning

你可以通过强化学习端到端学习代币化

Experiential Reinforcement Learning

体验式强化学习

QuRL: Efficient Reinforcement Learning with Quantized Rollout

QuRL:带量化推广的高效强化学习

WoVR: World Models as Reliable Simulators for Post-Training VLA Policies with RL

WoVR:作为强化学习后VLA策略可靠模拟器的世界模型

BRAIN: Bayesian Reasoning via Active Inference for Agentic and Embodied Intelligence in Mobile Networks

BRAIN:通过主动推理实现移动网络中智能与具身智能的贝叶斯推理

CoCoEdit: Content-Consistent Image Editing via Region Regularized Reinforcement Learning

CoCoEdit:通过区域正则化强化学习实现内容一致的图像编辑

Policy Gradient with Adaptive Entropy Annealing for Continual Fine-Tuning

策略梯度与自适应熵退火以实现持续微调

ForgeryVCR: Visual-Centric Reasoning via Efficient Forensic Tools in MLLMs for Image Forgery Detection and Localization

伪造VCR:通过多层次多层次营销(MLLM)中高效的取证工具进行视觉中心推理,实现图像伪造检测和定位

LaViDa-R1: Advancing Reasoning for Unified Multimodal Diffusion Language Models

LaViDa-R1:推进统一多模扩散语言模型的推理

Process-Supervised Multi-Agent Reinforcement Learning for Reliable Clinical Reasoning

过程监督多代理强化学习,实现可靠的临床推理

Deep Dense Exploration for LLM Reinforcement Learning via Pivot-Driven Resampling

通过枢轴驱动重采样进行深度密集探索,用于LLM强化学习

UniRef-Image-Edit: Towards Scalable and Consistent Multi-Reference Image Editing

UniRef-image-edit:迈向可扩展且一致的多参考图像编辑

GeoEyes: On-Demand Visual Focusing for Evidence-Grounded Understanding of Ultra-High-Resolution Remote Sensing Imagery

GeoEyes:按需视觉聚焦,基于证据理解超高分辨率遥感图像

The Interspeech 2026 Audio Reasoning Challenge: Evaluating Reasoning Process Quality for Audio Reasoning Models and Agents

Interspeech 2026 音频推理挑战:评估音频推理模型和代理的推理过程质量

Text Before Vision: Staged Knowledge Injection Matters for Agentic RLVR in Ultra-High-Resolution Remote Sensing Understanding

视觉之前的文本:分阶段的知识注入对超高分辨率遥感理解中的代理RLVR至关重要

REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents

REDSearcher:一个可扩展且成本效益高的长期搜索代理框架

GRAIL: Goal Recognition Alignment through Imitation Learning

圣杯:通过模仿学习实现目标识别对齐

KernelBlaster: Continual Cross-Task CUDA Optimization via Memory-Augmented In-Context Reinforcement Learning

KernelBlaster:通过内存增强上下文强化学习实现持续跨任务CUDA优化

Conformal Signal Temporal Logic for Robust Reinforcement Learning Control: A Case Study

稳健强化学习控制的共形信号时间逻辑:案例研究

Train Less, Learn More: Adaptive Efficient Rollout Optimization for Group-Based Reinforcement Learning

少训练,了解更多:基于群体强化学习的自适应高效推广优化

Data-Driven Network LQG Mean Field Games with Heterogeneous Populations via Integral Reinforcement Learning

数据驱动网络LQG通过整合强化学习实现异构群体的均等场博弈

Zero-Shot Instruction Following in RL via Structured LTL Representations

通过结构化LTL表示实现的强化学习中零帧指令后续

WIMLE: Uncertainty-Aware World Models with IMLE for Sample-Efficient Continuous Control

WIMLE:不确定性感知世界模型,采用IMLE实现样本高效连续控制

AdaptManip: Learning Adaptive Whole-Body Object Lifting and Delivery with Online Recurrent State Estimation

AdaptManip:通过在线重复状态估计学习自适应全身物体的提起与传递

A Q-Learning Approach for Dynamic Resource Management in Three-Tier Vehicular Fog Computing

三层车载雾计算中动态资源管理的Q-学习方法

LACONIC: Length-Aware Constrained Reinforcement Learning for LLM

拉科尼克:LLM中的长度感知受限强化学习

Socially-Weighted Alignment: A Game-Theoretic Framework for Multi-Agent LLM Systems

社会加权对齐:多智能体大型语言模型系统的博弈论框架

Learning Transferability: A Two-Stage Reinforcement Learning Approach for Enhancing Quadruped Robots' Performance in U-Shaped Stair Climbing

学习可迁移性:提升四足机器人U形楼梯爬行性能的两阶段强化学习方法

TikArt: Aperture-Guided Observation for Fine-Grained Visual Reasoning via Reinforcement Learning

TikArt:光圈引导观察,通过强化学习实现细粒度视觉推理

Formally Verifying and Explaining Sepsis Treatment Policies with COOL-MC

正式核实并解释COOL-MC的败血症治疗政策

TWISTED-RL: Hierarchical Skilled Agents for Knot-Tying without Human Demonstrations

TWISTED-RL:层级熟练特工,无需人类演示即可结结

MoRL: Reinforced Reasoning for Unified Motion Understanding and Generation

MoRL:统一运动理解与生成的强化推理

Fluid-Agent Reinforcement Learning

流体代理强化学习

Simulation-based Learning of Electrical Cabinet Assembly Using Robot Skills

基于模拟的机器学习:利用机器人技能学习电柜组装

DriveFine: Refining-Augmented Masked Diffusion VLA for Precise and Robust Driving

DriveFine:精炼增强面罩扩散VLA,实现精准稳健的驾驶

RNM-TD3: N:M Semi-structured Sparse Reinforcement Learning From Scratch

RNM-TD3:N:M 半结构化稀疏强化从零开始学习

Decoupled Continuous-Time Reinforcement Learning via Hamiltonian Flow

通过哈密顿流进行解耦连续时间强化学习

GREAT-EER: Graph Edge Attention Network for Emergency Evacuation Responses

GREAT-EER:Graph Edge 紧急疏散响应关注网络

Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs

进化系统提示学习可以促进LLM的强化学习

ManeuverNet: A Soft Actor-Critic Framework for Precise Maneuvering of Double-Ackermann-Steering Robots with Optimized Reward Functions

ManeuverNet:一个软演员-批评框架,用于优化奖励函数的双阿克曼转向机器人的精确作

Interactionless Inverse Reinforcement Learning: A Data-Centric Framework for Durable Alignment

无交互逆向强化学习:一个以数据为中心的持久对齐框架

Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning

金发姑娘RL:调整任务难度以逃避推理奖励稀少

On the Learning Dynamics of RLVR at the Edge of Competence

关于RLVR在能力边缘的学习动态

BFS-PO: Best-First Search for Large Reasoning Models

BFS-PO:大推理模型的最佳优先搜索

MAC-AMP: A Closed-Loop Multi-Agent Collaboration System for Multi-Objective Antimicrobial Peptide Design

MAC-AMP:一个用于多目标抗菌肽设计的闭环多代理协作系统

Learning User Interests via Reasoning and Distillation for Cross-Domain News Recommendation

通过推理和提炼学习用户兴趣,实现跨领域新闻推荐

Cold-Start Personalization via Training-Free Priors from Structured World Models

通过结构化世界模型的无训练先验进行冷启动个性化

Keyword: diffusion policy

HybridFlow: A Two-Step Generative Policy for Robotic Manipulation

HybridFlow:机器人作的两步生成策略

Semantic-Contact Fields for Category-Level Generalizable Tactile Tool Manipulation

用于类别级可推广触觉工具作的语义接触场

Learning Part-Aware Dense 3D Feature Field for Generalizable Articulated Object Manipulation

学习部分感知密集三维特征场以实现可通用的关节对象作