生成时间: 2026-01-09 16:35:27 (UTC+8); Arxiv 发布时间: 2026-01-09 20:00 EST (2026-01-10 09:00 UTC+8)

今天共有 45 篇相关文章

Keyword: reinforcement learning

Cross-Language Speaker Attribute Prediction Using MIL and RL

利用MIL和RL进行跨语言说话者属性预测

Making Tunable Parameters State-Dependent in Weather and Climate Models with Reinforcement Learning

利用强化学习使可调参数在天气和气候模型中具备状态依赖性

A Future Capabilities Agent for Tactical Air Traffic Control

战术空中交通管制的未来能力代理

Online Action-Stacking Improves Reinforcement Learning Performance for Air Traffic Control

在线动作叠加提升空中交通管制的强化学习性能

Survival Dynamics of Neural and Programmatic Policies in Evolutionary Reinforcement Learning

进化强化学习中神经与程序策略的生存动力学

Enhanced-FQL($λ$), an Efficient and Interpretable RL with novel Fuzzy Eligibility Traces and Segmented Experience Replay

增强型FQL($λ$),一种高效且可解释的强化学习,具有新颖的模糊资格痕迹和分段体验回放

Transformer-based Multi-agent Reinforcement Learning for Separation Assurance in Structured and Unstructured Airspaces

基于变压器的多智能体强化学习,用于结构化和非结构化空域的分离保障

Rate or Fate? RLV$^\varepsilon$R: Reinforcement Learning with Verifiable Noisy Rewards

速率还是命运?RLV$^\varepsilon$R:带可验证噪声奖励的强化学习

Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization

在大型离散动作空间中通过结构化策略初始化改进和加速离线强化学习

Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization

通过门控感知-推理优化解决大型视觉语言模型中的过度思考问题

Multiagent Reinforcement Learning with Neighbor Action Estimation

多智能体强化学习与邻居动作估计

TSSR: Two-Stage Swap-Reward-Driven Reinforcement Learning for Character-Level SMILES Generation

TSSR:两阶段交换奖励驱动强化学习,用于角色级SMILES生成

Not All Steps are Informative: On the Linearity of LLMs' RLVR Training

并非所有步骤都有益:关于大型语言模型RLVR训练的线性性

Reasoning Over Space: Enabling Geographic Reasoning for LLM-Based Generative Next POI Recommendation

推理超越空间:支持基于LLM的生成式下一个POI推荐的地理推理

Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization

文本、代码与视觉的对齐:文本到可视化的多目标强化学习框架

Optimizing Path Planning using Deep Reinforcement Learning for UGVs in Precision Agriculture

精准农业中利用深度强化学习优化UGV路径规划

Learning Dynamics in RL Post-Training for Language Models

语言模型后训练中的强化学习动力学

Nightmare Dreamer: Dreaming About Unsafe States And Planning Ahead

噩梦梦想家:梦见不安全的州并提前规划

ResMAS: Resilience Optimization in LLM-based Multi-agent Systems

ResMAS:基于LLM的多智能体系统中的韧性优化

A Method for Constructing a Digital Transformation Driving Mechanism Based on Semantic Understanding of Large Models

基于对大型模型语义理解构建数字化转型驱动机制的方法

TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning

旅游规划器:带有约束门槛强化学习的竞争共识框架,用于旅行规划

ThinkDrive: Chain-of-Thought Guided Progressive Reinforcement Learning Fine-Tuning for Autonomous Driving

ThinkDrive:思维链引导的渐进强化学习为自动驾驶微调

AM$^3$Safety: Towards Data Efficient Alignment of Multi-modal Multi-turn Safety for MLLMs

AM$^3$安全:迈向多模联运多匝道安全的数据高效对齐

AT$^2$PO: Agentic Turn-based Policy Optimization via Tree Search

AT$^2$PO:通过树搜索实现代理回合策略优化

AgentOCR: Reimagining Agent History via Optical Self-Compression

AgentOCR:通过光学自压缩重新构想代理历史

Thinking-Based Non-Thinking: Solving the Reward Hacking Problem in Training Hybrid Reasoning Models via Reinforcement Learning

基于思维的非思考:通过强化学习解决混合推理模型训练中的奖励黑客问题

SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning

SCALER:用于推理的合成可扩展自适应学习环境

Intelligent resource allocation in wireless networks via deep reinforcement learning

通过深度强化学习实现无线网络中的智能资源分配

RAAR: Retrieval Augmented Agentic Reasoning for Cross-Domain Misinformation Detection

RAAR:跨领域错误信息检测的检索增强代理推理

Flexible Manufacturing Systems Intralogistics: Dynamic Optimization of AGVs and Tool Sharing Using Coloured-Timed Petri Nets and Actor-Critic RL with Actions Masking

灵活制造系统内部物流:利用彩色定时Petri网和带动作掩蔽的actor-critic RL动态优化AGV和工具共享

SKATER: Synthesized Kinematics for Advanced Traversing Efficiency on a Humanoid Robot via Roller Skate Swizzles

SKATER:通过轮滑滑轮滑器在类人机器人上实现先进移动效率的综合运动学

Precision over Diversity: High-Precision Reward Generalizes to Robust Instruction Following

精准度胜于多样性:高精度奖励推广为稳健的跟随指令

Safe Reinforcement Learning Beyond Baseline Control: A Hierarchical Framework for Space Triangle Tethered Formation System

超越基线控制的安全强化学习:空间三角系绳编队系统的层级框架

Text as a Universal Interface for Transferable Personalization

文本作为可转移个性化的通用界面

ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning

ConMax:用于高效思维链推理的信心最大化压缩

A DQN-based model for intelligent network selection in heterogeneous wireless systems

基于DQN的异构无线系统智能网络选择模型

AlgBench: To What Extent Do Large Reasoning Models Understand Algorithms?

AlgBench:大型推理模型对算法的理解程度如何?

On the Hidden Objective Biases of Group-based Reinforcement Learning

关于基于群体的强化学习的隐性客观偏见

Hán Dān Xué Bù (Mimicry) or Qīng Chū Yú Lán (Mastery)? A Cognitive Perspective on Reasoning Distillation in Large Language Models

是模仿(Hán Dān Xué Bù)还是精通(Qīng Chū Yú Lán)?大型语言模型中推理提炼的认知视角

Reinforced Efficient Reasoning via Semantically Diverse Exploration

通过语义多样性探索强化高效推理

Safe Continual Reinforcement Learning Methods for Nonstationary Environments. Towards a Survey of the State of the Art

非固定环境的安全持续强化学习方法。迈向技术现状的概述

Inside Out: Evolving User-Centric Core Memory Trees for Long-Term Personalized Dialogue Systems

《头脑特工队:以用户为中心的核心记忆树演进,支持长期个性化对话系统》

EARL: Energy-Aware Optimization of Liquid State Machines for Pervasive AI

EARL:液态机的能量感知优化,适用于普及人工智能

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

GDPO:多奖励强化学习优化的群体奖励解耦规范化策略优化

RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

RL-AWB:用于低光夜间场景自动白平衡校正的深度强化学习

Keyword: diffusion policy

There is no result