生成时间: 2025-10-27 16:30:30 (UTC+8); Arxiv 发布时间: 2025-10-27 20:00 EDT (2025-10-28 08:00 UTC+8)

今天共有 25 篇相关文章

Keyword: reinforcement learning

Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards

通过推理过程奖励激励音频 LLM 中一致、有效和可扩展的推理能力

Code-enabled language models can outperform reasoning models on diverse tasks

支持代码的语言模型可以在各种任务上优于推理模型

Safety Assessment in Reinforcement Learning via Model Predictive Control

通过模型预测控制进行强化学习的安全性评估

Robust Point Cloud Reinforcement Learning via PCA-Based Canonicalization

基于PCA的规范化鲁棒点云强化学习

A Reinforcement Learning Framework for Robust and Secure LLM Watermarking

用于稳健且安全的 LLM 水印的强化学习框架

On the Sample Complexity of Differentially Private Policy Optimization

关于差分私有策略优化的样本复杂度

Sensing and Storing Less: A MARL-based Solution for Energy Saving in Edge Internet of Things

传感和存储更少:基于 MARL 的边缘物联网节能解决方案

Confounding Robust Deep Reinforcement Learning: A Causal Approach

混杂鲁棒深度强化学习:因果方法

NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation

NoisyGRPO:通过噪声注入和贝叶斯估计激励多模态 CoT 推理

Enhanced Evolutionary Multi-Objective Deep Reinforcement Learning for Reliable and Efficient Wireless Rechargeable Sensor Networks

增强演化多目标深度强化学习,实现可靠高效的无线可充电传感器网络

Uncertainty-Aware Multi-Objective Reinforcement Learning-Guided Diffusion Models for 3D De Novo Molecular Design

用于 3D De Novo 分子设计的不确定性感知多目标强化学习引导扩散模型

Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference

使用概率推理降低语言模型中不良输出的概率

How Hard is it to Confuse a World Model?

混淆世界模型有多难?

PARL: Prompt-based Agents for Reinforcement Learning

PARL:基于提示的强化学习代理

FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning

FineRS:使用强化学习对小物体进行细粒度推理和分割

Multi-turn Training with Basic Human Feedback Helps Little on LLM Reasoning

具有基本人类反馈的多轮训练对法学硕士推理几乎没有帮助

Boosting Accuracy and Efficiency of Budget Forcing in LLMs via Reinforcement Learning for Mathematical Reasoning

通过数学推理强化学习提高法学硕士预算强制的准确性和效率

Unified token representations for sequential decision models

顺序决策模型的统一令牌表示

MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization

MRO:通过多奖励优化增强扩散语言模型的推理

A Unified Model for Multi-Task Drone Routing in Post-Disaster Road Assessment

灾后道路评估中多任务无人机路由统一模型

RETuning: Upgrading Inference-Time Scaling for Stock Movement Prediction with Large Language Models

RETuning:利用大型语言模型升级股票走势预测的推理时间缩放

Enhancing Tactile-based Reinforcement Learning for Robotic Control

增强基于触觉的强化学习以进行机器人控制

DeepAgent: A General Reasoning Agent with Scalable Toolsets

DeepAgent:具有可扩展工具集的通用推理代理

DEEDEE: Fast and Scalable Out-of-Distribution Dynamics Detection

DEEDEE:快速、可扩展的分布外动态检测

Mechanistic Interpretability for Neural TSP Solvers

神经 TSP 求解器的机理可解释性

Keyword: diffusion policy

There is no result