生成时间: 2025-11-14 16:29:55 (UTC+8); Arxiv 发布时间: 2025-11-14 20:00 EST (2025-11-15 09:00 UTC+8)

今天共有 36 篇相关文章

Keyword: reinforcement learning

Scaling Environments for LLM Agents in the Era of Learning from Interaction: A Survey

在从交互中学习的时代扩展 LLM 代理的环境:一项调查

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

MMaDA-Parallel:用于思维感知编辑和生成的多模态大型扩散语言模型

Optimistic Reinforcement Learning with Quantile Objectives

具有分位数目标的乐观强化学习

SEBA: Sample-Efficient Black-Box Attacks on Visual Reinforcement Learning

SEBA:对视觉强化学习的采样高效黑盒攻击

ConstrainedSQL: Training LLMs for Text2SQL via Constrained Reinforcement Learning

ConstrainedSQL:通过约束强化学习训练 Text2SQL 的 LLM

Baby Sophia: A Developmental Approach to Self-Exploration through Self-Touch and Hand Regard

小索菲亚:通过自我触摸和手部关注进行自我探索的发展方法

Out-of-Distribution Generalization with a SPARC: Racing 100 Unseen Vehicles with a Single Policy

使用 SPARC 进行分布外泛化:使用单一策略赛跑 100 辆看不见的车辆

Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO

向小偷致敬:探索去中心化 GRPO 中的攻击和防御

Beyond Monotonicity: Revisiting Factorization Principles in Multi-Agent Q-Learning

超越单调性:重新审视多智能体 Q 学习中的因式分解原理

Uncertainty-Guided Checkpoint Selection for Reinforcement Finetuning of Large Language Models

面向大型语言模型强化精细化的不确定性引导检查点选择

In-Token Rationality Optimization: Towards Accurate and Concise LLM Reasoning via Self-Feedback

代币内理性优化:通过自我反馈实现准确简洁的 LLM 推理

HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning

HierRouter:通过强化学习协调路由专业大型语言模型

DemoTuner: Efficient DBMS Knobs Tuning via LLM-Assisted Demonstration Reinforcement Learning

DemoTuner:通过 LLM 辅助演示强化学习进行高效的 DBMS 旋钮调整

Reinforcing Trustworthiness in Multimodal Emotional Support Systems

加强多模式情感支持系统的可信度

Multi-agent In-context Coordination via Decentralized Memory Retrieval

通过分散式内存检索实现多代理上下文协调

When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion?

当眼睛和耳朵不一致时:MLLM 能否辨别视听混乱?

Tree-Based Stochastic Optimization for Solving Large-Scale Urban Network Security Games

基于树的随机优化求解大规模城市网络安全博弈

Opinion: Towards Unified Expressive Policy Optimization for Robust Robot Learning

观点:迈向鲁棒机器人学习的统一表达策略优化

Learning-Based Channel Access in Wi-Fi: A Multi-Armed Bandit Approach

Wi-Fi 中基于学习的信道访问:一种多臂强盗方法

Improved Offline Reinforcement Learning via Quantum Metric Encoding

通过量子度量编码改进离线强化学习

Heuristic Transformer: Belief Augmented In-Context Reinforcement Learning

启发式转换器:信念增强上下文强化学习

Beyond Single-Step Updates: Reinforcement Learning of Heuristics with Limited-Horizon Search

超越单步更新:使用有限视野搜索对启发式方法进行强化学习

PROPA: Toward Process-level Optimization in Visual Reasoning via Reinforcement Learning

PROPA:通过强化学习实现视觉推理的过程级优化

Causal Model-Based Reinforcement Learning for Sample-Efficient IoT Channel Access

基于因果模型的强化学习,实现样本效率高的物联网信道接入

Rectify Evaluation Preference: Improving LLMs' Critique on Math Reasoning via Perplexity-aware Reinforcement Learning

纠正评估偏好:通过困惑感知强化学习提高法学硕士对数学推理的批评

MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns

MonkeyOCR v1.5 技术报告:为复杂模式解锁强大的文档解析

AgentEvolver: Towards Efficient Self-Evolving Agent System

AgentEvolver:迈向高效自我进化的代理系统

Explaining Decentralized Multi-Agent Reinforcement Learning Policies

解释去中心化多智能体强化学习策略

Reasoning About Intent for Ambiguous Requests

关于歧义请求的意图的推理

Strategic Opponent Modeling with Graph Neural Networks, Deep Reinforcement Learning and Probabilistic Topic Modeling

使用图神经网络、深度强化学习和概率主题建模进行战略对手建模

Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following

基于评分标准的基准测试和强化学习,以推进法学硕士教学遵循

Towards Emotionally Intelligent and Responsible Reinforcement Learning

迈向情商和负责任的强化学习

Instella: Fully Open Language Models with Stellar Performance

Instella:具有出色性能的完全开放的语言模型

Robot Crash Course: Learning Soft and Stylized Falling

机器人速成班:学习软坠落和风格化坠落

Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling

通过自洽抽样增强MLLM基于结果奖励的RL训练

Keyword: diffusion policy

Opinion: Towards Unified Expressive Policy Optimization for Robust Robot Learning

观点:迈向鲁棒机器人学习的统一表达策略优化