生成时间: 2025-11-03 16:31:26 (UTC+8); Arxiv 发布时间: 2025-11-03 20:00 EST (2025-11-04 09:00 UTC+8)

今天共有 26 篇相关文章

Keyword: reinforcement learning

Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench

视觉语言模型能达到标准吗?使用 MeasureBench 对视觉测量读数进行基准测试

A Framework for Fair Evaluation of Variance-Aware Bandit Algorithms

方差感知强盗算法的公平评估框架

Algorithmic Predation: Equilibrium Analysis in Dynamic Oligopolies with Smooth Market Sharing

算法掠夺:动态寡头垄断中平滑市场共享的均衡分析

e1: Learning Adaptive Control of Reasoning Effort

e1:学习推理努力的自适应控制

Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

RLVR 中泛化的局限性:数学推理的两个案例研究

SpikeATac: A Multimodal Tactile Finger with Taxelized Dynamic Sensing for Dexterous Manipulation

SpikeATac:具有类片化动态传感的多模态触觉手指,可实现灵巧的作

Adaptive Human-Computer Interaction Strategies Through Reinforcement Learning in Complex

复合体中通过强化学习的自适应人机交互策略

Towards Understanding Self-play for LLM Reasoning

理解 LLM 推理的自我游戏

AURA: A Reinforcement Learning Framework for AI-Driven Adaptive Conversational Surveys

AURA:用于人工智能驱动的自适应对话式调查的强化学习框架

Disrupting Networks: Amplifying Social Dissensus via Opinion Perturbation and Large Language Models

颠覆网络:通过舆论扰动和大型语言模型放大社会分歧

ShapleyPipe: Hierarchical Shapley Search for Data Preparation Pipeline Construction

ShapleyPipe:用于数据准备管道构建的分层 Shapley 搜索

GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

GUI-Rise:用于 GUI 导航的结构化推理和历史摘要

MedCalc-Eval and MedCalc-Env: Advancing Medical Calculation Capabilities of Large Language Models

MedCalc-Eval 和 MedCalc-Env:推进大型语言模型的医学计算能力

Inferring trust in recommendation systems from brain, behavioural, and physiological data

从大脑、行为和生理数据推断对推荐系统的信任

A Digital Twin-based Multi-Agent Reinforcement Learning Framework for Vehicle-to-Grid Coordination

基于数字孪生的车网协调多智能体强化学习框架

Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines

针对远视野无序任务的强化学习:从布尔到耦合奖励机

Reasoning Models Sometimes Output Illegible Chains of Thought

推理模型有时会输出难以辨认的思维链

Realistic pedestrian-driver interaction modelling using multi-agent RL with human perceptual-motor constraints

使用具有人类感知运动约束的多智能体 RL 进行真实的行人-驾驶员交互建模

DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains

DeepCompress:动态探索和压缩推理链的双重奖励策略

Learning Soft Robotic Dynamics with Active Exploration

通过主动探索学习软机器人动力学

VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision

VCORE:基于方差控制优化的思维链监督重加权

Interact-RAG: Reason and Interact with the Corpus, Beyond Black-Box Retrieval

Interact-RAG:超越黑盒检索的推理和与语料库交互

MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval

MARAG-R1:通过强化学习的多工具代理检索超越单个检索器

Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning

Spatial-SSRL:通过自监督强化学习增强空间理解

Challenges in Credit Assignment for Multi-Agent Reinforcement Learning in Open Agent Systems

开放智能体系统中多智能体强化学习的学分分配挑战

Keyword: diffusion policy

EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities

EBT 政策:能源释放新兴的物理推理能力