生成时间: 2025-12-03 16:32:53 (UTC+8); Arxiv 发布时间: 2025-12-03 20:00 EST (2025-12-04 09:00 UTC+8)

今天共有 35 篇相关文章

Keyword: reinforcement learning

Reinforcement Learning for Robotic Safe Control with Force Sensing

基于力感的机器人安全控制强化学习

Deep Research: A Systematic Survey

深度研究:系统调查

Modelling the Doughnut of social and planetary boundaries with frugal machine learning

用节俭机器学习建模社会与地球边界的甜甜圈

Improved Training Mechanism for Reinforcement Learning via Online Model Selection

通过在线模型选择改进强化学习训练机制

Lightweight Latent Reasoning for Narrative Tasks

叙事任务的轻量潜在推理

CAIRNS: Balancing Readability and Scientific Accuracy in Climate Adaptation Question Answering

凯恩斯:在气候适应问题回答中平衡可读性与科学准确性

FOVA: Offline Federated Reinforcement Learning with Mixed-Quality Data

FOVA:结合混合质量数据的离线联合强化学习

Beyond Playtesting: A Generative Multi-Agent Simulation System for Massively Multiplayer Online Games

超越游戏测试:一个面向大型多人在线游戏的生成式多智能体模拟系统

VACoT: Rethinking Visual Data Augmentation with VLMs

VACoT:重新思考用VLM进行视觉数据增强

Risk-Sensitive Q-Learning in Continuous Time with Application to Dynamic Portfolio Selection

连续时间的风险敏感Q学习及其在动态投资组合选择中的应用

Synthetic Error Injection Fails to Elicit Self-Correction In Language Models

合成错误注入在语言模型中未能引发自我纠正

Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

Skywork-R1V4:通过交织思维与图像与深度研究,迈向智能多模态智能

Dynamic Configuration of On-Street Parking Spaces using Multi Agent Reinforcement Learning

利用多智能体强化学习动态配置路边停车位

Data Curation Through the Lens of Spectral Dynamics: Static Limits, Dynamic Acceleration, and Practical Oracles

通过谱动力学视角进行数据管理:静态极限、动态加速与实用预言机

GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning

图形界面探索实验室:通过多回合强化学习提升代理屏幕导航

Cross-Domain Offline Policy Adaptation with Dynamics- and Value-Aligned Data Filtering

跨域离线策略适配,采用动态和价值对齐的数据过滤

A Visual Analytics System to Understand Behaviors of Multi Agents in Reinforcement Learning

一个可视化分析系统,用于理解强化学习中多智能体的行为

Dual-Robust Cross-Domain Offline Reinforcement Learning Against Dynamics Shifts

双稳健跨域离线强化学习:针对动态变化

AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning

辅助:来自扩散的智能体意图,用于多智能体信息路径规划

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

CUDA-L2:通过强化学习超越 cuBLAS 矩阵乘法性能

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

DeepSeek-v3.2:推动开放大型语言模型的前沿

From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks

从模仿到歧视:迈向增强跨领域推理任务的通用课程优势机制

GoRL: An Algorithm-Agnostic Framework for Online Reinforcement Learning with Generative Policies

GoRL:一个算法无关的在线强化学习框架,采用生成策略

SeeNav-Agent: Enhancing Vision-Language Navigation with Visual Prompt and Step-Level Policy Optimization

SeeNav代理:通过可视化提示和步骤级策略优化提升视觉语言导航

Zero-Shot Instruction Following in RL via Structured LTL Representations

通过结构化LTL表示实现的强化学习中零帧指令后续

RoboWheel: A Data Engine from Real-World Human Demonstrations for Cross-Embodiment Robotic Learning

RoboWheel:来自真实人类演示的数据引擎,用于跨身体机器人学习

IC-World: In-Context Generation for Shared World Modeling

IC-World:共享世界建模的上下文生成

SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

SR-GRPO:稳定秩作为大型语言模型对齐的内在几何奖励

Phase-Adaptive LLM Framework with Multi-Stage Validation for Construction Robot Task Allocation: A Systematic Benchmark Against Traditional Optimization Algorithms

阶段自适应大型语言模型框架,具多阶段验证用于建筑机器人任务分配:系统性地对抗传统优化算法

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

引导视觉-语言-行动模型作为反探索:一种测试时间缩放方法

ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

ReVSeg:通过强化学习激励视频分割的推理链

Taming Camera-Controlled Video Generation with Verifiable Geometry Reward

驯服可验证几何奖励的摄像机控制视频生成

MindGPT-4ov: An Enhanced MLLM via a Multi-Stage Post-Training Paradigm

MindGPT-4ov:通过多阶段后训练范式的增强MLLM

OneThinker: All-in-one Reasoning Model for Image and Video

OneThinker:图像与视频的一体化推理模型

Keyword: diffusion policy

AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning

辅助:来自扩散的智能体意图,用于多智能体信息路径规划