生成时间: 2026-01-07 16:34:05 (UTC+8); Arxiv 发布时间: 2026-01-07 20:00 EST (2026-01-08 09:00 UTC+8)

今天共有 31 篇相关文章

Keyword: reinforcement learning

Improving News Recommendations through Hybrid Sentiment Modelling and Reinforcement Learning

通过混合情感建模和强化学习改进新闻推荐

Regional Resource Management for Service Provisioning in LEO Satellite Networks: A Topology Feature-Based DRL Approach

LEO卫星网络服务配置的区域资源管理:基于拓扑特征的DRL方法

AI-Native Integrated Sensing and Communications for Self-Organizing Wireless Networks: Architectures, Learning Paradigms, and System-Level Design

AI原生集成感测与通信实现自组织无线网络:架构、学习范式与系统级设计

WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks

WebGym:面向可视化网络代理的训练环境扩展,任务更真实

LLM-Enhanced Reinforcement Learning for Time Series Anomaly Detection

用于时间序列异常检测的大型语言模型增强强化学习

Textual Explanations and Their Evaluations for Reinforcement Learning Policy

强化学习政策的文本解释及其评估

SWaRL: Safeguard Code Watermarking via Reinforcement Learning

SWaRL:通过强化学习保护代码水印

Effective Online 3D Bin Packing with Lookahead Parcels Using Monte Carlo Tree Search

使用蒙特卡洛树搜索的在线3D垃圾桶包装,使用预先包裹进行有效

Inferring Causal Graph Temporal Logic Formulas to Expedite Reinforcement Learning in Temporally Extended Tasks

推断因果图时间逻辑公式以加速时间扩展任务中的强化学习

Time-Scaling Is What Agents Need Now

时间缩放正是代理现在所需要的

Q-Regularized Generative Auto-Bidding: From Suboptimal Trajectories to Optimal Policies

Q-正则化生成自投标:从次优轨迹到最优策略

Closing the Reality Gap: Zero-Shot Sim-to-Real Deployment for Dexterous Force-Based Grasping and Manipulation

缩小现实鸿沟:零机会模拟到现实部署,实现灵巧的基于原力的抓握与控

MiMo-V2-Flash Technical Report

MiMo-V2-Flash 技术报告

Reinforcement Learning for Follow-the-Leader Robotic Endoscopic Navigation via Synthetic Data

通过合成数据实现跟随领导者机器人内镜导航的强化学习

SketchThinker-R1: Towards Efficient Sketch-Style Reasoning in Large Multimodal Models

SketchThinker-R1:迈向大型多模态模型中高效的草图式推理

Sample-Efficient Neurosymbolic Deep Reinforcement Learning

样本高效神经符号深度强化学习

SimRPD: Optimizing Recruitment Proactive Dialogue Agents through Simulator-Based Data Evaluation and Selection

SimRPD:通过基于模拟器的数据评估和选择优化招聘主动对话代理

ChemBART: A Pre-trained BART Model Assisting Organic Chemistry Analysis

ChemBART:辅助有机化学分析的预训练BART模型

Zoom-IQA: Image Quality Assessment with Reliable Region-Aware Reasoning

Zoom-IQA:基于可靠区域感知推理的图像质量评估

The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models

世界不是单一的:在大型音频语言模型中实现空间理解

Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning

正确、简洁且完整:多阶段适应性推理训练

In-Context Reinforcement Learning through Bayesian Fusion of Context and Value Prior

通过上下文与价值先验的贝叶斯融合进行语境内强化学习

Dementia-R1: Reinforced Pretraining and Reasoning from Unstructured Clinical Notes for Real-World Dementia Prognosis

痴呆症-R1:基于非结构化临床笔记的强化预培训与推理,适用于现实世界痴呆症预后

SOP: A Scalable Online Post-Training System for Vision-Language-Action Models

SOP:一个面向视觉-语言-行动模型的可扩展在线后期训练系统

IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation

IBISAgent:在通用生物医学对象指称与分割中强化MLLM中的像素级视觉推理

One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling

一个例子可以统治所有:强化学习扩展中的极端数据效率

Unified Thinker: A General Reasoning Modular Core for Image Generation

统一思考者:用于图像生成的通用推理模块化核心

WebAnchor: Anchoring Agent Planning to Stabilize Long-Horizon Web Reasoning

WebAnchor:计划稳定长期网络推理的锚定代理

MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory

MemRL:通过情节记忆的运行时强化学习实现自我进化的智能体

UltraLogic: Enhancing LLM Reasoning through Large-Scale Data Synthesis and Bipolar Float Reward

UltraLogic:通过大规模数据综合和双极浮动奖励提升LLM推理能力

STReasoner: Empowering LLMs for Spatio-Temporal Reasoning in Time Series via Spatial-Aware Reinforcement Learning

STReasoner:通过空间感知强化学习赋能LLM在时间序列中实现时空推理

Keyword: diffusion policy

There is no result