生成时间: 2025-12-16 16:35:47 (UTC+8); Arxiv 发布时间: 2025-12-16 20:00 EST (2025-12-17 09:00 UTC+8)

今天共有 65 篇相关文章

Keyword: reinforcement learning

Reinforcement Learning for Latent-Space Thinking in LLMs

LLM潜空间思维的强化学习

Hierarchical Task Offloading and Trajectory Optimization in Low-Altitude Intelligent Networks Via Auction and Diffusion-based MARL

低高度智能网络中的分层任务卸载与轨迹优化,通过基于拍卖和扩散的MARL实现

WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving

WAM-Diff:一个具备MoE和在线强化学习的蒙面扩散VLA框架,用于自动驾驶

Mirror Mode in Fire Emblem: Beating Players at their own Game with Imitation and Reinforcement Learning

火焰纹章中的镜像模式:通过模仿与强化学习击败玩家

Safe Learning for Contact-Rich Robot Tasks: A Survey from Classical Learning-Based Methods to Safe Foundation Models

为接触丰富机器人任务提供安全学习:从经典基于学习的方法到安全基础模型的综述

Evolutionary Reinforcement Learning based AI tutor for Socratic Interdisciplinary Instruction

基于进化强化学习的苏格拉底跨学科教学人工智能导师

A Review of Learning-Based Motion Planning: Toward a Data-Driven Optimal Control Approach

基于学习的运动规划综述:迈向数据驱动的最优控制方法

Learning to Extract Context for Context-Aware LLM Inference

学习如何提取上下文以实现上下文感知的大型语言模型推理

Policy Gradient Algorithms for Age-of-Information Cost Minimization

用于信息时代成本最小化的策略梯度算法

Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning

利用以物纳约束的层级准强化学习实现目标

Learning to Get Up Across Morphologies: Zero-Shot Recovery with a Unified Humanoid Policy

学习跨形态起身:零射击回收与统一人形政策

Moment and Highlight Detection via MLLM Frame Segmentation

通过MLLM帧分割进行时刻和高光检测

A Conflict-Aware Resource Management Framework for the Computing Continuum

计算连续体的冲突感知资源管理框架

The Role of AI in Modern Penetration Testing

人工智能在现代渗透测试中的作用

ElasticVR: Elastic Task Computing in Multi-User Multi-Connectivity Wireless Virtual Reality (VR) Systems

ElasticVR:多用户多连接无线虚拟现实(VR)系统中的弹性任务计算

Sim2Real Reinforcement Learning for Soccer skills

Sim2Real 足球技能强化学习

HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments

HetRL:异构环境中LLM的高效强化学习

More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models

超越最终答案:提升视觉语言模型中的视觉提取和逻辑一致性

Adaptive Detector-Verifier Framework for Zero-Shot Polyp Detection in Open-World Settings

开放世界环境中零剂量息肉检测的自适应检测器-验证框架

World Models Unlock Optimal Foraging Strategies in Reinforcement Learning Agents

世界模型解锁强化学习代理中的最佳采集策略

Coupled Variational Reinforcement Learning for Language Model General Reasoning

语言模型通用推理中的耦合变分强化学习

CogDoc: Towards Unified thinking in Documents

CogDoc:迈向文档中的统一思维

Reassessing the Role of Supervised Fine-Tuning: An Empirical Study in VLM Reasoning

重新评估监督式微调的作用:VLM推理中的实证研究

Synergizing Code Coverage and Gameplay Intent: Coverage-Aware Game Playtesting with LLM-Guided Reinforcement Learning

协同代码覆盖率与游戏意图:覆盖感知游戏测试与大语言模型引导强化学习

Self-Motivated Growing Neural Network for Adaptive Architecture via Local Structural Plasticity

通过局部结构可塑性的自驱成长神经网络实现适应性架构

CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning

CoDA:带有强化学习的上下文解耦层级代理

Distributed Reinforcement Learning using Local Smart Meter Data for Voltage Regulation in Distribution Networks

利用本地智能电表数据进行配电网络电压调节的分布式强化学习

Information-Consistent Language Model Recommendations through Group Relative Policy Optimization

通过群组相对策略优化实现信息一致的语言模型推荐

LLM-based Personalized Portfolio Recommender: Integrating Large Language Models and Reinforcement Learning for Intelligent Investment Strategy Optimization

基于LLM的个性化投资组合推荐工具:整合大型语言模型与强化学习以实现智能投资策略优化

QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

QwenLong-L1.5:长上下文推理与记忆管理的训练后配方

Tackling Snow-Induced Challenges: Safe Autonomous Lane-Keeping with Robust Reinforcement Learning

应对积雪带来的挑战:安全自主车道保持,结合强化学习

Learning Terrain Aware Bipedal Locomotion via Reduced Dimensional Perceptual Representations

通过简化维度的感知表征学习地形感知双足行走

What Happens Next? Next Scene Prediction with a Unified Video Model

接下来会发生什么?使用统一视频模型预测下一场景

GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training

GTR-Turbo:合并检查点实际上是代理VLM训练的免费教师

Deep Q-Learning-Based Intelligent Scheduling for ETL Optimization in Heterogeneous Data Environments

基于深度Q学习的智能调度,用于异构数据环境中的ETL优化

M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization

M-GRPO:基于动量锚定策略优化的大型语言模型稳定自监督强化学习

PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive Representations

PvP:数据高效的类人机器人学习,带有本体感觉特权的对比表征

ADHint: Adaptive Hints with Difficulty Priors for Reinforcement Learning

ADHint:带有困难先验的自适应提示用于强化学习

Toward Self-Healing Networks-on-Chip: RL-Driven Routing in 2D Torus Architectures

迈向自愈的片上网络:二维环面架构中的强化学习驱动路由

TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning

TraPO:一个半监督式强化学习框架,用于提升LLM推理能力

SpeakRL: Synergizing Reasoning, Speaking, and Acting in Language Models with Reinforcement Learning

SpeakRL:语言模型中的推理、口语和行动与强化学习的协同

SACn: Soft Actor-Critic with n-step Returns

SACn:软演员兼评论家,带n步回报

Reflective Preference Optimization (RPO): Enhancing On-Policy Alignment via Hint-Guided Reflection

反思偏好优化(RPO):通过提示引导反思提升政策上的对齐

Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving

生成智能体行为模型的交互式自动驾驶训练后及测试时间尺度化

SPARS: A Reinforcement Learning-Enabled Simulator for Power Management in HPC Job Scheduling

SPARS:一个基于强化学习的高性能计算作业调度功率管理模拟器

AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning

AutoTool:智能推理的动态工具选择与集成

Intrinsic-Motivation Multi-Robot Social Formation Navigation with Coordinated Exploration

内在动机多机器人社会形成导航与协调探索

Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3)

使用双延迟深确定性策略梯度(TD3)控制双旋翼

Fast Policy Learning for 6-DOF Position Control of Underwater Vehicles

6自由度水下飞行器位置控制的快速策略学习

Universal Dexterous Functional Grasping via Demonstration-Editing Reinforcement Learning

通过演示编辑强化学习实现普遍灵巧功能抓握

QoS-Aware State-Augmented Learnable Framework for 5G NR-U/Wi-Fi Coexistence: Impact of Parameter Selection and Enhanced Collision Resolution

QoS感知状态增强可学习框架用于5G NR-U/Wi-Fi共存:参数选择与增强碰撞分辨率的影响

Differentiable Evolutionary Reinforcement Learning

可微分进化强化学习

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Seedance 1.5 pro:原生视听联合生成基础模型

MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph

MedCEG:用关键证据图强化可验证的医学推理

Reinforcement Learning based 6-DoF Maneuvers for Microgravity Intravehicular Docking: A Simulation Study with Int-Ball2 in ISS-JEM

基于强化学习的6-DoF微重力舱内对接机动:ISS-JEM中Int-Ball2模拟研究

How Low Can You Go? The Data-Light SE Challenge

你能降到多低?Data-Light SE 挑战

Memory in the Age of AI Agents

人工智能代理时代的记忆

MMhops-R1: Multimodal Multi-hop Reasoning

MMhops-R1:多模多跳推理

Image Diffusion Preview with Consistency Solver

图像扩散预览与一致性求解器

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Nemotron-级联:通用推理模型中的级联强化学习尺度化

SCR2-ST: Combine Single Cell with Spatial Transcriptomics for Efficient Active Sampling via Reinforcement Learning

SCR2-ST:结合单细胞与空间转录组学,通过强化学习实现高效主动采样

MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning

MindDrive:一种通过在线强化学习实现自动驾驶的视觉-语言-行动模型

A Scientific Reasoning Model for Organic Synthesis Procedure Generation

有机合成程序生成的科学推理模型

AgentIAD: Tool-Augmented Single-Agent for Industrial Anomaly Detection

AgentIAD:工业异常检测工具增强单一代理

Keyword: diffusion policy

World Models Can Leverage Human Videos for Dexterous Manipulation

世界模型可以利用人类视频进行灵巧的作