生成时间: 2025-11-24 16:31:35 (UTC+8); Arxiv 发布时间: 2025-11-24 20:00 EST (2025-11-25 09:00 UTC+8)

今天共有 18 篇相关文章

Keyword: reinforcement learning

Improving Latent Reasoning in LLMs via Soft Concept Mixing

通过软概念混合提升大型语言模型中的潜在推理能力

When Motion Learns to Listen: Diffusion-Prior Lyapunov Actor-Critic Framework with LLM Guidance for Stable and Robust AUV Control in Underwater Tasks

当运动学会倾听:扩散-先验的李雅普诺夫行为者-批评者框架及大型语言模型指导,实现水下任务中稳定稳健的AUV控制

R-AVST: Empowering Video-LLMs with Fine-Grained Spatio-Temporal Reasoning in Complex Audio-Visual Scenarios

R-AVST:在复杂视听场景下赋能视频大型语言模型的细粒度时空推理能力

Predicting Talent Breakout Rate using Twitter and TV data

利用Twitter和电视数据预测人才突破率

Hybrid Differential Reward: Combining Temporal Difference and Action Gradients for Efficient Multi-Agent Reinforcement Learning in Cooperative Driving

混合差分奖励:结合时间差分梯度与动作梯度,实现合作驾驶中高效的多智能体强化学习

CroTad: A Contrastive Reinforcement Learning Framework for Online Trajectory Anomaly Detection

CroTad:一种用于在线轨迹异常检测的对比强化学习框架

RL-AD-Net: Reinforcement Learning Guided Adaptive Displacement in Latent Space for Refined Point Cloud Completion

RL-AD-Net:强化学习引导的潜空间自适应位移以实现精细点云补全

MIR: Efficient Exploration in Episodic Multi-Agent Reinforcement Learning via Mutual Intrinsic Reward

MIR:通过相互内在奖励实现的情节多智能体强化学习的高效探索

FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle

FireScope:带链式思维预言的野火风险预测

Cross-cultural value alignment frameworks for responsible AI governance: Evidence from China-West comparative analysis

负责任人工智能治理中的跨文化价值对齐框架:来自中西方比较分析的证据

MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning

MolSight:利用SMILES预训练、多粒度学习和强化学习实现光学化学结构识别

Convergence and stability of Q-learning in Hierarchical Reinforcement Learning

Q-学习在层级强化学习中的收敛与稳定性

R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability

R2PS:部分可观测性下最坏情况下稳健实时追踪策略

Human Imitated Bipedal Locomotion with Frequency Based Gait Generator Network

基于频率的步态发生器网络的人类模拟双足行走

MorphSeek: Fine-grained Latent Representation-Level Policy Optimization for Deformable Image Registration

MorphSeek:可变形图像注册的细粒度潜在表示层策略优化

Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems

多智能体指针变换器:针对多车辆动态取货-交付问题的序列对序列强化学习

Harnessing Data from Clustered LQR Systems: Personalized and Collaborative Policy Optimization

利用集群LQR系统的数据:个性化与协作策略优化

Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination

视频-R4:通过视觉反复思考强化富含文本的视频推理

Keyword: diffusion policy

There is no result