生成时间: 2026-04-24 17:46:13 (UTC+8); Arxiv 发布时间: 2026-04-24 20:00 EDT (2026-04-25 08:00 UTC+8)

今天共有 21 篇相关文章

Keyword: reinforcement learning

Deep Interest Mining with Cross-Modal Alignment for SemanticID Generation in Generative Recommendation

基于跨模态对齐的深度兴趣挖掘,用于生成式推荐中的语义ID生成

Reinforcing privacy reasoning in LLMs via normative simulacra from fiction

通过虚构作品中的规范模拟来强化LLM中的隐私推理

A Systematic Review and Taxonomy of Reinforcement Learning-Model Predictive Control Integration for Linear Systems

强化学习模型预测控制集成的系统综述与分类学

Foveated Reasoning: Stateful, Action-based Visual Focusing for Vision-Language Models

聚焦推理:视觉语言模型中的有状态、基于动作的视觉聚焦

Self-Predictive Representation for Autonomous UAV Object-Goal Navigation

自主无人机目标导航的自我预测表示

Adaptive Instruction Composition for Automated LLM Red-Teaming

自动化大型语言模型红团队的自适应指令组合

Reinforcing 3D Understanding in Point-VLMs via Geometric Reward Credit Assignment

通过几何奖励积分赋值强化点VLM中的三维理解

CAP: Controllable Alignment Prompting for Unlearning in LLMs

CAP:大型语言模型中可控对齐提示,用于去除学习

Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

再测一次,点击一次:通过强化学习共同进化提案者与视觉批评者,帮助GUI扎根

Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning

理解并缓解数学推理测试时强化学习中的虚假信号放大

Learn Weightlessness: Imitate Non-Self-Stabilizing Motions on Humanoid Robot

学习失重:模仿类人机器人的非自稳定动作

ReaGeo: Reasoning-Enhanced End-to-End Geocoding with LLMs

ReaGeo:基于推理增强的端到端地理编码,采用大型语言模型

KD-CVG: A Knowledge-Driven Approach for Creative Video Generation

KD-CVG:一种以知识为驱动的创意视频生成方法

S1-VL: Scientific Multimodal Reasoning Model with Thinking-with-Images

S1-VL:带图像思考的科学多模态推理模型

Dynamical Priors as a Training Objective in Reinforcement Learning

动力学先验作为强化学习的训练目标

X2-N: A Transformable Wheel-legged Humanoid Robot with Dual-mode Locomotion and Manipulation

X2-N:可变形轮腿类人机器人,具备双模式移动和操控功能

Generative Learning Enhanced Intelligent Resource Management for Cell-Free Delay Deterministic Communications

生成学习增强智能资源管理,实现无细胞延迟确定性通信

AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use

AgenticQwen:使用双数据飞轮训练小型智能语言模型,用于工业规模工具

Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation

任务特定子网络发现在自主水下导航强化学习中

Fairness under uncertainty in sequential decisions

连续判决中的不确定性下的公平性

Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models

Nemobot 游戏:打造用于大型语言模型交互学习的战略AI游戏代理

Keyword: diffusion policy

There is no result