生成时间: 2025-10-17 16:29:29 (UTC+8); Arxiv 发布时间: 2025-10-17 20:00 EDT (2025-10-18 08:00 UTC+8)

今天共有 51 篇相关文章

Keyword: reinforcement learning

Joint Active RIS Configuration and User Power Control for Localization: A Neuroevolution-Based Approach

用于定位的联合主动RIS配置和用户功率控制:基于神经进化的方法

Bridging the Semantic Gap: Contrastive Rewards for Multilingual Text-to-SQL

弥合语义差距:多语言文本到 SQL 的对比奖励

K-frames: Scene-Driven Any-k Keyframe Selection for long video understanding

K-frames:场景驱动的任意k关键帧选择,用于理解长视频

A Diffusion-Refined Planner with Reinforcement Learning Priors for Confined-Space Parking

一种具有强化学习先验的密闭空间停车的扩散精细规划器

Optimistic Reinforcement Learning-Based Skill Insertions for Task and Motion Planning

基于乐观强化学习的技能插入,用于任务和运动规划

STEMS: Spatial-Temporal Enhanced Safe Multi-Agent Coordination for Building Energy Management

STEMS:用于建筑能源管理的时空增强安全多智能体协调

ViTacGen: Robotic Pushing with Vision-to-Touch Generation

ViTacGen:具有视觉到触摸生成的机器人推动

Demystifying the Mechanisms Behind Emergent Exploration in Goal-conditioned RL

揭秘目标条件RL中涌现探索背后的机制

Combining Reinforcement Learning and Behavior Trees for NPCs in Video Games with AMD Schola

将视频游戏中 NPC 的强化学习和行为树与 AMD Schola 相结合

ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning

ARM-FM:通过基础模型进行组合强化学习的自动奖励机

Incentive-Based Federated Learning

基于激励的联邦学习

Spatial Computing Communications for Multi-User Virtual Reality in Distributed Mobile Edge Computing Network

分布式移动边缘计算网络中面向多用户虚拟现实的空间计算通信

Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation

具有线性函数近似的策略正则化分布鲁棒马尔可夫决策过程

Towards Agentic Self-Learning LLMs in Search Environment

在搜索环境中走向代理自学习法学硕士

Identity-Preserving Image-to-Video Generation via Reward-Guided Optimization

通过奖励引导优化生成身份图像到视频

Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning

Identity-GRPO:通过强化学习优化多人身份保留视频生成

AlphaQuanter: An End-to-End Tool-Orchestrated Agentic Reinforcement Learning Framework for Stock Trading

AlphaQuanter:用于股票交易的端到端工具编排代理强化学习框架

Learning Human-Humanoid Coordination for Collaborative Object Carrying

学习人人协调协作物体搬运

Active Measuring in Reinforcement Learning With Delayed Negative Effects

具有延迟负面影响的强化学习中的主动测量

Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL

使用多轮 RL 评估和减少来自语言模型的欺骗性对话

Large Reasoning Embedding Models: Towards Next-Generation Dense Retrieval Paradigm

大型推理嵌入模型:迈向下一代密集检索范式

Risk-Aware Reinforcement Learning with Bandit-Based Adaptation for Quadrupedal Locomotion

基于强盗的四足运动适应的风险感知强化学习

Hi-Agent: Hierarchical Vision-Language Agents for Mobile Device Control

Hi-Agent:用于移动设备控制的分层视觉语言代理

Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following

说明就是您所需要的:用于指令遵循的自监督强化学习

Natural Language Tools: A Natural Language Approach to Tool Calling In Large Language Agents

自然语言工具:在大型语言代理中调用工具的自然语言方法

Learning to Undo: Rollback-Augmented Reinforcement Learning with Reversibility Signals

学习撤消:使用可逆性信号的回滚增强强化学习

Agentic Entropy-Balanced Policy Optimization

代理熵平衡策略优化

Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering

基于知识的可视化问答,具有多模态处理、检索和过滤功能

Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models

代码驱动的数序计算:增强大型语言模型的归纳推理能力

RLAIF-SPA: Optimizing LLM-based Emotional Speech Synthesis via RLAIF

RLAIF-SPA:通过 RLAIF 优化基于 LLM 的情感语音合成

MR.Rec: Synergizing Memory and Reasoning for Personalized Recommendation Assistant with LLMs

先生。Rec:将记忆和推理与 LLM 协同个性化推荐助手

ATGen: Adversarial Reinforcement Learning for Test Case Generation

ATGen:用于测试用例生成的对抗性强化学习

The Bidding Games: Reinforcement Learning for MEV Extraction on Polygon Blockchain

竞价游戏:Polygon 区块链上 MEV 提取的强化学习

An Efficient Rubric-based Generative Verifier for Search-Augmented LLMs

用于搜索增强法学硕士的基于评分标准的高效生成式验证器

Cognitive-Aligned Spatio-Temporal Large Language Models For Next Point-of-Interest Prediction

用于下一个兴趣点预测的认知对齐时空大语言模型

The Pursuit of Diversity: Multi-Objective Testing of Deep Reinforcement Learning Agents

追求多样性:深度强化学习智能体的多目标测试

AutoRubric-R1V: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning

AutoRubric-R1V:基于评分标准的生成奖励,用于忠实的多模态推理

Leveraging Neural Descriptor Fields for Learning Contact-Aware Dynamic Recovery

利用神经描述符字段学习接触感知动态恢复

SkyDreamer: Interpretable End-to-End Vision-Based Drone Racing with Model-Based Reinforcement Learning

SkyDreamer:基于模型的强化学习的可解释的端到端基于视觉的无人机赛车

SimKO: Simple Pass@K Policy Optimization

SimKO:简单的Pass@K策略优化

RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning

RL-100:使用真实世界强化学习进行高性能机器人作

Reinforcement Learning with Stochastic Reward Machines

使用随机奖励机进行强化学习

Mapping Smarter, Not Harder: A Test-Time Reinforcement Learning Agent That Improves Without Labels or Model Updates

更智能,更难地绘制地图:无需标签或模型更新即可改进的测试时强化学习代理

Reasoning with Sampling: Your Base Model is Smarter Than You Think

抽样推理:您的基本模型比您想象的更智能

A Hard-Label Black-Box Evasion Attack against ML-based Malicious Traffic Detection Systems

针对基于机器学习的恶意流量检测系统的硬标签黑盒规避攻击

VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tunin

VT-Refine:通过仿真 Fine-Tunin 学习具有视觉触觉反馈的双手装配

LaSeR: Reinforcement Learning with Last-Token Self-Rewarding

LaSeR:具有最后一个令牌自我奖励的强化学习

CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions

CBF-RL:具有控制屏障功能的训练中的安全滤波强化学习

Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents

基于信息增益的策略优化:一种简单有效的多轮LLM代理方法

Agentic Design of Compositional Machines

组合机器的代理设计

Keyword: diffusion policy

VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tunin

VT-Refine:通过仿真 Fine-Tunin 学习具有视觉触觉反馈的双手装配