生成时间: 2025-12-02 16:35:20 (UTC+8); Arxiv 发布时间: 2025-12-02 20:00 EST (2025-12-03 09:00 UTC+8)

今天共有 76 篇相关文章

Keyword: reinforcement learning

DREAMer-VXS: A Latent World Model for Sample-Efficient AGV Exploration in Stochastic, Unobserved Environments

DREAMer-VXS:一种用于随机、未观测环境中样本高效AGV探索的潜在世界模型

Perturbation-mitigated USV Navigation with Distributionally Robust Reinforcement Learning

利用分布式强化学习的微观减缓USV导航

Closing the Gap: Data-Centric Fine-Tuning of Vision Language Models for the Standardized Exam Questions

缩小差距:标准化考试题目中视觉语言模型的数据中心微调

Causal Reinforcement Learning based Agent-Patient Interaction with Clinical Domain Knowledge

基于临床领域知识的因果强化学习主体-患者互动

Socially aware navigation for mobile robots: a survey on deep reinforcement learning approaches

移动机器人的社会意识导航:深度强化学习方法的综述

Reinforcement Learning from Implicit Neural Feedback for Human-Aligned Robot Control

基于隐性神经反馈的强化学习,用于人对齐机器人控制

SpeedAug: Policy Acceleration via Tempo-Enriched Policy and RL Fine-Tuning

SpeedAug:通过节奏丰富策略和强化学习微调实现策略加速

InF-ATPG: Intelligent FFR-Driven ATPG with Advanced Circuit Representation Guided Reinforcement Learning

InF-ATPG:智能FFR驱动ATPG,具备先进电路表示引导强化学习

NetDeTox: Adversarial and Efficient Evasion of Hardware-Security GNNs via RL-LLM Orchestration

NetDeTox:通过RL-LLM编排高效规避硬件安全GNNs的对抗性与高效规避

A Hierarchical Hybrid AI Approach: Integrating Deep Reinforcement Learning and Scripted Agents in Combat Simulations

分层混合人工智能方法:在战斗模拟中整合深度强化学习与脚本化智能体

Gradient Inversion in Federated Reinforcement Learning

联合强化学习中的梯度反演

RL-Struct: A Lightweight Reinforcement Learning Framework for Reliable Structured Output in LLMs

RL-struct:一个用于LLM中可靠结构化输出的轻量级强化学习框架

Provable Memory Efficient Self-Play Algorithm for Model-free Reinforcement Learning

可证明的内存高效自玩算法,用于无模型强化学习

Sample-Efficient Tabular Self-Play for Offline Robust Reinforcement Learning

样本高效的表格自玩,用于离线强化学习

Learning Causal States Under Partial Observability and Perturbation

在部分可观测性和扰动下学习因果状态

Hardware-Software Collaborative Computing of Photonic Spiking Reinforcement Learning for Robotic Continuous Control

光子尖峰强化学习的硬件-软件协同计算,用于机器人连续控制

Learning What Helps: Task-Aligned Context Selection for Vision Tasks

学习什么有帮助:愿景任务的任务对齐上下文选择

ESPO: Entropy Importance Sampling Policy Optimization

ESPO:熵重要性抽样策略优化

G-KV: Decoding-Time KV Cache Eviction with Global Attention

G-KV:带全局注意力的解码时间KV缓存驱逐

Truthful Double Auctions under Approximate VCG: Immediate-Penalty Enforcement in P2P Energy Trading

Approximate VCG下的真实双重拍卖:P2P能源交易中的即时罚款执行

DQ4FairIM: Fairness-aware Influence Maximization using Deep Reinforcement Learning

DQ4FairIM:利用深度强化学习实现公平感知影响力最大化

List Replicable Reinforcement Learning

列表可复制强化学习

SAGE: Semantic-Aware Gray-Box Game Regression Testing with Large Language Models

SAGE:基于大型语言模型的语义感知灰盒游戏回归测试

HAVEN: Hierarchical Adversary-aware Visibility-Enabled Navigation with Cover Utilization using Deep Transformer Q-Networks

HAVEN:采用深度变压器Q网络的分层式对手感知可视性导航,具掩护利用

Clinical-R1: Empowering Large Language Models for Faithful and Comprehensive Reasoning with Clinical Objective Relative Policy Optimization

临床R1:赋能大型语言模型,实现忠实且全面的推理,结合临床客观相对政策优化

When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF

当人类偏好翻转:RLHF的实例依赖性强棒损失

Upcycled and Merged MoE Reward Model for Mitigating Reward Hacking

提升再造并合并的MoE奖励模型,用于缓解奖励黑客行为

MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion

MS-PPO:腿式机器人运动的形态-对称-等变策略

AI Agent for Source Finding by SoFiA-2 for SKA-SDC2

SoFiA-2 为 SKA-SDC2 提供源查找的 AI 代理

What Is Preference Optimization Doing, How and Why?

偏好优化在做什么,如何以及为什么?

Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding

用草稿思考:高效长视频理解的推测性时间推理

ReJump: A Tree-Jump Representation for Analyzing and Improving LLM Reasoning

ReJump:一种用于分析和提升大型语言模型推理的树跳表示法

Beyond High-Entropy Exploration: Correctness-Aware Low-Entropy Segment-Based Advantage Shaping for Reasoning LLMs

超越高熵探索:正确性感知、低熵段优势塑造为推理大型语言模型

Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments

对称破缺环境中的部分等变强化学习

Goal-Driven Reward by Video Diffusion Models for Reinforcement Learning

视频扩散模型的目标驱动奖励用于强化学习

Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search

通过强化学习优化小红书搜索中的生成排名相关性

AltNet: Addressing the Plasticity-Stability Dilemma in Reinforcement Learning

AltNet:解决强化学习中的可塑性与稳定性困境

Shielded Controller Units for RL with Operational Constraints Applied to Remote Microgrids

用于远程微电网的强化学习屏蔽控制单元,并对作约束施加

Automating the Refinement of Reinforcement Learning Specifications

自动化强化学习规范的细化

Adaptive-lambda Subtracted Importance Sampled Scores in Machine Unlearning for DDPMs and VAEs

机器学习解散中DDPM和VAE中自适应λ减去重要性抽样分数

Reinforcement Learning for Gliding Projectile Guidance and Control

滑翔弹道导引与控制的强化学习

Accelerating Inference of Masked Image Generators via Reinforcement Learning

通过强化学习加速掩体图像生成器的推断

World Model Robustness via Surprise Recognition

通过意外识别实现世界模型稳健性

Mode-Conditioning Unlocks Superior Test-Time Scaling

模式调节解锁了更优越的测试时间缩放

A TinyML Reinforcement Learning Approach for Energy-Efficient Light Control in Low-Cost Greenhouse Systems

一种用于低成本温室系统中节能光控的微型ML强化学习方法

Sum Rate Maximization in STAR-RIS-UAV-Assisted Networks: A CA-DDPG Approach for Joint Optimization

STAR-RIS-UAV 辅助网络中的求和速率最大化:CA-DDPG 联合优化方法

CoSineVerifier: Tool-Augmented Answer Verification for Computation-Oriented Scientific Questions

CoSineVerifier:用于计算导向科学问题的工具增强答案验证

On the Tension Between Optimality and Adversarial Robustness in Policy Optimization

关于策略优化中最优性与对抗性鲁棒性之间的张力

PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards

PSR:多主题个性化图像生成与成对主题一致性奖励的规模化

Kardia-R1: Unleashing LLMs to Reason toward Understanding and Empathy for Emotional Support via Rubric-as-Judge Reinforcement Learning

Kardia-R1:释放大型语言模型,通过评分标准作为评判强化学习,推理理解和共情情感支持

CuES: A Curiosity-driven and Environment-grounded Synthesis Framework for Agentic RL

CuES:一个基于好奇心驱动且基于环境的智能强化学习综合框架

Extending NGU to Multi-Agent RL: A Preliminary Study

将NGU扩展到多智能体强化学习:初步研究

Discovering Self-Protective Falling Policy for Humanoid Robot via Deep Reinforcement Learning

通过深度强化学习发现类人机器人的自我保护坠落政策

Directed evolution algorithm drives neural prediction

定向进化算法推动神经预测

BlinkBud: Detecting Hazards from Behind via Sampled Monocular 3D Detection on a Single Earbud

BlinkBud:通过单耳采样单眼三维检测从背后检测危险

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

用LLM稳定强化学习:表述与实践

Multi-Path Collaborative Reasoning via Reinforcement Learning

通过强化学习实现多路径协同推理

Learning the Boundary of Solvability: Aligning LLMs to Detect Unsolvable Problems

学习可解性边界:对齐大型语言模型以检测无解问题

How Does RL Post-training Induce Skill Composition? A Case Study on Countdown

强化学习的培训后如何促进技能构成?倒计时案例研究

GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation

GR-RL:灵巧且精准地进行远程机器人作

CauSight: Learning to Supersense for Visual Causal Discovery

CauSight:学习超感官以发现视觉因果

Beyond SFT: Reinforcement Learning for Safer Large Reasoning Models with Better Reasoning Ability

超越SFT:强化学习,打造更安全的大型推理模型,提升推理能力

Graph Distance as Surprise: Free Energy Minimization in Knowledge Graph Reasoning

图距离作为惊喜:知识图谱推理中的自由能量最小化

New Spiking Architecture for Multi-Modal Decision-Making in Autonomous Vehicles

自动驾驶车辆多模态决策的新尖峰架构

Rectifying LLM Thought from Lens of Optimization

从优化视角纠正LLM思维

Agentic Policy Optimization via Instruction-Policy Co-Evolution

通过指令-策略共进化实现代理策略优化

GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment

GrndCtrl:通过自我监督奖励对齐来接地世界模型

Learned-Rule-Augmented Large Language Model Evaluators

学习规则增强大型语言模型评估器

Forecasting in Offline Reinforcement Learning for Non-stationary Environments

非固定环境下的离线强化学习预测

RoaD: Rollouts as Demonstrations for Closed-Loop Supervised Fine-Tuning of Autonomous Driving Policies

RoaD:作为闭环监督自动驾驶政策微调演示的推广

Learning Sim-to-Real Humanoid Locomotion in 15 Minutes

15分钟内学习模拟到现实的人形移动

Learning Dexterous Manipulation Skills from Imperfect Simulations

从不完美模拟中学习灵巧控技能

A Diffusion Model Framework for Maximum Entropy Reinforcement Learning

用于最大熵强化学习的扩散模型框架

Keyword: diffusion policy

Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning

Bootstrap 动态感知三维可视化表示,用于可扩展机器人学习

PointNet4D: A Lightweight 4D Point Cloud Video Backbone for Online and Offline Perception in Robotic Applications

PointNet4D:一款用于机器人应用中线上线下感知的轻量级4D点云视频骨干

A Diffusion Model Framework for Maximum Entropy Reinforcement Learning

用于最大熵强化学习的扩散模型框架