生成时间: 2025-10-29 16:30:55 (UTC+8); Arxiv 发布时间: 2025-10-29 20:00 EDT (2025-10-30 08:00 UTC+8)

今天共有 33 篇相关文章

Keyword: reinforcement learning

Logic-based Task Representation and Reward Shaping in Multiagent Reinforcement Learning

多智能体强化学习中基于逻辑的任务表示与奖励塑造

Debiasing Reward Models by Representation Learning with Guarantees

通过具有保证的表示学习消除奖励模型的偏差

GIFT: Group-relative Implicit Fine Tuning Integrates GRPO with DPO and UNA

GIFT:组相关隐式微调将 GRPO 与 DPO 和 UNA 集成

Hybrid Modeling, Sim-to-Real Reinforcement Learning, and Large Language Model Driven Control for Digital Twins

数字孪生的混合建模、模拟到实的强化学习和大型语言模型驱动的控制

Stand, Walk, Navigate: Recovery-Aware Visual Navigation on a Low-Cost Wheeled Quadruped

站立、行走、导航:低成本轮式四足动物的恢复感知视觉导航

Secure Control of Connected and Autonomous Electrified Vehicles Under Adversarial Cyber-Attacks

在对抗性网络攻击下安全控制联网和自动驾驶电动汽车

Latent Chain-of-Thought for Visual Reasoning

视觉推理的潜在思维链

Reasoning Visual Language Model for Chest X-Ray Analysis

用于胸部 X 光分析的推理视觉语言模型

VOCALoco: Viability-Optimized Cost-aware Adaptive Locomotion

VOCALoco:可行性优化的成本感知自适应运动

Teaching LLMs to Abstain via Fine-Grained Semantic Confidence Reward

通过细粒度语义信心奖励教法学硕士弃权

Causal-Aware Generative Adversarial Networks with Reinforcement Learning

具有强化学习的因果感知生成对抗网络

ZTRS: Zero-Imitation End-to-end Autonomous Driving with Trajectory Scoring

ZTRS:零模仿端到端自动驾驶,轨迹评分

Reinforcement Learning for Long-Horizon Multi-Turn Search Agents

长视野多轮搜索代理的强化学习

BMGQ: A Bottom-up Method for Generating Complex Multi-hop Reasoning Questions from Semi-structured Data

BMGQ:一种从半结构化数据生成复杂多跳推理问题的自下而上方法

PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward Modeling

PaTaRM:通过偏好感知任务自适应奖励建模桥接成对和逐点信号

GRAPHIA: Harnessing Social Graph Data to Enhance LLM-Based Social Simulation

GRAPHIA:利用社交图谱数据增强基于法学硕士的社交模拟

Can LLMs Translate Human Instructions into a Reinforcement Learning Agent's Internal Emergent Symbolic Representation?

法学硕士能否将人类指令转化为强化学习代理的内部涌现符号表示?

Survey and Tutorial of Reinforcement Learning Methods in Process Systems Engineering

过程系统工程中强化学习方法的调查和教程

ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model

ViPER:赋能视觉语言模型视觉感知能力的自我进化

Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards

基于前瞻树的推出,用于增强强化学习中的轨迹级探索,并具有可验证的奖励

MiniOneRec: An Open-Source Framework for Scaling Generative Recommendation

MiniOneRec:用于扩展生成式推荐的开源框架

Fill in the Blanks: Accelerating Q-Learning with a Handful of Demonstrations in Sparse Reward Settings

填空:在稀疏奖励设置中通过一些演示加速 Q-Learning

SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space

SPARTA:通过文本自动编码器潜在空间中的黑盒对抗释义评估推理分割鲁棒性

Adaptive Surrogate Gradients for Sequential Reinforcement Learning in Spiking Neural Networks

用于尖峰神经网络中顺序强化学习的自适应代理梯度

Sample-efficient and Scalable Exploration in Continuous-Time RL

连续时间RL中的样品高效和可扩展探索

Dual-Mind World Models: A General Framework for Learning in Dynamic Wireless Networks

双思维世界模型:动态无线网络学习的通用框架

Towards Quadrupedal Jumping and Walking for Dynamic Locomotion using Reinforcement Learning

使用强化学习实现动态运动的四足跳跃和行走

Advancing site-specific disease and pest management in precision agriculture: From reasoning-driven foundation models to adaptive, feedback-based learning

推进精准农业中特定地点的病虫害管理:从推理驱动的基础模型到基于反馈的自适应学习

Evolving Diagnostic Agents in a Virtual Clinical Environment

在虚拟临床环境中不断发展的诊断药物

Learning to Drive Safely with Hybrid Options

学习使用混合动力选项安全驾驶

SPICE: Self-Play In Corpus Environments Improves Reasoning

SPICE:语料库环境中的自我游戏改善了推理能力

Greedy Sampling Is Provably Efficient for RLHF

贪婪采样对 RLHF 是可证明有效的

Keyword: diffusion policy

Language-Conditioned Representations and Mixture-of-Experts Policy for Robust Multi-Task Robotic Manipulation

用于鲁棒多任务机器人作的语言条件表示和专家混合策略