生成时间: 2025-11-17 16:31:32 (UTC+8); Arxiv 发布时间: 2025-11-17 20:00 EST (2025-11-18 09:00 UTC+8)

今天共有 20 篇相关文章

Keyword: reinforcement learning

A methodological analysis of prompt perturbations and their effect on attack success rates

即时扰动及其对攻击成功率影响的方法学分析

From Efficiency to Adaptivity: A Deeper Look at Adaptive Reasoning in Large Language Models

从效率到适应性:大型语言模型中自适应推理的深入探讨

Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning

行为策略优化:可证明的更低方差回报估计,用于非策略强化学习

Incorporating Spatial Information into Goal-Conditioned Hierarchical Reinforcement Learning via Graph Representations

通过图表示将空间信息纳入目标条件层级强化学习

When Data is the Algorithm: A Systematic Study and Curation of Preference Optimization Datasets

当数据成为算法:偏好优化数据集的系统研究与整理

Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis

医疗人工智能架构中的数据中毒漏洞:安全威胁分析

ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving

ARCTraj:抽象问题解决中的人类推理轨迹数据集与基准

Scalable Population Training for Zero-Shot Coordination

零发射协调的可扩展人群训练

VIDEOP2R: Video Understanding from Perception to Reasoning

VIDEOP2R:从感知到推理的视频理解

LoRaCompass: Robust Reinforcement Learning to Efficiently Search for a LoRa Tag

LoRaCompass:高效搜索LoRa标签的强化学习

Sashimi-Bot: Autonomous Tri-manual Advanced Manipulation and Cutting of Deformable Objects

刺身机器人:自主三手高级变形物体作与切割

STaR: Towards Cognitive Table Reasoning via Slow-Thinking Large Language Models

STaR:通过慢思考大型语言模型实现认知表格推理

RLSLM: A Hybrid Reinforcement Learning Framework Aligning Rule-Based Social Locomotion Model with Human Social Norms

RLSLM:一种混合强化学习框架,将基于规则的社会运动模型与人类社会规范相结合

MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism

MarsRL:通过增强学习推进多智能体推理系统,结合智能管道并行性

Robust and Efficient Communication in Multi-Agent Reinforcement Learning

多智能体强化学习中的稳健高效通信

Multi-Phase Spacecraft Trajectory Optimization via Transformer-Based Reinforcement Learning

通过基于变压器的强化学习实现多相航天器轨迹优化

Context-aware Adaptive Visualizations for Critical Decision Making

关键决策的上下文感知自适应可视化

Honesty over Accuracy: Trustworthy Language Models through Reinforced Hesitation

诚实胜于准确:通过强化犹豫构建可信语言模型

W2S-AlignTree: Weak-to-Strong Inference-Time Alignment for Large Language Models via Monte Carlo Tree Search

W2S-AlignTree:通过蒙特卡洛树搜索实现大型语言模型的弱到强推断时间比对

Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping

马基雅维利式代理的对齐:通过测试时策略塑造实现行为引导

Keyword: diffusion policy

There is no result