生成时间: 2025-10-13 16:31:14 (UTC+8); Arxiv 发布时间: 2025-10-13 20:00 EDT (2025-10-14 08:00 UTC+8)

今天共有 44 篇相关文章

Keyword: reinforcement learning

LadderSym: A Multimodal Interleaved Transformer for Music Practice Error Detection

LadderSym:用于音乐练习错误检测的多模态交错变压器

GRPO-GCC: Enhancing Cooperation in Spatial Public Goods Games via Group Relative Policy Optimization with Global Cooperation Constraint

GRPO-GCC:在全球合作约束下,通过群体相对政策优化,加强空间公共产品博弈合作

PARSE: LLM Driven Schema Optimization for Reliable Entity Extraction

PARSE:用于可靠实体提取的 LLM 驱动模式优化

Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting

不要浪费错误:通过置信度重新加权利用负 RL 组

SAFER-AiD: Saccade-Assisted Foveal-peripheral vision Enhanced Reconstruction for Adversarial Defense

SAFER-AiD:扫视辅助中心凹周边视觉增强重建,用于对抗性防御

Reinforcement Learning-Based Optimization of CT Acquisition and Reconstruction Parameters Through Virtual Imaging Trials

通过虚拟成像试验对CT采集和重建参数进行基于强化学习的优化

Zero-Shot Policy Transfer in Reinforcement Learning using Buckingham's Pi Theorem

使用白金汉圆周率定理的强化学习中的零样本策略转移

Prioritizing Latency with Profit: A DRL-Based Admission Control for 5G Network Slices

以利润优先考虑延迟:基于 DRL 的 5G 网络切片准入控制

Guiding Exploration in Reinforcement Learning Through LLM-Augmented Observations

通过 LLM 增强观察指导强化学习的探索

Reinforcement Learning-Driven Edge Management for Reliable Multi-view 3D Reconstruction

强化学习驱动的边缘管理,实现可靠的多视图 3D 重建

CDE: Concept-Driven Exploration for Reinforcement Learning

CDE:强化学习的概念驱动探索

Model-Based Lookahead Reinforcement Learning for in-hand manipulation

用于手动作的基于模型的前瞻强化学习

Exploring Multi-Temperature Strategies for Token- and Rollout-Level Control in RLVR

探索 RLVR 中令牌和推出级控制的多温度策略

HES-SQL: Hybrid Reasoning for Efficient Text-to-SQL with Structural Skeleton Guidance

HES-SQL:具有结构骨架指导的高效文本转SQL的混合推理

Pinpointing crucial steps: Attribution-based Credit Assignment for Verifiable Reinforcement Learning

确定关键步骤:基于归因的学分分配,用于可验证的强化学习

Unleashing Perception-Time Scaling to Multimodal Reasoning Models

将感知时间缩放到多模态推理模型中

Diagnosing and Mitigating System Bias in Self-Rewarding RL

诊断和减轻自我奖励 RL 中的系统偏差

Rethinking Reasoning in Document Ranking: Why Chain-of-Thought Falls Short

重新思考文档排名中的推理:为什么思维链不足

Tiny-R1V: Lightweight Multimodal Unified Reasoning Model via Model Merging

Tiny-R1V:通过模型合并实现轻量级多模态统一推理模型

DARO: Difficulty-Aware Reweighting Policy Optimization

DARO:难度感知重加权策略优化

HERO: Hardware-Efficient RL-based Optimization Framework for NeRF Quantization

HERO:基于硬件高效的 RL 优化框架,用于 NeRF 量化

TripScore: Benchmarking and rewarding real-world travel planning with fine-grained evaluation

TripScore:通过细粒度评估对现实世界的旅行计划进行基准测试和奖励

Slim Scheduler: A Runtime-Aware RL and Scheduler System for Efficient CNN Inference

Slim Scheduler:用于高效 CNN 推理的运行时感知 RL 和调度器系统

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

攻击者第二步:更强的自适应攻击绕过对 Llm 越狱和提示注入的防御

iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation

iMoWM:驯服机器人纵的交互式多模态世界模型

Robust Driving Control for Autonomous Vehicles: An Intelligent General-sum Constrained Adversarial Reinforcement Learning Approach

自动驾驶汽车鲁棒驾驶控制:一种智能广和约束对抗强化学习方法

Sensing, Detection and Localization for Low Altitude UAV: A RF-Based Framework via Multiple BSs Collaboration

低空无人机的传感、探测和定位:基于射频的多基站协同框架

Leading the Follower: Learning Persuasive Agents in Social Deduction Games

领导追随者:在社交推理游戏中学习说服力代理

Agentic-KGR: Co-evolutionary Knowledge Graph Construction through Multi-Agent Reinforcement Learning

Agentic-KGR:通过多智能体强化学习构建共同进化知识图谱

Hierarchical Semantic RL: Tackling the Problem of Dynamic Action Space for RL-based Recommendations

分层语义 RL:解决基于 RL 的建议的动态行动空间问题

Obstacle Avoidance using Dynamic Movement Primitives and Reinforcement Learning

使用动态运动基元和强化学习的避障

Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models

检测大型语言模型后训练强化学习中的数据污染

CLARity: Reasoning Consistency Alone Can Teach Reinforced Experts

CLARity:仅推理一致性就可以教出强化专家

Spotlight on Token Perception for Multimodal Reinforcement Learning

聚焦多模态强化学习的标记感知

Rate optimal learning of equilibria from data

从数据中对均衡进行最优学习

Safety Game: Balancing Safe and Informative Conversations with Blackbox Agentic AI using LP Solvers

安全游戏:使用 LP 求解器与黑盒代理 AI 平衡安全和信息丰富的对话

Logit Arithmetic Elicits Long Reasoning Capabilities Without Training

Logit 算术无需训练即可引发长时间的推理能力

HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness

提示:帮助无效的推出实现有效性

Scalable Multi-Agent Path Finding using Collision-Aware Dynamic Alert Mask and a Hybrid Execution Strategy

使用碰撞感知动态警报掩码和混合执行策略的可扩展多代理路径查找

Multimodal Policy Internalization for Conversational Agents

对话代理的多模态策略内部化

Mitigating Overthinking through Reasoning Shaping

通过推理塑造减少过度思考

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

SPG:屏蔽扩散语言模型的夹层策略梯度

Guiding Energy-Efficient Locomotion through Impact Mitigation Rewards

通过影响缓解奖励引导节能运动

Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

Dyna-Mind:从经验中学习模拟以获得更好的人工智能代理

Keyword: diffusion policy

There is no result