生成时间: 2025-12-10 16:32:06 (UTC+8); Arxiv 发布时间: 2025-12-10 20:00 EST (2025-12-11 09:00 UTC+8)

今天共有 26 篇相关文章

Keyword: reinforcement learning

ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models

ThreadWeaver:语言模型中高效并行推理的自适应线程

Agentic Artificial Intelligence for Ethical Cybersecurity in Uganda: A Reinforcement Learning Framework for Threat Detection in Resource-Constrained Environments

在乌干达伦理网络安全的智能人工智能:资源受限环境中威胁检测的强化学习框架

VLD: Visual Language Goal Distance for Reinforcement Learning Navigation

VLD:强化学习导航的视觉语言目标距离

Benchmarking Offline Multi-Objective Reinforcement Learning in Critical Care

重症护理中离线多目标强化学习的基准测试

An Introduction to Deep Reinforcement and Imitation Learning

深度强化与模仿学习导论

Training LLMs for Honesty via Confessions

通过忏悔训练大语言模型以实现诚实

Scalable Offline Model-Based RL with Action Chunks

可扩展的离线模型驱动强化学习,带有动作块

Universal Adversarial Suffixes for Language Models Using Reinforcement Learning with Calibrated Reward

基于校准奖励强化学习的语言模型通用对抗后缀

Robust Agents in Open-Ended Worlds

开放世界中的稳健智能体

TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models

TreeGRPO:用于在线强化学习扩散模型后训练的树优势GRPO

Empowerment Gain and Causal Model Construction: Children and adults are sensitive to controllability and variability in their causal interventions

赋权获得与因果模型构建:儿童和成人对因果干预的可控性和变异性非常敏感

rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection

rSIM:通过强化策略注入激励LLM的推理能力

Collaborative Intelligence for UAV-Satellite Network Slicing: Towards a Joint QoS-Energy-Fairness MADRL Optimization

无人机-卫星网络切片的协作智能:迈向QOS-能源-公平性联合优化 MADRL

Multi-Agent Deep Reinforcement Learning for Collaborative UAV Relay Networks under Jamming Atatcks

多智能体深度强化学习用于干扰技术下的协作无人机中继网络

Turning Threat into Opportunity: DRL-Powered Anti-Jamming via Energy Harvesting in UAV-Disrupted Channels

将威胁转化为机遇:在无人机干扰的航道中通过能量收集实现日日加速器(DRL)驱动的反干扰

From Accuracy to Impact: The Impact-Driven AI Framework (IDAIF) for Aligning Engineering Architecture with Theory of Change

从准确到影响力:推动工程架构与变革理论对齐的冲击驱动人工智能框架(IDAIF)

Using reinforcement learning to probe the role of feedback in skill acquisition

利用强化学习探究反馈在技能习得中的作用

Optimal Perturbation Budget Allocation for Data Poisoning in Offline Reinforcement Learning

离线强化学习中数据中毒的最佳扰动预算分配

Thinking with Images via Self-Calling Agent

通过自我调用代理用图像思考

Mind to Hand: Purposeful Robotic Control via Embodied Reasoning

心灵与手:通过具身推理实现有目的的机器人控制

Sim2Swim: Zero-Shot Velocity Control for Agile AUV Maneuvering in 3 Minutes

Sim2Swim:零发射速度控制,3分钟内实现灵活AUV机动

Direct transfer of optimized controllers to similar systems using dimensionless MPC

利用无量纲MPC将优化控制器直接转移到类似系统

Learning and Editing Universal Graph Prompt Tuning via Reinforcement Learning

通过强化学习学习和编辑通用图提示调优

Reinforcement Learning From State and Temporal Differences

从状态和时间差异中获得强化学习

IPPO Learns the Game, Not the Team: A Study on Generalization in Heterogeneous Agent Teams

IPPO学会了游戏,而非团队:异质代理团队中泛化的研究

No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers

无标签,无问题:用多模态验证器训练视觉推理者

Keyword: diffusion policy

There is no result