生成时间: 2025-12-31 16:33:16 (UTC+8); Arxiv 发布时间: 2025-12-30 20:00 EST (2025-12-31 09:00 UTC+8)

今天共有 51 篇相关文章

Keyword: reinforcement learning

Unbiased Visual Reasoning with Controlled Visual Inputs

带有受控视觉输入的无偏视觉推理

Learning Tennis Strategy Through Curriculum-Based Dueling Double Deep Q-Networks

通过课程对决学习网球策略 双深度Q网络

Physics-Informed Machine Learning for Transformer Condition Monitoring -- Part I: Basic Concepts, Neural Networks, and Variants

基于物理的机器学习用于变压器状态监测——第一部分:基本概念、神经网络及其变体

Emotion-Inspired Learning Signals (EILS): A Homeostatic Framework for Adaptive Autonomous Agents

情感启发学习信号(EILS):一种适应性自主智能体的稳态框架

DiRL: An Efficient Post-Training Framework for Diffusion Language Models

DiRL:扩散语言模型的高效后期训练框架

Masking Teacher and Reinforcing Student for Distilling Vision-Language Models

掩蔽教师与强化学生提炼视觉语言模型

Agentic Software Issue Resolution with Large Language Models: A Survey

大型语言模型下的代理软件问题解决:一项综述

VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning

VideoZoomer:强化学习的时间聚焦用于长视频推理

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

SmartSnap:主动寻找自我验证代理人的证据

PHANTOM: Physics-Aware Adversarial Attacks against Federated Learning-Coordinated EV Charging Management System

幻影:物理感知对联邦学习协调电动汽车充电管理系统的对抗攻击

AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing

AFA-LoRA:通过激活功能退火实现LoRA中的非线性适应

RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure

RollArt:通过拆分基础设施扩展智能强化学习训练

FinPercep-RM: A Fine-grained Reward Model and Co-evolutionary Curriculum for RL-based Real-world Super-Resolution

FinPercep-RM:基于强化学习的细粒度奖励模型与共进化课程,用于基于强化学习的现实世界超分辨率

Optimal Regulation of Nonlinear Input-Affine Systems via an Integral Reinforcement Learning-Based State-Dependent Riccati Equation Approach

通过基于积分强化学习的状态依赖 Riccati 方程方法对非线性输入仿射系统的最优调控

Memento-II: Learning by Stateful Reflective Memory

记忆书二:通过有状态反思记忆学习

Cyber Resilience in Next-Generation Networks: Threat Landscape, Theoretical Foundations, and Design Paradigms

下一代网络中的网络韧性:威胁格局、理论基础与设计范式

FoldAct: Efficient and Stable Context Folding for Long-Horizon Search Agents

FoldAct:长视野搜索代理的高效稳定上下文折叠

Parallel Diffusion Solver via Residual Dirichlet Policy Optimization

通过残差狄利克雷策略优化实现并行扩散求解器

ReDiF: Reinforced Distillation for Few Step Diffusion

ReDiF:少数步骤扩散的强化蒸馏

TEACH: Temporal Variance-Driven Curriculum for Reinforcement Learning

TEACH:基于时间方差的强化学习课程

MARPO: A Reflective Policy Optimization for Multi Agent Reinforcement Learning

MARPO:多智能体强化学习的反思策略优化

AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning

AutoForge:用于智能强化学习的自动化环境综合

Adaptive Trust Consensus for Blockchain IoT: Comparing RL, DRL, and MARL Against Naive, Collusive, Adaptive, Byzantine, and Sleeper Attacks

区块链物联网自适应信任共识:比较强化学习、DRL和MARL对抗天真、串通、自适应、拜占庭和潜伏攻击

Reinforcement Networks: novel framework for collaborative Multi-Agent Reinforcement Learning tasks

强化网络:协作多智能体强化学习任务的新框架

SAMP-HDRL: Segmented Allocation with Momentum-Adjusted Utility for Multi-agent Portfolio Management via Hierarchical Deep Reinforcement Learning

SAMP-HDRL:通过层级深度强化学习实现多代理投资组合管理的分段分配与动量调整效用

Sat-EnQ: Satisficing Ensembles of Weak Q-Learners for Reliable and Compute-Efficient Reinforcement Learning

Sat-EnQ:满足弱Q-学习者群,实现可靠且计算高效的强化学习

Heterogeneity in Multi-Agent Reinforcement Learning

多智能体强化学习中的异质性

APO: Alpha-Divergence Preference Optimization

APO:阿尔法-散度偏好优化

Diversity or Precision? A Deep Dive into Next Token Prediction

多样性还是精准?深入探讨下一个代币预测

Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning

驯服尾巴:通过动态词汇修剪实现的稳定大型语言模型强化学习

Benchmark Success, Clinical Failure: When Reinforcement Learning Optimizes for Benchmarks, Not Patients

基准成功与临床失败:当强化学习优化的是基准,而非患者

A Note on Hybrid Online Reinforcement and Imitation Learning for LLMs: Formulations and Algorithms

关于大型语言模型混合在线强化与模仿学习的说明:表述与算法

Evaluating Parameter Efficient Methods for RLVR

评估RLVR参数效率方法

A Human-Oriented Cooperative Driving Approach: Integrating Driving Intention, State, and Conflict

以人为本的合作驾驶方法:整合驾驶意图、状态与冲突

ViLaCD-R1: A Vision-Language Framework for Semantic Change Detection in Remote Sensing

ViLaCD-R1:一种用于遥感语义变更检测的视觉语言框架

Agentic AI-Enhanced Semantic Communications: Foundations, Architecture, and Applications

代理人工智能增强语义通信:基础、架构与应用

Splitwise: Collaborative Edge-Cloud Inference for LLMs via Lyapunov-Assisted DRL

Splitwise:通过Lyapunov辅助的DRL为LLM开发协作边缘云推断

CME-CAD: Heterogeneous Collaborative Multi-Expert Reinforcement Learning for CAD Code Generation

CME-CAD:异构协作多专家强化学习用于CAD代码生成

AGRO-SQL: Agentic Group-Relative Optimization with High-Fidelity Data Synthesis

AGRO-SQL:具备高精度数据综合的代理群相对优化

The World Is Bigger! A Computationally-Embedded Perspective on the Big World Hypothesis

世界更大了!对大世界假说的计算嵌入视角

Replay Failures as Successes: Sample-Efficient Reinforcement Learning for Instruction Following

将失败作为成功:教学遵循的样本高效强化学习

Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance

利用信息理论指导消除奖励模型中的归纳偏倚

HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation

HY-Motion 1.0:文本到动作生成的比例流匹配模型

Agentic AI for Autonomous Defense in Software Supply Chain Security: Beyond Provenance to Vulnerability Mitigation

软件供应链安全中自主防御的代理人工智能:超越来源到漏洞缓解

Hierarchical Decision Mamba Meets Agentic AI: A Novel Approach for RAN Slicing in 6G

分层决策Mamba遇见代理人工智能:6G中RAN切片的创新方法

PathFound: An Agentic Multimodal Model Activating Evidence-seeking Pathological Diagnosis

PathFound:激活寻求证据的病理诊断的代理多模态模型

ThinkGen: Generalized Thinking for Visual Generation

ThinkGen:视觉生成的通用思维

ProGuard: Towards Proactive Multimodal Safeguard

ProGuard:迈向主动多模态保障

Le Cam Distortion: A Decision-Theoretic Framework for Robust Transfer Learning

Le Cam 失真:稳健迁移学习的决策理论框架

Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation

机器人多巴胺:高精度机器人作的通用过程奖励建模

Training AI Co-Scientists Using Rubric Rewards

使用评分标准奖励培训AI共同科学家

Keyword: diffusion policy

There is no result