生成时间: 2026-02-27 16:44:28 (UTC+8); Arxiv 发布时间: 2026-02-27 20:00 EST (2026-02-28 09:00 UTC+8)

今天共有 45 篇相关文章

Keyword: reinforcement learning

Graph Your Way to Inspiration: Integrating Co-Author Graphs with Retrieval-Augmented Generation for Large Language Model Based Scientific Idea Generation

你的图带来灵感:将合著者图与检索增强生成整合,用于基于大型语言模型的科学思想生成

SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG

SmartChunk 检索:带高效文档 RAG 规划的查询感知区块压缩

UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs

UpSkill:面向大型语言模型结构化反应多样性的互信息技能学习

Learning Rewards, Not Labels: Adversarial Inverse Reinforcement Learning for Machinery Fault Detection

学习奖励,而非标签:对抗性反强化学习用于机械故障检测

Reinforcement-aware Knowledge Distillation for LLM Reasoning

强化感知知识蒸馏用于大型语言模型推理

Space Syntax-guided Post-training for Residential Floor Plan Generation

空间语法引导住宅平面图生成后培训

A Mathematical Theory of Agency and Intelligence

能动性与智能的数学理论

Agentic AI for Intent-driven Optimization in Cell-free O-RAN

无单元O-RAN中意图驱动优化的代理人工智能

Multilingual Safety Alignment Via Sparse Weight Editing

通过稀疏权重编辑实现多语言安全对齐

Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation

通过优势塑造和长度感知梯度调控实现稳定适应性思维

Towards Faithful Industrial RAG: A Reinforced Co-adaptation Framework for Advertising QA

迈向忠实的工业RAG:广告质量保证的强化共适应框架

Where Relevance Emerges: A Layer-Wise Study of Internal Attention for Zero-Shot Re-Ranking

相关性出现之处:零射重排序内部注意力的层级研究

EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning

EvolveGen:通过强化学习生成基准的算法级硬件模型检查

Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning

压缩简单,探索困难:难度感知熵正则化以实现高效LLM推理

AHBid: An Adaptable Hierarchical Bidding Framework for Cross-Channel Advertising

AHBid:跨渠道广告的可适应层级竞价框架

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

多搜索,少思考:重新思考长远代理搜索以提升效率与概括性

Reinforcing Real-world Service Agents: Balancing Utility and Cost in Task-oriented Dialogue

强化现实世界服务代理:任务导向对话中的效用与成本平衡

Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning

通过翻译器引导强化学习提升VLM中的几何感知

Same Words, Different Judgments: Modality Effects on Preference Alignment

同一句话,不同的判断:模态对偏好对立的影响

RLHFless: Serverless Computing for Efficient RLHF

RLHFless:高效RLHF的无服务器计算

Replacing Multi-Step Assembly of Data Preparation Pipelines with One-Step LLM Pipeline Generation for Table QA

用单步LLM流水线生成替代多步组装的数据准备流程,用于表质量保证

Generative Recommendation for Large-Scale Advertising

大规模广告的生成式推荐

Pixel2Catch: Multi-Agent Sim-to-Real Transfer for Agile Manipulation with a Single RGB Camera

Pixel2Catch:多智能体模拟到现实传输,用于单一RGB摄像头的敏捷作

Know What You Know: Metacognitive Entropy Calibration for Verifiable RL Reasoning

你知道什么:元认知熵校准用于可验证的强化学习推理

Towards Better RL Training Data Utilization via Second-Order Rollout

通过二阶推广实现更好的强化学习训练数据利用

Transformer Actor-Critic for Efficient Freshness-Aware Resource Allocation

Transformer Actor-Critic 以实现高效的新度感知资源分配

QSIM: Mitigating Overestimation in Multi-Agent Reinforcement Learning via Action Similarity Weighted Q-Learning

QSIM:通过动作相似度加权Q-学习缓解多智能体强化学习中的高估

Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving

释放扩散模型在端到端自动驾驶中的潜力

Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks

长期代理任务的组层策略优化

From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

从盲点到收益:大型多模态模型的诊断驱动迭代训练

MSJoE: Jointly Evolving MLLM and Sampler for Efficient Long-Form Video Understanding

MSJoE:联合发展MLLM与采样器,以实现高效的长视频理解

FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning

FactGuard:通过强化学习进行代理视频错误信息检测

A Perspective on Open Challenges in Deformable Object Manipulation

关于可变形物体作中开放挑战的视角

Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization

探索性内存增强LLM代理,通过混合开关策略优化

Learning-based Multi-agent Race Strategies in Formula 1

基于学习的多智能体竞赛策略在一级方程式中

GeoWorld: Geometric World Models

GeoWorld:几何世界模型

Towards Intelligible Human-Robot Interaction: An Active Inference Approach to Occluded Pedestrian Scenarios

迈向可理解的人机交互:一种针对遮蔽行人场景的主动推理方法

Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive

能动性与架构限制:为何基于优化的系统无法实现规范响应

A Model-Free Universal AI

一个无模型的通用人工智能

SPARR: Simulation-based Policies with Asymmetric Real-world Residuals for Assembly

SPARR:基于仿真的策略,具有非对称的现实世界残差用于组装

Physics Informed Viscous Value Representations

物理知情的粘性值表示

Simple Models, Real Swimming: Digital Twins for Tendon-Driven Underwater Robots

简单模型,真实游泳:肌腱驱动水下机器人的数字孪生

MediX-R1: Open Ended Medical Reinforcement Learning

MediX-R1:开放式医疗强化学习

Keyword: diffusion policy

When to Act, Ask, or Learn: Uncertainty-Aware Policy Steering

何时行动、询问或学习:不确定性意识政策引导

GraspLDP: Towards Generalizable Grasping Policy via Latent Diffusion

抓住LDP:通过潜在扩散实现可推广的抓取政策