生成时间: 2025-12-01 16:35:37 (UTC+8); Arxiv 发布时间: 2025-12-01 20:00 EST (2025-12-02 09:00 UTC+8)

今天共有 45 篇相关文章

Keyword: reinforcement learning

GPS: General Per-Sample Prompter

GPS:通用每样本提问器

Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks

目标导向搜索在长上下文记忆任务中优于目标无关的内存压缩

Factors That Support Grounded Responses in LLM Conversations: A Rapid Review

支持LLM对话中扎实回答的因素:快速回顾

Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs

提示策略搜索:通过语言和数值推理在大型语言模型中的强化学习

Heterogeneous Multi-Agent Reinforcement Learning with Attention for Cooperative and Scalable Feature Transformation

异构多智能体强化学习,关注合作且可扩展的特征转换

Selecting User Histories to Generate LLM Users for Cold-Start Item Recommendation

选择用户历史以生成用于冷启动项目推荐的LLM用户

MedEyes: Learning Dynamic Visual Focus for Medical Progressive Diagnosis

MedEyes:学习动态视觉聚焦以实现医学进步诊断

Hybrid Stackelberg Game and Diffusion-based Auction for Two-tier Agentic AI Task Offloading in Internet of Agents

基于两层智能体人工智能任务卸载的混合Stackelberg博弈与扩散拍卖,用于代理互联网中的任务卸载

Adaptive Dueling Double Deep Q-networks in Uniswap V3 Replication and Extension with Mamba

Uniswap V3 复制与扩展中的自适应双深度 Q 网络 Mamba

Representative Action Selection for Large Action Space: From Bandits to MDPs

大行动空间的代表性行动选择:从强盗到移动行动工具

Energy Efficient Sleep Mode Optimization in 5G mmWave Networks via Multi Agent Deep Reinforcement Learning

通过多智能体深度强化学习实现5G毫米波网络中的节能睡眠模式优化

An energy-efficient spiking neural network with continuous learning for self-adaptive brain-machine interface

一个节能的尖峰神经网络,具备持续学习能力,实现自适应脑机接口

PROMPTMINER: Black-Box Prompt Stealing against Text-to-Image Generative Models via Reinforcement Learning and Fuzz Optimization

PROMPTMINER:通过强化学习和模糊优化对文本转图像生成模型的黑箱提示窃取

TinyLLM: Evaluation and Optimization of Small Language Models for Agentic Tasks on Edge Devices

TinyLLM:边缘设备上代理任务小型语言模型的评估与优化

Guiding the Inner Eye: A Framework for Hierarchical and Flexible Visual Grounded Reasoning

引导内心之眼:层级与灵活视觉基础推理的框架

Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information

聚焦思维链:通过结构化输入信息实现高效的大型语言模型推理

BiCQL-ML: A Bi-Level Conservative Q-Learning Framework for Maximum Likelihood Inverse Reinforcement Learning

BiCQL-ML:一种用于最大似然逆强化学习的双级保守Q-学习框架

Optimizing NetGPT via Routing-Based Synergy and Reinforcement Learning

通过基于路由的协同与强化学习优化NetGPT

Embedded Universal Predictive Intelligence: a coherent framework for multi-agent learning

嵌入式通用预测智能:一个多智能体学习的连贯框架

Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation

通过执行反馈强化学习培训高级调度员,实现长期图形界面自动化

Beyond Query-Level Comparison: Fine-Grained Reinforcement Learning for Text-to-SQL with Automated Interpretable Critiques

超越查询级比较:文本转SQL的细粒度强化学习,并具备自动化可解释性批评

Improving Stochastic Action-Constrained Reinforcement Learning via Truncated Distributions

通过截断分布改进随机动作约束强化学习

Exposing Vulnerabilities in RL: A Novel Stealthy Backdoor Attack through Reward Poisoning

揭示强化学习中的漏洞:一种通过奖励毒药进行的新型隐秘后门攻击

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

DeepSeekMath-V2:迈向自我验证的数学推理

GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes

GeoZero:从零开始激励地理空间场景推理

Deadlock-Free Hybrid RL-MAPF Framework for Zero-Shot Multi-Robot Navigation

零点多机器人导航的无死锁混合RL-MAPF框架

ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering

ReAG:基于知识的视觉问答推理增强生成

ORION: Teaching Language Models to Reason Efficiently in the Language of Thought

ORION:教授语言模型高效推理思维语言

Switching-time bioprocess control with pulse-width-modulated optogenetics

采用脉宽调制光遗传学的切换时间生物工艺控制

Language-conditioned world model improves policy generalization by reading environmental descriptions

语言条件世界模型通过阅读环境描述提升政策泛化能力

Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary

通过自由形式语言指挥类人生物:一个具有统一运动词汇的大型语言动作模型

McSc: Motion-Corrective Preference Alignment for Video Generation with Self-Critic Hierarchical Reasoning

MCSC:视频生成中的运动矫正偏好对齐与自我批评层级推理

Evolutionary Discovery of Heuristic Policies for Traffic Signal Control

交通信号控制启发式策略的进化发现

Peer-to-Peer Energy Trading in Dairy Farms using Multi-Agent Reinforcement Learning

利用多智能体强化学习实现奶牛场点对点能源交易

REVEAL: Reasoning-enhanced Forensic Evidence Analysis for Explainable AI-generated Image Detection

揭晓:基于推理的增强法医证据分析,用于可解释的AI生成图像检测

Fault-Tolerant MARL for CAVs under Observation Perturbations for Highway On-Ramp Merging

在高速公路匝道合流观测扰动下,CAV的容错MARL

Adapting Like Humans: A Metacognitive Agent with Test-time Reasoning

像人类一样适应:具备测试时间推理的元认知代理

Emergent Coordination and Phase Structure in Independent Multi-Agent Reinforcement Learning

独立多智能体强化学习中的涌现协调与阶段结构

Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization

歧义意识优化:直接偏好优化的语义消歧

ASTRO: Adaptive Stitching via Dynamics-Guided Trajectory Rollouts

ASTRO:通过动力学引导轨迹展开的自适应缝合

ThetaEvolve: Test-time Learning on Open Problems

ThetaEvolve:开放问题的考试学习

Video-CoM: Interactive Video Reasoning via Chain of Manipulations

视频-漫画:通过作链进行交互式视频推理

Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models

视频-R2:在多模态语言模型中强化一致且扎实的推理

Keyword: diffusion policy

Visual-Geometry Diffusion Policy: Robust Generalization via Complementarity-Aware Multimodal Fusion

视觉几何扩散政策:通过互补性感知多模态融合实现的稳健推广

CAPE: Context-Aware Diffusion Policy Via Proximal Mode Expansion for Collision Avoidance

CAPE:通过近端模式扩展实现上下文感知扩散策略以避免碰撞