生成时间: 2025-12-09 16:35:29 (UTC+8); Arxiv 发布时间: 2025-12-09 20:00 EST (2025-12-10 09:00 UTC+8)

今天共有 65 篇相关文章

Keyword: reinforcement learning

Video Models Start to Solve Chess, Maze, Sudoku, Mental Rotation, and Raven' Matrices

视频模型开始解题国际象棋、迷宫、数独、心灵旋转和渡鸦矩阵

FishDetector-R1: Unified MLLM-Based Framework with Reinforcement Fine-Tuning for Weakly Supervised Fish Detection, Segmentation, and Counting

FishDetector-R1:基于MLLM的统一框架,支持弱监督下鱼类检测、分段和计数的强化微调

Reinforcement Learning Integrated Agentic RAG for Software Test Cases Authoring

强化学习集成代理RAG用于软件测试用例创作

JaxWildfire: A GPU-Accelerated Wildfire Simulator for Reinforcement Learning

JaxWildfire:一款基于GPU的加速野火模拟器,用于强化学习

Average-reward reinforcement learning in semi-Markov decision processes via relative value iteration

通过相对价值迭代进行半马尔可夫决策过程中的平均奖励强化学习

AI Application in Anti-Money Laundering for Sustainable and Transparent Financial Systems

人工智能在反洗钱中的应用,实现可持续且透明的金融体系

Auto-exploration for online reinforcement learning

在线强化学习的自动探索

Learning When to Switch: Adaptive Policy Selection via Reinforcement Learning

学习何时切换:通过强化学习实现自适应策略选择

Learning Without Time-Based Embodiment Resets in Soft-Actor Critic

无时间化的学习在软演员批评中重置

Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models

南贝阁4-3B技术报告:探索小型语言模型的前沿

A Hybrid Physics-Based and Reinforcement Learning Framework for Electric Vehicle Charging Time Prediction

基于物理的混合型增强学习框架用于电动汽车充电时间预测

ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models

ReCAD:强化学习增强型参数CAD模型生成,结合视觉语言模型

LLM-Upgraded Graph Reinforcement Learning for Carbon-Aware Job Scheduling in Smart Manufacturing

智能制造中碳感知作业调度的大型语言模型升级图强化学习

VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning

VG-Refiner:通过代理强化学习实现工具精细的指称基础推理

RLAX: Large-Scale, Distributed Reinforcement Learning for Large Language Models on TPUs

RLAX:TPU大型语言模型的大规模分布式强化学习

Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control

为什么目标条件强化学习有效:与双重控制的关系

Entropy-Controlled Intrinsic Motivation Reinforcement Learning for Quadruped Robot Locomotion in Complex Terrains

熵控制的内在动机强化学习:复杂地形中四足机器人运动

Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

超越代币级监督:通过强化学习释放基于解码的回归潜力

A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation

A-3PO:加速异步LLM训练,采用陈旧感知的近距离策略近似

Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input

从嘈杂的感官输入中学习人形足球机器人的敏捷前锋技能

MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding

MedGRPO:多任务强化学习用于异质医学视频理解

A New Trajectory-Oriented Approach to Enhancing Comprehensive Crowd Navigation Performance

一种以轨迹为导向的新方法,提升全面人群导航性能

MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment

MIND-V:基于强化学习的远程机器人作分层视频生成

Analyzing Collision Rates in Large-Scale Mixed Traffic Control via Multi-Agent Reinforcement Learning

通过多智能体强化学习分析大规模混合交通控制中的碰撞率

LightSearcher: Efficient DeepSearch via Experiential Memory

LightSearcher:通过体验记忆实现高效的深度搜索

RunawayEvil: Jailbreaking the Image-to-Video Generative Models

RunawayEvil:越狱图像到视频生成模型

The Role of Entropy in Visual Grounding: Analysis and Optimization

熵在视觉基础中的作用:分析与优化

PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance

PrivLLMSwarm:保护隐私的LLM驱动无人机群,实现物联网安全监控

Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning

解耦以泛化:情境优先的自我进化学习,用于数据稀缺的视觉语言推理

JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models

JT-DA:通过工具集成表推理增强数据分析 大型语言模型

An Analysis of Large Language Models for Simulating User Responses in Surveys

模拟调查用户回答的大型语言模型分析

Energy-Efficient Navigation for Surface Vehicles in Vortical Flow Fields

涡流流场中地面车辆的节能导航

Know your Trajectory -- Trustworthy Reinforcement Learning deployment through Importance-Based Trajectory Analysis

了解你的轨迹——通过基于重要性轨迹分析实现可信的强化学习部署

Parent-Guided Semantic Reward Model (PGSRM): Embedding-Based Reward Functions for Reinforcement Learning of Transformer Language Models

家长引导语义奖励模型(PGSRM):基于嵌入的奖励函数用于变换器语言模型的强化学习

Neuro-Vesicles: Neuromodulation Should Be a Dynamical System, Not a Tensor Decoration

神经囊泡:神经调控应是动态系统,而非张量装饰

LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding

多源强化语言(LLM)驱动复合神经架构搜索

Surrogate compliance modeling enables reinforcement learned locomotion gaits for soft robots

替代顺从建模使软机器人能够强化学习的移动步态

TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning

TrajMoE:结合专家与强化学习的场景自适应轨迹规划

Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models

思考-反思-修订:一个基于政策引导的大型视觉语言模型安全对齐反思框架

Less is More: Non-uniform Road Segments are Efficient for Bus Arrival Prediction

少即是多:非均匀的道路段对公交到达预测更有效

MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent Reasoning

MMRPT:通过蒙面视觉依赖推理进行多模态强化预训练

Towards Robust Protective Perturbation against DeepFake Face Swapping

迈向针对深度伪造面部互换的强有力保护干扰

SINRL: Socially Integrated Navigation with Reinforcement Learning using Spiking Neural Networks

SINRL:利用尖峰神经网络进行强化学习的社会整合导航

RVLF: A Reinforcing Vision-Language Framework for Gloss-Free Sign Language Translation

RVLF:无注释手语翻译的强化视觉语言框架

PrivORL: Differentially Private Synthetic Dataset for Offline Reinforcement Learning

PrivORL:用于离线强化学习的差分私有合成数据集

Multi-Rigid-Body Approximation of Human Hands with Application to Digital Twin

多刚体人手近似及数字孪生应用

Training Language Models to Use Prolog as a Tool

训练语言模型以使用 Prolog 作为工具

Adaptive Tuning of Parameterized Traffic Controllers via Multi-Agent Reinforcement Learning

通过多智能体强化学习对参数化流量控制器进行自适应调优

Revolutionizing Mixed Precision Quantization: Towards Training-free Automatic Proxy Discovery via Large Language Models

革新混合精度量化:迈向通过大型语言模型实现无训练自动代理发现

KAN-Dreamer: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models

KAN-Dreamer:作为世界模型中功能近似器的Kolmogorov-Arnold网络基准测试

From Show Programmes to Data: Designing a Workflow to Make Performing Arts Ephemera Accessible Through Language Models

从演出节目到数据:设计工作流程,通过语言模型使表演艺术零碎资料变得可访问

Gait-Adaptive Perceptive Humanoid Locomotion with Real-Time Under-Base Terrain Reconstruction

步态自适应感知类人移动,结合实时基底地形重建

Enhancing Agentic RL with Progressive Reward Shaping and Value-based Sampling Policy Optimization

通过渐进奖励塑造和基于价值的抽样策略优化,增强代理式强化学习

How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations

大型语言模型在代理场景中如何失败?对各种大型语言模型在代理模拟中成功与失败场景的定性分析

Model-Based Reinforcement Learning Under Confounding

基于模型的混杂强化学习

ReLaX: Reasoning with Latent Exploration for Large Reasoning Models

ReLaX:大型推理模型中的潜在探索推理

Understanding Individual Decision-Making in Multi-Agent Reinforcement Learning: A Dynamical Systems Approach

理解多智能体强化学习中的个体决策:一种动态系统方法

Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement

PPO、GRPO和DAPO在LLM推理增强中的比较分析与参数调优

The Agent Capability Problem: Predicting Solvability Through Information-Theoretic Bounds

智能体能力问题:通过信息理论界限预测可解性

SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery

空间梦者:通过主动心理意象激励空间推理

DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous Driving

DiffusionDriveV2:端到端自动驾驶中的强化学习约束截断扩散建模

RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models

RL-MTJail:大型语言模型自动化黑匣子多回合越狱的强化学习

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

关于预训练、中期训练和强化学习在推理语言模型中的相互作用

An Adaptive Multi-Layered Honeynet Architecture for Threat Behavior Analysis via Deep Learning

基于深度学习的自适应多层蜂网架构用于威胁行为分析

Keyword: diffusion policy

Delay-Aware Diffusion Policy: Bridging the Observation-Execution Gap in Dynamic Tasks

延迟感知扩散策略:弥合动态任务中的观察与执行差距