生成时间: 2025-11-21 16:31:35 (UTC+8); Arxiv 发布时间: 2025-11-21 20:00 EST (2025-11-22 09:00 UTC+8)

今天共有 32 篇相关文章

Keyword: reinforcement learning

Integrated 4D/5D Digital-Twin Framework for Cost Estimation and Probabilistic Schedule Control: A Texas Mid-Rise Case Study

集成4D/5D数字孪生框架用于成本估算与概率进度控制:德克萨斯中层住宅案例研究

MACIE: Multi-Agent Causal Intelligence Explainer for Collective Behavior Understanding

MACIE:多智能体因果智能集体行为理解解说

Extending Test-Time Scaling: A 3D Perspective with Context, Batch, and Turn

扩展测试时间缩放:结合上下文、批处理和转向的三维视角

Thinking, Faithful and Stable: Mitigating Hallucinations in LLMs

思维、忠实且稳定:缓解大型语言模型中的幻觉

KRAL: Knowledge and Reasoning Augmented Learning for LLM-assisted Clinical Antimicrobial Therapy

KRAL:知识与推理增强学习用于LLM辅助临床抗菌治疗

HGCN2SP: Hierarchical Graph Convolutional Network for Two-Stage Stochastic Programming

HGCN2SP:两阶段随机规划的层级图卷积网络

Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning

Agent0:通过工具集成推理从零数据释放自我进化的智能体

Bellman Memory Units: A neuromorphic framework for synaptic reinforcement learning with an evolving network topology

Bellman 记忆单元:一个具有演进网络拓扑结构的神经形态框架,用于突触强化学习

A Mathematical Framework for Custom Reward Functions in Job Application Evaluation using Reinforcement Learning

基于强化学习的求职申请评估中自定义奖励函数的数学框架

A Hybrid Proactive And Predictive Framework For Edge Cloud Resource Management

一个混合、主动与预测的边缘云资源管理框架

VideoSeg-R1:Reasoning Video Object Segmentation via Reinforcement Learning

VideoSeg-R1:通过强化学习进行视频对象分割推理

SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent

SkyRL 代理:多回合大型语言模型代理的高效强化学习训练

An Image Is Worth Ten Thousand Words: Verbose-Text Induction Attacks on VLMs

一张图片胜过一万字:对VLM的冗长文本归纳攻击

Pass@k Metric for RLVR: A Diagnostic Tool of Exploration, But Not an Objective

Pass@k RLVR的指标:一种探索的诊断工具,但不是客观

Revisiting Fairness-aware Interactive Recommendation: Item Lifecycle as a Control Knob

重新审视公平意识的互动推荐:作为控制旋钮的项目生命周期

Optimizing Operation Recipes with Reinforcement Learning for Safe and Interpretable Control of Chemical Processes

通过强化学习优化作配方,实现化学过程的安全且可解释的控制

Safe and Optimal Variable Impedance Control via Certified Reinforcement Learning

通过认证增强学习实现安全且最优的可变阻抗控制

Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement

将自我重写融入大型语言模型推理强化

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

OpenMMReasoner:以开放且通用的方案推动多模态推理的前沿

Flow-Aided Flight Through Dynamic Clutters From Point To Motion

流动辅助飞行穿越动态杂波,从点到运动

LAOF: Robust Latent Action Learning with Optical Flow Constraints

LAOF:具有光流约束的稳健潜在动作学习

A Comparison Between Decision Transformers and Traditional Offline Reinforcement Learning Algorithms

决策变换器与传统离线强化学习算法的比较

Limitations of Scalarisation in MORL: A Comparative Study in Discrete Environments

MORL中标量化的局限性:离散环境下的比较研究

Large Language Model-Based Reward Design for Deep Reinforcement Learning-Driven Autonomous Cyber Defense

基于大型语言模型的奖励设计,用于深度强化学习驱动的自主网络防御

Green Resilience of Cyber-Physical Systems: Doctoral Dissertation

网络物理系统的绿色韧性:博士论文

Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization

连接VLM与具身智能与刻意实践策略优化

Stabilizing Policy Gradient Methods via Reward Profiling

通过奖励画像稳定政策梯度方法

Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations

智能镜片的灵巧度:多指机器人控与野外人类演示

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

驯服长尾:利用自适应绘图者的高效强化学习推理

SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation

SceneDesigner:可控多对象图像生成,支持9景深姿态作

Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO

视频即答案:用联合-GRPO预测并生成下一个视频事件

Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation

边思考边生成:文本推理贯穿视觉生成

Keyword: diffusion policy

There is no result