生成时间: 2025-11-26 16:31:32 (UTC+8); Arxiv 发布时间: 2025-11-26 20:00 EST (2025-11-27 09:00 UTC+8)

今天共有 45 篇相关文章

Keyword: reinforcement learning

AI-driven Predictive Shard Allocation for Scalable Next Generation Blockchains

AI驱动的预测分片分配,实现可扩展的下一代区块链

SparOA: Sparse and Operator-aware Hybrid Scheduling for Edge DNN Inference

SparOA:边缘DNN推断的稀疏与操作员感知混合调度

Position: The Complexity of Perfect AI Alignment -- Formalizing the RLHF Trilemma

立场:完美AI对齐的复杂性——形式化RLHF三难题

VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning

VideoChat-M1:通过多智能体强化学习实现视频理解的协作策略规划

Discover, Learn, and Reinforce: Scaling Vision-Language-Action Pretraining with Diverse RL-Generated Trajectories

发现、学习与强化:利用多样化强化学习生成轨迹扩展视觉-语言-行动预训练

HunyuanOCR Technical Report

混源OCR技术报告

Learning Massively Multitask World Models for Continuous Control

学习大规模多任务世界模型以实现连续控制

Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs

VLM工具集成推理的尺度化智能体强化学习

Learning to Clean: Reinforcement Learning for Noisy Label Correction

学习清洁:噪声标签纠正的强化学习

CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception

CropVLM:学习缩放以实现细粒度视觉-语言感知

Reinforcement Learning with $ω$-Regular Objectives and Constraints

基于$ω$-正则目标和约束的强化学习

Complex Instruction Following with Diverse Style Policies in Football Games

复杂的教学跟随,足球比赛中采用多样风格政策

Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning

Agent0-VL:探索用于工具集成视觉语言推理的自我演化智能体

Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving

推理-VLA:一种快速且通用的视觉-语言-行动推理模型,用于自动驾驶

Designing Reputation Systems for Manufacturing Data Trading Markets: A Multi-Agent Evaluation with Q-Learning and IRL-Estimated Utilities

为制造数据交易市场设计声誉系统:多智能体评估,结合Q学习和真实关系估计效用

Collaborate sim and real: Robot Bin Packing Learning in Real-world and Physical Engine

协作模拟与现实:机器人垃圾箱打包学习在现实与物理引擎中

Optimize Flip Angle Schedules In MR Fingerprinting Using Reinforcement Learning

利用强化学习优化MR指纹识别中的翻转角度安排

Differential Smoothing Mitigates Sharpening and Improves LLM Reasoning

微分平滑减轻了锐化问题,提升了大型语言模型的推理能力

Toward Trustworthy Digital Twins in Agentic AI-based Wireless Network Optimization: Challenges, Solutions, and Opportunities

迈向基于代理AI的无线网络优化中的可信数字孪生:挑战、解决方案与机遇

HiCoGen: Hierarchical Compositional Text-to-Image Generation in Diffusion Models via Reinforcement Learning

HiCoGen:通过强化学习实现扩散模型中的分层合成文本到图像生成

Boosting Reasoning in Large Multimodal Models via Activation Replay

通过激活重放提升大型多模态模型中的推理能力

OmniRefiner: Reinforcement-Guided Local Diffusion Refinement

OmniRefiner:强化引导局部扩散精炼

Energy Costs and Neural Complexity Evolution in Changing Environments

能源成本与神经复杂性在变化环境中的演化

SOMBRL: Scalable and Optimistic Model-Based RL

SOMBRL:可扩展且乐观的基于模型的强化学习

QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation

启梦内核:基于LLM的高性能GPU内核生成的宏思维微编码范式

From data to concepts via wiring diagrams

从数据到概念,通过接线图

Map-World: Masked Action planning and Path-Integral World Model for Autonomous Driving

Map-World:掩面行动规划与路径整合世界模型用于自动驾驶

Interactive AI NPCs Powered by LLMs: Technical Report for the CPDC Challenge 2025

由大型语言模型驱动的互动式AI NPC:CPDC 2025挑战赛技术报告

Leveraging weights signals - Predicting and improving generalizability in reinforcement learning

利用权重信号——预测和提升强化学习中的泛化性

Quantum-Enhanced Reinforcement Learning for Accelerating Newton-Raphson Convergence with Ising Machines: A Case Study for Power Flow Analysis

量子增强强化学习加速牛顿-拉夫森收敛与伊辛机:功率流分析案例研究

The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation

图像作为自身的奖励:强化学习与对抗性图像生成奖励

DRL-Guided Neural Batch Sampling for Semi-Supervised Pixel-Level Anomaly Detection

用于半监督像素级异常检测的DRL引导神经批次采样

VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs

VKnowU:评估多模态大型语言模型中的视觉知识理解

HAFO: Humanoid Force-Adaptive Control for Intense External Force Interaction Environments

HAFO:针对强烈外部力相互作用环境的类人自适应控制

AD-R1: Closed-Loop Reinforcement Learning for End-to-End Autonomous Driving with Impartial World Models

AD-R1:基于独立世界模型的端到端自动驾驶闭环强化学习

NNGPT: Rethinking AutoML with Large Language Models

NNGPT:重新思考大型语言模型中的自动机器学习

Soft Adaptive Policy Optimization

软自适应策略优化

Complexity Reduction Study Based on RD Costs Approximation for VVC Intra Partitioning

基于 RD 成本近似的 VVC 内部分区复杂性降低研究

BRIC: Bridging Kinematic Plans and Physical Control at Test Time

BRIC:测试时运动计划与物理控制的桥接

DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs

DRAFT-RL:强化学习增强型大型语言模型的多智能体草稿链推理

Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning

Flash-DMD:迈向高保真少步图像生成,兼具高效蒸馏和关节强化学习

Attention Trajectories as a Diagnostic Axis for Deep Reinforcement Learning

注意力轨迹作为深度强化学习的诊断轴

MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

MapReduce LoRA:在生成模型多优先优化中推进帕累托前沿

Reinforcing Action Policies by Prophesying

通过预言强化行动政策

RubricRL: Simple Generalizable Rewards for Text-to-Image Generation

评分标准RL:文本转图像生成的简单可推广奖励

Keyword: diffusion policy

There is no result