Junli Wang | UC San Diego

Research

My research focuses on building multimodal digital agents that can perceive, reason, and execute long-horizon tasks across diverse interfaces. I work on:

Foundation models for digital agents, e.g. [Aguvis], [OpenCUA] and [Mocha-Coder].
Scaling digital agent data through large-scale synthetic data mined from the internet, e.g. [AgentTrek], [VideoAgentTrek], and [Jedi].
Evaluating digital agents in the wild, e.g. [OSWorld-Verified], [Computer Agent Arena], and [CocoaBench].
Open-source infrastructure that makes agent rollout and learning practical at scale, such as [NanoRollout].

Beyond my own projects, I have contributed to post-training open-source foundation models for agentic capabilities, including [Qwen3.5] and [Qwen3-Coder]. I mostly work on agentic RL infrastructure that enables agent rollouts in large-scale digital environments and SFT data curation for digital agents.

News

[May. 2026] Checkout our new project [NanoRollout] and model [Mocha-Coder] for efficient digital agent rollouts at scale!
[March. 2025] I will join UC San Diego as a PhD student in ~~Fall 2025~~ Winter 2026.

Selected Projects

Qwen3.5: Towards Native Multimodal Agents

Qwen Team

Model Blog
Qwen3-Coder-Next: Pushing Small Hybrid Models on Agentic Coding

Qwen-Coder Team

Model Blog PDF
Qwen3-Coder: Agentic Coding in the World.

Qwen-Coder Team

Model Blog

Publications

VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos

Dunjie Lu* and Yiheng Xu* and Junli Wang* and Haoyuan Wu and Xinyuan Wang and Zekun Wang and Junlin Yang and Hongjin Su and Jixuan Chen and Junda Chen and Yuchen Mao and Jingren Zhou and Junyang Lin and Binyuan Hui and Tao Yu

TL;DR: VideoAgentTrek mines screen-recorded internet videos into labeled computer-use actions through inverse dynamics, creating web-scale supervision that improves agent pretraining without manual trajectory annotation.

International Conference on Learning Representations (ICLR), 2026.

Project PDF
CocoaBench: Evaluating Unified Digital Agents in the Wild

CocoaBench Team

TL;DR: CocoaBench evaluates unified digital agents on long-horizon tasks that require composing vision, search, and coding, showing that current systems remain far from reliable in the wild.

Blog PDF
Computer Agent Arena: Toward Human-Centric Evaluation and Analysis of Computer-Use Agents

Bowen Wang, Xinyuan Wang, Jiaqi Deng, Tianbao Xie, Ryan Li, Yanzhe Zhang, Junli Wang, Dunjie Lu, Zicheng Gong, Gavin Li, Toh Jing Hua, Wei-Lin Chiang, Ion Stoica, Diyi Yang, Yu Su, Yi Zhang, Zhiguo Wang, Victor Zhong, Tao Yu

TL;DR: Computer Agent Arena is an open-source platform for head-to-head CUA evaluation and a dynamic methodology that converts human preferences into structured feedback in realistic environments.

International Conference on Learning Representations (ICLR), 2026.

PDF
Introducing OSWorld-Verified

OSWorld Team

TL;DR: OSWorld-Verified upgrades OSWorld with repaired tasks, more robust infrastructure, and public verified evaluation to provide more reliable signals for computer-use agent benchmarking.

Blog
OpenCUA: Open Foundations for Computer-Use Agents

Xinyuan Wang*, Bowen Wang*, Dunjie Lu*, Junlin Yang*, Tianbao Xie*, Junli Wang* et al.

TL;DR: OpenCUA releases an open framework for computer-use agents, including demonstration-capture infrastructure, AgentNet data, reasoning-augmented training pipelines, and competitive open-source CUA models.

Neural Information Processing Systems (NeurIPS), 2025.

Project PDF
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Yiheng Xu*, Zekun Wang*, Junli Wang*, Dunjie Lu, Tianbao Xie, Amrita Saha, Doyen Sahoo, Tao Yu, Caiming Xiong

TL;DR: Aguvis builds a unified vision-based GUI agent that operates directly on screenshots with a standardized action space and structured reasoning, achieving strong cross-platform autonomous computer-use performance.

International Conference on Machine Learning (ICML), 2025.

Project PDF
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Yiheng Xu*, Dunjie Lu*, Zhennan Shen*, Junli Wang, Zekun Wang, Yuchen Mao, Caiming Xiong, Tao Yu

TL;DR: AgentTrek turns web tutorials into verified multimodal GUI agent trajectories, providing a scalable and low-cost alternative to human annotation for training stronger web agents.

International Conference on Learning Representations (ICLR), 2025.

Project PDF

Education

Ph.D. in Computer Science and Engineering, UC San Diego. 2026.1 - Present

B.Eng. in Computer Science and Technology (Affiliated with ), Tsinghua University. 2021.9 - 2025.6