Bingyi Kang's Homepage

Bingyi Kang

Email: bingykang [at] gmail [dot] com

I am a research scientist at TikTok, Seattle. My primary research interests are computer vision, multi-modal models and decision making. My goal is to develop agents that can acquire knowledge from various observations and interact with the physical world. I approach the goal from the following perspectives:

Dealing with arbitrary data in real life (e.g., long-tailed, unlabeled, synthetic, etc).
Recovering (physical and semantic) knowledge of the world from observations.
Effectively and efficiently utilizing the knowledge for interaction.

Previously, I was a research scientist at the Sea AI Lab. I received my PhD from National University of Singapore, advised by Prof. Jiashi Feng. I was also fortunate to have worked as a visiting researcher at UC Berkeley, under the supervision of Prof. Trevor Darrell. During my PhD study, I also interned at Facebook AI Research, working with Saining Xie, Yannis Kalantidis, and Marcus Rohrbach.

I am leading the development of Depth Anything series. We have multiple intern and FTE positions (on 3D&4D foundation models) open for application. Feel free to drop me an email if you are interested.

News

2024-11: Our work, Phyworld, investigating whether video generation models can learn physical laws, has been released. It sparked heated discussions on X and was highlighted by many researchers. See details at this URL.
2024-06: Depth Anything V2 was optimized for Apple Neural Engine and selected as official Core ML Models by Apple. Now, you can use is for fun on-device applications.
2024-06: Depth Anything V1 and V2 were intergrated into transformers, the popular codebase for state-of-the-art machine learning by Hugging Face. See documents at dav1 and dav2.

Recent and Selected Publications

(* denotes equal contribution; † denotes project lead.) For the full publication list, please go to Google Scholar.

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Sili Chen, Hengkai Guo, Shengnan Zhu, Feihu Zhang, Zilong Huang, Jiashi Feng, Bingyi Kang

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025

Project Page / Paper / Code / Demo

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, Bingyi Kang

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025

Project Page / Paper / Code / Demo / Interactive Results

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Zhongwei Ren, Yunchao Wei, Xun Guo, Yao Zhao, Bingyi Kang, Jiashi Feng, Xiaojie Jin

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025

Project Page / Paper / Code

Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models

Xinghang Li, Peiyan Li, Minghuan Liu, Dong Wang, Jirong Liu, Bingyi Kang, Xiao Ma, Tao Kong, Hanbo Zhang, Huaping Liu

Tech Report, 2024

Project Page / Paper / Code / Model / Dataset / Videos

How Far is Video Generation from World Model? -- A Physical Law Perspective

Bingyi Kang*, Yang Yue*, Rui Lu, Zhijie Lin, Yang Zhao, Kaixin Wang, Gao Huang, Jiashi Feng

*Equal Contribution in alphabetical order

Tech Report, 2024

Project Page / Paper / Code / Data / Video / Media

Classification Done Right for Vision-Language Pre-Training

Zilong Huang, Qinghao Ye, Bingyi Kang, Jiashi Feng, Haoqi Fan

Advances in Neural Information Processing Systems (NeurIPS), 2024

Paper / Code

Image Understanding Makes for A Good Tokenizer for Image Generation

Luting Wang, Yang Zhao, Zijian Zhang, Jiashi Feng, Si Liu, Bingyi Kang†

Advances in Neural Information Processing Systems (NeurIPS), 2024

Paper / Code

Depth Anything V2

Lihe Yang, Bingyi Kang†, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao

Advances in Neural Information Processing Systems (NeurIPS), 2024

Project Page / Paper / Code / Demo / Media

MADiff: Offline Multi-agent Learning with Diffusion Models

Zhengbang Zhu, Minghuan Liu, Liyuan Mao, Bingyi Kang, Minkai Xu, Yong Yu, Stefano Ermon, Weinan Zhang

Advances in Neural Information Processing Systems (NeurIPS), 2024

Paper / Code

Improving Token-Based World Models with Parallel Observation Prediction

Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor

International Conference on Machine Learning (ICML), Vienna, Austria, 2024

Paper / Code

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Lihe Yang, Bingyi Kang†, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2024

Project Page / Paper / Code / Demo / Media

Understanding, predicting and better resolving q-value divergence in offline-rl

Yang Yue*, Rui Lu*, Bingyi Kang*, Shiji Song, Gao Huang

Advances in Neural Information Processing Systems (NeurIPS), New Orleans, USA, 2023

Project Page / Paper / Code / Slides / Poster

It presents SEEM, a beautiful theoretical framework that perfectly explains the Q-value divergence problem in offline RL. It can reliably predict upcoming divergence through the largest eigenvalue of a kernel matrix and accurately characterize the growth order of diverging Q-values.