Bingyi Kang

Email: bingykang [at] gmail [dot] com

Scholar  |  Twitter  |  Github

I am a research scientist at TikTok. My primary research interests are computer vision, multi-modal models and decision making. My goal is to develop agents that can acquire knowledge from various observations and interact with the physical world. I approach the goal from the following perspectives:

  • Dealing with arbitrary data in real life (e.g., long-tailed, unlabeled, synthetic, etc).
  • Recovering (physical and semantic) knowledge of the world from observations.
  • Effectively and efficiently utilizing the knowledge for interaction.

Previously, I was a research scientist at the Sea AI Lab. I received my PhD from National University of Singapore, advised by Prof. Jiashi Feng. I was also fortunate to have worked as a visiting researcher at UC Berkeley, under the supervision of Prof. Trevor Darrell. During my PhD study, I also interned at Facebook AI Research, working with Saining Xie, Yannis Kalantidis, and Marcus Rohrbach.

I am leading the development of Depth Anything series. Academic discussion and collaboration are always welcome! Meanwhile, we have intern positions (on 3D&4D foundation models) open for application. Feel free to drop me an email if you are interested.

Recent and Selected Publications

(* denotes equal contribution; † denotes project lead.) For the full publication list, please go to Google Scholar.

How Far is Video Generation from World Model? -- A Physical Law Perspective
Bingyi Kang*, Yang Yue*, Rui Lu, Zhijie Lin, Yang Zhao, Kaixin Wang, Gao Huang, Jiashi Feng
*Equal Contribution in alphabetical order
Tech Report, 2024
Project Page /  Paper /  Code /  Data /  Video /  Media 
Classification Done Right for Vision-Language Pre-Training
Zilong Huang, Qinghao Ye, Bingyi Kang, Jiashi Feng, Haoqi Fan
Advances in Neural Information Processing Systems (NeurIPS), 2024
Paper /  Code
Image Understanding Makes for A Good Tokenizer for Image Generation
Luting Wang, Yang Zhao, Zijian Zhang, Jiashi Feng, Si Liu, Bingyi Kang
Advances in Neural Information Processing Systems (NeurIPS), 2024
Paper /  Code
Depth Anything V2
Lihe Yang, Bingyi Kang†, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao
Advances in Neural Information Processing Systems (NeurIPS), 2024
Project Page /  Paper /  Code  /  Demo /  Media
 
MADiff: Offline Multi-agent Learning with Diffusion Models
Zhengbang Zhu, Minghuan Liu, Liyuan Mao, Bingyi Kang, Minkai Xu, Yong Yu, Stefano Ermon, Weinan Zhang
Advances in Neural Information Processing Systems (NeurIPS), 2024
Paper /  Code
Improving Token-Based World Models with Parallel Observation Prediction
Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor
International Conference on Machine Learning (ICML), Vienna, Austria, 2024
Paper /  Code
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang, Bingyi Kang†, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2024
Project Page /  Paper /  Code  /  Demo /  Media
 
Understanding, predicting and better resolving q-value divergence in offline-rl
Yang Yue*, Rui Lu*, Bingyi Kang*, Shiji Song, Gao Huang
Advances in Neural Information Processing Systems (NeurIPS), New Orleans, USA, 2023
Project Page /  Paper /  Code  /  Slides /  Poster
It presents SEEM, a beautiful theoretical framework that perfectly explains the Q-value divergence problem in offline RL. It can reliably predict upcoming divergence through the largest eigenvalue of a kernel matrix and accurately characterize the growth order of diverging Q-values.
Efficient diffusion policies for offline reinforcement learning
Bingyi Kang*, Xiao Ma*, Chao Du, Tianyu Pang, Shuicheng Yan
Advances in Neural Information Processing Systems (NeurIPS), New Orleans, USA, 2023
Paper /  Code
Mutual Information Regularized Offline Reinforcement Learning
Xiao Ma*, Bingyi Kang*, Zhongwen Xu, Min Lin, Zhongwen Xu, Shuicheng Yan
Advances in Neural Information Processing Systems (NeurIPS), New Orleans, USA, 2023
Paper /  Code
FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models
Lihe Yang, Xiaogang Xu, Bingyi Kang, Yinghuan Shi, Hengshuang Zhao
Advances in Neural Information Processing Systems (NeurIPS), New Orleans, USA, 2023
Paper /  Code
Exploring balanced feature spaces for representation learning
Bingyi Kang, Yu Li, Sa Xie, Zehuan Yuan, Jiashi Feng
International Conference on Learning Representations (ICLR), 2021
Paper /  Code
Decoupling Representation and Classifier for Long-Tailed Recognition
Bingyi Kang, Saining Xie Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis
International Conference on Learning Representations (ICLR), 2020
Paper /  Code  /  Slides /  Talk
Few-shot object detection via feature reweighting
Bingyi Kang*, Zhuang Liu*, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell
International Conference on Computer Vision (ICCV), 2019
Paper /  Code
Policy Optimization with Demonstrations
Bingyi Kang*, Zequn Jie, Jiashi Feng
International conference on machine learning (ICML), 2018
Paper

Open Projects

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Yang Zhao*, Zhijie Lin*, Daquan Zhou, Zilong Huang, Jiashi Feng and Bingyi Kang
Open Project, 2023
Project Page /  Paper /  Code

Services

  • Reviewer: ICML, NeurIPS, ICLR, CVPR, ICCV, ECCV, TPAMI, etc