Kevin Qinghong Lin

Ph.D. Student

Show Lab
National University of Singapore

Email: kevin.qh.lin [at] gmail.com

Photo taken on Rottnest Island.

Biography

Hi, I am a Ph.D. student in Show Lab @ NUS, working with Prof. Mike Shou.

My research focuses on Video Understanding and Language Models, aiming to develop assistants to streamline human tasks.

News

Awesome-GUI-Agent

2024 July: MovieSeq got accepted by ECCV 2024.
2024 Jun: Check out our new work on GUI automation: VideoGUI and GUI Narrator.
2024 Jun: EgoVLP received Egocentric Vision (EgoVis) Distinguished Paper Award.
2024 May: Recognized as CVPR 2024 Outstanding Reviewers.
2024 Feb: VideoLLM-online, SparseFormer got accepted by CVPR 2024.
2023 Sept: VisorGPT got accepted by NeurIPS 2023.
2023 Aug: EgoVLP received PREMIA Best Student Paper Award (Gold award).
2023 July: UniVTG, EgoVLPv2, TL;DR got accepted by ICCV 2023.
2023 Mar: All-in-one, Afformer got accepted by CVPR 2023.
2022 Sept: EgoVLP got accepted by NeurIPS 2022 as Spotlight.
2022 Aug: Joined Show Lab @ NUS to start my Ph.D. journey!
2022 Jun: EgoVLP won Double Champions of Joint 1st Ego4D and 10th EPIC Workshop, CVPR 2022. [News]

Preprints

	VideoGUI: A Benchmark for GUI Automation from Instructional Videos Kevin QH. Lin, Linjie Li, Difei Gao, Qinchen Wu, Mingyi Yan, Zhengyuan Yang, Lijuan Wang, Mike Z. Shou. Preprint, 2024 [project] [paper] [code (tbd)] Can an agent recreate PowerPoint animation effects from instructional videos?
	CosMo: Contrastive Streamlined Multimodal Model With Interleaved Pre-Training Alex JP. Wang, Linjie Li, Kevin QH. Lin, Jianfeng Wang, Kevin Lin, Zhengyuan Yang, Lijuan Wang and Mike Z. Shou. Preprint, 2023 [project] [paper] [code] [dataset] [MoE] Interleaved vision-text pretraining, with contrastive and generative modeling, MoE scaling.
	AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn Difei Gao, Lei Ji, Luowei Zhou, Kevin QH. Lin, Joya Chen, Zihan Fan, Mike Z. Shou. Preprint, 2023 [project] [paper] A video agent for general video understanding.

Publications

	Learning Video Context as Interleaved Multimodal Sequences. Kevin QH. Lin, Pengchuan Zhang, Difei Gao, Xide Xia, Joya Chen, Ziteng Gao, Jinheng Xie, Xuhong Xiao, Mike Z. Shou. ECCV, 2024 [project] [paper] [code] Video in-context learning using interleaved sequences of images, videos, plots and dialogues.
	VideoLLM-online: Towards Large Video-Language Model for Streaming Video. Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin QH. Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, Mike Z. Shou. CVPR, 2024 [project] [paper] [code] The first streaming video-language model, achieving 10 FPS for long video online processing.
	Bootstrapping SparseFormers from Vision Foundation Models Ziteng Gao, Zhan Tong, Kevin QH. Lin, Joya Chen, Mike Z. Shou. CVPR, 2024 [paper] [code] An efficient pathway to transform vision foundation models into SparseFormer.
	VisorGPT: Learning Visual Prior via Generative Pre-Training Jinheng Xie, Kai Ye, Yudong Li, Yuexiang Li, Kevin QH. Lin, Yefeng Zheng, Linlin Shen, Mike Z. Shou. NeurIPS, 2023 [project] [paper] [code] Model visual prior by language modeling.
	UniVTG: Towards Unified Video-Language Temporal Grounding Kevin QH. Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex JP. Wang, Rui Yan, Mike Z. Shou. ICCV, 2023 [demo] [paper] [code] The first video temporal grounding pretraining model, unifying diverse temporal labels.
	EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone Shraman Pramanick, Yale Song, Sayan Nag, Kevin QH. Lin, Hardik Shah, Mike Z. Shou, Rama Chellappa, Pengchuan Zhang ICCV, 2023 [project] [paper] [code] The new generation of egocentric video-language pre-training.
	Too Large; Data Reduction for Vision-Language Pre-Training Alex JP. Wang, Kevin QH. Lin, David JH. Zhang, Stan WX. Lei, Mike Z. Shou. ICCV, 2023 [paper] [code] Compress large-scale vision-text dataset into a small, high-quality set.
	Affordance Grounding from Demonstration Video to Target Image Joya Chen, Difei Gao, Kevin QH. Lin, Mike Z. Shou. CVPR, 2023 [paper] [code] Learning where to interact (affordance) from demonstration videos.
	All in one: Exploring unified video-language pre-training Alex JP. Wang, Yixiao Ge, Rui Yan, Yuying Ge, Kevin QH. Lin, Satoshi Tsutsui, Xudong Lin, Guanyu Cai, Jianping Wu, Ying Shan, Xiaohu Qie, Mike Z. Shou. CVPR, 2023 [paper] [code] The first unified video-language pretrained model.
	Egocentric Video-Language Pretraining Kevin QH. Lin, Alex JP. Wang, M. Soldan, M. Wray, R. Yan, Eric ZC. Xu, D. Gao, R. Tu, W. Zhao, W. Kong, C. Cai, H. Wang, D. Damen, B. Ghanem, W. Liu, Mike Z. Shou. NeurIPS, 2022. Spotlight (1.7%) [project] [paper] [code] [poster] The first egocentric vision-language pretrained model. EgoVis 2022/2023 Distinguished Paper Award & PREMIA Best Student Paper Award 2023. Double champions in Ego4D & Epic-Kitchens CVPR 2022 challenges.

Projects

VLog: Video as a Long Document

[demo] [code] [twitter]
Given a long video, we turn it into a doc containing visual + audio info. By sending this doc to ChatGPT, we can chat over the video!

Honors

Egocentric Vision (EgoVis) Distinguished Paper Award

2024
CVPR Outstanding Reviewers

2024
PREMIA Best Student Paper Awards, Gold Award

2023
Show Lab Annual Award

2022
NeurIPS Scholar Award

2022
Tencent Rhino-Bird Research Scholarship, Second Prize

2022
1st Place on Ego4D - Object State Change Classiﬁcation Challenge, CVPR

2022
1st Place on EPIC-Kitchens - Multi-Instance Retrieval Challenge, CVPR
2022
China National Scholarship

2018, 2021

Service

Conference Reviewer: CVPR, ICCV, ECCV, ICML, NeurIPS, KDD, EMNLP, AAAI, IJCAI, ICME, ICASSP, ACM MM.
Journal Reviewer: TPAMI, IJCV, TNNLS, TMM, Neurocomputing, Pattern Recognition, etc.
Co-organizer of The AI Talks.