default search action
Difei Gao
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j3]Ziyi Bai, Ruiping Wang, Difei Gao, Xilin Chen:
Event Graph Guided Compositional Spatial-Temporal Reasoning for Video Question Answering. IEEE Trans. Image Process. 33: 1109-1121 (2024) - [c20]Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou:
AssistGUI: Task-Oriented PC Graphical User Interface Automation. CVPR 2024: 13289-13298 - [c19]Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, Mike Zheng Shou:
VideoLLM-online: Online Video Large Language Model for Streaming Video. CVPR 2024: 18407-18418 - [c18]Weixian Lei, Yixiao Ge, Kun Yi, Jianfeng Zhang, Difei Gao, Dylan Sun, Yuying Ge, Ying Shan, Mike Zheng Shou:
VIT-LENS: Towards Omni-modal Representations. CVPR 2024: 26637-26647 - [c17]Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou:
Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces. IJCAI 2024: 5862-5871 - [c16]Difei Gao, Siyuan Hu, Zechen Bai, Qinghong Lin, Mike Zheng Shou:
AssistEditor: Multi-Agent Collaboration for GUI Workflow Automation in Video Creation. ACM Multimedia 2024: 11255-11257 - [i28]Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou:
Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces. CoRR abs/2401.13516 (2024) - [i27]Henry Hengyuan Zhao, Pan Zhou, Difei Gao, Mike Zheng Shou:
LOVA3: Learning to Visual Question Answering, Asking and Assessment. CoRR abs/2405.14974 (2024) - [i26]Kevin Qinghong Lin, Linjie Li, Difei Gao, Qinchen Wu, Mingyi Yan, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou:
VideoGUI: A Benchmark for GUI Automation from Instructional Videos. CoRR abs/2406.10227 (2024) - [i25]Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, Mike Zheng Shou:
VideoLLM-online: Online Video Large Language Model for Streaming Video. CoRR abs/2406.11816 (2024) - [i24]Qinchen Wu, Difei Gao, Kevin Qinghong Lin, Zhuoyu Wu, Xiangwu Guo, Peiran Li, Weichen Zhang, Hengxu Wang, Mike Zheng Shou:
GUI Action Narrator: Where and When Did That Action Take Place? CoRR abs/2406.13719 (2024) - [i23]Kevin Qinghong Lin, Pengchuan Zhang, Difei Gao, Xide Xia, Joya Chen, Ziteng Gao, Jinheng Xie, Xuhong Xiao, Mike Zheng Shou:
Learning Video Context as Interleaved Multimodal Sequences. CoRR abs/2407.21757 (2024) - 2023
- [j2]Difei Gao, Ruiping Wang, Shiguang Shan, Xilin Chen:
CRIC: A VQA Dataset for Compositional Reasoning on Vision and Commonsense. IEEE Trans. Pattern Anal. Mach. Intell. 45(5): 5561-5578 (2023) - [c15]Stan Weixian Lei, Difei Gao, Jay Zhangjie Wu, Yuxuan Wang, Wei Liu, Mengmi Zhang, Mike Zheng Shou:
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task. AAAI 2023: 1250-1259 - [c14]Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing Kwong Chan, Chong-Wah Ngo, Mike Zheng Shou, Nan Duan:
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding. ACL (1) 2023: 8013-8028 - [c13]Joya Chen, Difei Gao, Kevin Qinghong Lin, Mike Zheng Shou:
Affordance Grounding from Demonstration Video to Target Image. CVPR 2023: 6799-6808 - [c12]Difei Gao, Luowei Zhou, Lei Ji, Linchao Zhu, Yi Yang, Mike Zheng Shou:
MIST : Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering. CVPR 2023: 14773-14783 - [c11]Muhammet Ilaslan, Chenan Song, Joya Chen, Difei Gao, Weixian Lei, Qianli Xu, Joo Lim, Mike Zheng Shou:
GazeVQA: A Video Question Answering Dataset for Multiview Eye-Gaze Task-Oriented Collaborations. EMNLP 2023: 10462-10479 - [c10]Kevin Qinghong Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex Jinpeng Wang, Rui Yan, Mike Zheng Shou:
UniVTG: Towards Unified Video-Language Temporal Grounding. ICCV 2023: 2782-2792 - [c9]Parantak Singh, You Li, Ankur Sikarwar, Weixian Lei, Difei Gao, Morgan B. Talbot, Ying Sun, Mike Zheng Shou, Gabriel Kreiman, Mengmi Zhang:
Learning to Learn: How to Continuously Teach Humans and Machines. ICCV 2023: 11674-11685 - [i22]Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Zheng Qin, Mike Zheng Shou:
DeepfakeMAE: Facial Part Consistency Aware Masked Autoencoder for Deepfake Video Detection. CoRR abs/2303.01740 (2023) - [i21]Joya Chen, Difei Gao, Kevin Qinghong Lin, Mike Zheng Shou:
Affordance Grounding from Demonstration Video to Target Image. CoRR abs/2303.14644 (2023) - [i20]Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou:
Mover: Mask and Recovery based Facial Part Consistency Aware Method for Deepfake Video Detection. CoRR abs/2305.05943 (2023) - [i19]Difei Gao, Lei Ji, Luowei Zhou, Kevin Qinghong Lin, Joya Chen, Zihan Fan, Mike Zheng Shou:
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn. CoRR abs/2306.08640 (2023) - [i18]Zhijian Hou, Lei Ji, Difei Gao, Wanjun Zhong, Kun Yan, Chao Li, Wing-Kwong Chan, Chong-Wah Ngo, Nan Duan, Mike Zheng Shou:
GroundNLQ @ Ego4D Natural Language Queries Challenge 2023. CoRR abs/2306.15255 (2023) - [i17]Kevin Qinghong Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex Jinpeng Wang, Rui Yan, Mike Zheng Shou:
UniVTG: Towards Unified Video-Language Temporal Grounding. CoRR abs/2307.16715 (2023) - [i16]Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou:
Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Mapping Recovered Faces. CoRR abs/2308.09921 (2023) - [i15]David Junhao Zhang, Jay Zhangjie Wu, Jia-Wei Liu, Rui Zhao, Lingmin Ran, Yuchao Gu, Difei Gao, Mike Zheng Shou:
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation. CoRR abs/2309.15818 (2023) - [i14]Jay Zhangjie Wu, Xiuyu Li, Difei Gao, Zhen Dong, Jinbin Bai, Aishani Singh, Xiaoyu Xiang, Youzeng Li, Zuwei Huang, Yuanxi Sun, Rui He, Feng Hu, Junhua Hu, Hai Huang, Hanyu Zhu, Xu Cheng, Jie Tang, Mike Zheng Shou, Kurt Keutzer, Forrest N. Iandola:
CVPR 2023 Text Guided Video Editing Competition. CoRR abs/2310.16003 (2023) - [i13]Weixian Lei, Yixiao Ge, Kun Yi, Jianfeng Zhang, Difei Gao, Dylan Sun, Yuying Ge, Ying Shan, Mike Zheng Shou:
ViT-Lens-2: Gateway to Omni-modal Intelligence. CoRR abs/2311.16081 (2023) - [i12]Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou:
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation. CoRR abs/2312.13108 (2023) - 2022
- [c8]Benita Wong, Joya Chen, You Wu, Stan Weixian Lei, Dongxing Mao, Difei Gao, Mike Zheng Shou:
AssistQ: Affordance-Centric Question-Driven Task Completion for Egocentric Assistant. ECCV (36) 2022: 485-501 - [c7]Yuxuan Wang, Difei Gao, Licheng Yu, Weixian Lei, Matt Feiszli, Mike Zheng Shou:
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval. ECCV (35) 2022: 709-725 - [c6]Weixian Lei, Difei Gao, Yuxuan Wang, Dongxing Mao, Zihan Liang, Lingmin Ran, Mike Zheng Shou:
AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant. EMNLP (Findings) 2022: 319-338 - [c5]Kevin Qinghong Lin, Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, Rong-Cheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou:
Egocentric Video-Language Pretraining. NeurIPS 2022 - [i11]Benita Wong, Joya Chen, You Wu, Stan Weixian Lei, Dongxing Mao, Difei Gao, Mike Zheng Shou:
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant. CoRR abs/2203.04203 (2022) - [i10]Yuxuan Wang, Difei Gao, Licheng Yu, Stan Weixian Lei, Matt Feiszli, Mike Zheng Shou:
GEB+: A benchmark for generic event boundary captioning, grounding and text-based retrieval. CoRR abs/2204.00486 (2022) - [i9]Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, Rong-Cheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou:
Egocentric Video-Language Pretraining. CoRR abs/2206.01670 (2022) - [i8]Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, Rong-Cheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou:
Egocentric Video-Language Pretraining @ Ego4D Challenge 2022. CoRR abs/2207.01622 (2022) - [i7]Stan Weixian Lei, Difei Gao, Jay Zhangjie Wu, Yuxuan Wang, Wei Liu, Mengmi Zhang, Mike Zheng Shou:
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task. CoRR abs/2208.12037 (2022) - [i6]Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing Kwong Chan, Chong-Wah Ngo, Zheng Shou, Nan Duan:
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding. CoRR abs/2209.10918 (2022) - [i5]Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing Kwong Chan, Chong-Wah Ngo, Zheng Shou, Nan Duan:
An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022. CoRR abs/2211.08776 (2022) - [i4]Difei Gao, Luowei Zhou, Lei Ji, Linchao Zhu, Yi Yang, Mike Zheng Shou:
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering. CoRR abs/2212.09522 (2022) - 2021
- [c4]Difei Gao, Ruiping Wang, Ziyi Bai, Xilin Chen:
Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments. ICCV 2021: 1655-1665 - [i3]Stan Weixian Lei, Yuxuan Wang, Dongxing Mao, Difei Gao, Mike Zheng Shou:
AssistSR: Affordance-centric Question-driven Video Segment Retrieval. CoRR abs/2111.15050 (2021) - 2020
- [j1]Difei Gao, Ruiping Wang, Shiguang Shan, Xilin Chen:
Learning to Recognize Visual Concepts for Visual Question Answering With Structural Label Space. IEEE J. Sel. Top. Signal Process. 14(3): 494-505 (2020) - [c3]Difei Gao, Ke Li, Ruiping Wang, Shiguang Shan, Xilin Chen:
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text. CVPR 2020: 12743-12753 - [i2]Difei Gao, Ke Li, Ruiping Wang, Shiguang Shan, Xilin Chen:
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text. CoRR abs/2003.13962 (2020)
2010 – 2019
- 2019
- [i1]Difei Gao, Ruiping Wang, Shiguang Shan, Xilin Chen:
From Two Graphs to N Questions: A VQA Dataset for Compositional Reasoning on Vision and Commonsense. CoRR abs/1908.02962 (2019) - 2017
- [c2]Difei Gao, Ruiping Wang, Shiguang Shan, Xilin Chen:
Visual Textbook Network: Watch Carefully before Answering Visual Questions. BMVC 2017 - 2015
- [c1]Difei Gao, Lili Pan, Risheng Liu, Rui Chen, Mei Xie:
Correlated warped Gaussian processes for gender-specific age estimation. ICIP 2015: 133-137
Coauthor Index
aka: Kevin Qinghong Lin
aka: Mike Zheng Shou
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-07 20:35 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint