Recent Vision-Language Models (VLMs) have demonstrated remarkable capabilities in text and image analysis (like ChatGPT, Gemini, Claude, Grok, Qwen, InternVL, etc.), but they still face significant challenges in processing and understanding video data, regardless of their scale. I focus on identifying failure cases, curating specialized datasets to evaluate these models, and proposing methods to boost accuracy. Advancing video understanding is critical for achieving AGI.

Education: Ph.D. Student and Graduate Teaching/Research Assistant (Fall 2023 - Present)
Center for Research in Computer Vision (CRCV) - #8 in Computer Vision in the US
University of Central Florida (UCF)
Supervisor: Dr. Yogesh Singh Rawat
Research focus: Computer Vision, Video Understanding, Action Recognition, Datasets

Publications (Ph.D.)

Main Proceedings

Punching Bag vs. Punching Person: Motion Transferability in Videos (ICCV ‘25)
Raiyaan Abdullah, Jared Claypoole, Michael Cogswell, Ajay Divakaran, Yogesh Singh Rawat

Workshops

iSafetyBench: A Video-Language Benchmark for Safety in Industrial Environment (ICCVW ‘25)
Raiyaan Abdullah, Yogesh Singh Rawat, Shruti Vyas
Probing Conceptual Understanding of Large Visual-Language Models (CVPRW ‘24)
Madeline Schiappa, Raiyaan Abdullah, Shehreen Azad, Jared Claypoole, Michael Cogswell, Ajay Divakaran, Yogesh Singh Rawat

Awards

ORCGS Doctoral Fellowship, UCF (2023-2024)

Reviewer experience

Direct assignment

CVPR ‘25

Part of CRCV

CVPR ‘24
ICCV ‘25
ICLR ‘25
ICML ‘24
NeurIPS ‘24, ‘25

Raiyaan Abdullah

Publications (Ph.D.)

Awards

Reviewer experience