Recent Vision-Language Models (VLMs) have demonstrated remarkable capabilities in text and image analysis (like ChatGPT, Gemini, Claude, Grok, Qwen, InternVL, etc.), but they still face significant challenges in processing and understanding video data, regardless of their scale. I focus on identifying failure cases, curating specialized datasets to evaluate these models, and proposing methods to boost accuracy. Advancing video understanding is critical for achieving AGI.
- Education: Ph.D. Student and Graduate Teaching/Research Assistant (Fall 2023 - Present)
Center for Research in Computer Vision (CRCV) - #8 in Computer Vision in the US
University of Central Florida (UCF) - Supervisor: Dr. Yogesh Singh Rawat
- Research focus: Computer Vision, Video Understanding, Action Recognition, Datasets
Publications (Ph.D.)
Main Proceedings
- Punching Bag vs. Punching Person: Motion Transferability in Videos (ICCV ‘25)
Raiyaan Abdullah, Jared Claypoole, Michael Cogswell, Ajay Divakaran, Yogesh Singh Rawat
Workshops
- iSafetyBench: A Video-Language Benchmark for Safety in Industrial Environment (ICCVW ‘25)
Raiyaan Abdullah, Yogesh Singh Rawat, Shruti Vyas - Probing Conceptual Understanding of Large Visual-Language Models (CVPRW ‘24)
Madeline Schiappa, Raiyaan Abdullah, Shehreen Azad, Jared Claypoole, Michael Cogswell, Ajay Divakaran, Yogesh Singh Rawat
Awards
- ORCGS Doctoral Fellowship, UCF (2023-2024)
Reviewer experience
Direct assignment
- CVPR ‘25
Part of CRCV
- CVPR ‘24
- ICCV ‘25
- ICLR ‘25
- ICML ‘24
- NeurIPS ‘24, ‘25