Srijan Das

I am an Assistant Professor in the Department of Computer Science at the University of North Carolina at Charlotte. At UNC Charlotte, I am working on Video Representation Learning, and Robotic Vision. I am a member of the AI4Health Center and one of the founding members of the Charlotte Machine Learning Lab (CharMLab) at UNC Charlotte.

Before this, I was a Postdoctoral Associate at Stony Brook University under the supervision of Michael S. Ryoo. In 2020, I completed my Ph.D. in Computer Science at INRIA, Sophia Antipolis, France under the supervision of Francois Bremond and Monique Thonnat. My Ph.D. thesis is on ¨Spatio-temporal attention mechanisms for Action Recognition¨ and click here to watch my Defense Presentation. I did my Post-Grad in Computer Science from the National Institute of Technology (NIT), Rourkela.

Email / CV / Bio / Google Scholar / Twitter / Github

Research

My research focuses on representation learning in long videos, leveraging vision-language models, multimodal inputs, and both egocentric and exocentric viewpoints. I am also interested in vision-language-action models for robot learning and the use of diffusion models for video generation.

News

2025

Jun: 2 papers accepted to ICCV 2025 (Gecko and MaskHand).

May: Outstanding Reviewer for CVPR 2025.

May: Our research "Computer model aims to enhance video technology" aired on WSOC-TV.

Apr: Serving as Area Chair for NeurIPS 2025.

Feb: LLAVIDAL accepted to CVPR 2025. [Featured Story]

Feb: 3rd-place in Elderly Action Recognition Challenge - WACV 2025.

2024

Dec: 2 papers accepted to AAAI 2025.

Oct: 3 papers accepted to NeurIPS 2024 workshops; early version of LLAVIDAL presented at VLM workshop.

Jul: 2 papers accepted to ECCV 2024, 1 to ACM MM 2024.

Apr: DeepFake Generation paper accepted at CVPRW 2024.

Feb: 3 papers accepted to CVPR 2024.

2023

Oct: Paper accepted to WACV 2024.

Aug: DeepFake detection paper at ICCVW 2023 and CLIP for Action Detection at BMVC 2023.

Jul: CMMC received Best Poster Award at MVA 2023.

May: Serving as SPC at AAAI 2024 for the second time.

May: Dominick Reilly awarded Chateaubriand Fellowship.

Feb: First NSF Grant awarded – Link.

Jan: Paper accepted to ISBI 2023 with Stony Brook Medicine collaborators.

2022

Sep: Two papers accepted to NeurIPS 2022.

Aug: Paper accepted to WACV 2023 (first round).

Aug: Joined UNC Charlotte as Assistant Professor.

Lab members

Current PhD Students

Dominick Reilly
Arkaprava Sinha
Manish Kumar Govind (co-supervised with Prof. Pu Wang)
Weston Bondurant (co-supervised with Prof. Stephanie Schuckers)
Wenhao Chi

Other Current Students

Nitin Chandrasekhar (UG student at UNC Charlotte)
Drew O’Donnell (UG student at UNC Charlotte)

Selected Publications (For full list of papers, visit my Google Scholar.)

Preprints
	From My View to Yours: Ego-Augmented Learning in Large Vision Language Models for Understanding Exocentric Daily Living Activities Dominick Reilly, Manish Kumar Govind, Le Xue, and Srijan Das. Preprint arXiv / Code We leverage the complementary nature of egocentric views to enhance LVLM’s understanding of exocentric ADL videos through online ego2exo distillation.
	MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection Arkaprava Sinha, Monish Soundar Raj, Pu Wang, Ahmed Helmy, and Srijan Das. Preprint arXiv / code MS-Temba is the first Mamba based architecture for action detection in long untrimmed videos that can be trained/tested on NVIDIA Jetson Nano.
2025
	GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology Saarthak Kapse, Pushpak Pati, Srikar Yellapragada, Srijan Das , Rajarsi R. Gupta, Joel Saltz, Dimitris Samaras, Prateek Prasanna. To Appear in ICCV 2025 arXiv / Code Gigapixel Vision-Concept Knowledge Contrastive pretraining (GECKO) aligns WSIs with a Concept Prior for delivering clinically meaningful interpretability.
	MaskHand: Generative Masked Modeling for Robust Hand Mesh Reconstruction in the Wild Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Mayur Jagdishbhai Patel, Hongfei Xue, Ahmed Helmy, Srijan Das, Pu Wang. To Appear in ICCV 2025 arXiv / Website A novel generative masked model for hand mesh recovery that synthesizes plausible 3D hand meshes.
	LLAVIDAL : A Large LAnguage VIsion Model for Daily Activities of Living Dominick Reilly, Rajatsubhra Chakraborty, Arkaprava Sinha, Manish Kumar Govind, Pu Wang, Francois Bremond, Le Xue, Srijan Das. CVPR 2025 arXiv / website / code LLAVIDAL, a Large Language Vision Model, incorporates 3D poses and relevant object trajectories to understand the intricate spatiotemporal relationships within ADLs.
	SKI Models: SKeleton Induced Vision-Language Embeddings for Understanding Activities of Daily Living Arkaprava Sinha, Dominick Reilly, Francois Bremond, Pu Wang, and Srijan Das. AAAI 2025 arXiv / Code Ski-models introduce 3D skeletons into the vision-language embedding space to enable effective zeroshot learning for ADL.
	GenHMR: Generative Human Mesh Recovery Muhammad Usama Saleem , Ekkasit Pinyoanuntapong, Pu Wang, Hongfei Xue, Srijan Das, Chen Chen. AAAI 2025 arXiv / Website A generative framework that reformulates monocular HMR as an image-conditioned generative task.
2024
	Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer Wenhan Wu, Ce Zheng, Zihao Yang, Chen Chen, Srijan Das, Aidong Lu. ACM MM 2024 arXiv / code A frequency-aware attention module to unweave skeleton frequency representations for action recognition.
	Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier Prantik Howlader, Srijan Das, Hieu Le, Dimitris Samaras. ECCV 2024 arXiv / code A novel plug-in module designed for existing semi-supervised segmentation frameworks that offers patch-level supervision.
	BAMM: Bidirectional Autoregressive Motion Model Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Pu Wang, Minwoo Lee, Srijan Das, Chen Chen. ECCV 2024 arXiv / website / code A novel text-to-motion generation framework. BAMM captures rich and bidirectional dependencies among motion tokens.
	Just Add π! Pose Induced Video Transformers for Understanding Activities of Daily Living Dominick Reilly and Srijan Das. CVPR 2024 arXiv / code We introduce the first Pose Induced Video Transformer: PI-ViT (or π-ViT), a novel approach that augments the RGB representations learned by video transformers with 2D and 3D pose information.
	SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology Saarthak Kapse^, Pushpak Pati^, Srijan Das, Jingwei Zhang, Chao Chen, Maria Vakalopoulou, Joel Saltz, Dimitris Samaras, Rajarsi Gupta, Prateek Prasanna. CVPR 2024 arXiv / code Self-Interpretable MIL (SI-MIL), the first interpretable-by-design MIL method for gigapixel WSIs, which provides de novo feature-level interpretations grounded on pathological insights for a WSI.
	Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception? Aritra Dutta, Srijan Das , Jacob Nielsen, Rajatsubhra Chakraborty, Mubarak Shah. CVPR 2024 arXiv / Website We present MAVREC, a video dataset where we record synchronized scenes from different perspectives -- ground camera and drone-mounted camera.
	Attention de-sparsification Matters: Inducing Diversity in Digital Pathology Representation Learning Saarthak Kapse, Srijan Das, Jingwei Zhang, Rajarsi R. Gupta, Joel Saltz, Dimitris Samaras, Prateek Prasanna. Medical Image Analysis (IF 10.9) arXiv A diversity-inducing pretraining technique, tailored to enhance representation learning in digital pathology.
	Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders Srijan Das, Tanmay Jain, Dominick Reilly, Pranav Balaji, Soumyajit Karmakar, Shyam Marjit, Xiang Li, Abhijit Das, and Michael S. Ryoo. WACV 2024 arXiv / code / Poster / Video This paper shows that jointly optimizing ViTs for the primary task and a Self-Supervised Auxiliary Task is surprisingly beneficial when the amount of training data is limited.
2023
	Attributes-Aware Network for Temporal Action Detection Rui Dai, Srijan Das, Michael S. Ryoo, Francois Bremond. BMVC 2023 arXiv / video This paper explains how to utilize OpenAI's CLIP for long-term action detection in videos.
	Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning Srijan Das and Michael S. Ryoo. 18th International Conference on Machine Vision Applications , July 2023 arXiv / Poster / Best Poster Award This paper focuses on designing video augmentation for self-supervised learning, we propose CMMC to make use of other modalities in videos for data mixing.
	ViewCLR: Learning Self-supervised Video Representation for Unseen Viewpoints Srijan Das, and Michael S. Ryoo. WACV 2023 arXiv A framework for learning self-supervised video representation that is invariant to unseen camera viewpoints.
2022
	Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels? Xiang Li, Jinghuan Shang, Srijan Das, Michael S. Ryoo. NeurIPS 2022 arXiv / code The impacts of the existing self-supervised losses with Joint Learning framework for RL is limited, while there is no golden method that can dominate all tasks.
	Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space Jinghuan Shang, Srijan Das, Michael S. Ryoo. NeurIPS 2022 arXiv / Project Page / code 3DTRL is a light-weighted, plug-and play layer that recovers 3D information of visual tokens and leverages it for learning viewpoint-agnostic representations.
	Toyota Smarthome Untrimmed: Real-World Untrimmed Videos for Activity Detection Rui Dai, Srijan Das, Saurav Sharma, Luca Minciullo, Lorenzo Garattoni, François Brémond, Gianpiero Francesca. T-PAMI 2022 Project Link / Code TSU is a new untrimmed daily-living dataset consisting of 51 activities performed in a spontaneous manner, captured from non-optimal viewpoints.
	MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection Rui Dai, Srijan Das, Kumara Kahatapitiya, Michael S. Ryoo, Francois Bremond. CVPR 2022 arXiv / code A ConvTransformer network that explores global and local temporal relations at multiple resolutions.
2021
	VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living Srijan Das, Rui Dai, Di Yang, Francois Bremond, TPAMI, 2021 arXiv / code VPN++ is an extension of our VPN model (ECCV 2020). VPN++ hallucinates pose driven features while not requiring costly 3D Poses at inference.
	CTRN: Class Temporal Relational Network for Action Detection Rui Dai, Srijan Das, Francois Bremond. BMVC 2021, Oral
	Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection Rui Dai, Srijan Das, Francois Bremond. ICCV 2021
	PDAN: Pyramid Dilated Attention Network for Action Detection. Rui Dai, Srijan Das, Luca Minciullo, Lorenzo Garattoni, Gianpiero Francesca and Francois Bremond. WACV 2021 Code / Video / Poster
2020
	VPN: Learning Video-Pose Embedding for Activities of Daily Living Srijan Das, Saurav Sharma, Rui Dai, Francois Bremond, Monique Thonnat. ECCV 2020 Code
	Looking deeper into Time for Activities of Daily Living Recognition Srijan Das, Monique Tonnat and Francois Bremond. WACV 2020
2019
	Toyota Smarthome: Real World Activities of Daily Living. Srijan Das, Rui Dai, Michal Koperski, Luca Minciullo, Lorenzo Garattoni, Francois Bremond and Gianpiero Francesca. ICCV 2019 Project Link / Code
	Where to focus on for Human Action Recognition? Srijan Das, Arpit Chaudhary, Francois Bremond and Monique Thonnat. WACV 2019

Datasets

Talks

May 2025 Invited Talk on "LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living" in the first special WACV 2025 Meetup Series.
May 2025 Invited Academic Talk on "Improved Reasoning in AI Models for Deepfake Detection" in Martigny Biometrics Workshop co-organised by the European Association for Biometrics (EAB), the Center for Identification Technology Research (CITeR) and the Idiap Research Institute at Idiap in Martigny, Switzerland.
May 2025 Invited research poster presentation at the Computing Community Consortium (CCC) Computing Futures Symposium in Washington, DC, USA.
Mar 2025 Guest Lecture on "Deep Neural Networks" at The University of Michigan-Dearborn.
Mar 2024 Talk on "Computer Vision Projects in CharMLab" in a RoundTable discussion on AI in conjunction with the Defense Alliance of NC (DANC) and the Michael Best Law Firm.
Feb 2024 Invited Online Tech Talk on "From Pixels to Robots: Recipes for Vision-Enabled Robot Learning" at Christ University, Bangalore, India.
Dec 2023 Invited Talk on "Video Understanding using AI" as part of the "AI and ROS for Robotics: Theory and Practice" short-term training program at IIITDM.
Jun 2023 Invited Talk on "Computer Vision for Robot Learning" as part of the "AI and Machine Vision for Robotics" short-term training program at IIITDM. (Virtually)
Apr 2023 Talk on "From Few to More: Enhancing ViT Performance on Limited Data" at PHPC Lab in UNC Charlotte.
Mar 2023 Talk on "From Pixels to Robots: Recipes for Vision-Enabled Robot Learning" at the Seminar on Controls and Robotics in UNC Charlotte.
Jan 2023 Talk on "Quo vadis, computer vision!" at the PhD seminar in UNC Charlotte.
Mar 2022 Invited Talk in AICTE sponsored Short Term Course on "Multiple Modalities are all you need for Video Understanding!" at IIITDM Kancheepuram. (Virtually)
Sep 2021 Talk on "Vision for understanding Activities of Daily Living" at SciTech Talks . [video]
Apr 2021 Seminar talk on "How to combine modalities for understanding Activities of Daily Living? " for CSE 600 at Stony Brook University, NY, USA.
Nov 2020 Seminar talk on "How to combine RGB & Poses for understanding Activities of Daily Living?" at Université Lumière Lyon 2.
Nov 2019 Nice Data Science meetup . [slides]
Aug 2018 Summer School Brain Innovation Generation @ UCA . [slides]

Teaching

Spring 2025, Fall 2025 ITCS 4152 Introduction to Computer Vision
Fall 2024 ITCS 6010/8010 Advanced Computer Vision (Topics Course)
Fall 2022, Spring 2023, Fall 2023, Spring 2024 ITCS 4152/5152 Computer Vision
Aug 2021 Surviving the Deep Learning Apocalypse (SKFGI Webinar series 2020)

Video Understanding: How to model Time? [video]
Tips to attend attention! [video]

Jan 2021 Deep Learning Winter School for Computer Vision 2019-20

Introduction to video classification & RNN. [slides] [assignment1] [Practical 1]
Action Classification in videos [slides]
Attention Mechanisms for video analytics [slides] [assignment2] [Practical 2]

Academic Activities

Area Chair for NeurIPS 2025.
Program committee member of AAAI-24 Student Program.
Associate Editor for ICRA 2024.
Member of DEI committe for CVPR 2023.
Senior Program Committee Member for AAAI 2023 and AAAI 2024.
Session chair for Image Understanding & Activity Recognition session at IPAS 2020.
Mentored for B.E.N.J.I. in GirlScript Summer of Code 2019 edition.
Mentor for the Emerging Technology Business Incubator (ETBI) Led by NIT Rourkela, a platform envisaged to transform the start-up ecosystem of the region.
Reviewer at ICACIE 2017, 2018, SETIT 2018, KCST 2019, ICAML 2019, AVSS 2019, 2022, WACV 2020, 2021, 2022, CVPR 2021, 2022, 2023, 2024, 2025, ECCV 2022, 2024, ICCV 2021, 2023, 2025, AAAI 2023, NeurIPS 2023, IROS 2021, 2024.
Reviewer at TPAMI, Pattern Recognition, Elsevier Journal of CVIU, Elsevier Journal of FGCS, Elsevier Journal of Computer & Electrical Engineering, MTAP, and Journal of Signal Processing: Image Communication.
Volunteer at ICACNI 2014, ICACNI 2016, ICCV 2019, ICLR 2020 & ICML 2020.