Srijan Das

I am an Assistant Professor in the Department of Computer Science at the University of North Carolina at Charlotte. At UNC Charlotte, I am working on Video Representation Learning, and Robotic Vision.

Before this, I was a Postdoctoral Associate at Stony Brook University under the supervision of Michael Ryoo. In 2020, I completed my Ph.D. in Computer Science at INRIA, Sophia Antipolis, France under the supervision of Francois Bremond and Monique Thonnat. My Ph.D. thesis is on ¨Spatio-temporal attention mechanisms for Action Recognition¨ and click here to watch my Defense Presentation. I did my Post-Grad in Computer Science from the National Institute of Technology (NIT), Rourkela.

Email  /  CV  /  Bio  /  Google Scholar  /  Twitter  /  Github

profile photo

I am interested in computer vision, machine learning, deep learning, and image processing. I have been mostly working on video representation learning including spatio-temporal attention mechanisms, cross-modal attention mechanisms, cross-modal knowledge distillation, and self-supervised learning for applications like action classification in trimmed videos, temporal action detection in untrimmed videos, video retrieval, anomaly detection, and deepfake detection.

[Hiring] I am actively looking for motivated graduate students who can conduct research with me in UNC Charlotte. If your research interest aligns with mine, please apply to the UNC Charlotte graduate application for a PhD and mention my name in your application. Please feel free to also reach out to me if you are really interested in working on this area!


  • Sep 2022 - Two papers accepted to NeurIPS 2022.
  • Aug 2022 - One paper accepted to WACV 2023 (first round).
  • Aug 2022 - Joined UNC Charlotte as an Asst. Professor.
  • Jul 2022 - We are organizing a workshop "Artificial Intelligence for Automated Human Health-care and Monitoring" at IEEE FG 2023.
  • Jul 2022 - Serving as Senior Program Committee Member for AAAI 2023.
  • Jun 2022 - Secured 2nd-place in Ego4D challenge under the Long-Term Anticipation Track at CVPR 2022. [Report][Code]
  • Apr 2022 - I will join UNC Charlotte as an Assistant Professor in August 2022!
  • Apr 2022 - Toyota Smarthome Untrimmed (TSU) has been accepted to TPAMI.
  • Mar 2022 - One Paper accepted to CVPR 2022.
  • Dec 2021 - We have organized a special session on Applications in Healthcare and Health Monitoring at the 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG'21).
  • Dec 2021 - Two papers on DeepFake detection and Video Anomaly detection at FG 2021.
  • Nov 2021 - One paper on Video Anomaly Detection is presented in AVSS 2021.
  • Nov 2021 - One paper accepted to TPAMI.
  • Oct 2021 - One paper accepted to BMVC 2021 as oral presentation.
  • Jul 2021 - One Paper accepted at ICCV 2021.
  • Apr 2021 - Joined Stony Brook University as a Postdoctoral Associate.
  • Nov 2020 - Session chair for Image Understanding & Activity Recognition session at IPAS 2020.

Selected Publications (For full list of papers, visit my Google Scholar.)

Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?
Xiang Li, Jinghuan Shang, Srijan Das, Michael S. Ryoo.
NeurIPS 2022

The impacts of the existing self-supervised losses with Joint Learning framework for RL is limited, while there is no golden method that can dominate all tasks.

Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space
Jinghuan Shang, Srijan Das, Michael S. Ryoo.
NeurIPS 2022
arXiv / Project Page

3DTRL is a light-weighted, plug-and play layer that recovers 3D information of visual tokens and leverages it for learning viewpoint-agnostic representations.

Toyota Smarthome Untrimmed: Real-World Untrimmed Videos for Activity Detection
Rui Dai, Srijan Das, Saurav Sharma, Luca Minciullo, Lorenzo Garattoni, François Brémond, Gianpiero Francesca.

Project Link / Code

TSU is a new untrimmed daily-living dataset consisting of 51 activities performed in a spontaneous manner, captured from non-optimal viewpoints.

CD-Net: Histopathology Representation Learning using Pyramidal Context-Detail Network
Saarthak Kapse, Srijan Das, Prateek Prasanna.
Arxiv Pre-print, March 2022
arXiv / code

A transformer based Pyramidal Context-Detail Network that leverages complementary information from multiple resolutions in Whole slide images.

ViewCLR: Learning Self-supervised Video Representation for Unseen Viewpoints
Srijan Das, and Michael S. Ryoo.
WACV 2023

A framework for learning self-supervised video representation that is invariant to unseen camera viewpoints.

STC-mix: Space, Time, Channel mixing for Self-supervised Video Representation
Srijan Das and Michael S. Ryoo.
Arxiv Pre-print, December 2021

This paper focuses on designing video augmentation for self-supervised learning, we analyze the best strategy to mix videos to create a new augmented video sample. We also propose CMMC to make use of other modalities in videos for data mixing.

MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
Rui Dai, Srijan Das, Kumara Kahatapitiya, Michael S. Ryoo, Francois Bremond.
CVPR 2022
arXiv / code

A ConvTransformer network that explores global and local temporal relations at multiple resolutions.

VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living
Srijan Das, Rui Dai, Di Yang, Francois Bremond,
TPAMI, 2021
arXiv / code

VPN++ is an extension of our VPN model (ECCV 2020). VPN++ hallucinates pose driven features while not requiring costly 3D Poses at inference.

CTRN: Class Temporal Relational Network for Action Detection
Rui Dai, Srijan Das, Francois Bremond.
BMVC 2021, Oral
Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection
Rui Dai, Srijan Das, Francois Bremond.
ICCV 2021
PDAN: Pyramid Dilated Attention Network for Action Detection.
Rui Dai, Srijan Das, Luca Minciullo, Lorenzo Garattoni, Gianpiero Francesca and Francois Bremond.
WACV 2021
Code / Video / Poster
VPN: Learning Video-Pose Embedding for Activities of Daily Living
Srijan Das, Saurav Sharma, Rui Dai, Francois Bremond, Monique Thonnat.
ECCV 2020

Looking deeper into Time for Activities of Daily Living Recognition
Srijan Das, Monique Tonnat and Francois Bremond.
WACV 2020
Toyota Smarthome: Real World Activities of Daily Living.
Srijan Das, Rui Dai, Michal Koperski, Luca Minciullo, Lorenzo Garattoni, Francois Bremond and Gianpiero Francesca.
ICCV 2019

Project Link / Code
Where to focus on for Human Action Recognition?
Srijan Das, Arpit Chaudhary, Francois Bremond and Monique Thonnat.
WACV 2019
Lab members

  • Dominick Reilly (Graduate at UNCC)
  • Ian Boyle (undergraduate at UNCC)
  • Jacob Nielsen (jointly supervising Master Thesis with Prof. Aritra Dutta at SDU)
  • Tanmay Jain (undergraduate intern from DTU, India)
  • Shyam Marjit (undergraduate intern from IIITG, India)
  • Soumyajit Karmakar(undergraduate intern from IIITG, India)


  • Mar 2022    Invited Talk in AICTE sponsored Short Term Course on "Multiple Modalities are all you need for Video Understanding!" at IIITDM Kancheepuram. (Virtually)
  • Sep 2021    Talk on "Vision for understanding Activities of Daily Living" at SciTech Talks . [video]
  • Apr 2021    Seminar talk on "How to combine modalities for understanding Activities of Daily Living? " for CSE 600 at Stony Brook University, NY, USA.
  • Nov 2020    Seminar talk on "How to combine RGB & Poses for understanding Activities of Daily Living?" at Université Lumière Lyon 2.
  • Nov 2019    Nice Data Science meetup . [slides]
  • Aug 2018    Summer School Brain Innovation Generation @ UCA . [slides]


Academic Activities

  • Session chair for Image Understanding & Activity Recognition session at IPAS 2020.
  • Mentored for B.E.N.J.I. in GirlScript Summer of Code 2019 edition.
  • Mentor for the Emerging Technology Business Incubator (ETBI) Led by NIT Rourkela, a platform envisaged to transform the start-up ecosystem of the region.
  • Reviewer at ICACIE 2017, 2018, SETIT 2018, KCST 2019, ICAML 2019, AVSS 2019, WACV 2020, 2021, 2022, CVPR 2021, 2022, ICCV 2021, IROS 2021.
  • Reviewer at TPAMI, Patter Recognition, Elsevier Journal of CVIU, Elsevier Journal of FGCS, Elsevier Journal of Computer & Electrical Engineering, MTAP, and Journal of Signal Processing: Image Communication.
  • Volunteer at ICACNI 2014, ICACNI 2016, ICCV 2019, ICLR 2020 & ICML 2020.

Thanks to Jon Barron for the theme.