Publications
Search
1.
Jin, Zhangyu; Feng, Andrew; Chemburkar, Ankur; Melo, Celso M. De
PromptGAR: Flexible Promptive Group Activity Recognition Miscellaneous
2025, (arXiv:2503.08933 [cs]).
@misc{jin_promptgar_2025,
title = {PromptGAR: Flexible Promptive Group Activity Recognition},
author = {Zhangyu Jin and Andrew Feng and Ankur Chemburkar and Celso M. De Melo},
url = {http://arxiv.org/abs/2503.08933},
doi = {10.48550/arXiv.2503.08933},
year = {2025},
date = {2025-03-01},
urldate = {2025-03-20},
publisher = {arXiv},
abstract = {We present PromptGAR, a novel framework that addresses the limitations of current Group Activity Recognition (GAR) approaches by leveraging multi-modal prompts to achieve both input flexibility and high recognition accuracy. The existing approaches suffer from limited real-world applicability due to their reliance on full prompt annotations, the lack of long-term actor consistency, and under-exploration of multi-group scenarios. To bridge the gap, we proposed PromptGAR, which is the first GAR model to provide input flexibility across prompts, frames, and instances without the need for retraining. Specifically, we unify bounding boxes, skeletal keypoints, and areas as point prompts and employ a recognition decoder for cross-updating class and prompt tokens. To ensure long-term consistency for extended activity durations, we also introduce a relative instance attention mechanism that directly encodes instance IDs. Finally, PromptGAR explores the use of area prompts to enable the selective recognition of the particular group activity within videos that contain multiple concurrent groups. Comprehensive evaluations demonstrate that PromptGAR achieves competitive performances both on full prompts and diverse prompt inputs, establishing its effectiveness on input flexibility and generalization ability for real-world applications.},
note = {arXiv:2503.08933 [cs]},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
We present PromptGAR, a novel framework that addresses the limitations of current Group Activity Recognition (GAR) approaches by leveraging multi-modal prompts to achieve both input flexibility and high recognition accuracy. The existing approaches suffer from limited real-world applicability due to their reliance on full prompt annotations, the lack of long-term actor consistency, and under-exploration of multi-group scenarios. To bridge the gap, we proposed PromptGAR, which is the first GAR model to provide input flexibility across prompts, frames, and instances without the need for retraining. Specifically, we unify bounding boxes, skeletal keypoints, and areas as point prompts and employ a recognition decoder for cross-updating class and prompt tokens. To ensure long-term consistency for extended activity durations, we also introduce a relative instance attention mechanism that directly encodes instance IDs. Finally, PromptGAR explores the use of area prompts to enable the selective recognition of the particular group activity within videos that contain multiple concurrent groups. Comprehensive evaluations demonstrate that PromptGAR achieves competitive performances both on full prompts and diverse prompt inputs, establishing its effectiveness on input flexibility and generalization ability for real-world applications.
Filter
2025
Jin, Zhangyu; Feng, Andrew; Chemburkar, Ankur; Melo, Celso M. De
PromptGAR: Flexible Promptive Group Activity Recognition Miscellaneous
2025, (arXiv:2503.08933 [cs]).
Abstract | Links | BibTeX | Tags: Computer Science - Computer Vision and Pattern Recognition
@misc{jin_promptgar_2025,
title = {PromptGAR: Flexible Promptive Group Activity Recognition},
author = {Zhangyu Jin and Andrew Feng and Ankur Chemburkar and Celso M. De Melo},
url = {http://arxiv.org/abs/2503.08933},
doi = {10.48550/arXiv.2503.08933},
year = {2025},
date = {2025-03-01},
urldate = {2025-03-20},
publisher = {arXiv},
abstract = {We present PromptGAR, a novel framework that addresses the limitations of current Group Activity Recognition (GAR) approaches by leveraging multi-modal prompts to achieve both input flexibility and high recognition accuracy. The existing approaches suffer from limited real-world applicability due to their reliance on full prompt annotations, the lack of long-term actor consistency, and under-exploration of multi-group scenarios. To bridge the gap, we proposed PromptGAR, which is the first GAR model to provide input flexibility across prompts, frames, and instances without the need for retraining. Specifically, we unify bounding boxes, skeletal keypoints, and areas as point prompts and employ a recognition decoder for cross-updating class and prompt tokens. To ensure long-term consistency for extended activity durations, we also introduce a relative instance attention mechanism that directly encodes instance IDs. Finally, PromptGAR explores the use of area prompts to enable the selective recognition of the particular group activity within videos that contain multiple concurrent groups. Comprehensive evaluations demonstrate that PromptGAR achieves competitive performances both on full prompts and diverse prompt inputs, establishing its effectiveness on input flexibility and generalization ability for real-world applications.},
note = {arXiv:2503.08933 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
pubstate = {published},
tppubtype = {misc}
}
We present PromptGAR, a novel framework that addresses the limitations of current Group Activity Recognition (GAR) approaches by leveraging multi-modal prompts to achieve both input flexibility and high recognition accuracy. The existing approaches suffer from limited real-world applicability due to their reliance on full prompt annotations, the lack of long-term actor consistency, and under-exploration of multi-group scenarios. To bridge the gap, we proposed PromptGAR, which is the first GAR model to provide input flexibility across prompts, frames, and instances without the need for retraining. Specifically, we unify bounding boxes, skeletal keypoints, and areas as point prompts and employ a recognition decoder for cross-updating class and prompt tokens. To ensure long-term consistency for extended activity durations, we also introduce a relative instance attention mechanism that directly encodes instance IDs. Finally, PromptGAR explores the use of area prompts to enable the selective recognition of the particular group activity within videos that contain multiple concurrent groups. Comprehensive evaluations demonstrate that PromptGAR achieves competitive performances both on full prompts and diverse prompt inputs, establishing its effectiveness on input flexibility and generalization ability for real-world applications.