Grand Challenges

Accepted Grand Challenges

Below is the current list of accepted grand challenges for ACM Multimedia 2026.

01

MAC’26: The 3rd Micro-Action Analysis Grand Challenge

MAC 2026 aims to advance fine-grained whole-body micro-action understanding for deep psychological and emotional state analysis across diverse scenarios, subtle motion scales, and complex behavioral contexts.

02

AVI Challenge 2026: Assessing True Personality Traits and Cognitive Ability from Asynchronous Video Interviews (AVIs)

The AVI Challenge 2026 invites researchers to develop state-of-the-art methods for multimodal (audio, visual, verbal) algorithmic assessment of true personality traits and cognitive ability.

03

Identity-Preserving Video Generation Challenge

The IPVG grand challenge includes two tracks: Facial Identity-Preserving Video Generation and Sequential Action Identity-Preserving Video Generation. In general, the goal of this grand challenge is two-fold: (a) coalescing community effort around new challenging identity-preserving video generation datasets, and (b) offering a fertile ground for designing controllable generative models to facilitate precise identity binding in video generation, aiming to propel the field toward more accountable and user-steerable video synthesis systems.

04

POLY-SIM: POLYglot Speaker Identification with Missing Modality Grand Challenge 2026

The POLYSIM 2026 Challenge tasks participants with building speaker identification systems that remain accurate across multiple languages even when facial or vocal data is missing.

05

Enhancing Dynamic Point Clouds in the Wild: A Grand Challenge on Real-World 4D Volumetric Data (UVG-CWI-DQPC)

The Point Cloud Enhancement Grand Challenge challenges researchers to push the limits of dynamic 4D point cloud enhancement using the UVG-CWI-DQPC dataset, featuring real-world sequences with paired low-quality inputs and high-fidelity ground truth.

06

MER 2026: Interlocutor Emotion, Fine-Grained Emotion, Emotion Preference, and Emotion Recognition from Physiological Signals

MER 2026 centers on advancing human emotion understanding by shifting from coarse-grained to fine-grained emotional analysis, and from categorical labels to more nuanced, descriptive expressions.

07

NeuroMM 2026: Interictal Epileptiform Discharge Detection and Localization in Multimodal Neuro-Signals

The NeuroMM 2026 challenge invites researchers to advance trustworthy clinical AI by tackling the detection and localization of Interictal Epileptiform Discharges through a novel framework of multimodal neurophysiological and behavioral signals.

08

GenText-Forensics: Challenge on Explainable Forensics and Adversarial Generation for Text-Centric Images

GenText-Forensics Challenge establishes a unified benchmark for text-centric multimedia forensics, bridging defense and attack research using the large-scale, multilingual RealText-V2 dataset. The challenge focuses on two key areas: generating structured, explainable reports to improve forensic analysis, and developing high-quality, multilingual AIGC editing models to reveal security vulnerabilities. By combining these opposing approaches, we aim to enhance model interpretability and promote the development of robust, real-world digital media security systems.

09

SMP Challenge - Social Media Prediction Challenge 2026

SMP Challenge is an annual challenge that seeks excellent research teams on new ways of forecasting problems and meaningfully improving people’s social lives and business scenarios. The enormous amounts of online content lead to overconsumption, online word-of-mouth helps us to efficiently discover interesting news, emerging topics, the latest stories, or amazing products from the information ocean. Therefore, predicting online popularity became an emerging and significant task for online media, brand marketing, social influencers, or our individuals. It is central to various scenarios, such as online advertising, social recommendation, demand forecasting, etc.

10

(AAAD Challenge 2026) Adversarial Attacks on Deepfake Detection: Generated Media in Real-World Scenarios

The AADD 2026 Challenge advances research on adversarial attacks that deceive detectors while remaining effective under realistic social-media compression and post-processing conditions.

11

EgoLink: Egocentric Language-Vision Interactive Network Knowledge Challenge

The EgoLink Challenge is a egocentric benchmark that evaluates embodied AI's social intelligence through multi-dimensional MCQs assessing emotion understanding, causal reasoning, and behavioral intent prediction in real-world human interactions.

12

MPDD-AVG: Multimodal Personality-Aware Depression Detection via Audio-Visual Interview and Gait Analysis

MPDD-AVG challenge advances personalized depression detection by integrating audio-visual, gait, and personality trait across young and elderly populations.

13

Single-Image Guided Multi-Angle Image Synthesis

The Single-Image Guided Multi-Angle Image Synthesis Grand Challenge establishes a new benchmark for single-frame driven novel view generation by evaluating models across spatial consistency, content alignment, pose compliance, and visual quality to enhance controllability in AIGC video production and accelerate industrial deployment of multi-angle synthesis technologies.

14

The Third Edition of Large Vision – Language Model Learning and Applications Grand Challenge (LAVA Challenge)

The primary goal of LAVA challenge is to advance the capability of Large Vision-Language Models to accurately interpret and understand complex visual data documents

15

AdoDAS: A Privacy-Preserving Multimodal Challenge for Adolescent Depression, Anxiety, and Stress Assessment

The AdoDAS is a privacy-preserving multimodal challenge for adolescent depression, anxiety, and stress assessment. It combines a standardized reading passage with open-ended interview prompts and provides labels from DASS-21, including both subscale scores and item-level responses.

16

Deepfake Explainability Challenge

Deepfake detection is increasingly moving beyond simple face swaps toward rich semantic manipulations. This challenge focuses not only on identifying manipulated content but also on predicting explanations tailored to both technical and novice users, enabling models to justify their decisions in an accessible manner. There are two tasks - a. Deepfake Detection and b. Deepfake Explainability.

17

Machine-oriented Visual Media Quality Assessment

With the development of Embodied AI, machines have replaced humans as the main consumers of visual media, yet existing Image Quality Assessment (IQA) metrics remain focused on human and overlooked machine task utility. To address this gap, we introduce the Machine-oriented Image Quality Assessment (MoIQA) Challenge, emphasizing simulation comprehension and Real-world exectution. This challenge aims to advance reliable image understanding for machine preference and support robust Embodied AI applications.

18

REACT 2026 Challenge: The Fourth Personalised Multiple Appropriate Facial Reaction Generation in Dyadic Interactions

Building on the REACT 2023–2025 challenge series—including the large-scale MARS dataset of 137 dyadic interactions (3,105 sessions across five topics)—we propose the REACT 2026 challenge to advance one-to-many personalised Multiple Appropriate Facial Reaction Generation (MAFRG) by leveraging behavioural, affective, Big-Five personality, and EEG signals to develop and benchmark ML models capable of producing diverse, realistic, synchronised, and contextually appropriate human-style listener facial reactions to a given speaker behaviour.

19

AT-ADD: The Grand Challenge on All-Type Audio Deepfake Detection

The AT-ADD Grand Challenge aims to advance robust speech deepfake countermeasures and all-type audio deepfake detection across diverse real-world conditions, unseen generation methods, and multiple audio types (speech, sound, singing, and music).

20

2nd CASTLE Multimodal Analytics Challenge

The second CASTLE challenge aims to advance the area of multimodal ego- and exo-centric video understanding by offering tasks on object instance search, event instance search, and open question answering on the CASTLE 2024 Dataset.

21

PCBA Standard-to-Real Challenge: Cross-domain VQA for Real-world Manufacturing Inspection

The PCBA Standard to Real-World Challenge benchmarks a multimodal model on cross-domain PCBA inspection VQA, providing interpretable and actionable answers for defect identification, quantitative reasoning, and defect cause/treatment.

22

TRIDENT: Tri-modal Deepfake Perception, Detection, and Hallucination Grand Challenge

The TRIDENT Grand Challenge establishes a new standard for tri-modal explainable forensics by evaluating models across the interdependent dimensions of perception, detection, and hallucination to ensure logical consistency in deepfake identification.

23

MultiMediate: Multimodal Behaviour Analysis for Artificial Mediation

MultiMediate'26 aims to facilitate human engagement estimation across a wide variety of cultural backgrounds (9 different languages), social situations (dyadic novice-expert interactions, group discussions, child-child and child-robot play), age groups (adults, children), and notions of engagement (continuous and categorical).

24

AffectiveArt Challenge 2026: Fine-Grained Emotion Understanding and Generation in Artistic Images

Despite rapid progress in AIGC, models still struggle to capture the emotional language of art—EmoArt 2026 Challenge invites researchers to advance emotion-aware artistic generation and fine-grained art emotion understanding.

25

Multimodal Brain-Computer Interface Grand Challenge: EEG-fNIRS based Handwriting-Trajectory Classification

Multimodal Brain-Computer Interface Grand Challenge focuses on classifying imagined logographic handwriting trajectories from synchronized EEG and fNIRS signals, providing a standardized multimodal benchmark with hidden-test evaluation to advance reproducible brain-computer interface research.