The Science Behind NeuroPeer
How we bridge Meta's TRIBE v2 brain encoding model to neuromarketing — predicting how brains respond to your content without a single participant.
Part I: TRIBE v2
The Foundation Model · Meta FAIR · March 2026
TRIBE v2(TRansformer for In-silico Brain Experiments v2) is a tri-modal foundation model published by Stéphane d'Ascoli, Jérémy Rapin, and colleagues at Meta FAIR. It predicts human brain activity — specifically, fMRI BOLD signals across the entire cortical surface — from video, audio, and language stimuli.
The key insight: deep neural networks and the primate brain share representational structure. TRIBE v2 doesn't learn perception from scratch — it leverages representational alignment between state-of-the-art foundation models and the brain, using three frozen feature extractors as its sensory front-end.
Architecture
V-JEPA2-Giant
Video encoder. Processes 64-frame segments (4s). Spatiotemporal visual features from a ViT-Giant backbone.
Wav2Vec-BERT 2.0
Audio encoder. Resampled to 2 Hz. Captures acoustic features, speech prosody, and music.
LLaMA 3.2-3B
Text encoder. 1,024-word context mapped to 2 Hz. Contextualized language embeddings.
The three embedding streams are compressed to D=384 each, concatenated into D_model=1152, and fed into a Transformer encoder with 8 layers and 8 attention heads over a 100-second temporal window.
Outputs are decimated from 2 Hz to 1 Hz (matching fMRI acquisition rate) and projected through a Subject Block to 20,484 cortical vertices on the fsaverage5 mesh plus 8,802 subcortical voxels. Each vertex represents the predicted z-scored BOLD signal at a specific point on the brain surface.
Training & Performance
Trained on 451.6 hours of fMRI from 25 subjects across 4 naturalistic studies — subjects watching movies, listening to podcasts, and viewing silent videos. Evaluated on 1,117.7 hours from 720 subjects.
Zero-shot generalization to new subjects achieves group correlation near r ≈ 0.4 on HCP 7T — a two-fold improvement over individual subjects' group-predictivity. Fine-tuning with ≤1 hour of data yields 2–4x improvement over linear baselines. The model follows log-linear scaling with no plateau.
The predecessor TRIBE v1 won the Algonauts 2025 competition, placing first among 263 teams. TRIBE v2 extends this with tri-modal input and zero-shot subject generalization.
In-Silico Validation
The model recovers classic neuroscience landmarks with no explicit supervision:
Fusiform Face Area (FFA)
Face-selective region — lights up for faces vs. objects
Parahippocampal Place Area (PPA)
Scene-selective region — responds to places and environments
Temporo-Parietal Junction (TPJ)
Theory of mind — social cognition and emotional processing
Broca's Area
Syntax processing — activated by linguistic structure
ICA on the final Transformer layer reveals five emergent functional networks: primary auditory, language, motion, default mode, and visual— mirroring the brain's own functional architecture without being trained to produce it.
Part II: The Bridge to Neuromarketing
NeuroPeer's Application Layer
TRIBE v2 provides the cortical prediction substrate. Everything below is NeuroPeer's downstream interpretation — mapping raw neural activations to marketing-relevant metrics.
The global neuromarketing market reached $1.74B in 2024 with 9.2% CAGR through 2032. Traditional methods — EEG headsets, eye-trackers, GSR sensors — require physical participants, specialized labs, and weeks of lead time. NeuroPeer replaces all of this with a single API call.
The scientific basis: small-sample brain activity reliably predicts population-level outcomes. Falk et al. (2012) showed that neural responses from just 30 people predict campaign success at scale. Genevsky et al. (2025) demonstrated that NAcc-based affect signals generalize across demographics while behavioral self-reports do not. TRIBE v2's in-silico brain serves as an idealized “neural focus group” — free from the noise of real scanners, wandering thoughts, and individual variability.
The Metric Pipeline
TRIBE v2 outputs 20,484 vertex activations per second. NeuroPeer aggregates these into interpretable neuromarketing metrics through three stages:
Raw vertices are grouped into anatomical regions using the Schaefer-1000 atlas on the fsaverage5 surface. This maps 20,484 vertices onto ~1,000 cortical parcels with known functional labels (visual, auditory, default mode, frontoparietal, etc.).
Parcels are grouped into functional regions of interest (ROIs) based on published neuroscience: Ventral Striatum for reward, Amygdala for emotion, Hippocampus for memory, Dorsal Attention Network for sustained focus, Default Mode Network for disengagement, etc. Mean activation per ROI is computed per second.
ROI time-series are transformed into 18+ marketing-aligned metrics using temporal windowing, peak detection, and cross-regional correlation. For example: Hook Score = NAcc activation in the first 3 seconds; Memory Encoding = hippocampal peak near brand moments; Mind Wandering = default mode network activation during content.
Brain Region → Metric Mapping
NeuroPeer's application-layer interpretation · not from the TRIBE v2 paper
Ventral Striatum (NAcc)
Hook Score, Reward Prediction — Approach motivation and reward anticipation. Activation in the first 3 seconds predicts scroll-stop behavior.
Tong et al. 2020, PNAS
Anterior Insula (AIns)
Valence Detection, Avoidance Signal — Negative affect and avoidance drive. Inverse signal — high AIns = the viewer wants to leave.
Genevsky et al. 2025, PNAS Nexus
Dorsal Attention Network
Sustained Attention — Top-down attentional control. Tracks whether the viewer is actively engaged or passively watching.
Hasson et al. 2004, Science
Hippocampal Formation
Memory Encoding — Long-term memory consolidation. Predicts brand recall 24–72 hours post-exposure.
Wagner et al. 1998, Science
Default Mode Network
Mind Wandering (inverse) — Self-referential thought and disengagement. High DMN during content = the viewer has checked out.
Christoff et al. 2009, PNAS
Amygdala + Limbic System
Emotional Arousal — Affective intensity regardless of valence. Emotional peaks enhance memory consolidation and sharing.
Nummenmaa et al. 2012, PNAS
Broca's + Wernicke's Areas
Message Clarity — Language comprehension load. High activation with low cognitive load = clear message delivery.
Hickok & Poeppel 2007, Nat Rev Neuro
mPFC + Orbitofrontal Cortex
Aesthetic Quality — Beauty judgment and subjective value computation. Correlates with perceived production quality.
Vartanian & Skov 2014, Neuropsychologia
Temporo-Parietal Junction
Re-engagement, Social Cognition — Theory of mind and attentional reorienting. Fires at narrative twists, pattern interrupts, and social cues.
Corbetta & Shulman 2002, Nat Rev Neuro
Part III: The Mission
Why This Matters
Every piece of content is a neural experiment. Most marketers run it blind.
A 30-second pre-roll ad triggers millions of neural computations — visual salience in V1, face recognition in the fusiform gyrus, reward anticipation in the nucleus accumbens, narrative comprehension in the temporal pole, memory encoding in the hippocampus. Each of these happens in the first few seconds, before any click, like, or comment.
Traditional A/B testing measures the output (did they click?) but not the process(why did their brain decide to click?). NeuroPeer provides the process. When your Hook Score is 42 but your Emotional Resonance is 88, you know the problem isn't the content — it's the first 3 seconds. That's a fundamentally different insight than “CTR was low.”
The goal is not to replace human creativity. It's to give creators a neural mirror — a way to see how brains will respond before a single viewer watches. Iterate on the neural response, not the engagement metrics. The engagement follows.
What NeuroPeer Does NOT Do
We do not claim to read individual minds. TRIBE v2 predicts population-average neural responses — an idealized brain, not your brain.
We do not replace real neuroscience research. The metric mappings are grounded in published literature but are interpretive, not diagnostic.
We do not guarantee marketing outcomes. Neural predictions correlate with engagement but are one signal among many — creative strategy, distribution, timing, and audience all matter.
TRIBE v2 is licensed CC BY-NC 4.0. Commercial use of the underlying model requires a licensing agreement with Meta FAIR.
Part IV: Research Foundation
The Papers Behind the Metrics
d'Ascoli, Rapin et al. (2026) — Meta FAIR
“TRIBE v2 predicts fMRI across video/audio/text with zero-shot generalization. 451.6h training data, 20,484 cortical vertices, 1 Hz resolution.”
→ Core inference engine powering all NeuroPeer predictions.
Tong, Mondloch, Bhatt et al. (2020) — PNAS
“NAcc + anterior insula activation at video onset predicts YouTube view frequency at population scale.”
→ Hook Score — first 3 seconds predict scroll-stop behavior.
Genevsky, Yoon & Knutson (2025) — PNAS Nexus
“NAcc-based affect signals generalize across demographics; behavioral self-reports do not.”
→ Neural signals outperform surveys — affect is universal.
Chan, Hiaeshutter-Rice et al. (2024) — Journal of Marketing Research
“Emotion and memory encoding are the earliest neural predictors of ad liking, preceding behavioral responses.”
→ Emotional Resonance and Memory Encoding weighted heavily in the first 10 seconds.
Falk, Berkman & Lieberman (2012) — Psychological Science
“Small-sample brain activity (n=30) predicts population-level media sharing and campaign success.”
→ Validates in-silico prediction of population-level engagement from neural data.
Hasson, Nir, Levy et al. (2004) — Science 303(5664)
“Intersubject synchronization of cortical activity during natural vision correlates with narrative engagement.”
→ Sustained Attention metric — cortical synchrony over time.
Itti & Koch (2001) — Nature Reviews Neuroscience
“Computational saliency maps predict visual attention allocation and fixation patterns.”
→ Visual Hook Strength — saliency-driven attention in first frames.
Wagner, Schacter et al. (1998) — Science 281(5380)
“Hippocampal activity during encoding predicts subsequent memory — the 'subsequent memory effect'.”
→ Memory Encoding metric predicts brand recall.
Nummenmaa, Glerean et al. (2012) — PNAS 109(23)
“Emotions promote social interaction by synchronizing brain activity across individuals.”
→ Emotional Resonance — synchronized limbic activation predicts sharing.
Christoff, Gordon et al. (2009) — PNAS 106(21)
“Default mode network activation during task indicates mind wandering.”
→ Drop-off Risk — DMN activation signals disengagement.
Ready to see what your content does to a brain?
Run a Neural Analysis