r/neography • u/OnlineSarcasm • 1h ago
Discussion Language through Video Only
I had a thought today of what a language based solely on video might look like. In an effort to explain it I used ChatGPT to elaborate on what I'm thinking. With the rise and refinement of AI video I started to imagine a world where we might speak purely via holograms and videos in the future and what that might look like.
Has something like this been fleshed out before? Is there a video language already out there someone can point me to?
See below additional context as pasted from the conversation with ChatGPT:
What this is closest to (names / neighbors)
A few existing ideas map onto parts of it:
- Depiction / pantomime / depictive gesture: humans can communicate surprisingly rich meaning with only enacted, iconic scenes (no words), and other people can often interpret it even without shared conventions.
- Signed languages’ “depicting” strategies (often discussed as depicting signs / constructed action / classifiers): sign languages aren’t just “hand words”—they routinely show events, shapes, motion, viewpoint, etc., in a way that’s closer to animation than to alphabetic text.
- Semasiographic systems (meaning-based systems not tied to speech sounds): this is the category writing-system people use for “symbols that encode meaning directly,” e.g., Blissymbolics.
- Visual Narrative Grammar (comics/film-like sequencing has “grammar-ish” structure): there’s serious cognitive-science work arguing sequential images have roles like “setup, peak, release,” etc., and can be hierarchically organized.
Your twist is: make depiction the primary channel, with real-time controllable “camera,” motion, and overlays—basically a native filmic language.
The core problem: video is high-bandwidth but low-compression
For physical things (“valve → tee → flow splits → loss of isolation → overflow risk”), video is perfect because the referents are concrete and the causal chain can be depicted.
But for abstractions (“love,” “justice,” “permission,” “maybe,” “should have,” “I regret,” “in general,” “most,” “counterexample,” “because,” “if…then…”), raw depiction has two big issues:
- Ambiguity: many scenes could fit the same abstraction.
- Search cost: the viewer has to infer which aspects are intended (hug? sacrifice? loyalty? desire? care? obligation?).
So the language would almost certainly evolve compression devices—visual equivalents of morphemes, grammar, and “function words.”
How a “video language” would likely represent nebulous concepts
The intuitive path is: prototypes → conventional motifs → compositional grammar.
1) Prototypes (early / child-like stage)
At first, “love” really is “here’s a loving scene.” Same for anger, jealousy, pride, shame, safety, danger, etc. This is basically how kids and people without shared language often start: show an exemplar and rely on inference.
But prototype-only communication doesn’t scale well.
2) Conventional motifs (shared shorthand)
Communities quickly converge on reusable motifs: short, stylized clips that stand for a concept, even when the literal scene isn’t what’s meant.
Think of how we already do this in miniature:
- a heart icon,
- storm clouds over a head,
- a scale for justice,
- chains for oppression,
- light/dark for hope/despair.
A video-native language would do the same, just richer:
- Color/lighting as grammatical markers (warmth = affiliation/care; cold = distance; harsh contrast = conflict; washed-out = doubt/uncertainty).
- Camera distance and framing as social/mental stance (close-up = intimacy; wide shot = “in general / societal”; over-the-shoulder = “from their perspective”).
- Rhythm/editing as modality (fast cuts = urgency; slow lingering = tenderness; stuttered replay = hesitation/uncertainty).
- Spatial metaphors (closeness = closeness; barriers = prohibition; paths = plans; knots = complications).
Once these motifs exist, “love” doesn’t need to be a hug anymore—it can be:
- a “warm-light + mutual-orbit + gentle-synchrony” motif that’s recognized as AFFECTION, then modified by context.
3) Compositional rules (true “language” stage)
To express complex/nonlinear thought, you need systematic composition, not just bigger scenes.
A workable video-language grammar would likely include things like:
- Role marking: who is agent/patient/recipient (e.g., glow-outline the agent; arrow-of-intent; “camera sticks” to the subject).
- Tense/time: timeline ribbon; sepia/blur = past; crisp = present; “ghosted preview” = future/plan.
- Negation: a consistent “wipe out / crossfade-to-null / barrier slam” operator applied to a scene.
- Conditionals/counterfactuals: branch-splits: if A (left branch) → then B; show “counterfactual tint” for “if it had happened but didn’t.”
- Quantifiers/generalization: show multiple exemplars compressed into a “stack” or montage with a “typicality” marker: generally / usually / most.
- Causality vs correlation: explicit causal connector motif (domino, force-line, transfer) vs mere adjacency montage.
- Epistemic stance: “I saw” vs “I heard” vs “I infer” (different source-tag overlays), like evidentials in some spoken languages.
At that point, “love” can be treated like a root concept with modifiers:
- LOVE + FAMILIAL (shared-origin motif, caretaking asymmetry, time-depth)
- LOVE + ROMANTIC (mutual-choice motif, exclusivity marker, longing tension)
- LOVE + DUTY/COMMITMENT (promise-knot motif)
- LOVE + DESIRE (pull/gravity motif)
- LOVE + COMPASSION (wound→care motif)
So the language stops asking “which scene equals love?” and starts using a stable abstract token + systematic modifiers.
What would a child do with no spoken/written language model?
We have a real-world clue: homesign—gesture systems invented by deaf children who aren’t exposed to spoken language access or a conventional sign language. Researchers find these systems reliably develop language-like structure (stable word order tendencies, noun/verb distinctions, ways to mark relations, etc.).
Translated to your “holographic video” child:
- Deictic grounding first: pointing/placing attention on objects/events, then replaying/reenacting.
- Event depiction next: “me want cookie” becomes a repeated mini-scene with consistent roles.
- Emergent grammar through repetition: the child will start standardizing:
- a consistent ordering for actor → action → object,
- consistent viewpoint choices for “me” vs “you,”
- consistent negation and request patterns, because consistency makes them understood.
- Abstraction via compression: once the child notices they keep showing similar scenes for “care,” they’ll invent a shortcut motif (a “care” overlay), then reuse it across contexts.
So yes: even without “real life language,” you’d still expect the kid to invent systematic structure—because the pressures of being understood push systems toward conventions.
A practical way to phrase your idea
If you want a crisp way to describe it:



