Avatar Studio¶

Skill ID: ivx-avatar-studio

name: ivx-avatar-studio description: >- AI-powered avatar creation from a single photo: one-shot 3D head reconstruction, expression-driven reenactment, real-time facial animation, and interactive conversation with TTS. Based on GAGAvatar (3D Gaussian Splatting). Use when the user says "create avatar", "avatar from photo", "3D avatar", "face avatar", "avatar studio", "facial reenactment", "expression animation", "talking avatar", "avatar chat", "photo to avatar", "real-time avatar", or needs AI-generated avatars for games, social apps, or metaverse. version: "1.0.0" author: "IntelliVerse-X team@intelli-verse-x.ai" allowed-tools: - Read - Write - Edit - Glob - Grep - Shell

Overview¶

The Avatar Studio creates expressive 3D avatars from a single photo using GAGAvatar (Gaussian Avatars with Geometric-aware Appearance Normalization). It supports real-time facial reenactment, expression-driven animation, and interactive conversations with TTS — enabling custom player avatars, NPC portraits, streamers, and virtual hosts.

Single Photo
     │
     ▼
┌── Content-Factory / GAGAvatar ──────────────────────┐
│  3DGS Reconstruction → Appearance Normalization     │
│  → Expression Transfer → Real-time Rendering        │
│  → TTS Integration → Interactive Chat               │
└──────────────────────┬──────────────────────────────┘
                       │
           ┌───────────┼───────────┐
           ▼           ▼           ▼
      3D Avatar    Animated     Interactive
      Model        Sequences    Conversation
      (3DGS)       (MP4/GIF)   (WebSocket)

Content-Factory API¶

Create Avatar from Photo¶

curl -X POST http://localhost:8001/pipelines/avatar_create \
  -H "Content-Type: application/json" \
  -d '{
    "source_photo": "https://example.com/photo.jpg",
    "avatar_name": "PlayerAvatar",
    "style": "realistic",
    "generate_expressions": true,
    "ivx_export": true
  }'

Drive Avatar with Expressions¶

curl -X POST http://localhost:8001/pipelines/avatar_animate \
  -d '{
    "avatar_id": "PlayerAvatar",
    "driving_video": "driving_expressions.mp4",
    "output_format": "mp4"
  }'

Interactive Conversation¶

curl -X POST http://localhost:8001/pipelines/avatar_chat \
  -d '{
    "avatar_id": "PlayerAvatar",
    "message": "Welcome to Brain Quest! Ready for a challenge?",
    "voice_id": "elli_friendly",
    "emotion": "excited"
  }'

Generated Output¶

Avatar Package¶

avatars/PlayerAvatar/
├── model/
│   ├── avatar.ply                    # 3D Gaussian Splatting model
│   ├── appearance_volume.npz         # Tri-plane appearance features
│   └── metadata.json                 # Generation parameters
├── expressions/
│   ├── neutral.png                   # Rendered expression frames
│   ├── happy.png
│   ├── sad.png
│   ├── angry.png
│   ├── surprised.png
│   └── thinking.png
├── animations/
│   ├── greeting.mp4                  # Pre-rendered animation sequences
│   ├── nodding.mp4
│   ├── laughing.mp4
│   └── talking_idle.mp4
├── interactive/
│   ├── chat_config.json              # TTS + expression mapping config
│   └── emotion_map.json              # Text emotion → facial expression mapping
└── ivx/
    ├── avatar_meta.json              # SDK-compliant avatar metadata
    ├── expression_sheet.png          # 3×2 grid (matches character expression format)
    └── thumbnail.png                 # 512×512 avatar portrait

Avatar Metadata (ivx/avatar_meta.json)¶

{
  "id": "PlayerAvatar",
  "display_name": "Player Avatar",
  "type": "3dgs_avatar",
  "source": "single_photo",
  "model": {
    "file": "model/avatar.ply",
    "format": "3dgs",
    "gaussian_count": 50000,
    "appearance_volume": "model/appearance_volume.npz"
  },
  "expressions": {
    "neutral":   { "file": "expressions/neutral.png",   "blendshape_weights": {} },
    "happy":     { "file": "expressions/happy.png",     "blendshape_weights": {"mouthSmile": 0.8, "cheekPuff": 0.3} },
    "sad":       { "file": "expressions/sad.png",       "blendshape_weights": {"mouthFrown": 0.7, "browDown": 0.5} },
    "angry":     { "file": "expressions/angry.png",     "blendshape_weights": {"browDown": 0.9, "jawClench": 0.6} },
    "surprised": { "file": "expressions/surprised.png", "blendshape_weights": {"mouthOpen": 0.8, "browUp": 0.9} },
    "thinking":  { "file": "expressions/thinking.png",  "blendshape_weights": {"browUp": 0.3, "eyeSquint": 0.4} }
  },
  "animations": {
    "greeting":     { "file": "animations/greeting.mp4",     "duration_sec": 2.0, "loop": false },
    "nodding":      { "file": "animations/nodding.mp4",      "duration_sec": 1.5, "loop": true },
    "laughing":     { "file": "animations/laughing.mp4",     "duration_sec": 2.5, "loop": false },
    "talking_idle": { "file": "animations/talking_idle.mp4", "duration_sec": 3.0, "loop": true }
  },
  "voice": {
    "provider": "elevenlabs",
    "voice_id": "elli_friendly",
    "language": "en"
  },
  "bounds": { "center": [0, 0, 0], "radius": 0.3 },
  "tags": ["avatar", "3dgs", "interactive"]
}

Use Cases¶

1. Custom Player Avatars¶

Players upload a selfie → get a 3D avatar that reacts to gameplay: - Correct answer → happy expression - Wrong answer → sad expression - Victory → celebration animation - Defeat → disappointed animation

2. AI Game Host / NPC¶

Create an AI host character from concept art: - Real-time lip sync during TTS narration - Emotion-appropriate expressions based on game context - Interactive Q&A with players

3. Streamer / Content Creator Avatars¶

Create an avatar from a streamer's photo: - Real-time facial tracking from webcam - Expression transfer for live streaming - Green-screen ready output

User avatar for social features: - Profile picture variants (6 expressions) - Animated greeting for friend requests - In-game emotes using expression presets

Expression-Driven Animation¶

The avatar supports real-time expression transfer via ARKit blendshape coefficients:

EXPRESSION_BLENDSHAPES = {
    "happy":     {"mouthSmile": 0.8, "cheekPuff": 0.3},
    "sad":       {"mouthFrown": 0.7, "browDown": 0.5},
    "angry":     {"browDown": 0.9, "jawClench": 0.6},
    "surprised": {"mouthOpen": 0.8, "browUp": 0.9},
    "thinking":  {"browUp": 0.3, "eyeSquint": 0.4},
    "excited":   {"mouthSmile": 0.9, "browUp": 0.6, "cheekPuff": 0.4},
}

Driving Sources¶

Source	Method	Real-time?
Webcam	ARKit/MediaPipe face tracking	Yes
Video	Pre-recorded expression video	No
Text	LLM emotion detection → expression map	Yes
Game Events	Event → emotion → expression	Yes

Interactive Conversation Pipeline¶

User Text Input
     │
     ▼
LLM (emotion detection + response)
     │
     ├── response_text: "Great answer!"
     └── emotion: "excited"
              │
              ▼
     ElevenLabs TTS (audio)
     GAGAvatar (visemes + expression)
              │
              ▼
     Synchronized talking avatar video/stream

Chat Configuration¶

{
  "avatar_id": "PlayerAvatar",
  "llm_provider": "openai",
  "llm_model": "gpt-4o",
  "tts_provider": "elevenlabs",
  "voice_id": "elli_friendly",
  "emotion_detection": true,
  "response_format": "audio_video",
  "max_response_length": 150,
  "personality": "Friendly quiz host who encourages players and celebrates their knowledge"
}

Platform Notes¶

VR¶

3DGS rendering via custom VR shader for stereo output
Head tracking drives gaze direction
Spatial audio for voice output

Mobile¶

Pre-rendered expression frames (static images) for low-end devices
Real-time mode requires GPU with 3DGS support (high-end only)
Fallback: 2D expression sheet with crossfade transitions

WebGL¶

WebGPU required for real-time 3DGS rendering
Fallback: server-side rendering streamed as video
Pre-rendered animations for broad compatibility

Console¶

Full 3DGS rendering on PS5/Xbox Series X
Switch: pre-rendered fallback

Engine Integration¶

Engine	3DGS Rendering	Expression Control	Voice Integration
Unity	Custom renderer (compute shader)	BlendShape API	AudioSource + lip sync
Unreal	Niagara particle system	Morph Targets	MetaSound + visemes
Godot	Custom shader (particles)	BlendShape tracks	AudioStreamPlayer
Web	Three.js + WebGPU	JavaScript blendshape driver	Web Audio API