Documentation
Capabilities / Voice

Layering & Mixing

Professional voice mixing capabilities coming soon

Coming Soon

Layering and mixing capabilities are currently under development. This feature will allow you to create dialogue scenes, mix voice with music and effects, and produce broadcast-ready audio with professional mixing techniques.

Overview

Professional voice production involves more than recording—it requires skillful mixing of dialogue, music, sound effects, and processing. Wubble's layering and mixing capabilities help you create polished, production-ready audio by intelligently combining voice elements with music and effects, applying professional processing, and ensuring broadcast-ready quality.

What You Can Create

Multi-character dialogue scenes with natural timing and positioning
Voiceover mixed with background music and sound effects
Podcast episodes with intro/outro music, interviews, and production elements
Broadcast-ready audio meeting professional loudness standards
Audiobook productions with chapter breaks and consistent quality

Why Professional Mixing Matters

Voice Intelligibility

Proper mixing ensures voice remains clear and intelligible even when combined with music and effects. Poor mixing makes content frustrating to listen to.

Professional Sound Quality

Well-mixed audio sounds expensive and professional. Listeners unconsciously associate production quality with content credibility and value.

Broadcast Compliance

Professional platforms have loudness and quality standards. Proper mixing ensures your content meets specifications without manual adjustment.

Listener Experience

Consistent levels, balanced frequency response, and proper dynamics create a pleasant listening experience that keeps audiences engaged.

Dialogue Scene Creation

Create multi-character dialogue scenes with natural conversational flow, realistic timing, and spatial positioning. Perfect for podcast interviews, game dialogue, audio drama, animation, and any content featuring multiple speakers.

How It Works

Wubble's dialogue scene creation handles the complexity of multi-speaker audio:

1

Character Voice Generation

Generates or uses provided voices for each character with consistent identity throughout the scene

2

Natural Timing

Adds realistic pauses between speakers, natural breath points, and conversational overlaps when appropriate

3

Spatial Positioning

Positions voices in the stereo field to create spatial separation and help listeners distinguish speakers

4

Environmental Context

Applies appropriate room ambience and acoustic characteristics to match the scene setting

Scene Configuration Options

Natural Timing

Automatic pause insertion between speakers, realistic reaction times, and conversational flow patterns

Spatial Positioning

Position each character in stereo field (left, center, right) for clear separation and immersive experience

Room Ambience

Choose from various acoustic environments: studio, living room, outdoor, car, office, large hall, or intimate space

Emotional Dynamics

Characters can have emotional arcs throughout the scene, with delivery adapting to evolving emotional states

🎭

Character Differentiation

Use distinct voice profiles for each character. Spatial positioning and voice characteristics help listeners easily track who's speaking without confusion.

Mixing Voice with Music

Professionally mix voice narration or dialogue with background music. Wubble's intelligent mixing ensures voice remains clear and intelligible while music provides emotional support and production value without competing for attention.

Auto-Ducking & Dynamic Mixing

Auto-ducking automatically reduces music volume when voice is present, creating professional-sounding mixes without manual automation:

  • Voice Detection: AI identifies when voice is present and needs clarity
  • Smooth Ducking: Music level reduces smoothly when voice enters, avoiding abrupt changes
  • Musical Sensitivity:Ducking responds to musical phrases and doesn't interrupt musical moments awkwardly
  • Recovery: Music returns to full level during pauses with natural timing

Mix Strategies

Auto-Duck

Automatic music volume reduction when voice is present. Best for podcasts, narration, and voiceover work.

Constant Background

Music at consistent, lower level throughout. Appropriate for light background music that doesn't compete with voice.

Musical Bed

EQ-shaped music that occupies frequency ranges not used by voice. Creates space without ducking.

Segmented

Music plays between voice segments (intro, transitions, outro). Common in structured content like courses or presentations.

🎵

Music Selection Matters

Choose music without heavy vocals or dominant mid-range elements that compete with voice. Instrumental tracks with clear low and high frequencies work best for voice mixing.

Full Production Mixing

Create complete audio productions combining dialogue, music, sound effects, and ambience. Wubble handles the complexity of multi-element mixing, ensuring all components work together cohesively while maintaining voice intelligibility and production standards.

Production Elements

Dialogue

Primary voice content: narration, character dialogue, interviews

Music

Background scores, intros, outros, transitions, stingers

Sound Effects

Spot effects, impacts, UI sounds, foley, action elements

Ambience

Environmental backgrounds, room tone, atmospheric textures

Mixing Styles

Cinematic

Wide dynamic range, spatial depth, room for dramatic impact. Ideal for film, animation, and dramatic audio content.

Broadcast

Controlled dynamics, consistent loudness, high intelligibility. Perfect for TV, radio, and streaming platforms with loudness standards.

Podcast

Optimized for earbuds and mobile listening, excellent voice clarity, moderate dynamics. Follows podcast loudness standards (-16 LUFS).

Audiobook

Voice-focused, minimal distraction from music/effects, sustained listening comfort. Meets ACX and audiobook platform specifications.

Professional Vocal Processing

Apply professional audio processing to voice recordings: de-essing, compression, EQ, reverb, and more. Wubble's intelligent processing enhances voice quality, ensures consistency, and creates polished, broadcast-ready audio.

Essential Processing

De-essing

Reduces harsh sibilant sounds (S, T, CH) that can be fatiguing and cause distortion. Essential for professional vocal quality.

Compression

Controls dynamic range for consistent volume. Makes quiet parts audible and loud parts controlled. Broadcast compression styles available.

EQ (Equalization)

Shapes frequency balance: removes rumble, adds presence, enhances warmth. Tailored EQ curves for different voice types and contexts.

Reverb

Adds space and depth to voice. Subtle room reverb creates naturalness, while larger spaces add drama. Context-appropriate reverb selection.

Processing Presets

Broadcast Voice

Radio-ready sound with presence, clarity, and controlled dynamics

Podcast Voice

Optimized for spoken word, excellent intelligibility, earbud-friendly

Cinematic Voice

Rich, full sound with space and depth for dramatic content

Clean & Natural

Minimal processing, maintaining natural voice character

⚙️

Less is Often More

Start with subtle processing and increase only as needed. Over-processing can make voice sound unnatural or fatiguing. Trust the presets, they're designed by professionals.

Best Practices

Prioritize Voice Intelligibility

Voice should always be clear and understandable. If music or effects compromise intelligibility, they're too loud. Clarity comes first.

Use Reference Monitoring

Test your mix on multiple playback systems: studio monitors, earbuds, phone speakers, car audio. Good mixes translate across all systems.

Meet Platform Standards

Follow loudness standards for your target platform: -16 LUFS for podcasts, -14 LUFS for streaming, -23 LUFS for broadcast TV. Wubble handles this automatically.

Leave Headroom

Don't push levels to 0dB. Leave -1 to -3dB of headroom to prevent clipping and allow for encoding artifacts. Professional mixes have headroom.

Use Appropriate Processing

Match processing to content type. Podcast processing differs from cinematic. Audiobook processing differs from commercial. Choose the right style.

Consider Listening Context

People listen to content in various environments. Podcast listeners often use earbuds in noisy environments. Optimize for your audience's likely listening conditions.

Maintain Consistency

Keep processing and loudness consistent across episodes or chapters. Listeners notice and dislike sudden changes in audio quality or volume.

Was this page helpful?