WAVDSP logo
Neural Labs TeamApple Silicon Optimization

Primary Vx

Pure vocals for real-world performance.

Primary Vx is a real-time, low-latency AI vocal cleanup product line from WAVDSP Neural Labs. It detects and separates the human voice from background noise and stage bleed, helping speech and singing stay clear in demanding environments.

License$699
Trial

macOS Only, Windows will release soon.

Primary Vx product

Behind The Engine

Voice-aware cleanup in three stages.

Primary Vx follows the voice through a focused cleanup chain: detect the source, wipe unwanted bleed, then protect the signal path before the PA starts to ring.

No IIR or FIR EQ, Dynamic EQ, or phase tuning — cleanup is handled entirely by our custom Pure AI pipeline, not traditional filters.

1

Vx Source Detection

Identifies the vocal source in real time so the engine follows the performance, not the bleed around it.

2

Source Separation

Wipes unwanted background instruments, ambience, and bleed away from the vocal path.

3

Feedback Protection

Helps you push more gain before feedback, with lower risk of ringing from the PA system.

What it removes

Keep the voice. Push the noise away.

Primary Vx is designed for vocal microphones surrounded by real world sound: loud instruments, reflections, ambience, and unwanted bleed that normally follows the voice into the mix.

Drums

Transient spill

01

Cymbals

Harsh high bleed

02

Guitar amp

Midrange wash

03

Bass amp

Low-end rumble

04

Room

Stage reflections

05

Reverb

Tail buildup

06

PA bleed

System spill

07

Feedback

Ringing pressure

08

Hear the difference

Before and after Primary Vx on a live vocal capture.

Listen to the same performance with stage bleed intact, then switch to the Primary Vx processed version. Playback is streaming-only for this page demo.

Unprocessed capture0:00 / 0:00

Stream-only preview. These demo files cannot be downloaded from this page.

Example audio credit: LIVE FROM THE LAB by TELEFUNKEN Elektroakustik — "Doom Flamingo - Measurements LFTL".

Training disclosure

Primary Vx was trained on datasets we recorded in-house. That includes material captured specifically for vocal separation, plus additional sessions built to train our feedback protection system—requirements that made off-the-shelf public datasets unsuitable for this product. We organize and process that audio through proprietary file formats and workflows built for our training pipeline.

More gain before feedback

Cleaner input gives the vocal more room to come forward before the system starts fighting back.

Cleaner live vocals

Keep speech and singing focused, even when the microphone is close to drums, amps, or PA systems.

Production-ready capture

Reduce distracting bleed at the source so the mix starts from a clearer, more usable vocal track.

Labs Analysis

Measured cleanup from a real noisy vocal clip.

WAVDSP Neural Labs processed a 30-second excerpt from a live-style vocal capture. Compare waveform, RMS envelope, and time-frequency spectrograms between the noisy input and denoised output.

Loading analysis plots...

Labs Training

How Primary Vx was trained, and what it took to get there.

Primary Vx did not come from a general speech denoiser with a new skin. The Neural Labs team trained a vocal separation stack on material collected for one problem: keep the performance, lose the bleed, and do it inside a live-latency budget.

Source materials

2.5 TB

Licensed studio stems, multitrack rehearsals, and field captures assembled into the Primary Vx training corpus.

Active training pool

1.9 TB

After silence trimming, clipping rejection, and manual review of questionable takes.

Compute

H200 · H100 · A100

Distributed cloud GPU fleet—NVIDIA H200, H100, and A100 tiers, plus additional accelerator SKUs brought online as runs scaled.

Core model training

~1,680 GPU-hrs

Mixed-precision distributed runs for the main vocal separation network.

Fine-tuning passes

~420 GPU-hrs

Voice-activity detection and feedback-aware guard modules, trained separately.

Wall-clock schedule

12 days

Primary cloud cluster window, followed by three weeks of targeted refinement.

Training pipeline

The work was split into curation, pre-training, fine-tuning, and export validation. We cared less about benchmark scores on clean speech and more about whether the model still behaved when the vocal was buried under real stage noise.

1

Capture and curation

We built the dataset from microphone paths that actually fail in production: vocals sharing a capsule with drum spill, guitar amp wash, room reflections, and PA bleed. A large share of incoming audio was rejected—wrong mic type, clipped converters, unusable metadata, or material that did not survive listening checks.

2

Separation pre-training

The base separator was trained on multitrack material where isolated vocal references existed. That gave the model a stable idea of what belongs to the voice before we pushed it into messier mono captures where ground truth is harder to recover.

3

Live-bleed fine-tuning

The second phase focused on single-mic and FOH-style sources. We deliberately overweighted difficult cases: loud stages, wedge-heavy monitoring, and rooms where the vocal never really sits alone in the recording.

4

Real-time validation

Every export candidate was run through latency-bounded inference tests and held-out venue recordings that never entered training. Models that looked good offline but smeared consonants or pumped background under load were sent back for another pass.

Compute & validation

The heavy lifting ran on a distributed cloud GPU fleet—H200, H100, A100, and additional accelerator tiers scheduled as job size changed. That was not a single overnight job—the separator alone consumed roughly 1,680 GPU-hours, with another 420 GPU-hours spent on the voice-activity and feedback-protection stages that ship with the product.

In calendar time, the main training block took about 12 days of scheduled cluster time across that fleet, followed by three more weeks of smaller fine-tuning runs while we chased edge cases in cymbal bleed, amp hum, and monitor spill.

  • Training ran on a rented multi-tier cloud GPU fleet—H200, H100, A100, and supporting accelerator classes—with mixed FP16/BF16 precision.
  • Distributed jobs were staged on high-throughput cloud scratch storage so workers were not waiting on ingest between long epochs.
  • Held-out validation came from 112 live and rehearsal captures recorded after the training cutoff date.
  • The shipping macOS build was re-validated through Core ML and Apple Neural Engine export checks on Apple Silicon hardware.

The numbers above describe the training program behind the model in Primary Vx today. They are not marketing round figures—we track ingest volume, rejected material, and GPU time because retraining a vocal product is expensive, and we need to know exactly what changed between one build and the next.

FAQ

Primary Vx questions, answered.

Practical details for latency, recommended systems, host software, instances, feedback reduction, routing, licensing, and how the engine handles cleanup without traditional EQ.

What is the latency of Primary Vx?+

Primary Vx is designed with a multi-stage processing chain while keeping internal latency under 4.7 ms. This makes it practical for Front of House (FOH) workflows and other live vocal applications where response time matters.

What hardware do you recommend?+

Primary Vx runs on macOS 11.0 or later. We recommend Apple Silicon Macs because the engine can take advantage of Core ML and Apple Neural Engine acceleration for higher performance. Windows support is not available yet while optimization work is still in progress.

Is Primary Vx a standalone application or an audio plugin?+

Primary Vx is currently built as an audio plugin for macOS in VST3, AU, and AAX formats, with support for both Apple Silicon and Intel Macs. A dedicated standalone version of Primary Vx is planned for the future. A Windows version is also coming soon.

What host software do you recommend?+

We recommend hosts such as SuperRack Performer and LiveProfessor. The ideal buffer size depends on the CPU in your machine, so we recommend testing your own setup and choosing the lowest stable buffer for your workflow.

How many instances can I run on one machine?+

You can run as many instances as your CPU can handle reliably. Larger sessions may require a slightly higher buffer size to maintain stability.

How does Primary Vx reduce feedback?+

Primary Vx can deliver impressive feedback reduction while preserving vocal quality. In many situations, it can increase gain before feedback by up to 10-20 dB while keeping background noise and bleed from breaking through the vocal path.

Does Primary Vx use EQ or filters?+

Primary Vx does not use any IIR or FIR EQ or filtering. The engine is built entirely on our custom Pure AI pipeline, so there is no Dynamic EQ, no phase tuning, and no traditional spectral shaping. What you hear comes entirely from the Neural Engine.

Is Primary Vx for a single vocal channel or a group vocal bus?+

Primary Vx can be used on either a single vocal channel or a vocal group/bus. The best placement depends on the situation, the source material, and how your live or studio session is routed.

How does licensing and activation work?+

Primary Vx uses online activation through your WAVDSP account. After purchase, install the plugin and sign in with your email and password in the built-in license panel when you first open it. Activation requires an internet connection once; after that, Primary Vx works offline. Each license can be activated on one machine at a time. To move your license to another Mac, deactivate from the license panel first, then activate again on the new machine while online.

Primary Vx

Give every vocal a cleaner starting point.

Whether the voice is in front of drums, near a PA, inside a lively room, or captured in a noisy creator setup, Primary Vx helps it stay focused, intelligible, and ready for the next stage.

Trial

macOS Only, Windows will release soon.