Building a CW decoder in the browser — and what comes next

I wanted a CW decoder that worked in the browser. No installs, no accounts, no desktop software from 2004. Just open a page, let it listen, and watch Morse appear as text. So I built one.

This post covers how it works, the problems I hit, and what I’m planning next.

The approach

The decoder runs entirely in the browser using the Web Audio API. The audio chain looks like this:

Microphone/Line-in → AnalyserNode (FFT for waterfall) → BiquadFilter (bandpass) → AudioWorklet (Goertzel filter)

Each stage has a specific job. The AnalyserNode gives us spectrum data for the waterfall display. The bandpass filter narrows the audio to ±100Hz around the detected tone, cutting out noise before it reaches the decoder. The AudioWorklet runs the actual tone detection.

Goertzel filter — narrowband tone detection

The core of the decoder is a Goertzel filter running inside an AudioWorklet. If you haven’t come across Goertzel before, it’s essentially a single-bin FFT — it tells you the magnitude of one specific frequency in a block of samples, without computing the full FFT.

This is ideal for CW. I don’t need to know what every frequency is doing. I just need to know: is there energy at 600Hz (or whatever the sidetone is) right now?

The worklet accumulates samples into blocks of sampleRate / 75 (giving roughly 75Hz bandwidth), runs the Goertzel algorithm on each block, and posts the magnitude back to the main thread. At 48kHz sample rate, that’s a block every ~13ms — fast enough to track individual dits at 25 WPM.

Adaptive timing

Standard Morse timing is simple on paper: a dit is one unit, a dah is three units, inter-element gap is one unit, inter-letter gap is three units, word gap is seven units. But real-world keying — especially hand keying — is never that clean.

The decoder uses rolling buffers of the last 12 dit and dah durations. The threshold between dit and dah is the midpoint of their running averages. This lets the decoder adapt to the operator’s actual speed and style rather than assuming textbook timing.

WPM range clamping prevents the adaptive timing from drifting too far. If the estimated dit duration would imply a WPM outside the configured range, it gets clamped. This stops the decoder from locking onto noise or drifting wildly during pauses.

Auto-detecting the tone frequency

Early versions required you to manually set the tone frequency. This was awkward — most people don’t know exactly what frequency their sidetone or radio audio is at.

The decoder now auto-detects the tone on startup. It takes 15 FFT snapshots over 1.5 seconds, finds the peak frequency in each, groups them to the nearest 5Hz, and picks the most consistent one. Once detected, the bandpass filter locks in around it.

This uses the same FFT data that drives the waterfall display, so the detected frequency lines up exactly with what you see on screen.

The waterfall

The waterfall display uses an AnalyserNode with an FFT size of 4096, giving about 12Hz resolution per bin. Each frame, I read the frequency data, map the 400–900Hz range across the canvas width, and scroll the previous content down by one pixel. The colour mapping uses a power curve for contrast — quiet bins stay dark, strong signals light up in amber.

A frequency marker overlays the detected tone position. When the tone is active (CW key down), the marker brightens.

What’s hard in the browser

The browser is a surprisingly capable DSP environment, but there are real limitations:

Audio thread constraints. The AudioWorklet runs on the audio rendering thread. It processes 128-sample render quanta — you can’t change that. Any computation has to finish within that window or you get glitches. The Goertzel filter is lightweight enough, but more sophisticated DSP (adaptive filters, noise reduction, multipath handling) would push the limits.

No control over the audio input chain. When using a microphone, the browser applies automatic gain control, noise suppression, and echo cancellation by default. I disable these where possible using getUserMedia constraints, but browser support varies. Line-in mode helps, but not everyone has a cable setup.

Gap measurement precision. The Goertzel block size determines how often we get magnitude updates — roughly every 13ms. That’s the resolution of my tone on/off detection. At 20+ WPM, a dit is 60ms. Losing 13ms of precision on each edge means gap measurements between elements can be off by a significant fraction of a dit. This makes reliable letter and word boundary detection harder at higher speeds.

Acoustic coupling. Many operators will hold their phone up to a radio speaker. Room reverb, speaker response, and microphone quality all smear the signal. Tones don’t cut cleanly — they ring and decay, which extends the apparent element duration and shortens the apparent gap. This is the single biggest source of decoding errors I’ve seen in testing.

No persistent state. The browser can’t maintain a long-running connection to a radio. Every time you reload the page, the audio context resets, the adaptive timing resets, and detection starts from scratch.

What I’m building next — Python backend

The next version moves the signal processing to a Python backend running on AWS. Lambda won’t work here — WebSocket connections need to stay open for the duration of a decoding session, and Lambda’s execution model (short-lived, stateless invocations) doesn’t suit continuous audio streaming. Instead, I’m looking at Fargate — long-running containers behind an API Gateway WebSocket API, fronted by CloudFront.

The browser will still capture audio and display results, but the heavy lifting moves server-side. Here’s what that unlocks:

Better DSP. Python with NumPy and SciPy gives me access to proper signal processing tools — higher-order bandpass filters, adaptive noise cancellation, and more sophisticated detection algorithms. I can run a full STFT with overlapping windows, apply spectral subtraction for noise reduction, and use matched filtering against ideal dit/dah templates.

Machine learning decoding. With the DSP running in Python, I can experiment with ML-based decoders. A trained model could handle imperfect timing, overlapping signals, and QRM far better than threshold-based detection. The training data exists — generated Morse at various speeds with added noise, fading, and timing jitter.

Persistent sessions. A WebSocket connection means the backend can maintain state across the entire decoding session. The adaptive timing doesn’t reset. Historical context can inform current decoding — if the decoder has already identified the operator’s style over 30 seconds of keying, it can use that to resolve ambiguous elements.

Multi-signal detection. On a busy band, multiple CW signals overlap. The browser decoder picks the strongest tone and ignores everything else. A Python backend could track multiple signals simultaneously, letting me select which one to decode.

Reduced client load. Moving DSP off the browser means the page stays responsive regardless of signal complexity. The client sends raw audio chunks over the WebSocket and receives decoded text back. This also opens the door to mobile use where CPU and battery are more constrained.

The browser-only version will stay available — it works well for clean signals and practice keying, and there’s value in a tool that needs no backend at all. The Python version is for when you want to throw real-world RF at it and expect it to cope.

Try it

The decoder is live at skipzone.co.uk/tools/cw-decoder. It’s very much a v1 — I’m actively testing and tuning it. If you try it and something doesn’t decode right, that’s useful data. The gap sensitivity controls let you adjust letter and word boundary detection in real time, which helps with different keying styles and speeds.

73 de MM7IUY