
🎵 Shazam: the trick is ignoring almost everything
How can your phone recognize a song in seconds, even in a noisy coffee shop? It doesn’t listen to the melody. It doesn’t recognize the lyrics. It does something much cleverer.
🔍 The process in four steps:
- Captures sound as a digitized air pressure waveform
- Transforms the waveform with an FFT (Fast Fourier Transform) → converts audio into a time/frequency spectrogram
- Discards 99% → keeps only the loudest peaks, creating a sparse “constellation map” of dots
- Generates fingerprints → pairs each dot with nearby ones → 3 numbers (frequency A, frequency B, time difference) = 1 unique hash
🗄️ The magic of the inverted index: Instead of comparing your audio against each song, the system looks up each of your hashes in a giant table where each hash points to the songs that contain it. It’s like the index at the back of a book. Result: millions of songs compared in fractions of a second.
🔊 Why does it work well with noise? Because background noise rarely creates the single loudest peak in any region of the spectrogram.
🎤 Why does it fail when you hum? Because humming generates different hashes from the original recording.
💡 Explanation in a nutshell#
Shazam identifies songs by converting a few seconds of audio into a compact fingerprint based on the strongest frequency peaks and their time relationships, then matches it against a database of millions of songs using an inverted index that responds in milliseconds.
More information at the link 👇

