Skip to main content
  1. Posts/

How Shazam Works: The Connect-the-Dots Trick for Sound

··289 words·2 mins·

🎵 Shazam: the trick is ignoring almost everything

How can your phone recognize a song in seconds, even in a noisy coffee shop? It doesn’t listen to the melody. It doesn’t recognize the lyrics. It does something much cleverer.

🔍 The process in four steps:

  1. Captures sound as a digitized air pressure waveform
  2. Transforms the waveform with an FFT (Fast Fourier Transform) → converts audio into a time/frequency spectrogram
  3. Discards 99% → keeps only the loudest peaks, creating a sparse “constellation map” of dots
  4. Generates fingerprints → pairs each dot with nearby ones → 3 numbers (frequency A, frequency B, time difference) = 1 unique hash

🗄️ The magic of the inverted index: Instead of comparing your audio against each song, the system looks up each of your hashes in a giant table where each hash points to the songs that contain it. It’s like the index at the back of a book. Result: millions of songs compared in fractions of a second.

🔊 Why does it work well with noise? Because background noise rarely creates the single loudest peak in any region of the spectrogram.

🎤 Why does it fail when you hum? Because humming generates different hashes from the original recording.

💡 Explanation in a nutshell
#

Shazam identifies songs by converting a few seconds of audio into a compact fingerprint based on the strongest frequency peaks and their time relationships, then matches it against a database of millions of songs using an inverted index that responds in milliseconds.

More information at the link 👇

Also published on LinkedIn.
Juan Pedro Bretti Mandarano
Author
Juan Pedro Bretti Mandarano