7 concerning levels of acoustic spying techniques

From Shotgun Mics to Quantum Vibrations

Benn Jordan, a musician and audio engineer with an unusually deep technical toolkit, presents a layered exploration of acoustic surveillance that escalates from consumer-grade directional microphones to theoretical interferometer-based listening devices capable of measuring vibrations at the picometer scale. The piece is structured as a descent -- seven levels of increasing sophistication -- and the format works because each tier genuinely builds on the physics of the last. Jordan is not merely cataloging spy gadgets. He is tracing a line from well-understood acoustic principles to their most extreme and unsettling applications.

The video opens with a demonstration that grounds the entire discussion in practical reality. Mechanical keyboards, the kind favored by programmers and writers, produce subtly distinct sounds for each key. Jordan demonstrates that existing AI tools can decode typed text from keystroke audio alone, even from a recording made outside a window with an entry-level shotgun microphone.

There's an algorithm called Keyap that attempts to take recorded keystrokes, compare them to a data set of English words, and decrypt what you're typing by making guesses that get more confident over time.

This is not speculative. The algorithm exists, it runs locally, and Jordan shows it working. The implication is immediate and uncomfortable: anyone conducting a video conference while typing is potentially broadcasting the contents of their keystrokes to every participant on the call. It is the kind of vulnerability that sounds paranoid until someone demonstrates it on camera.

7 concerning levels of acoustic spying techniques

The Physics of Eavesdropping at Distance

Jordan's tour through directional microphones -- shotgun mics, parabolic dishes, mid-side configurations, and ambisonic arrays -- serves a dual purpose. On one level, it is a practical comparison test at 100 feet. On another, it is an education in how phase cancellation and beamforming work, concepts most people encounter only as marketing language on noise-cancelling headphones.

The ambisonic demonstration is particularly striking. Using higher-order ambisonics to create a focused "blob" of sensitivity, Jordan extracts the audio of a small speaker playing Franklin D. Roosevelt speeches from a concrete lawn ornament -- a speaker so quiet that it is inaudible to a person standing directly next to it.

When we use higher order ambisonics to amplify negative pressure everywhere except for this one spherical harmonic blob, we can hear it rather well.

The comparison with Adobe's commercial voice isolation model is telling. Adobe's AI-powered tool, which represents the state of the art in consumer audio processing, produces garbled results on the same source material. The physics-based approach, rooted in Laplace's equation and spherical harmonics, outperforms the machine learning approach. It is a useful reminder that brute-force AI is not always the superior tool, particularly in domains where the underlying physics is well characterized.

Invisible Lasers and Visible Concerns

The laser microphone segment represents the video's pivot from commercially available technology to purpose-built surveillance tools. Jordan revisits a project from eight years earlier -- bouncing a visible laser off a window to reconstruct audio inside a room -- and upgrades it to infrared. The technical details matter: infrared lasers are invisible to the human eye, their focused radiation can burn skin and damage DNA, and the pulse width resolution turns out to be superior to visible-light lasers because there is less ambient interference in the infrared spectrum.

Even though I had some unique tools to help me guide the laser, a mistake could injure myself or others. This might sound dramatic, but keep in mind that infrared lasers are not only invisible to the human eye, but their focused radiation can burn skin and damage DNA a lot more than other visible wavelengths.

The experiment works. Jordan successfully reconstructs dialogue from Night of the Living Dead by bouncing an invisible infrared laser off his studio window. He is candid about the difficulty -- calling it "a huge expensive dangerous pain in the ass" -- but the point is made. The technique is real, it functions, and it requires no cooperation from the target.

Seeing Sound With a High-Speed Camera

The most conceptually ambitious segment involves extracting audio from high-speed video footage of vibrating objects. Jordan films a speaker cone at 1,440 frames per second, tracks the subtle motion of a speck on the cone's surface using open-source physics software called Tracker, exports the positional data to Excel, and converts the resulting time series into a 32-bit float audio file at 1,440 Hz.

The result is recognizable audio reconstructed entirely from visual data. No microphone was involved. Jordan then applies the same technique to a grocery bag sitting near a speaker playing speech, and after noise filtering, manages to extract intelligible words from the bag's vibrations.

He compares his approach to a well-known MIT technique that extracted audio from potato chip bags using computational photography algorithms. Jordan's manual tracking method, despite being less elegant, produces better results in his tests -- a fact he handles with characteristic self-deprecation.

So, let's check the score here. MIT zero, high school dropout one. Just kidding. Just because I couldn't make it work well doesn't mean that it doesn't work well.

The intellectual honesty here is notable. Jordan claims his result while acknowledging that his failure to replicate the MIT method likely reflects his own limitations rather than flaws in the research. It is the kind of epistemic humility that is rare in science communication, where demonstrations are typically curated for maximum impressiveness.

Time-of-Flight Sensors and the Theoretical Ceiling

The final practical demonstration uses time-of-flight sensors -- the same infrared dot-projection technology found in Xbox Kinect and LiDAR-equipped self-driving cars -- to capture three-dimensional point cloud data of a room at 90 frames per second. By extracting the motion of specific points across frames, Jordan produces what he calls "extremely low-resolution three-dimensional audio that is useful to no one."

But the theoretical extension is where the concern lies. Time-of-flight sensors work at distances up to 3,000 meters. Multiple sensors could be phase-staggered to increase the effective frame rate. Ten sensors at 90 frames per second would yield 900 frames per second, enough for 450 Hz of audio bandwidth -- sufficient for intelligible speech.

Theoretically, considering that time of flight works up to 3,000 m or 1.8 mi with enough determination, you'd be able to do what my laser microphone does, but with the added benefit of isolating different objects for direct vibration analysis. Now, imagine what one could accomplish setting that up across the street from the White House.

The suggestion is provocative but grounded. The individual components -- time-of-flight sensors, point cloud analysis software, phase-staggering techniques -- all exist. The barrier is cost and engineering effort, not physics. For a well-resourced intelligence agency, that is no barrier at all.

The Legal Framework Is Not the Safeguard

Jordan's legal discussion near the end is measured and worth scrutiny. He correctly notes that in the United States, the expectation of privacy determines the legality of surveillance. A conversation in a public park is fair game; a conversation inside a car is not, even if the car is visible from a public sidewalk. Using any of the demonstrated techniques to record someone without consent in a private space is, in most states, a felony.

His argument about why government abuse of these techniques is unlikely is less convincing, however. Jordan points out that FBI forensic consultants earn roughly forty dollars an hour and suggests that the economics of surveillance constrain its deployment. This may be true for the specific techniques he demonstrates, which require expensive hardware and significant expertise. But the broader trend -- the decreasing cost of sensors, the increasing capability of AI-driven analysis, and the expansion of always-on recording devices in consumer electronics -- cuts against his reassurance.

His parting observation is sharper and more honest than his optimism about government restraint.

You could worry about the police cherry-picking the information that they present to a judge to obtain a warrant to put a much more affordable bug in your house or wiretap your phone because that is something that happens.

This is the real concern. The exotic techniques Jordan demonstrates are impressive, but the mundane ones -- a hidden phone with Apple's Live Listen feature, a cheap Amazon bug, a compromised smart speaker -- are the actual vectors for abuse. The sophistication of the surveillance is less important than the willingness to deploy it.

Bottom Line

Benn Jordan's exploration is valuable precisely because it is not alarmist. Each technique is demonstrated with its limitations intact, its costs acknowledged, and its practical barriers visible. The cumulative effect is nonetheless sobering. The physics of sound -- the fact that every noise creates vibrations in every nearby object, and those vibrations are in principle recoverable -- means that acoustic privacy is less a right than a temporary condition, maintained only by the gap between what is theoretically possible and what is practically affordable. That gap is closing. The question is not whether these capabilities will become accessible, but who will have them first and what legal frameworks will govern their use. On that question, Jordan is honest enough to offer no reassurance.