← Back to Library

How to make a legit sound camera

Benn Jordan challenges a fundamental assumption of modern engineering: that seeing the invisible requires expensive, proprietary hardware. In a project that blends DIY enthusiasm with rigorous signal processing, Jordan demonstrates that the subtle pixel variations in standard video files contain enough data to visualize sound waves, track heartbeats, and locate acoustic sources for a fraction of the cost of commercial alternatives. This is not just a tech demo; it is a case study in how open-source software and consumer-grade sensors are democratizing high-fidelity data analysis.

The Cost of Seeing Sound

Jordan begins by exposing the absurd economics of the current market. "Acoustic cameras definitely already exist and they exist in a business to business Market," he notes, pointing out that the cheapest commercial units start around $5,000 while top-tier models approach $100,000. He argues that this pricing structure is artificial, driven by proprietary hardware rather than the actual complexity of the task. "To me it just doesn't seem like it would need to cost tens of thousands of dollars and require a bunch of proprietary Hardware if you have a smartphone that could record 4K video," Jordan writes.

How to make a legit sound camera

The core of Jordan's argument is that the necessary data is already being captured by devices in our pockets. By treating video not as a visual record but as a dense stream of numerical data, one can extract information that the human eye ignores. He builds a functional acoustic imaging system using a 16-channel microphone array, a Raspberry Pi, and a global shutter camera module for under $400. "This acoustic camera comes in below $400 if you already have a laptop saving you I don't know $29,000 off of buying an existing camera with similar features," he concludes. The framing is effective because it shifts the focus from hardware acquisition to algorithmic ingenuity, suggesting that the barrier to entry for advanced diagnostics is now intellectual, not financial.

Critics might note that while the cost savings are real, the reliability of such DIY systems in critical industrial environments remains unproven compared to certified, calibrated commercial units. However, for prototyping and non-critical monitoring, the trade-off is compelling.

"You and I are on this journey together and there are a lot of exploratory steps and we're going to try and accomplish all this as inexpensively as possible."

The Software Bottleneck

The project's greatest hurdle was not the hardware, but the software ecosystem. Jordan admits that his lack of discipline as a developer made the process arduous, noting that "anybody with a few years of python experience will probably get farther in a few hours than I did in the last two weeks." He highlights the fragmentation of open-source tools, specifically the difficulty of managing dependencies for frameworks like Ocular and Spectacular. "If you attempt to get this running without running a virtual environment you will almost certainly run into some headaches," he warns, a sentiment that resonates with anyone who has tried to assemble complex scientific software stacks.

Despite these friction points, the results were tangible. Jordan successfully used the system to locate the source of unwanted noise in his studio and even identify the specific chickens making a racket. "I never know which one of my chickens are making noise now I do," he quips. This practical application underscores the utility of the technology: it turns abstract audio data into a visual map, allowing users to pinpoint problems without expensive consultants. The narrative choice to include the failures and the messy debugging process adds credibility, preventing the piece from feeling like a polished, unrealistic sales pitch.

Visualizing the Invisible: Pulse and Heat

The most provocative section of Jordan's work extends the concept of motion amplification beyond sound to biological and thermal phenomena. Drawing inspiration from fellow creator Poozie's work on "motion extraction," Jordan demonstrates how blending video frames with slight time differences can exaggerate micro-movements. "The subtle differences that your eyes with would never notice that exist between frames pixel to pixel in a video of a stationary scene holds a ton of information," he explains.

He pushes this further by applying the technique to human physiology. By filtering out vibrations outside the normal pulse range, he claims to have visualized his own heartbeat through a standard webcam. "This means that if we figured out a way to filter out vibrations or oscillations outside of the normal human pulse range using something like processing or python we could probably find the pulse of anyone sitting still in front of any modern webcam," Jordan writes. He even touches on the potential for corporate surveillance, joking that "big companies can use this to gauge how organically excited their employees are about a PowerPoint presentation."

This section raises significant privacy concerns that the piece only lightly skims over. While Jordan frames this as a triumph of data extraction, the implication that a camera could remotely monitor vital signs without consent is a double-edged sword. A counterargument worth considering is that the same technology used to diagnose machinery faults could be weaponized for non-consensual biometric surveillance, a risk that requires robust policy and technical safeguards.

"The subtle differences that your eyes with would never notice that exist between frames pixel to pixel in a video of a stationary scene holds a ton of information."

The Physics of Rolling Shutter

Jordan concludes by addressing a critical technical limitation: the rolling shutter effect common in most digital cameras. He explains that because these sensors scan horizontally rather than capturing the entire frame at once, fast-moving data can be distorted. "If something is moving quickly in the shot that you're getting with the video it will be picked up inaccurately because different regions of the frame are being captured at different times," he writes. To solve this, he advocates for global shutter cameras, which capture all pixels simultaneously, ensuring data integrity for scientific analysis.

He also revisits his previous work on schlieren imaging, showing how he adapted the technique to visualize sound waves and heat using simple patterns and high frame rates. "As long as you are not observing photons and bookmark that for a much more fuing complicated and crazy video in the future," he jokes, before demonstrating how sound pressure waves generate detectable heat. The transition from high-level theory to a literal fireball experiment keeps the tone accessible while maintaining scientific rigor. The piece succeeds in showing that the laws of physics are not the limit; the limit is often just our willingness to look closer at the data we already have.

Bottom Line

Jordan's strongest argument is that the democratization of high-end sensing technology is already underway, driven by software innovation rather than hardware breakthroughs. The piece's biggest vulnerability is the gap between a functional prototype and a reliable, user-friendly product, particularly regarding the complex software dependencies he encountered. Readers should watch for how these open-source methods evolve into standardized tools for industrial maintenance and medical diagnostics, as the potential for non-invasive, low-cost monitoring is now undeniable.

Sources

How to make a legit sound camera

by Benn Jordan · Benn Jordan · Watch video

disclaimer this video is definitely going to be a very deep nerdy sciency rabbit hole and some of my science videos are like hey check out this thing that cost me an absurd amount of time and resources let's check it out and I'll briefly tell you how it works this is not that at all you're coming with me on this entire journey and in fact at the time of me saying this I have no idea if this video will be about a fascinating piece of technology that I made or if it'll be about dealing with failure but the other day I woke up way too early in the morning and for some ungodly reason laid in bed thinking about those really expensive motion amplification systems used to diagnose and predict problems in engineering and Machinery Steve mold did an excellent video on them motion amplification is inarguably very cool technology but to me it just doesn't seem like it would need to cost tens of thousands of dollars and require a bunch of proprietary Hardware if you have a smartphone that could record 4K video that means that you could record nearly 8.3 million pixels in every single frame that is a lot of data and all you're really doing for motion amplification is analyzing changes in pixels and if you could detect something like this then maybe you could detect other slight vibrations like sound a big chunk of this video is going to be deciphering hidden layers of information inside everyday video files and streams such as locating an object that's causing unwanted noise or even getting people's heartbeat and vitals in real time from their webcams you and I are on this journey together and there are a lot of explor atory steps and we're going to try and accomplish all this as inexpensively as possible in fact if you have a smartphone you could do most of the stuff you'll see in this video with free software except for this first segment that'll require a few dozen cheap microphones a laptop a Raspberry Pi 5 a global shutter camera module acoustic cameras definitely already exist and they exist in a business to business Market if you wanted to go out and buy an acoustic Imaging device today the cheapest you could get one at least from what I could find is about ...