Sound Data Frequency Analyzer
The Sound Data Frequency Analyzer does high accuracy detection and plotting of frequency components of a sequence of digital data, such as digital sound data. It was also used to do digital signal processing frequency detector algorithm development, analyze, characterize, plot, and tune the frequency detectors, but that portion would be removed for any product to be released. The analysis has considerably better characteristics than an FFT in many important ways, does not have the large process noise errors that an FFT has, has much better frequency selection characteristics than an FFT, however has a much high processing cost than an FFT when doing a broad range of frequencies. Some comparative plots between the developed sound detector and FFT frequency analysis performance is shown below.
Some of the program was developed to do algorithm development and do mathematical evaluations of the frequency measurement algorithms to get some data for the SBIR submittal, but the part being shown in this document was mostly developed between the time that the submittal was made and the time when the SBIR evaluation was completed. Since funding didn't occur, this project is currently a lower priority project then other current work, and so development is currently proceeding at a snail's pace. There are many potential markets and continued development might become a priority in the future, if a interested and potentially profitable market becomes readily available, however there is currently limited development resources and priorities will need to be revaluated at that time.
This program is under development and is currently functional in some useful ways but is not completed in other ways. For example, it can process and analyze 44.1 kHz wav files ripped from a music CD, but cannot currently input data from the microphone and save it's data. It could also use a scroll bar for the Audio Track Plot display frequency plot range, etc. However the current trend of software is to put out software with one operation point and disregard robustness, usefulness, and interactive flexibility outside that one operation point, but somehow seems to me to be less than professional work.
General Sound Analyzer Overview
The following plot set are frequency plots of some sound from a wav file obtained from a music CD being played through the Sound Analyzer. The arbitrary music contains instruments being played along with vocals.
The bottom plot is the Sound Detector Response for the frequency components of the current data block music being processed, which is displayed as a frequency plot that is continuously being updating as the music is playing. The value being plotted is the maximum detector value in each data block for each of the analyzed frequencies. The plot is very fast and appears to plot simultaneously as long as the processor is not extremely overloaded. The data frequency changes are so fast that data is easily missed since the update rate is faster than a person's visual perception rate. A display filter is used. When the next data point comes in, if the new data is larger than current data then the new high value will be used, otherwise a major percentage is kept (settable in the configuration dialog) and the new data is added in, which allows a short spike of frequency data to be visible long enough for a person to see before it is gone.
The control bar at the bottom allows the audio stream to be stopped, data examined, adjust display position, and sound playing to be restarted. The time position on the right hand side of the control bar shows the current data position in minutes and seconds. The green data position control bar, which represents the full audio file/block, shows current data position as a vertical red line, the range of data that has analyzed data buffers currently available in shown as the red bar. During playing, the vertical line can be dragged to change the position being analyzed. When data play is stopped the vertical bar represents the data position being looked at, and clicking on a position on the red bar will adjust plots to the new data position.
The top plot is a small number of frequencies, 50 to 2000 Hz of the analysis range for this case, which frequency and count are configurable, plotted as a point for each block analyzed. In this case the plot pen is 1 pixel wide with right plotted in red and left plotted in black since green does not show well for this type of plot. The analysis block size is currently set to the arbitrary size of 1024 samples to allow easier comparison to the 1024 point FFT presented later. Below the plot is a scale scroll bar on the left side and a data scroll bar on the right side, with vertical scale bar at the right top and vertical data position scroll bar on the right bottom. In this case the plot is showing blocks 557 to 1100, which is ((1100-557+1)*1024)=557056 samples of data, which is 12.6 seconds of data at 44.1 kHz sample rate. The structures in the upper plot is data are from musical instruments and vocals. During play and processing of data, this plot is not being updated and only updates when stopping the data to examine by use of the magnifying glass button. When stopped, the black vertical line shows the position the data set being plotted in the lower plot. The position being plotted can be changed by clicking somewhere in the upper plot, which causes the cursors and lower plot to be changed to the new position, or by the < or > buttons which changes by one data block at a time when not on the end.
When in data examination mode (data stopped with magnifying glass button), clicking the plot button, the one with the squiggles brings up a second window for the frequency detector process data plot, which shows sample by sample data and frequency plots for the current plot display location.
The upper plot is the raw audio data being processed, Red for Right channel, and Left channel which is channel 0, is plotted in green.
The lower plot is the frequency plot, is plotting the range of frequencies as set by the scroll bar on the left side. The position line, which is the black line, corresponds to the position line on the other plots. The position line is currently in the center and only moves out of the center when close to start or the end of the displayable data. The previous frequency spectrum plot is of the 1024 samples to the right of the position line, which is about 2 and half vertical lines in this plot and a quarter of the 4096 samples in the plot.
The light blue data plot shows that maximum frequency detection level for this plot, which is the 200 Hz plot. Where the red and green plots (150 and 200 Hz) are close the same amplitude, the input frequency is very close to 200 Hz. In this case the detector's half power width is 100 Hz, which is +/- 50 Hz, for the 150 and 250 Hz plots. By using the characteristics of the detector response, the actual instantaneous frequency for each data point can be closely estimated and tracked through time. This data corresponds to the low frequency spike in the Audio Detector Frequency Spectrum plot above.
The following is the same data position, but the frequency range has been scrolled up to 6050 to 8000 Hz range to show the data for the data bumps in that range shown in the Audio Detector Frequency Spectrum plot above. The vertical scale is kept the same to show the relative detector amplitude differences.
The sound analyzer can be set to process the data with an FFT. The following plot in the lower portion is the FFT plot of the same data block displayed for the Sound Detector Response plot shown earlier. This data was found by stopping the processing when in the same region and by position clicking, scrolling and stepping to the same position, as reported by the data position value in the lower right hand box. Since the block size are the same for both processing cases, 1024 samples, the same block can be found.
The FFT frequency plot has a much higher signal response variance across the frequency range than the corresponding Sound Detector Response plot. The data is coming from high quality low noise music and I am inclined to believe that most of those small peaks at the plot floor represent a responses when no data for those frequencies exist in the music. Although Sound Detector Response has some aspects that might not be considered linear for short data sequences, I believe the response floor is much less noisy. For the FFT I will just call this noise, process noise. The FFT uses considerably less compute power than the Sound Detector, but the tradeoff is less flexibility and more process noise.
The top audio track data plot is the FFT data plotted in the same manner as the detector data in the above plot, however the noise floor is much more prevalent. Sample by sample data for the FFT is not available due to combined solution of multiple convolutions within one FFT.
Sound Analyzer and Vocalizations
This next plot set is for music that contains mostly singing with some instrument scattered through it. In this case, the singing is at a higher frequency range than normal speech.
Looking at the Audio Track Plot we see long continuing areas across the plot. These long flowing sections are vocal areas with a short area at the front which quickly rises in frequency, followed by what looks like 3 vocal areas, which are extended time vocalizations compared to normal speech. The focus point of the frequency plot , shows 2 major power peaks with two minor peaks between, at the beginning of the vocalizations. Also plotted is the lowest response level plots for the block, which is the two plots below the maximum response plots. Often the lowest response level is low and the plots just show as strait lines at bottom.
Looking at the detector process data for these peaks, we see a large data peak at 400 Hz range on the left side followed in time, which is to the right, by a lower power sound at higher frequency, which corresponds to the two larger frequency peaks discussed earlier.
Scrolling the frequency scroll bar, we can move up in frequency and eliminate the lower frequency data and review the high frequency data as can be seen the following plot. We see that this second sound in the data is peaking out in the 800 and 850 Hz range.
The FFT result of the same data block shows the smaller peaks between the two major peaks as more pronounced than the Sound Detector. It is curious why there is such a pronounced difference. Some of this can be process noise but seems larger than normal process noise, however the processing falls under a different method of analyzing data and time length of data greatly affects the sound detector's peak response. The output of the detector is not really accumulated power, but just responds to the instantaneous power and tracks response level over time. Just because it does not respond the same exact way and power levels as an FFT does not mean that it is poor or an invalid detector, it is just different with characteristics that can be exploited to analyze and characterize real data.
The S hisssss Sound
While watching the real-time sound analyzer in response to the music, an occasional spike in the 8000 to 9000 Hz range appeared but could not relate it to any musical instrument. Because I was assuming that most of the vocal sound was below 2000 Hz, I also didn't relate it to the vocal sounds that was also playing. Such a power spike is shown in the Audio Detector Frequency Spectrum in the following plot.
One day I decided to evaluate these more closely, maybe it was just technical environment noise. In the position scroll bar control, located at the bottom, there are start and stop position cursors. From the left side, the start cursor can be pulled to the start point where it is desired to start the audio. From the right side of the position scroll bar control, the stop position cursor can be pulled to where the analysis is to stop. These start and stop positioning is shown as blue highlighting behind the green data bar. In the between these, you can see the area to be analyzed. The stop and stop positions can be adjusted until the short period that is to be analyzed is set. This segment can be listen to several time by using the move to start arrow button and play button, which would stop when reaching the stop position. The sound was found to be a vocalized hissing sound often at the end of words or between words that are being sung. In this particular case, it is the hiss sound of a word beginning with an "S" being pronounced. In certain cases, as in this, there is almost no low frequency sounds during portions of a hiss, where most vocalizations have low frequencies.
The processed frequency data shows the hiss starting at a lower frequency and quickly moving up in frequency as it proceeds in time. Some of these frequencies have very short detection time as the sound is quickly moving through the frequency range would have very little response, possibly buried in the process noise, for an FFT covering a much larger time period. The time relationships of these small power signals within the block, would also be lost.
The following plot shows same block of data as processed by an FFT.