SoundTouch项目总结

更新时间：2023-11-29 21:12:02 阅读量：教育文库文档下载

说明：文章内容仅供预览，部分内容可能不全。下载后的文档，内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的，是否完整无缺。

BOSE SOUNDTOUCH 300推荐度：
相关推荐

soundtouch之变调、变速、节拍

前一段日子在做变调不变速的算法，通过频域实现，谁知道到相位同步一直搞不定了，声音效果比较差。后来去偶然看到了soundtouch，这个强大的库让我为之振奋，现在已经完成，并做成了一个实时播放的demo，现把一些使用笔记简单地拿出来共享。

SoundTouch是一个开源的音频处理库，主要实现包含变速、变调、变速同时变调等三个功能模块，能够对媒体流实时操作，也能对音频文件操作。采用32位浮点或者16位定点，支持单声道或者双声道，采样率范围为8k~48k。当然，这里的变速是通过节拍tempo控制的，因此它能提取乐音的节拍。另外，这个库的算法被很多知名软件使用，如audacity，winnap等。

三个功能分别采用的算法：

变速不变调：通过wsola类型的算法实现；

rate：通过插值抽取实现；（重采样的算法我已经写过一个函数了，现有的函数也比较多）变调不变速：前两个的组合，先变速，然后重采样。

主要使用的函数都在soundtouch.H文件中包含，当然，它也做成了一个dll，还给出了一个使用样例。可以通过样例来看它的使用，还有一个readme文件，其中有详细的介绍。

我个人总结的主要注意事项：

1、使用前需要初始化SoundTouch对象，初始化的方法是用setSampleRate和setChannels设置音频流的参数。1为单声道，2为双声道。采样率为8000~48000Hz。然后可以通过相应的函数来设置新的pitch，tempo，rate等。

2、 SoundTouch就像一个FIFO管子，先进去的先出来。用putSamples来输入采样值，用receiveSamples来获取处理后的值。需要注意的是，新的pitch，tempo，rate必须在putsample之前就设定好。而不是填充好数据了才设置新的值。

3、 SoundTouch需要成批的数据到来才能处理，所以必须有足够的采样到来，处理才能进行。所以，这里会有一些延迟（latency），输入不一定会马上处理，输出也不一定是刚刚输入的那一个。

4、各个控制参数可以在处理期间改变，但是没有一个控制信号来实现同步，所以如果在多线程里面进行的话要增加另外的信号控制。

5、 Soundtouch类采用了TDStretch类来变速，采用了RateTranposer类来变采样率。Soundtouch的变调实际上是通过先用TDStretch把信号拉长了，这样变速了但是没有变调，然后再重采样，实现变调不变速。

6、 Soundtouch有定点和浮点两种算法，在win下面编译的时候，要在STTypes.H头文件中选择“#define INTEGER_SAMPLES 1”则用16位，如果是选择了“#define FLOAT_SAMPLES 1” 就代表用float型的采样值。

7、如果用了time-stretch功能的话，可能会有比较长的延迟，文献中比较低端的机器配置会有100ms的延迟处理时间，实际上根据我的机器配置（当前比较普通的）计算，延迟远小于这个数值，基本上在put进去之后马上就会计算出来。当然，变调用的是time-stretch和重采样的组合，也必须考虑这个延迟。

8、有三个参数setting需要设置，DEFAULT_SEQUENCE_MS，帧长（ms），样例中用的是40；DEFAULT_SEEKWINDOW_MS，叠加的时候寻找窗的范围长度（ms），样例中给的是15；SETTING_OVERLAP_MS，叠加范围（ms），样例中用的是8； SETTING_USE_QUICKSEEK，是否使用快速查找方法；SETTING_USE_AA_FILTER，是否使用AA滤波器；

SETTING_AA_FILTER_LENGTH，滤波器阶数，默认值是32。当然这些参数都有相应的默认值，如果你不去修改它也可以，但是为了实时或者音质的要求你可以根据需求调整它，详见它自带的readme文档。

SoundTouch音频处理库源码分析及算法提取(1)

SoundTouch音频处理库的使用异常简单，经过简单的编译之后，设置编译环境，以vc为例，直接在include包含SoundTouch目录下的include路径，接着在lib添加SoundTouch目录下的lib路径，然后在代码的头文件中添加头文件以及引用的库。如下：根据_DEBUG这个宏，我们可以进行一些编译预处理，假如是以DEBUG编译就采用debug库，其他的话就采用 release库。他们的区别就是文件名后面是否多了一个“D”。 #include #ifdef _DEBUG

#pragma comment(lib, \#else

#pragma comment(lib, \#endif

当然你也可以直接在vc的项目工程中直接添加，某些人比较喜欢如此。

最重要的一点还要声明一个命名空间，至于原因，和SoundTouch这个库的声明定义有关，以下在分析的时候会提到。 using namespace soundtouch

然后就可以直接在自己的代码中定义一个类变量SoundTouch m_SoundTouch;

SoundTouch 类的声明包含在SoundTouch.h和SoundTouch.cpp之中，由FIFOProcessor类直接派生，而FIFOProcessor类又直接从基类FIFOSamplePipe派生。同时声明SoundTouch这个类包含在命名空间 soundtouch，这就是为什么我们使用这个库的时候需要声明命名空间的主要原因。感觉有点多余。且仅仅定义了一些常量，诸如版本号，版本ID号等等，这两个父类都包含在FIFOSamplePipe.h和FIFOSamplePipe.cpp文件中。

不管什么库，如果要使用的话，一般的流程都是先定义然后进行一些必要的初始化，

SoundTouch（以下简称ST)也不例外。ST的初始化也和他的编译一样异常的简单，具体可以参考他的例子SoundStretch来实现，也可以参考源代码中有关SoundTouch这个类的声明，现在只关心我们会用到的那部分，可以看到在private中定义了另外两个类指针 RateTransposer*，TDStretch*；

RateTransposer从FIFOProcessor派生，而FIFOProcessor又直接从基类FIFOSamplePipe派生，TDStretch和RateTransposer类似。由此可见，单单从两个类的名字上看：拉长？传输速率？不难想象出这个库对声音信号的处理可能就是“拉长”，然后“变速”。难道就是传说中的不变调变速？事实正是如此。这还不是我们现在关心的话题。 …… private:

/// Rate transposer class instance class RateTransposer *pRateTransposer; /// Time-stretch class instance class TDStretch *pTDStretch;

/// Virtual pitch parameter. Effective rate & tempo are calculated from

these parameters. float virtualRate;

/// Virtual pitch parameter. Effective rate & tempo are calculated from these parameters. float virtualTempo;

/// Virtual pitch parameter. Effective rate & tempo are calculated from these parameters. float virtualPitch;

/// Flag: Has sample rate been set? BOOL bSrateSet;

/// Calculates effective rate & tempo valuescfrom 'virtualRate', 'virtualTempo' and

/// 'virtualPitch' parameters. void calcEffectiveRateAndTempo(); protected :

/// Number of channels uint channels;

/// Effective 'rate' value calculated from 'virtualRate', 'virtualTempo' and 'virtualPitch' float rate;

/// Effective 'tempo' value calculated from 'virtualRate', 'virtualTempo' and 'virtualPitch' float tempo;

/// Sets new rate control value. Normal rate = 1.0, smaller values /// represent slower rate, larger faster rates. void setRate(float newRate);

/// Sets new tempo control value. Normal tempo = 1.0, smaller values /// represent slower tempo, larger faster tempo. void setTempo(float newTempo);

/// Sets new rate control value as a difference in percents compared /// to the original rate (-50 .. +100 %) void setRateChange(float newRate);

/// Sets new tempo control value as a difference in percents compared /// to the original tempo (-50 .. +100 %) void setTempoChange(float newTempo);

/// Sets new pitch control value. Original pitch = 1.0, smaller values /// represent lower pitches, larger values higher pitch. void setPitch(float newPitch);

/// Sets pitch change in octaves compared to the original pitch /// (-1.00 .. +1.00)

void setPitchOctaves(float newPitch);

/// Sets pitch change in semi-tones compared to the original pitch /// (-12 .. +12)

void setPitchSemiTones(int newPitch); void setPitchSemiTones(float newPitch);

/// Sets the number of channels, 1 = mono, 2 = stereo void setChannels(uint numChannels); /// Sets sample rate.

void setSampleRate(uint srate);

/// Changes a setting controlling the processing system behaviour. See the /// 'SETTING_...' defines for available setting ID's. /// /return 'TRUE' if the setting was succesfully changed

BOOL setSetting(int settingId, ///< Setting ID number. see SETTING_... defines.

int value///< New setting value. ); ……

参考ST提供的例子SoundStretch，初始化SoundTouch这个类： m_SoundTouch.setSampleRate(sampleRate);//设置声音的采样频率 m_SoundTouch.setChannels(channels);//设置声音的声道

m_SoundTouch.setTempoChange(tempoDelta); //这个就是传说中的变速不变调 m_SoundTouch.setPitchSemiTones(pitchDelta);//设置声音的pitch m_SoundTouch.setRateChange(rateDelta);//设置声音的速率

// quick是一个bool变量，USE_QUICKSEEK具体有什么用我暂时也不太清楚。 m_SoundTouch.setSetting(SETTING_USE_QUICKSEEK, quick);

// noAntiAlias是一个bool变量，USE_AA_FILTER具体有什么用我暂时也不太清楚。 m_SoundTouch.setSetting(SETTING_USE_AA_FILTER, !(noAntiAlias));

// speech也是一个bool变量，初步估计可能是没有音乐只有人声的时候，需要设置一下。 if (speech) {

// use settings for speech processing

m_SoundTouch.setSetting(SETTING_SEQUENCE_MS, 40); m_SoundTouch.setSetting(SETTING_SEEKWINDOW_MS, 15);

m_SoundTouch.setSetting(SETTING_OVERLAP_MS, 8);

fprintf(stderr, \}

通过那么简单的几个函数调用，现在我们就可以感受一下ST的强大。通过SoundTouch类提供的函数调用方法：

putSamples(sampleBuffer,nSamples);

第一个参数为一个指向PCM编码的一段音频数据的指针，第二个参数就是要处理多少个 sample也可以理解为多少帧。

需要注意的是，一般数据流都是字节流，也就是说，sample的大小和声道、位的声音参数有关，假如sampleBuffer指针指向一个长度为64bytes的一个PCM数据缓冲区，16位2声道，那么实际上这里只存放了(16*2)/8=4bytes,64/4=16;16个sample，这是我们需要注意的地方。m_SoundTouch.putSamples(sampleBuffer, nSamples);数据是传进去了，可是从哪里接收处理过的音频数据呢？这个时候我们就要用SoundTouch提供的receiveSamples函数调用方法。

uint receiveSamples(SAMPLETYPE *outBuffer, ///< Buffer where to copy output samples.

uint maxSamples ///< How many samples to receive at max.

);他也是两个参数，第一个为接收数据的参数，第二个最大可以接收多少sample。

通过这段注释，大概明白receiveSamples这个函数不会在putSamples之后马上返回数据，另外一方面有可能返回比maxSamples更多的数据，因此需要放在一个do…while(…)的循环里面把他们都榨干。

// Read ready samples from SoundTouch processor & write them output file. // NOTES:

// - 'receiveSamples' doesn't necessarily return any samples at all // during some rounds!

// - On the other hand, during some round 'receiveSamples' may have more // ready samples than would fit into 'sampleBuffer', and for this reason // the 'receiveSamples' call is iterated for as many times as it // outputs samples. do {

nSamples = m_SoundTouch.receiveSamples(sampleBuffer, buffSizeSamples); //把sampleBuffer写入一个文件，或者填充进声卡的缓冲区，播放声音。 } while (nSamples != 0);

SoundTouch音频处理库源码分析及算法提取(2)

SoundTouch音频处理库初始化流程剖析定义一个变量SoundTouch m_SoundTouch;

SoundTouch的派生关系

FIFOSamplePipe->FIFOProcessor->SoundTouch (流程[1])

因此首先构造基类FIFOSamplePipe，接着派生出FIFOProcessor，然后才以FIFOProcessor派生出SoundTouch。这里不得不提一下老外的C++水平真的很高，在这里基本上把类的继承发挥到了极致。能够分析这样的代码简直就是一种享受。先看一下基类FIFOSamplePipe，如下定义： class FIFOSamplePipe { public:

// virtual default destructor virtual ~FIFOSamplePipe() {}

/// Returns a pointer to the beginning of the output samples.

/// This function is provided for accessing the output samples directly. /// Please be careful for not to corrupt the book-keeping! ///

/// When using this function to output samples, also remember to 'remove' the

/// output samples from the buffer by calling the /// 'receiveSamples(numSamples)' function virtual SAMPLETYPE *ptrBegin() = 0;

/// Adds 'numSamples' pcs of samples from the 'samples' memory position to /// the sample buffer.

virtual void putSamples(const SAMPLETYPE *samples, ///< Pointer to samples. uint numSamples ///< Number of samples to insert. ) = 0;

// Moves samples from the 'other' pipe instance to this instance.

void moveSamples(FIFOSamplePipe &other ///< Other pipe instance where from the receive the data. ) {

int oNumSamples = other.numSamples();

putSamples(other.ptrBegin(), oNumSamples); other.receiveSamples(oNumSamples); };

/// Output samples from beginning of the sample buffer. Copies requested samples to /// output buffer and removes them from the sample buffer. If there are less than /// 'numsample' samples in the buffer, returns all that available. ///

/// /return Number of samples returned.

virtual uint receiveSamples(SAMPLETYPE *output, ///< Buffer where to copy output samples. uint maxSamples ///< How many samples to receive at max. ) = 0;

/// Adjusts book-keeping so that given number of samples are removed from beginning of the /// sample buffer without copying them anywhere. ///

/// Used to reduce the number of samples in the buffer when accessing the sample buffer directly /// with 'ptrBegin' function.

virtual uint receiveSamples(uint maxSamples ///< Remove this many samples from the beginning of pipe.

) = 0;

/// Returns number of samples currently available. virtual uint numSamples() const = 0;

// Returns nonzero if there aren't any samples available for outputting. virtual int isEmpty() const = 0;

/// Clears all the samples. virtual void clear() = 0; }

这里没有实现FIFOSamplePipe类的构造函数，因此系统隐性的调用了默认的自动生成的

FIFOSamplePipe()。当然他应该没有做任何的初始化，同样也不需要做任何的初始化。通过定义virtual ~FIFOSamplePipe() {}虚析构函数，使得new一个子类，例如：FIFOSamplePipe* a = new FIFOProcessor，当a销毁的时候都会执行子类FIFOProcessor的析构函数，保证不管多少层继承都会一次过全部销毁，这是作为一个基类的特点。类的继承和多态果然是C++最为强悍的一部分，有助于编写重复性很高的类。通过看这个基类的声明，我们可以留意到除了定义大多数虚函数之外，他唯独实现了moveSamples这个函数，也就是子类如果没有override moveSamples，都将调用这个方法。他做的处理也相对来说很简单，根据注释，我们不难理解，正是这个函数实现了各个派生类之间的数据共享传递的接口。

// Moves samples from the 'other' pipe instance to this instance.

moveSamples(FIFOSamplePipe &other ///< Other pipe instance where from the receive the data. ) {

int oNumSamples = other.numSamples();

putSamples(other.ptrBegin(), oNumSamples); other.receiveSamples(oNumSamples); };

bufferUnaligned = NULL; samplesInBuffer = 0; bufferPos = 0;

channels = (uint)numChannels;

ensureCapacity(32); // allocate initial capacity }

FIFOSampleBuffer的构造函数将被调用三次。现在终于可以执行RateTransposer的构造函数 // Constructor

RateTransposer::RateTransposer() : FIFOProcessor(&outputBuffer) {

numChannels = 2; bUseAAFilter = TRUE; fRate = 0;

// Instantiates the anti-alias filter with default tap length // of 32

pAAFilter = new AAFilter(32); }

首先看一下AAFilter的相关定义 class AAFilter {

protected:

class FIRFilter *pFIR;

/// Low-pass filter cut-off frequency, negative = invalid double cutoffFreq; /// num of filter taps uint length;

/// Calculate the FIR coefficients realizing the given cutoff-frequency void calculateCoeffs(); public:

AAFilter(uint length); ~AAFilter();

/// Sets new anti-alias filter cut-off edge frequency, scaled to sampling /// frequency (nyquist frequency = 0.5). The filter will cut off the /// frequencies than that.

void setCutoffFreq(double newCutoffFreq);

/// Sets number of FIR filter taps, i.e. ~filter complexity void setLength(uint newLength); uint getLength() const;

/// Applies the filter to the given sequence of samples.

/// Note : The amount of outputted samples is by value of 'filter length' /// smaller than the amount of input samples. uint evaluate(SAMPLETYPE *dest,

const SAMPLETYPE *src, uint numSamples, uint numChannels) const; };

在其构造函数中初始化了一个指向class FIRFilter的指针 AAFilter::AAFilter(uint len) {

pFIR = FIRFilter::newInstance(); cutoffFreq = 0.5; setLength(len); }

首先我们看看FIRFilter类成员函数newInstance()，嘿嘿，在这里我们发现了一个非常有用的函数detectCPUextensions();通过这个函数我们可以判断cpu到底支持什么类型的多媒体指令集。根据注释我们也可以很快理解。detectCPUextensions收藏了。他的实现就在Cpu_detect_x86_win.cpp的实现中。美中不足的是，他只能检测x86结构体系的CPU。可能我多想了。根据本人电脑的配置（采用的赛扬cpu），所以只支持mmx指令。 FIRFilter * FIRFilter::newInstance() {

uint uExtensions;

uExtensions = detectCPUextensions();

// Check if MMX/SSE/3DNow! instruction set extensions supported by CPU #ifdef ALLOW_MMX

// MMX routines available only with integer sample types if (uExtensions & SUPPORT_MMX)

{

return ::new FIRFilterMMX; } else

#endif // ALLOW_MMX #ifdef ALLOW_SSE

if (uExtensions & SUPPORT_SSE) {

// SSE support

return ::new FIRFilterSSE; } else

#endif // ALLOW_SSE #ifdef ALLOW_3DNOW

if (uExtensions & SUPPORT_3DNOW) {

// 3DNow! support

return ::new FIRFilter3DNow; } else

#endif // ALLOW_3DNOW {

// ISA optimizations not supported, use plain C version return ::new FIRFilter; } }

为此他将通过这个判断构造返回一个FIRFilterMMX类 if (uExtensions & SUPPORT_MMX) {

return ::new FIRFilterMMX; }

查看FIRFilterMMX的类定义class FIRFilterMMX : public FIRFilter，他从FIRFilter派生。成员函数uint FIRFilterMMX::evaluateFilterStereo引起了我的高度注意，主要的算法采用MMX指令集来完成某些声音计算。这个就是我们需要的Rate的核心算法。不同指令集的实现，可以参考FIRFilter3DNow，FIRFilterSSE，默认是FIRFilter的evaluateFilterStereo函数的实现。 // mmx-optimized version of the filter routine for stereo sound

uint FIRFilterMMX::evaluateFilterStereo(short *dest, const short *src, uint numSamples) const {

// Create stack copies of the needed member variables for asm routines : uint i, j;

__m64 *pVdest = (__m64*)dest;

if (length < 2) return 0;

for (i = 0; i < (numSamples - length) / 2; i ++)

{

__m64 accu1; __m64 accu2;

const __m64 *pVsrc = (const __m64*)src;

const __m64 *pVfilter = (const __m64*)filterCoeffsAlign;

accu1 = accu2 = _mm_setzero_si64(); for (j = 0; j < lengthDiv8 * 2; j ++) {

__m64 temp1, temp2;

temp1 = _mm_unpacklo_pi16(pVsrc[0], pVsrc[1]); // = l2 l0 r2 r0 temp2 = _mm_unpackhi_pi16(pVsrc[0], pVsrc[1]); // = l3 l1 r3 r1

accu1 = _mm_add_pi32(accu1, _mm_madd_pi16(temp1, pVfilter[0])); // += l2*f2+l0*f0

r2*f2+r0*f0

accu1 = _mm_add_pi32(accu1, _mm_madd_pi16(temp2, pVfilter[1])); // += l3*f3+l1*f1

r3*f3+r1*f1

temp1 = _mm_unpacklo_pi16(pVsrc[1], pVsrc[2]); // = l4 l2 r4 r2

accu2 = _mm_add_pi32(accu2, _mm_madd_pi16(temp2, pVfilter[0])); // += l3*f2+l1*f0

r3*f2+r1*f0

accu2 = _mm_add_pi32(accu2, _mm_madd_pi16(temp1, pVfilter[1])); // += l4*f3+l2*f1

r4*f3+r2*f1

// accu1 += l2*f2+l0*f0 r2*f2+r0*f0 // += l3*f3+l1*f1 r3*f3+r1*f1

// accu2 += l3*f2+l1*f0 r3*f2+r1*f0 // l4*f3+l2*f1 r4*f3+r2*f1

pVfilter += 2; pVsrc += 2; }

// accu >>= resultDivFactor

accu1 = _mm_srai_pi32(accu1, resultDivFactor); accu2 = _mm_srai_pi32(accu2, resultDivFactor);

// pack 2*2*32bits => 4*16 bits

pVdest[0] = _mm_packs_pi32(accu1, accu2); src += 4; pVdest ++; }

_m_empty(); // clear emms state

return (numSamples & 0xfffffffe) - length; }

因此，如果把SoundTouch移植到arm等没有多媒体指令集的CPU时,应使用FIRFilter的

evaluateFilterStere函数。执行完这里，终于可以真正意义上构造我们的RateTransposerInteger()。在构造函数中：

RateTransposerInteger::RateTransposerInteger() : RateTransposer() {

// Notice: use local function calling syntax for sake of clarity, // to indicate the fact that C++ constructor can't call virtual functions. RateTransposerInteger::resetRegisters(); RateTransposerInteger::setRate(1.0f);

}进行了一些必要的初始化。至此pRateTransposer = RateTransposer::newInstance();实例化完毕。至于pTDStretch = TDStretch::newInstance();下回分晓。

SoundTouch音频处理库源码分析及算法提取(3)

SoundTouch音频处理库初始化流程剖析 2

紧接上文《SoundTouch音频处理库初始化流程剖析》

TDStretch类和基类的关系：FIFOSamplePipe -> FIFOProcessor ->TDStretch

SoundTouch类成员class TDStretch *pTDStretch变量的初始化在SoundTouch的构造函数 SoundTouch::SoundTouch()中进行。 pTDStretch = TDStretch::newInstance();

他通过调用TDStretch类成员函数newInstance()构造，代码如下： TDStretch * TDStretch::newInstance() {

uint uExtensions;

uExtensions = detectCPUextensions();

// Check if MMX/SSE/3DNow! instruction set extensions supported by CPU #ifdef ALLOW_MMX

// MMX routines available only with integer sample types if (uExtensions & SUPPORT_MMX) {

return ::new TDStretchMMX; } else

#endif // ALLOW_MMX

#ifdef ALLOW_SSE

if (uExtensions & SUPPORT_SSE) {

// SSE support

return ::new TDStretchSSE; } else

#endif // ALLOW_SSE

#ifdef ALLOW_3DNOW

if (uExtensions & SUPPORT_3DNOW) {

// 3DNow! support

return ::new TDStretch3DNow; }

else

#endif // ALLOW_3DNOW {

// ISA optimizations not supported, use plain C version return ::new TDStretch; } }

和pRateTransposer如出一辙，也是通过对cpu的增强指令集的检测，构造支持相应多媒体指令集处理的子类。针对不同的指令集，他派生了TDStretchMMX，TDStretch3DNow，TDStretchSSE针对三种不同指令集的类，主要通过override TDStretch类成员函数calcCrossCorrStereo来实现，假如都不支持，将采用TDStretch自己的类成员函数calcCrossCorrStereo进行处理，浮点处理采用double TDStretch::calcCrossCorrStereo(const float*mixingPos, const float *compare) const,定点处理采用double TDStretch::calcCrossCorrStereo(const float *mixingPos, const float *compare) const，通过宏定义#define INTEGER_SAMPLES或者#define FLOAT_SAMPLES来进行预编译处理。由于TDStretchMMX，TDStretch3DNow，TDStretchSSE只是简单的override了类成员函数

calcCrossCorrStereo，并没有初始化什么，自然就没有写构造函数，因此都将采用编译器默认生成的构造函数进行构造，针对我的赛扬CPU： if (uExtensions & SUPPORT_MMX) {

return ::new TDStretchMMX; }

他将构造TDStretchMMX并返回一个指向这个类的指针。实际上还是按照以下这个流程构造了TDStretchMMX：

FIFOSamplePipe->FIFOProcessor->TDStretch->TDStretchMMX

根据以上分析，我们需要的，把一个声音信号拉长压短的算法就在TDStretch类的成员函数TDStretch::calcCrossCorrStereo中实现，针对不同cpu的三种优化代码分别在源文件

3dnow_win.cpp,Mmx_optimized.cpp,See_optimized.cpp中。至于calcCrossCorrStereo(const float *mixingPos, const float *compare)函数的调用参数具体什么含义，卖个关子，以后再具体分析。再回到我们的SoundTouch类的构造函数SoundTouch::SoundTouch(); SoundTouch::SoundTouch() {

// Initialize rate transposer and tempo changer instances pRateTransposer = RateTransposer::newInstance(); pTDStretch = TDStretch::newInstance(); setOutPipe(pTDStretch); rate = tempo = 0; virtualPitch = virtualRate =

virtualTempo = 1.0;

calcEffectiveRateAndTempo(); channels = 0; bSrateSet = FALSE; }

如今初始化了一个处理rate的实例pRateTransposer，还有一个对音频进行拉长压短的实例pTDStretch，剩下的事情，就是初始化一些变量。至此SoundTouch m_SoundTouch;变量实例化完成。

SoundTouch音频处理库源码分析及算法提取(4)

SoundTouch构造流程初始化的一点补充。

在SoundTouch类构造函数中，我们留意到有这么一个函数calcEffectiveRateAndTempo() SoundTouch::SoundTouch() {

calcEffectiveRateAndTempo(); channels = 0; bSrateSet = FALSE; }

在SoundTouch类的6个成员函数void setRate(float newRate),void setRateChange(float newRate),void setTempo(float newTempo),void setTempoChange(float newTempo),void

setPitch(float newPitch),void setPitchOctaves(float newPitch)分别调用。不难想象，应该是对音频处理参数的一些处理，通过对calcEffectiveRateAndTempo的进一步分析，他的实现如下。 // Calculates 'effective' rate and tempo values from the // nominal control values.

void SoundTouch::calcEffectiveRateAndTempo() {

float oldTempo = tempo; float oldRate = rate;

tempo = virtualTempo / virtualPitch;

rate = virtualPitch * virtualRate;

if (!TEST_FLOAT_EQUAL(rate,oldRate)) pRateTransposer->setRate(rate); if (!TEST_FLOAT_EQUAL(tempo, oldTempo)) pTDStretch->setTempo(tempo); #ifndef PREVENT_CLICK_AT_RATE_CROSSOVER if (rate <= 1.0f) {

if (output != pTDStretch) {

FIFOSamplePipe *tempoOut; assert(output == pRateTransposer);

// move samples in the current output buffer to the output of pTDStretch tempoOut = pTDStretch->getOutput(); tempoOut->moveSamples(*output);

// move samples in pitch transposer's store buffer to tempo changer's input pTDStretch->moveSamples(*pRateTransposer->getStore()); output = pTDStretch; } } else #endif {

if (output != pRateTransposer) {

FIFOSamplePipe *transOut; assert(output == pTDStretch);

// move samples in the current output buffer to the output of pRateTransposer transOut = pRateTransposer->getOutput(); transOut->moveSamples(*output);

// move samples in tempo changer's input to pitch transposer's input pRateTransposer->moveSamples(*pTDStretch->getInput()); output = pRateTransposer; } } }

主要还是完成了pRateTransposer，pTDStretch两个类的一些参数设置。从而对于整个声音的处理流程大概也有了一个初步的认识。

1、创建一个数字低通滤波器AAFilter，通过加入hamming window来截取sample。

我们分析一下他是如何创建这个低通数字滤波器，主要实现还是在RateTransposer类的构造函数中，构造一个AAFilter类来实现。pAAFilter = new AAFilter(32);

RateTransposer::RateTransposer() : FIFOProcessor(&outputBuffer) {

numChannels = 2; bUseAAFilter = TRUE; fRate = 0;

// Instantiates the anti-alias filter with default tap length // of 32

pAAFilter = new AAFilter(32); }

我们看一下AAFilter类定义，比较简单，也很好理解。class FIRFilter *pFIR就和前面分析的一样，指向根据CPU派生出支持相应增强指令集优化的类。同样他们只是简单的override数据处理的函数。double cutoffFreq;就是低通截止频率。calculateCoeffs()就是我们应该重点理解的类函数，数字滤波器的主要参数就靠它来实现 class AAFilter {

protected:

class FIRFilter *pFIR;

/// Low-pass filter cut-off frequency, negative = invalid double cutoffFreq; /// num of filter taps uint length;

/// Calculate the FIR coefficients realizing the given cutoff-frequency void calculateCoeffs(); public:

AAFilter(uint length); ~AAFilter();

/// Sets new anti-alias filter cut-off edge frequency, scaled to sampling /// frequency (nyquist frequency = 0.5). The filter will cut off the /// frequencies than that.

void setCutoffFreq(double newCutoffFreq);

/// Sets number of FIR filter taps, i.e. ~filter complexity void setLength(uint newLength); uint getLength() const;

/// Applies the filter to the given sequence of samples.

/// Note : The amount of outputted samples is by value of 'filter length' /// smaller than the amount of input samples. uint evaluate(SAMPLETYPE *dest, const SAMPLETYPE *src, uint numSamples,

uint numChannels) const; };

先看一下AAFilter的构造函数，先创建一个FIR滤波器的实例，接着让截取频率等于0.5，需要注意的是，这个是一个角频率。然后设置滤波器的窗体宽度。 AAFilter::AAFilter(uint len) {

pFIR = FIRFilter::newInstance(); cutoffFreq = 0.5; setLength(len); }

在设置宽度的类成员函数SetLength中，调用了类成员函数calculateCoeffs(); // Sets number of FIR filter taps

void AAFilter::setLength(uint newLength) {

length = newLength; calculateCoeffs(); }

现在重点介绍一下类成员函数calculateCoeffs()，他就是整个数字滤波器参数实现的核心。源代码如下：

// Calculates coefficients for a low-pass FIR filter using Hamming window void AAFilter::calculateCoeffs() { uint i;

double cntTemp, temp, tempCoeff,h, w; double fc2, wc;

double scaleCoeff, sum; double *work;

SAMPLETYPE *coeffs; assert(length >= 2); assert(length % 4 == 0); assert(cutoffFreq >= 0); assert(cutoffFreq <= 0.5); work = new double[length];

coeffs = new SAMPLETYPE[length]; fc2 = 2.0 * cutoffFreq; wc = PI * fc2;

tempCoeff = TWOPI / (double)length;

sum = 0;

for (i = 0; i < length; i ++) {

cntTemp = (double)i - (double)(length / 2); temp = cntTemp * wc; if (temp != 0) {

h = fc2 * sin(temp) / temp; // sinc function } else {

h = 1.0; }

w = 0.54 + 0.46 * cos(tempCoeff * cntTemp); // hamming window temp = w * h; work[i] = temp;

// calc net sum of coefficients sum += temp; } ...

类函数的前半部分通过assert进行一些必要的判断，例如长度一定要大于2且一定要是4的倍数，才能保证length/2是一个整数，同时保证截取频率在0和0.5之间。接着采用汉明窗作为窗。注意到0.54 + 0.46 * cos(2 * pi * cntTemp / N)和汉明窗函数0.54 - 0.46*cos(2*pi*n/(N-1))形式上有点不一致，其实也不难理解：

i = (0 .. length-1) 且 cntTemp = i - (length/ 2)； 0.54 + 0.46 * cos(2 * pi * cntTemp / N) = 0.54 - 0.46 * cos(2 * pi * cntTemp / N + pi) = 0.54 - 0.46 * cos(2 * pi * cntTemp / N + pi * N / N) = 0.54 - 0.46 * cos(2 * pi * (cntTemp + N / 2) / N) = 0.54 - 0.46 * cos(2 * pi * n / N) where n = 0..N-1

仅仅是一个cos(x) = -cos(x+pi)的变化，很简单却又让人很容易惯性思维，不容易想明白。至于为什么用N不用N-1，我相信以下这段话，可以很清楚明白的表达，在这里，要谢谢一个哈理工老师的指教。“这个N-1如果用N，对称中心N/2不是整数，就不是一个采样点（因为是偶对称，并且N要取奇数----低通滤波器理论上只能这么选取参数”注意到我们在长度中i是从0开始的，到length结束，而length前面通过assert判断一定要大于2且一定是四

的倍数，他不是一个奇数，因此(length-1)/2一定不是一个整数。所以这里可以理解为我们的滤波器是有length+1的长度。 ......

// ensure the sum of coefficients is larger than zero assert(sum > 0);

// ensure we've really designed a lowpass filter... assert(work[length/2] > 0); assert(work[length/2 + 1] > -1e-6); assert(work[length/2 - 1] > -1e-6);

// Calculate a scaling coefficient in such a way that the result can be // divided by 16384

scaleCoeff = 16384.0f / sum; for (i = 0; i < length; i ++) {

// scale & round to nearest integer temp = work[i] * scaleCoeff; temp += (temp >= 0) ? 0.5 : -0.5; // ensure no overfloods

assert(temp >= -32768 && temp <= 32767); coeffs[i] = (SAMPLETYPE)temp; }

// Set coefficients. Use divide factor 14 => divide result by 2^14 = 16384 pFIR->setCoefficients(coeffs, length, 14); delete[] work; delete[] coeffs; }

类函数的后半部分，assert用来验证这个低通滤波器是否真的有效，剩下的主要是做一个定点的处理2^14=16384，相当于右移了14位（放大16384倍，结果再左移14位变回来，可以增加精度），同时还assert(temp >= -32768 && temp <= 32767);来验证temp作为一个十六位整数一定不溢出。最后做的事情，就是把低通滤波器参数传递进去FIRxxx的类。然后FIRxxx类就可以抽象成一个数字低通滤波器。至此，所有的初始化工作完毕，可以进入数据的具体处理流程。

SoundTouch音频处理库源码分析及算法提取(5)

变速类RateTransposer的实现

回到SoundTouch类成员函数void SoundTouch::putSamples(const SAMPLETYPE *samples, uint nSamples)。定义一个SoundTouch类变量之后，通过简单地调用这个类函数，就可以实现音频的相关处理。分析一下他的调用形式，也很简单，第一个参数SAMPLETYPE *samples，指向一个以PCM编码的wave数据缓冲区，第二个参数uint nSamples，就是这个数据缓冲区包含的Sample个数，前面已经讨论过这个Sample的计算方法，这里就不再累述。先看一下他的实现：

// Adds 'numSamples' pcs of samples from the 'samples' memory position into // the input of the object.

void SoundTouch::putSamples(const SAMPLETYPE *samples, uint nSamples) {

if (bSrateSet == FALSE) {

throw std::runtime_error(\ }

else if (channels == 0) {

throw std::runtime_error(\ }

#ifndef PREVENT_CLICK_AT_RATE_CROSSOVER else if (rate <= 1.0f) {

// transpose the rate down, output the transposed sound to tempo changer buffer assert(output == pTDStretch);

pRateTransposer->putSamples(samples, nSamples);

pTDStretch->moveSamples(*pRateTransposer); } else #endif {

// evaluate the tempo changer, then transpose the rate up, assert(output == pRateTransposer);

pTDStretch->putSamples(samples, nSamples); pRateTransposer->moveSamples(*pTDStretch); } }

前面大致上可以看做是判断SoundTouch类初始化过程是否顺利，重点我们看一下 #ifndef PREVENT_CLICK_AT_RATE_CROSSOVER else if (rate <= 1.0f) {

// transpose the rate down, output the transposed sound to tempo changer buffer assert(output == pTDStretch);

pRateTransposer->putSamples(samples, nSamples); pTDStretch->moveSamples(*pRateTransposer); } else #endif

{

// evaluate the tempo changer, then transpose the rate up, assert(output == pRateTransposer);

pTDStretch->putSamples(samples, nSamples); pRateTransposer->moveSamples(*pTDStretch); }

这里有一个宏判断#ifndef PREVENT_CLICK_AT_RATE_CROSSOVER，具体有什么用，我一时半会也不太清楚，不过由于整个库都没有对这个宏进行定义，可以看做作者有想法要使用这个宏，但是还没有完善代码，以至于没有使用。rate通过前面介绍的SoundTouch类成员函数

calcEffectiveRateAndTempo计算出的一个比率，小于等于1就是播放速度减慢。大于1就是速度加快。从注释也可以看出个一二。对于rate <= 1.0f这种情况。先通过pRateTransposer类变量调用了他自己的类成员函数putSamples。看看代码的具体实现。

// Adds 'nSamples' pcs of samples from the 'samples' memory position into // the input of the object.

void RateTransposer::putSamples(const SAMPLETYPE *samples, uint nSamples) {

processSamples(samples, nSamples); }

简单的调用了类成员函数processSamples来处理。继续分析一下类成员函数processSamples的具体实现

// Transposes sample rate by applying anti-alias filter to prevent folding. // Returns amount of samples returned in the \

// The maximum amount of samples that can be returned at a time is set by // the 'set_returnBuffer_size' function.

void RateTransposer::processSamples(const SAMPLETYPE *src, uint nSamples)

{

uint count; uint sizeReq;

if (nSamples == 0) return; assert(pAAFilter);

// If anti-alias filter is turned off, simply transpose without applying // the filter

if (bUseAAFilter == FALSE) {

sizeReq = (uint)((float)nSamples / fRate + 1.0f);

count = transpose(outputBuffer.ptrEnd(sizeReq), src, nSamples); outputBuffer.putSamples(count); return; }

// Transpose with anti-alias filter if (fRate < 1.0f) {

upsample(src, nSamples); }

本文来源：https://www.bwwdw.com/article/chit.html

相关文章：

正在阅读：

SoundTouch项目总结11-29

2011年小农水专项资金项目工作总结提纲06-02

六十甲子纳音五行巧算口诀01-21

小学三年级数学应用题150道07-02

医学遗传学C12-31

2010会计学练习题一12-18

淘点网络Wi-Fi室内定位解决方案 V2.009-06

中国动力传动行业调研及市场前景分析目录06-03

轨道车辆新型转向架Syntegra05-14

揭露中国高铁完全自主知识产权10-07

上一篇：计算机期末测评系统理论题 - 下一篇：2019-2020学年高中数学 §1.3函数的基本性质学案新人教A版必修1.doc