SoundTouch项目总结

更新时间:2023-11-29 21:12:02 阅读量: 教育文库 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

soundtouch之变调、变速、节拍

前一段日子在做变调不变速的算法,通过频域实现,谁知道到相位同步一直搞不定了,声音效果比较差。后来去偶然看到了soundtouch,这个强大的库让我为之振奋,现在已经完成,并做成了一个实时播放的demo,现把一些使用笔记简单地拿出来共享。

SoundTouch是一个开源的音频处理库,主要实现包含变速、变调、变速同时变调等三个功能模块,能够对媒体流实时操作,也能对音频文件操作。采用32位浮点或者16位定点,支持单声道或者双声道,采样率范围为8k~48k。当然,这里的变速是通过节拍tempo控制的,因此它能提取乐音的节拍。另外,这个库的算法被很多知名软件使用,如audacity,winnap等。

三个功能分别采用的算法:

变速不变调:通过wsola类型的算法实现;

rate:通过插值抽取实现;(重采样的算法我已经写过一个函数了,现有的函数也比较多) 变调不变速:前两个的组合,先变速,然后重采样。

主要使用的函数都在soundtouch.H文件中包含,当然,它也做成了一个dll,还给出了一个使用样例。可以通过样例来看它的使用,还有一个readme文件,其中有详细的介绍。

我个人总结的主要注意事项:

1、 使用前需要初始化SoundTouch对象,初始化的方法是用setSampleRate和setChannels设置音频流的参数。1为单声道,2为双声道。采样率为8000~48000Hz。然后可以通过相应的函数来设置新的pitch,tempo,rate等。

2、 SoundTouch就像一个FIFO管子,先进去的先出来。用putSamples来输入采样值,用receiveSamples来获取处理后的值。需要注意的是,新的pitch,tempo,rate必须在putsample之前就设定好。而不是填充好数据了才设置新的值。

3、 SoundTouch需要成批的数据到来才能处理,所以必须有足够的采样到来,处理才能进行。所以,这里会有一些延迟(latency),输入不一定会马上处理,输出也不一定是刚刚输入的那一个。

4、 各个控制参数可以在处理期间改变,但是没有一个控制信号来实现同步,所以如果在多线程里面进行的话要增加另外的信号控制。

5、 Soundtouch类采用了TDStretch类来变速,采用了RateTranposer类来变采样率。Soundtouch的变调实际上是通过先用TDStretch把信号拉长了,这样变速了但是没有变调,然后再重采样,实现变调不变速。

6、 Soundtouch有定点和浮点两种算法,在win下面编译的时候,要在STTypes.H头文件中选择“#define INTEGER_SAMPLES 1”则用16位,如果是 选择了“#define FLOAT_SAMPLES 1” 就代表用float型的采样值。

7、 如果用了time-stretch功能的话,可能会有比较长的延迟,文献中比较低端的机器配置会有100ms的延迟处理时间,实际上根据我的机器配置(当前比较普通的)计算,延迟远小于这个数值,基本上在put进去之后马上就会计算出来。当然,变调用的是time-stretch和重采样的组合,也必须考虑这个延迟。

8、 有三个参数setting需要设置,DEFAULT_SEQUENCE_MS,帧长(ms),样例中用的是40;DEFAULT_SEEKWINDOW_MS,叠加的时候寻找窗的范围长度(ms),样例中给的是15;SETTING_OVERLAP_MS,叠加范围(ms),样例中用的是8; SETTING_USE_QUICKSEEK,是否使用快速查找方法;SETTING_USE_AA_FILTER,是否使用AA滤波器;

SETTING_AA_FILTER_LENGTH,滤波器阶数,默认值是32。当然这些参数都有相应的默认值,如果你不去修改它也可以,但是为了实时或者音质的要求你可以根据需求调整它,详见它自带的readme文档。

SoundTouch音频处理库源码分析及算法提取(1)

SoundTouch音频处理库的使用异常简单,经过简单的编译之后,设置编译环境,以vc为例 ,直接在include包含SoundTouch目录下的include路径,接着在lib添加SoundTouch目录下 的lib路径,然后在代码的头文件中添加头文件以及引用的库。如下:根据_DEBUG这个宏, 我们可以进行一些编译预处理,假如是以DEBUG编译就采用debug库,其他的话就采用 release库。他们的区别就是文件名后面是否多了一个“D”。 #include #ifdef _DEBUG

#pragma comment(lib, \#else

#pragma comment(lib, \#endif

当然你也可以直接在vc的项目工程中直接添加,某些人比较喜欢如此。

最重要的一点还要声明一个命名空间,至于原因,和SoundTouch这个库的声明定义有关, 以下在分析的时候会提到。 using namespace soundtouch

然后就可以直接在自己的代码中定义一个类变量SoundTouch m_SoundTouch;

SoundTouch 类的声明包含在SoundTouch.h和SoundTouch.cpp之中,由FIFOProcessor类直 接派生,而FIFOProcessor类又直接从基类FIFOSamplePipe派生。同时声明SoundTouch这个 类包含在命名空间 soundtouch,这就是为什么我们使用这个库的时候需要声明命名空间的 主要原因。感觉有点多余。且仅仅定义了一些常量,诸如版本号,版本ID号等等,这两个 父类都包含在FIFOSamplePipe.h和FIFOSamplePipe.cpp文件中。

不管什么库,如果要使用的话,一般的流程都是先定义然后进行一些必要的初始化,

SoundTouch(以下简称ST)也不例外。ST的初始化也和他的编译一样异常的简单,具体可以 参考他的例子SoundStretch来实现,也可以参考源代码中有关SoundTouch这个类的声明, 现在只关心我们会用到的那部分,可以看到在private中定义了另外两个类指针 RateTransposer*,TDStretch*;

RateTransposer从FIFOProcessor派生,而FIFOProcessor又直接从基类FIFOSamplePipe派 生,TDStretch和RateTransposer类似。由此可见,单单从两个类的名字上看:拉长?传输 速率?不难想象出这个库对声音信号的处理可能就是“拉长”,然后“变速”。难道就是传说 中的不变调变速?事实正是如此。这还不是我们现在关心的话题。 …… private:

/// Rate transposer class instance class RateTransposer *pRateTransposer; /// Time-stretch class instance class TDStretch *pTDStretch;

/// Virtual pitch parameter. Effective rate & tempo are calculated from

these parameters. float virtualRate;

/// Virtual pitch parameter. Effective rate & tempo are calculated from these parameters. float virtualTempo;

/// Virtual pitch parameter. Effective rate & tempo are calculated from these parameters. float virtualPitch;

/// Flag: Has sample rate been set? BOOL bSrateSet;

/// Calculates effective rate & tempo valuescfrom 'virtualRate', 'virtualTempo' and

/// 'virtualPitch' parameters. void calcEffectiveRateAndTempo(); protected :

/// Number of channels uint channels;

/// Effective 'rate' value calculated from 'virtualRate', 'virtualTempo' and 'virtualPitch' float rate;

/// Effective 'tempo' value calculated from 'virtualRate', 'virtualTempo' and 'virtualPitch' float tempo;

/// Sets new rate control value. Normal rate = 1.0, smaller values /// represent slower rate, larger faster rates. void setRate(float newRate);

/// Sets new tempo control value. Normal tempo = 1.0, smaller values /// represent slower tempo, larger faster tempo. void setTempo(float newTempo);

/// Sets new rate control value as a difference in percents compared /// to the original rate (-50 .. +100 %) void setRateChange(float newRate);

/// Sets new tempo control value as a difference in percents compared /// to the original tempo (-50 .. +100 %) void setTempoChange(float newTempo);

/// Sets new pitch control value. Original pitch = 1.0, smaller values /// represent lower pitches, larger values higher pitch. void setPitch(float newPitch);

/// Sets pitch change in octaves compared to the original pitch /// (-1.00 .. +1.00)

void setPitchOctaves(float newPitch);

/// Sets pitch change in semi-tones compared to the original pitch /// (-12 .. +12)

void setPitchSemiTones(int newPitch); void setPitchSemiTones(float newPitch);

/// Sets the number of channels, 1 = mono, 2 = stereo void setChannels(uint numChannels); /// Sets sample rate.

void setSampleRate(uint srate);

/// Changes a setting controlling the processing system behaviour. See the /// 'SETTING_...' defines for available setting ID's. /// /return 'TRUE' if the setting was succesfully changed

BOOL setSetting(int settingId, ///< Setting ID number. see SETTING_... defines.

int value///< New setting value. ); ……

参考ST提供的例子SoundStretch,初始化SoundTouch这个类: m_SoundTouch.setSampleRate(sampleRate);//设置声音的采样频率 m_SoundTouch.setChannels(channels);//设置声音的声道

m_SoundTouch.setTempoChange(tempoDelta); //这个就是传说中的变速不变调 m_SoundTouch.setPitchSemiTones(pitchDelta);//设置声音的pitch m_SoundTouch.setRateChange(rateDelta);//设置声音的速率

// quick是一个bool变量,USE_QUICKSEEK具体有什么用我暂时也不太清楚。 m_SoundTouch.setSetting(SETTING_USE_QUICKSEEK, quick);

// noAntiAlias是一个bool变量,USE_AA_FILTER具体有什么用我暂时也不太清楚。 m_SoundTouch.setSetting(SETTING_USE_AA_FILTER, !(noAntiAlias));

// speech也是一个bool变量,初步估计可能是没有音乐只有人声的时候,需要设置一下。 if (speech) {

// use settings for speech processing

m_SoundTouch.setSetting(SETTING_SEQUENCE_MS, 40); m_SoundTouch.setSetting(SETTING_SEEKWINDOW_MS, 15);

m_SoundTouch.setSetting(SETTING_OVERLAP_MS, 8);

fprintf(stderr, \}

通过那么简单的几个函数调用,现在我们就可以感受一下ST的强大。通过SoundTouch类提 供的函数调用方法:

putSamples(sampleBuffer,nSamples);

第一个参数为一个指向PCM编码的一段音频数据的指针,第二个参数就是要处理多少个 sample也可以理解为多少帧。

需要注意的是,一般数据流都是字节流,也就是说,sample的大小和声道、位的声音参数 有关,假如sampleBuffer指针指向一个 长度为64bytes的一个PCM数据缓冲区,16位2声道 ,那么实际上这里只存放了(16*2)/8=4bytes,64/4=16;16个sample,这是我们需要注意的 地方。m_SoundTouch.putSamples(sampleBuffer, nSamples);数据是传进去了,可是从哪 里接收处理过的音频数据呢?这个时候我们就要用SoundTouch提供的receiveSamples函数 调用方法。

uint receiveSamples(SAMPLETYPE *outBuffer, ///< Buffer where to copy output samples.

uint maxSamples ///< How many samples to receive at max.

);他也是两个参数,第一个为接收数据的参数,第二个最大可以接收多少sample。

通过这段注释,大概明白receiveSamples这个函数不会在putSamples之后马上返回数据, 另外一方面有可能返回比maxSamples更多的数据,因此需要放在一个do…while(…)的循环里 面把他们都榨干。

// Read ready samples from SoundTouch processor & write them output file. // NOTES:

// - 'receiveSamples' doesn't necessarily return any samples at all // during some rounds!

// - On the other hand, during some round 'receiveSamples' may have more // ready samples than would fit into 'sampleBuffer', and for this reason // the 'receiveSamples' call is iterated for as many times as it // outputs samples. do {

nSamples = m_SoundTouch.receiveSamples(sampleBuffer, buffSizeSamples); //把sampleBuffer写入一个文件,或者填充进声卡的缓冲区,播放声音。 } while (nSamples != 0);

SoundTouch音频处理库源码分析及算法提取(2)

SoundTouch音频处理库初始化流程剖析 定义一个变量SoundTouch m_SoundTouch;

SoundTouch的派生关系

FIFOSamplePipe->FIFOProcessor->SoundTouch (流程[1])

因此首先构造基类FIFOSamplePipe,接着派生出FIFOProcessor,然后才以FIFOProcessor派生出SoundTouch。这里不得不提一下老外的C++水平真的很高,在这里基本上把类的继承发挥到了极致。能够分析这样的代码简直就是一种享受。先看一下基类FIFOSamplePipe,如下定义: class FIFOSamplePipe { public:

// virtual default destructor virtual ~FIFOSamplePipe() {}

/// Returns a pointer to the beginning of the output samples.

/// This function is provided for accessing the output samples directly. /// Please be careful for not to corrupt the book-keeping! ///

/// When using this function to output samples, also remember to 'remove' the

/// output samples from the buffer by calling the /// 'receiveSamples(numSamples)' function virtual SAMPLETYPE *ptrBegin() = 0;

/// Adds 'numSamples' pcs of samples from the 'samples' memory position to /// the sample buffer.

virtual void putSamples(const SAMPLETYPE *samples, ///< Pointer to samples. uint numSamples ///< Number of samples to insert. ) = 0;

// Moves samples from the 'other' pipe instance to this instance.

void moveSamples(FIFOSamplePipe &other ///< Other pipe instance where from the receive the data. ) {

int oNumSamples = other.numSamples();

putSamples(other.ptrBegin(), oNumSamples); other.receiveSamples(oNumSamples); };

/// Output samples from beginning of the sample buffer. Copies requested samples to /// output buffer and removes them from the sample buffer. If there are less than /// 'numsample' samples in the buffer, returns all that available. ///

/// /return Number of samples returned.

virtual uint receiveSamples(SAMPLETYPE *output, ///< Buffer where to copy output samples. uint maxSamples ///< How many samples to receive at max. ) = 0;

/// Adjusts book-keeping so that given number of samples are removed from beginning of the /// sample buffer without copying them anywhere. ///

/// Used to reduce the number of samples in the buffer when accessing the sample buffer directly /// with 'ptrBegin' function.

virtual uint receiveSamples(uint maxSamples ///< Remove this many samples from the beginning of pipe.

) = 0;

/// Returns number of samples currently available. virtual uint numSamples() const = 0;

// Returns nonzero if there aren't any samples available for outputting. virtual int isEmpty() const = 0;

/// Clears all the samples. virtual void clear() = 0; }

这里没有实现FIFOSamplePipe类的构造函数,因此系统隐性的调用了默认的自动生成的

FIFOSamplePipe()。当然他应该没有做任何的初始化,同样也不需要做任何的初始化。通过定义virtual ~FIFOSamplePipe() {}虚析构函数,使得new一个子类,例如:FIFOSamplePipe* a = new FIFOProcessor,当a销毁的时候都会执行子类FIFOProcessor的析构函数,保证不管多少层继承都会一次过全部销毁,这是作为一个基类的特点。类的继承和多态果然是C++最为强悍的一部分,有助于编写重复性很高的类。通过看这个基类的声明,我们可以留意到除了定义大多数虚函数之外,他唯独实现了moveSamples这个函数,也就是子类如果没有override moveSamples,都将调用这个方法。他做的处理也相对来说很简单,根据注释,我们不难理解,正是这个函数实现了各个派生类之间的数据共享传递的接口。

// Moves samples from the 'other' pipe instance to this instance.

moveSamples(FIFOSamplePipe &other ///< Other pipe instance where from the receive the data. ) {

int oNumSamples = other.numSamples();

putSamples(other.ptrBegin(), oNumSamples); other.receiveSamples(oNumSamples); };

bufferUnaligned = NULL; samplesInBuffer = 0; bufferPos = 0;

channels = (uint)numChannels;

ensureCapacity(32); // allocate initial capacity }

FIFOSampleBuffer的构造函数将被调用三次。 现在终于可以执行RateTransposer的构造函数 // Constructor

RateTransposer::RateTransposer() : FIFOProcessor(&outputBuffer) {

numChannels = 2; bUseAAFilter = TRUE; fRate = 0;

// Instantiates the anti-alias filter with default tap length // of 32

pAAFilter = new AAFilter(32); }

首先看一下AAFilter的相关定义 class AAFilter {

protected:

class FIRFilter *pFIR;

/// Low-pass filter cut-off frequency, negative = invalid double cutoffFreq; /// num of filter taps uint length;

/// Calculate the FIR coefficients realizing the given cutoff-frequency void calculateCoeffs(); public:

AAFilter(uint length); ~AAFilter();

/// Sets new anti-alias filter cut-off edge frequency, scaled to sampling /// frequency (nyquist frequency = 0.5). The filter will cut off the /// frequencies than that.

void setCutoffFreq(double newCutoffFreq);

/// Sets number of FIR filter taps, i.e. ~filter complexity void setLength(uint newLength); uint getLength() const;

/// Applies the filter to the given sequence of samples.

/// Note : The amount of outputted samples is by value of 'filter length' /// smaller than the amount of input samples. uint evaluate(SAMPLETYPE *dest,

const SAMPLETYPE *src, uint numSamples, uint numChannels) const; };

在其构造函数中初始化了一个指向class FIRFilter的指针 AAFilter::AAFilter(uint len) {

pFIR = FIRFilter::newInstance(); cutoffFreq = 0.5; setLength(len); }

首先我们看看FIRFilter类成员函数newInstance(),嘿嘿,在这里我们发现了一个非常有用的函数detectCPUextensions();通过这个函数我们可以判断cpu到底支持什么类型的多媒体指令集。根据注释我们也可以很快理解。detectCPUextensions收藏了。他的实现就在Cpu_detect_x86_win.cpp的实现中。美中不足的是,他只能检测x86结构体系的CPU。可能我多想了。根据本人电脑的配置(采用的赛扬cpu),所以只支持mmx指令。 FIRFilter * FIRFilter::newInstance() {

uint uExtensions;

uExtensions = detectCPUextensions();

// Check if MMX/SSE/3DNow! instruction set extensions supported by CPU #ifdef ALLOW_MMX

// MMX routines available only with integer sample types if (uExtensions & SUPPORT_MMX)

{

return ::new FIRFilterMMX; } else

#endif // ALLOW_MMX #ifdef ALLOW_SSE

if (uExtensions & SUPPORT_SSE) {

// SSE support

return ::new FIRFilterSSE; } else

#endif // ALLOW_SSE #ifdef ALLOW_3DNOW

if (uExtensions & SUPPORT_3DNOW) {

// 3DNow! support

return ::new FIRFilter3DNow; } else

#endif // ALLOW_3DNOW {

// ISA optimizations not supported, use plain C version return ::new FIRFilter; } }

为此他将通过这个判断构造返回一个FIRFilterMMX类 if (uExtensions & SUPPORT_MMX) {

return ::new FIRFilterMMX; }

查看FIRFilterMMX的类定义class FIRFilterMMX : public FIRFilter,他从FIRFilter派生。成员函数uint FIRFilterMMX::evaluateFilterStereo引起了我的高度注意,主要的算法采用MMX指令集来完成某些声音计算。这个就是我们需要的Rate的核心算法。不同指令集的实现,可以参考FIRFilter3DNow,FIRFilterSSE,默认是FIRFilter的evaluateFilterStereo函数的实现。 // mmx-optimized version of the filter routine for stereo sound

uint FIRFilterMMX::evaluateFilterStereo(short *dest, const short *src, uint numSamples) const {

// Create stack copies of the needed member variables for asm routines : uint i, j;

__m64 *pVdest = (__m64*)dest;

if (length < 2) return 0;

for (i = 0; i < (numSamples - length) / 2; i ++)

{

__m64 accu1; __m64 accu2;

const __m64 *pVsrc = (const __m64*)src;

const __m64 *pVfilter = (const __m64*)filterCoeffsAlign;

accu1 = accu2 = _mm_setzero_si64(); for (j = 0; j < lengthDiv8 * 2; j ++) {

__m64 temp1, temp2;

temp1 = _mm_unpacklo_pi16(pVsrc[0], pVsrc[1]); // = l2 l0 r2 r0 temp2 = _mm_unpackhi_pi16(pVsrc[0], pVsrc[1]); // = l3 l1 r3 r1

accu1 = _mm_add_pi32(accu1, _mm_madd_pi16(temp1, pVfilter[0])); // += l2*f2+l0*f0

r2*f2+r0*f0

accu1 = _mm_add_pi32(accu1, _mm_madd_pi16(temp2, pVfilter[1])); // += l3*f3+l1*f1

r3*f3+r1*f1

temp1 = _mm_unpacklo_pi16(pVsrc[1], pVsrc[2]); // = l4 l2 r4 r2

accu2 = _mm_add_pi32(accu2, _mm_madd_pi16(temp2, pVfilter[0])); // += l3*f2+l1*f0

r3*f2+r1*f0

accu2 = _mm_add_pi32(accu2, _mm_madd_pi16(temp1, pVfilter[1])); // += l4*f3+l2*f1

r4*f3+r2*f1

// accu1 += l2*f2+l0*f0 r2*f2+r0*f0 // += l3*f3+l1*f1 r3*f3+r1*f1

// accu2 += l3*f2+l1*f0 r3*f2+r1*f0 // l4*f3+l2*f1 r4*f3+r2*f1

pVfilter += 2; pVsrc += 2; }

// accu >>= resultDivFactor

accu1 = _mm_srai_pi32(accu1, resultDivFactor); accu2 = _mm_srai_pi32(accu2, resultDivFactor);

// pack 2*2*32bits => 4*16 bits

pVdest[0] = _mm_packs_pi32(accu1, accu2); src += 4; pVdest ++; }

_m_empty(); // clear emms state

return (numSamples & 0xfffffffe) - length; }

因此,如果把SoundTouch移植到arm等没有多媒体指令集的CPU时,应使用FIRFilter的

evaluateFilterStere函数。执行完这里,终于可以真正意义上构造我们的RateTransposerInteger()。在构造函数中:

RateTransposerInteger::RateTransposerInteger() : RateTransposer() {

// Notice: use local function calling syntax for sake of clarity, // to indicate the fact that C++ constructor can't call virtual functions. RateTransposerInteger::resetRegisters(); RateTransposerInteger::setRate(1.0f);

}进行了一些必要的初始化。至此pRateTransposer = RateTransposer::newInstance();实例化完毕。至于pTDStretch = TDStretch::newInstance();下回分晓。

SoundTouch音频处理库源码分析及算法提取(3)

SoundTouch音频处理库初始化流程剖析 2

紧接上文《SoundTouch音频处理库初始化流程剖析》

TDStretch类和基类的关系:FIFOSamplePipe -> FIFOProcessor ->TDStretch

SoundTouch类成员class TDStretch *pTDStretch变量的初始化在SoundTouch的构造函数 SoundTouch::SoundTouch()中进行。 pTDStretch = TDStretch::newInstance();

他通过调用TDStretch类成员函数newInstance()构造,代码如下: TDStretch * TDStretch::newInstance() {

uint uExtensions;

uExtensions = detectCPUextensions();

// Check if MMX/SSE/3DNow! instruction set extensions supported by CPU #ifdef ALLOW_MMX

// MMX routines available only with integer sample types if (uExtensions & SUPPORT_MMX) {

return ::new TDStretchMMX; } else

#endif // ALLOW_MMX

#ifdef ALLOW_SSE

if (uExtensions & SUPPORT_SSE) {

// SSE support

return ::new TDStretchSSE; } else

#endif // ALLOW_SSE

#ifdef ALLOW_3DNOW

if (uExtensions & SUPPORT_3DNOW) {

// 3DNow! support

return ::new TDStretch3DNow; }

else

#endif // ALLOW_3DNOW {

// ISA optimizations not supported, use plain C version return ::new TDStretch; } }

和pRateTransposer如出一辙,也是通过对cpu的增强指令集的检测,构造支持相应多媒体指令集处理的子类。针对不同的指令集,他派生了TDStretchMMX,TDStretch3DNow,TDStretchSSE针对三种不同指令集的类,主要通过override TDStretch类成员函数calcCrossCorrStereo来实现,假如都不支持,将采用TDStretch自己的类成员函数calcCrossCorrStereo进行处理,浮点处理采用double TDStretch::calcCrossCorrStereo(const float*mixingPos, const float *compare) const,定点处理采用double TDStretch::calcCrossCorrStereo(const float *mixingPos, const float *compare) const,通过宏定义#define INTEGER_SAMPLES或者#define FLOAT_SAMPLES来进行预编译处理。由于TDStretchMMX,TDStretch3DNow,TDStretchSSE只是简单的override了类成员函数

calcCrossCorrStereo,并没有初始化什么,自然就没有写构造函数,因此都将采用编译器默认生成的构造函数进行构造,针对我的赛扬CPU: if (uExtensions & SUPPORT_MMX) {

return ::new TDStretchMMX; }

他将构造TDStretchMMX并返回一个指向这个类的指针。实际上还是按照以下这个流程构造了TDStretchMMX:

FIFOSamplePipe->FIFOProcessor->TDStretch->TDStretchMMX

根据以上分析,我们需要的,把一个声音信号拉长压短的算法就在TDStretch类的成员函数TDStretch::calcCrossCorrStereo中实现,针对不同cpu的三种优化代码分别在源文件

3dnow_win.cpp,Mmx_optimized.cpp,See_optimized.cpp中。至于calcCrossCorrStereo(const float *mixingPos, const float *compare)函数的调用参数具体什么含义,卖个关子,以后再具体分析。 再回到我们的SoundTouch类的构造函数SoundTouch::SoundTouch(); SoundTouch::SoundTouch() {

// Initialize rate transposer and tempo changer instances pRateTransposer = RateTransposer::newInstance(); pTDStretch = TDStretch::newInstance(); setOutPipe(pTDStretch); rate = tempo = 0; virtualPitch = virtualRate =

virtualTempo = 1.0;

calcEffectiveRateAndTempo(); channels = 0; bSrateSet = FALSE; }

如今初始化了一个处理rate的实例pRateTransposer,还有一个对音频进行拉长压短的实例pTDStretch,剩下的事情,就是初始化一些变量。至此SoundTouch m_SoundTouch;变量实例化完成。

SoundTouch音频处理库源码分析及算法提取(4)

SoundTouch构造流程初始化的一点补充。

在SoundTouch类构造函数中,我们留意到有这么一个函数calcEffectiveRateAndTempo() SoundTouch::SoundTouch() {

// Initialize rate transposer and tempo changer instances pRateTransposer = RateTransposer::newInstance(); pTDStretch = TDStretch::newInstance(); setOutPipe(pTDStretch); rate = tempo = 0; virtualPitch = virtualRate = virtualTempo = 1.0;

calcEffectiveRateAndTempo(); channels = 0; bSrateSet = FALSE; }

在SoundTouch类的6个成员函数void setRate(float newRate),void setRateChange(float newRate),void setTempo(float newTempo),void setTempoChange(float newTempo),void

setPitch(float newPitch),void setPitchOctaves(float newPitch)分别调用。不难想象,应该是对音频处理参数的一些处理,通过对calcEffectiveRateAndTempo的进一步分析,他的实现如下。 // Calculates 'effective' rate and tempo values from the // nominal control values.

void SoundTouch::calcEffectiveRateAndTempo() {

float oldTempo = tempo; float oldRate = rate;

tempo = virtualTempo / virtualPitch;

rate = virtualPitch * virtualRate;

if (!TEST_FLOAT_EQUAL(rate,oldRate)) pRateTransposer->setRate(rate); if (!TEST_FLOAT_EQUAL(tempo, oldTempo)) pTDStretch->setTempo(tempo); #ifndef PREVENT_CLICK_AT_RATE_CROSSOVER if (rate <= 1.0f) {

if (output != pTDStretch) {

FIFOSamplePipe *tempoOut; assert(output == pRateTransposer);

// move samples in the current output buffer to the output of pTDStretch tempoOut = pTDStretch->getOutput(); tempoOut->moveSamples(*output);

// move samples in pitch transposer's store buffer to tempo changer's input pTDStretch->moveSamples(*pRateTransposer->getStore()); output = pTDStretch; } } else #endif {

if (output != pRateTransposer) {

FIFOSamplePipe *transOut; assert(output == pTDStretch);

// move samples in the current output buffer to the output of pRateTransposer transOut = pRateTransposer->getOutput(); transOut->moveSamples(*output);

// move samples in tempo changer's input to pitch transposer's input pRateTransposer->moveSamples(*pTDStretch->getInput()); output = pRateTransposer; } } }

主要还是完成了pRateTransposer,pTDStretch两个类的一些参数设置。从而对于整个声音的处理流程大概也有了一个初步的认识。

1、创建一个数字低通滤波器AAFilter,通过加入hamming window来截取sample。

我们分析一下他是如何创建这个低通数字滤波器,主要实现还是在RateTransposer类的构造函数中,构造一个AAFilter类来实现。pAAFilter = new AAFilter(32);

RateTransposer::RateTransposer() : FIFOProcessor(&outputBuffer) {

numChannels = 2; bUseAAFilter = TRUE; fRate = 0;

// Instantiates the anti-alias filter with default tap length // of 32

pAAFilter = new AAFilter(32); }

我们看一下AAFilter类定义,比较简单,也很好理解。class FIRFilter *pFIR就和前面分析的一样,指向根据CPU派生出支持相应增强指令集优化的类。同样他们只是简单的override数据处理的函数。double cutoffFreq;就是低通截止频率。calculateCoeffs()就是我们应该重点理解的类函数,数字滤波器的主要参数就靠它来实现 class AAFilter {

protected:

class FIRFilter *pFIR;

/// Low-pass filter cut-off frequency, negative = invalid double cutoffFreq; /// num of filter taps uint length;

/// Calculate the FIR coefficients realizing the given cutoff-frequency void calculateCoeffs(); public:

AAFilter(uint length); ~AAFilter();

/// Sets new anti-alias filter cut-off edge frequency, scaled to sampling /// frequency (nyquist frequency = 0.5). The filter will cut off the /// frequencies than that.

void setCutoffFreq(double newCutoffFreq);

/// Sets number of FIR filter taps, i.e. ~filter complexity void setLength(uint newLength); uint getLength() const;

/// Applies the filter to the given sequence of samples.

/// Note : The amount of outputted samples is by value of 'filter length' /// smaller than the amount of input samples. uint evaluate(SAMPLETYPE *dest, const SAMPLETYPE *src, uint numSamples,

uint numChannels) const; };

先看一下AAFilter的构造函数,先创建一个FIR滤波器的实例,接着让截取频率等于0.5,需要注意的是,这个是一个角频率。然后设置滤波器的窗体宽度。 AAFilter::AAFilter(uint len) {

pFIR = FIRFilter::newInstance(); cutoffFreq = 0.5; setLength(len); }

在设置宽度的类成员函数SetLength中,调用了类成员函数calculateCoeffs(); // Sets number of FIR filter taps

void AAFilter::setLength(uint newLength) {

length = newLength; calculateCoeffs(); }

现在重点介绍一下类成员函数calculateCoeffs(),他就是整个数字滤波器参数实现的核心。源代码如下:

// Calculates coefficients for a low-pass FIR filter using Hamming window void AAFilter::calculateCoeffs() { uint i;

double cntTemp, temp, tempCoeff,h, w; double fc2, wc;

double scaleCoeff, sum; double *work;

SAMPLETYPE *coeffs; assert(length >= 2); assert(length % 4 == 0); assert(cutoffFreq >= 0); assert(cutoffFreq <= 0.5); work = new double[length];

coeffs = new SAMPLETYPE[length]; fc2 = 2.0 * cutoffFreq; wc = PI * fc2;

tempCoeff = TWOPI / (double)length;

sum = 0;

for (i = 0; i < length; i ++) {

cntTemp = (double)i - (double)(length / 2); temp = cntTemp * wc; if (temp != 0) {

h = fc2 * sin(temp) / temp; // sinc function } else {

h = 1.0; }

w = 0.54 + 0.46 * cos(tempCoeff * cntTemp); // hamming window temp = w * h; work[i] = temp;

// calc net sum of coefficients sum += temp; } ...

类函数的前半部分通过assert进行一些必要的判断,例如长度一定要大于2且一定要是4的倍数,才能保证length/2是一个整数,同时保证截取频率在0和0.5之间。接着采用汉明窗作为窗。注意到0.54 + 0.46 * cos(2 * pi * cntTemp / N)和汉明窗函数0.54 - 0.46*cos(2*pi*n/(N-1))形式上有点不一致,其实也不难理解:

i = (0 .. length-1) 且 cntTemp = i - (length/ 2); 0.54 + 0.46 * cos(2 * pi * cntTemp / N) = 0.54 - 0.46 * cos(2 * pi * cntTemp / N + pi) = 0.54 - 0.46 * cos(2 * pi * cntTemp / N + pi * N / N) = 0.54 - 0.46 * cos(2 * pi * (cntTemp + N / 2) / N) = 0.54 - 0.46 * cos(2 * pi * n / N) where n = 0..N-1

仅仅是一个cos(x) = -cos(x+pi)的变化,很简单却又让人很容易惯性思维,不容易想明白。至于为什么用N不用N-1,我相信以下这段话,可以很清楚明白的表达,在这里,要谢谢一个哈理工老师的指教。“这个N-1如果用N,对称中心N/2不是整数,就不是一个采样点(因为是偶对称,并且N要取奇数----低通滤波器理论上只能这么选取参数”注意到我们在长度中i是从0开始的,到length结束,而length前面通过assert判断一定要大于2且一定是四

的倍数,他不是一个奇数,因此(length-1)/2一定不是一个整数。所以这里可以理解为我们的滤波器是有length+1的长度。 ......

// ensure the sum of coefficients is larger than zero assert(sum > 0);

// ensure we've really designed a lowpass filter... assert(work[length/2] > 0); assert(work[length/2 + 1] > -1e-6); assert(work[length/2 - 1] > -1e-6);

// Calculate a scaling coefficient in such a way that the result can be // divided by 16384

scaleCoeff = 16384.0f / sum; for (i = 0; i < length; i ++) {

// scale & round to nearest integer temp = work[i] * scaleCoeff; temp += (temp >= 0) ? 0.5 : -0.5; // ensure no overfloods

assert(temp >= -32768 && temp <= 32767); coeffs[i] = (SAMPLETYPE)temp; }

// Set coefficients. Use divide factor 14 => divide result by 2^14 = 16384 pFIR->setCoefficients(coeffs, length, 14); delete[] work; delete[] coeffs; }

类函数的后半部分,assert用来验证这个低通滤波器是否真的有效,剩下的主要是做一个定点的处理2^14=16384,相当于右移了14位(放大16384倍,结果再左移14位变回来,可以增加精度),同时还assert(temp >= -32768 && temp <= 32767);来验证temp作为一个十六位整数一定不溢出。最后做的事情,就是把低通滤波器参数传递进去FIRxxx的类。然后FIRxxx类就可以抽象成一个数字低通滤波器。至此,所有的初始化工作完毕,可以进入数据的具体处理流程。

SoundTouch音频处理库源码分析及算法提取(5)

变速类RateTransposer的实现

回到SoundTouch类成员函数void SoundTouch::putSamples(const SAMPLETYPE *samples, uint nSamples)。定义一个SoundTouch类变量之后,通过简单地调用这个类函数,就可以实现音频的相关处理。分析一下他的调用形式,也很简单,第一个参数SAMPLETYPE *samples,指向一个以PCM编码的wave数据缓冲区,第二个参数uint nSamples,就是这个数据缓冲区包含的Sample个数,前面已经讨论过这个Sample的计算方法,这里就不再累述。 先看一下他的实现:

// Adds 'numSamples' pcs of samples from the 'samples' memory position into // the input of the object.

void SoundTouch::putSamples(const SAMPLETYPE *samples, uint nSamples) {

if (bSrateSet == FALSE) {

throw std::runtime_error(\ }

else if (channels == 0) {

throw std::runtime_error(\ }

#ifndef PREVENT_CLICK_AT_RATE_CROSSOVER else if (rate <= 1.0f) {

// transpose the rate down, output the transposed sound to tempo changer buffer assert(output == pTDStretch);

pRateTransposer->putSamples(samples, nSamples);

pTDStretch->moveSamples(*pRateTransposer); } else #endif {

// evaluate the tempo changer, then transpose the rate up, assert(output == pRateTransposer);

pTDStretch->putSamples(samples, nSamples); pRateTransposer->moveSamples(*pTDStretch); } }

前面大致上可以看做是判断SoundTouch类初始化过程是否顺利,重点我们看一下 #ifndef PREVENT_CLICK_AT_RATE_CROSSOVER else if (rate <= 1.0f) {

// transpose the rate down, output the transposed sound to tempo changer buffer assert(output == pTDStretch);

pRateTransposer->putSamples(samples, nSamples); pTDStretch->moveSamples(*pRateTransposer); } else #endif

{

// evaluate the tempo changer, then transpose the rate up, assert(output == pRateTransposer);

pTDStretch->putSamples(samples, nSamples); pRateTransposer->moveSamples(*pTDStretch); }

这里有一个宏判断#ifndef PREVENT_CLICK_AT_RATE_CROSSOVER,具体有什么用,我一时半会也不太清楚,不过由于整个库都没有对这个宏进行定义,可以看做作者有想法要使用这个宏,但是还没有完善代码,以至于没有使用。rate通过前面介绍的SoundTouch类成员函数

calcEffectiveRateAndTempo计算出的一个比率,小于等于1就是播放速度减慢。大于1就是速度加快。从注释也可以看出个一二。对于rate <= 1.0f这种情况。先通过pRateTransposer类变量调用了他自己的类成员函数putSamples。看看代码的具体实现。

// Adds 'nSamples' pcs of samples from the 'samples' memory position into // the input of the object.

void RateTransposer::putSamples(const SAMPLETYPE *samples, uint nSamples) {

processSamples(samples, nSamples); }

简单的调用了类成员函数processSamples来处理。继续分析一下类成员函数processSamples的具体实现

// Transposes sample rate by applying anti-alias filter to prevent folding. // Returns amount of samples returned in the \

// The maximum amount of samples that can be returned at a time is set by // the 'set_returnBuffer_size' function.

void RateTransposer::processSamples(const SAMPLETYPE *src, uint nSamples)

{

uint count; uint sizeReq;

if (nSamples == 0) return; assert(pAAFilter);

// If anti-alias filter is turned off, simply transpose without applying // the filter

if (bUseAAFilter == FALSE) {

sizeReq = (uint)((float)nSamples / fRate + 1.0f);

count = transpose(outputBuffer.ptrEnd(sizeReq), src, nSamples); outputBuffer.putSamples(count); return; }

// Transpose with anti-alias filter if (fRate < 1.0f) {

upsample(src, nSamples); }

本文来源:https://www.bwwdw.com/article/chit.html

Top