-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPyAMDF.py
364 lines (296 loc) · 12.8 KB
/
PyAMDF.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
#!/usr/bin/python
# -*- coding: utf-8 -*-'
"""
Gibbon, Dafydd. 2018. PyAMDF: a do-it-yourself pitch estimator in Python. Bielefeld University.
The fundamental frequency (aka 'F0', 'pitch') of the voice is generated by air flowing through the vocal folds in the larynx. It is interrupted by pauses and by voiceless consonants and is absent in whispering. Where the fundamental frequency is not interrupted, it is a carrier signal, modulated by changing the air pressure during expiration and by changing the rigidity of the vocal folds. This frequency modulation (FM) is entirely analogous to the FM of radio broadcasts, except that the carrier signal of speech is between 50 and 500 Hz, while the carrier signal of FM radio is around 100 MHz, and that the frequency of the modulation in speech is around 3 frequency changes per second, while the modulation of FM radio is the whole of speech, music or other sounds.
The methods of estimating speech F0 fall into two main types, which can also be combined: time domain methods, in which similarities in amplitude patterns between intervals in the waveform are analysed, and frequency domain methods, in which frequency patterns in intervals in the waveform are analysed.
The PyAMDF method is a time-domain method, and uses the Average Magnitude Difference Function, which checks the similarity or difference between magnitudes (the absolute amplitude) of intervals with the average magnitude patterns magnitude patterns in intervals in the waveform with magnitudes of the following intervals. The difference between the beginning of the first interval and the next most similar interval is the inverse of the fundamental frequency.
The code is mainly in traditional functional style, so that those familiar with other programming languages should not have too much difficulty in following it. The structure is:
- Import of supporting library modules
- Definition of parameters
- Inport of a wavefile and calculation of relevant dependent parameters
- Preprocessing of the signal by centre-clipping
- F0 estimation by AMDF
- Postprocessing of the signal by
- Truncating F0 estimates outside a defined voice range
- Smoothing the F0 pattern
- Display of the waveform, a spectrogram and the F0 estimation
The PyAMDF implementation does not include common features of well-known standard F0 estimators such as Praat, RAPT (aka 'get_f0' or 'esps'), Reaper, SWIPE, YAAPT or YIN (descriptions of these can be found on the internet):
- separate voicing detection
- statistical estimation of parameter costs
- octave and suboctave detection
- continuity detection
Consequently, PyAMDF is not as robust or versatile as the F0 estimators named above, but is neverthess useful with careful selection of parameter values.
The implementation is standalone code, intended for use on the command line interface (CLI). It was developed and tested with Ubuntu Linux 16.10, but is intended to be platform-independent, but requires installation of the standard Python library modules NumPy, MatPlotLib and SciPy.
Typical CLI usage of PyAMDFis:
PyAMDF.py somebodysvoice.wav fairlyhigh single
CLI parameter shortcuts for different voice type parameters are provided:
'high', 'fairlyhigh', 'mid', 'fairlylow', 'low'
Other parameters are 'hard-wired' in the code, and can be changed as desired.
Three outputs are provided:
- a screen figure display of waveform, spectrogram and F0 estimation,
- a PNG file with the figure,
- a CSV file of the CSV estimation for input into a spreadsheet.
The code is open source free software. The only condition on using this code is that it may not be patented or copyrighted. If the code is used in publications it may be cited as:
Gibbon, Dafydd. 2018. PyAMDF: a do-it-yourself pitch estimator in Python. Bielefeld University.
Dafydd Gibbon, Bielefeld, 2018-10-10.
"""
# PyAMDF.py
# D. Gibbon
# 2018-10-10
# Pitch Estimator based on Average Magnitude Difference Function
import sys, re
import numpy as np
from scipy.signal import medfilt, hilbert
import scipy.io.wavfile as wav
import matplotlib.pyplot as plt
#==============================================================================
# Parameters - will definitely need tweaking
try:
filename = sys.argv[1]
voice = sys.argv[2]
figtype = sys.argv[3]
filebase = re.sub('.wav','',filename)
print filename,filebase
except:
print 'Usage: PyADMF.py wavfilename high|fairlyhigh|mid|fairlylow|low single|multiple'; exit()
try:
if figtype == 'single':
singlefigure = True
else:
singlefigure = False
framerate = 0.01 # Low pitched voices may need 0.02 s
centreclip = 10
winoffsetdivisor = 20.0
medianwindow = 3
envelopeflag = False
spectmin = 0
spectmax = 5000
spectwinfactor = 20
if voice == 'high':
f0min = 160
f0max = 450
elif voice == 'fairlyhigh':
f0min = 140
f0max = 400
elif voice == 'mid':
f0min = 120
f0max = 350
elif voice == 'fairlylow':
f0min = 100
f0max = 300
framerate = 0.02
elif voice == 'low':
f0min = 50
f0max = 250
framerate = 0.02
else:
print 'Unknown voice type.'; exit()
except:
print 'Parameter error.'; exit()
#==============================================================================
# WAV file input
try:
fs, signal = wav.read(filename)
signallen = len(signal)
signalduration = int(round(1.0*signallen/fs))
sampframeratio = fs * framerate
framelen = np.int(round(fs * framerate))
framecount = np.int(round(1.0 * signallen / framelen))
diffoffset = np.int(round(framelen/winoffsetdivisor))
newsignallen = framecount * framelen
signal = signal[:newsignallen]
spectwin = int(fs / spectwinfactor)
print "Sampling rate: %s Hz"%fs
print "len(signal): %s"%len(signal)
except:
print "Error reading signal."; exit()
#==============================================================================
#==============================================================================
# Preprocessing (centre-clipping)
try:
signal = (abs(signal) > centreclip).astype(np.int) * signal
except:
print "Error preprocessing."; exit()
#==============================================================================
# F0 estimation
try:
irange = range(0, signallen-2*framelen, framelen) # Note truncation.
# Allocate memory for f0list and AMDF list
f0list = np.zeros(framecount)
meandiffs = np.zeros(framelen).tolist()
# Move frame window through signal
count = 0
for framestart in irange:
framestop = framestart + framelen
frame = signal[framestart:framestop]
# Calculate Average Magnitude Difference Function with moving window
indx = 0
for winstart in range(framestart,framestop):
movingwin = signal[winstart:winstart+framelen]
absdiffs = abs(frame - movingwin)
meandiffs[indx] = np.mean(absdiffs)
indx += 1
# Pick smallest absolute difference in frame
smallest = np.min(meandiffs[diffoffset:])
# Get position of the smallest absolute difference
index = meandiffs.index(smallest)
# Divide the sampling rate by the number of samples in the interval
f0 = fs / index # That is: t = index/fs; f0 = 1/t
# Extend f0 list
f0list[count] = f0
count += 1
except:
print "F0 estimation error."; exit()
#==============================================================================
# Post-processing
try:
# Remove f0 values outside defined limits
f0list = (f0list > f0min).astype(int) * f0list
f0list = (f0list < f0max).astype(int) * f0list
# Smooth F0 contour
f0list = medfilt(f0list,medianwindow)
except:
print "Post-processing failed."; exit()
#==============================================================================
#==============================================================================
# Print to CSV file (data in rows)
try:
csvfilename = 'speech-f0estimate-%s.csv'%filebase
f0csv = filebase+','+','.join(map(str,f0list))
header = 'speech-'+filebase+','+','.join(map(str,np.linspace(0,len(f0list),len(f0list))*framerate))
csvfiletext = header+'\n'+f0csv+'\n'
file = open(csvfilename,'w')
file.write(csvfiletext)
print 'Output in CSV file. Open with "soffice %s", choose comma separator, click on the left of row 1, then Insert, then Chart, then format the chart as a scatter plot.'%csvfilename
except:
print "Text file output error."; exit()
#==============================================================================
#==============================================================================
# Figures
try:
sp1figname = 'speech-waveform-%s.png'%filebase
sp2figname = 'speech-amenvelope-%s.png'%filebase
sp3figname = 'speech-f0track-%s.png'%filebase
sp4figname = 'speech-FMandAMenvelopes-%s.png'%filebase
sp5figname = 'speech-spectrogram-%s.png'%filebase
spfullfigname = 'speech-full-%s.png'%filebase
spfignames = csvfilename,sp1figname,sp4figname,sp5figname,spfullfigname
print "Output file names:\n\t%s\n\t%s\n\t%s\n\t%s\n\t%s"%spfignames
if singlefigure:
fig, (sp1,sp2,sp3,sp4,sp5) = plt.subplots(5, 1, figsize=(12,9))
fig.suptitle('DIY: F0 estimation (aka "pitch extraction"). DG, 2010-10-10',fontsize=16)
except:
""; exit()
#==============================================================================
# Plot signal
if True:
if not singlefigure:
fig, (sp1) = plt.subplots(1, 1, figsize=(16,4))
x = np.linspace(0,len(signal),len(signal))/fs
signal = medfilt(signal,31)
signal = signal / float(np.max(abs(signal)))
abssignal = abs(signal)
sp1.scatter(x,signal,color='g',s=2)
if envelopeflag:
sp1.plot(x,abssignal,color='lightgreen')
peakwin = 20
peakarray = [ max(abssignal[i-peakwin:i]) for i in range(peakwin,len(abssignal)) ]
peakarray = np.append(medfilt(peakarray,201),[0]*peakwin)
peakarray = peakarray / max(peakarray)
sp1.plot(x,peakarray,color='r',linewidth=2)
sp1.set_title('Waveform')
sp1.set_xlabel('Time (s)')
signalduration = float(len(signal))/fs
sp1.set_xlim(0,signalduration)
sp1.set_ylim(-1,1)
if not singlefigure:
plt.tight_layout(pad=1, w_pad=0.1, h_pad=0.1)
plt.savefig(sp1figname)
#==============================================================================
# Plot AM envelope
if True:
if not singlefigure:
fig, (sp2) = plt.subplots(1, 1, figsize=(12,4))
x = np.linspace(0,len(signal),len(signal))/fs
signal = medfilt(signal,31)
signal = signal / float(np.max(abs(signal)))
abssignal = abs(signal)
peakwin = 20
amenvelope = [ max(abssignal[i-peakwin:i]) for i in range(peakwin,len(abssignal)) ]
amenvelope = np.append(medfilt(amenvelope,201),[0]*peakwin)
amenvelope = amenvelope / max(amenvelope)
sp2.plot(x,amenvelope,color='r',linewidth=2)
sp2.set_title('Amplitude Modulation (AM) envelope')
sp2.set_xlabel('Time (s)')
# sp2.set_xlim(0,signalduration)
if not singlefigure:
plt.tight_layout(pad=1, w_pad=0.1, h_pad=0.1)
plt.savefig(sp2figname)
#==============================================================================
# F0 estimation
try:
if not singlefigure:
fig, (sp3) = plt.subplots(1, 1, figsize=(12,4))
sp3.set_title('AMDF Frequency Modulation (FM) envelope (F0 estimation, pitch tracking)')
sp3.set_xlabel('Time (s)')
sp3.set_xlim(0,signalduration)
sp3.set_ylim(f0min,f0max)
y = f0list
leny = len(y)
x = np.linspace(0,leny,leny)*framerate
sp3.scatter(x,y,s=5,color='b')
if not singlefigure:
plt.tight_layout(pad=1, w_pad=0.1, h_pad=0.1)
plt.savefig(sp3figname)
except:
print "Problem with F0 estimation."; exit()
#==============================================================================
# F0 estimation with Average Magnitude Difference Function in moving window
try:
if not singlefigure:
fig, (sp4) = plt.subplots(1, 1, figsize=(12,4))
if envelopeflag:
amstring = ' AM envelope & '
else:
amstring = ''
sp4.set_title(amstring+'AMDF Frequency Modulation (FM) envelope (F0 estimation, pitch tracking)')
sp4.set_xlabel('Time (s)')
sp4.set_xlim(0,signalduration)
sp4.set_ylim(f0min,f0max)
y = f0list
leny = len(y)
x = np.linspace(0,leny,leny)*framerate
if envelopeflag:
thinenvelope = amenvelope[::int(sampframeratio)]
y = 0.5*y/max(y)
z = thinenvelope/max(thinenvelope)
sp4.plot(x,z,linewidth=2,color='r')
sp4.set_ylim(0.0,1.1)
sp4.set_yticklabels([])
sp4.scatter(x,y,s=5,color='b')
if not singlefigure:
plt.tight_layout(pad=1, w_pad=0.1, h_pad=0.1)
plt.savefig(sp4figname)
#==============================================================================
# Plot spectrogram
if not singlefigure:
fig, (sp5) = plt.subplots(1, 1, figsize=(12,8))
sp5.set_title('Spectrogram')
sp5.set_xlabel('Time (s)')
sp5.specgram(signal,NFFT=spectwin, Fs=fs)
sp5.axis(ymin=spectmin, ymax=spectmax)
sp5.grid(which='both',axis='both',linewidth="1",linestyle='--')
sp5.set_xlim(0,signalduration)
if not singlefigure:
plt.tight_layout(pad=1, w_pad=0.1, h_pad=0.1)
plt.savefig(sp5figname)
#==============================================================================
# Format and display figure
if singlefigure:
plt.tight_layout(pad=3, w_pad=0.1, h_pad=1)
plt.savefig(spfullfigname)
plt.show()
except:
print "Graphics or CSV output error."; exit()
#===========================================================================EOF