PyAMDF.py

#!/usr/bin/python
# -*- coding: utf-8 -*-'

"""
Gibbon, Dafydd. 2018. PyAMDF: a do-it-yourself pitch estimator in Python. Bielefeld University.

The fundamental frequency (aka 'F0', 'pitch') of the voice is generated by air flowing through the vocal folds in the larynx. It is interrupted by pauses and by voiceless consonants and is absent in whispering. Where the fundamental frequency is not interrupted, it is a carrier signal, modulated by changing the air pressure during expiration and by changing the rigidity of the vocal folds. This frequency modulation (FM) is entirely analogous to the FM of radio broadcasts, except that the carrier signal of speech is between 50 and 500 Hz, while the carrier signal of FM radio is around 100 MHz, and that the frequency of the modulation in speech is around 3 frequency changes per second, while the modulation of FM radio is the whole of speech, music or other sounds.

The methods of estimating speech F0 fall into two main types, which can also be combined: time domain methods, in which similarities in amplitude patterns between intervals in the waveform are analysed, and frequency domain methods, in which frequency patterns in intervals in the waveform are analysed.

The PyAMDF method is a time-domain method, and uses the Average Magnitude Difference Function, which checks the similarity or difference between magnitudes (the absolute amplitude) of intervals with the average magnitude patterns  magnitude patterns in intervals in the waveform with magnitudes of the following intervals. The difference between the beginning of the first interval and the next most similar interval is the inverse of the fundamental frequency.

The code is mainly in traditional functional style, so that those familiar with other programming languages should not have too much difficulty in following it. The structure is:
	- Import of supporting library modules
	- Definition of parameters
	- Inport of a wavefile and calculation of relevant dependent parameters
	- Preprocessing of the signal by centre-clipping
	- F0 estimation by AMDF
	- Postprocessing of the signal by
		- Truncating F0 estimates outside a defined voice range
		- Smoothing the F0 pattern
	- Display of the waveform, a spectrogram and the F0 estimation

The PyAMDF implementation does not include common features of well-known standard F0 estimators such as Praat, RAPT (aka 'get_f0' or 'esps'), Reaper, SWIPE, YAAPT or YIN (descriptions of these can be found on the internet):
	- separate voicing detection
	- statistical estimation of parameter costs
	- octave and suboctave detection
	- continuity detection
Consequently, PyAMDF is not as robust or versatile as the F0 estimators named above, but is neverthess useful with careful selection of parameter values.

The implementation is standalone code, intended for use on the command line interface (CLI). It was developed and tested with Ubuntu Linux 16.10, but is intended to be platform-independent, but requires installation of the standard Python library modules NumPy, MatPlotLib and SciPy.

Typical CLI usage of PyAMDFis:
	PyAMDF.py somebodysvoice.wav fairlyhigh single
CLI parameter shortcuts for different voice type parameters are provided:
	'high', 'fairlyhigh', 'mid', 'fairlylow', 'low'
Other parameters are 'hard-wired' in the code, and can be changed as desired.

Three outputs are provided:
	- a screen figure display of waveform, spectrogram and F0 estimation,
	- a PNG file with the figure,
	- a CSV file of the CSV estimation for input into a spreadsheet.

The code is open source free software. The only condition on using this code is that it may not be patented or copyrighted. If the code is used in publications it may be cited as:

Gibbon, Dafydd. 2018. PyAMDF: a do-it-yourself pitch estimator in Python. Bielefeld University.

Dafydd Gibbon, Bielefeld, 2018-10-10.

"""

# PyAMDF.py
# D. Gibbon
# 2018-10-10
# Pitch Estimator based on Average Magnitude Difference Function

import sys, re
import numpy as np
from scipy.signal import medfilt, hilbert
import scipy.io.wavfile as wav
import matplotlib.pyplot as plt

#==============================================================================
# Parameters - will definitely need tweaking

try:
	filename = sys.argv[1]
	voice = sys.argv[2]
	figtype = sys.argv[3]
	filebase = re.sub('.wav','',filename)
	print filename,filebase
except:
	print 'Usage: PyADMF.py wavfilename high|fairlyhigh|mid|fairlylow|low single|multiple'; exit()

try:
	if figtype == 'single':
		singlefigure = True
	else:
		singlefigure = False
		
	framerate = 0.01	# Low pitched voices may need 0.02 s
	
	centreclip = 10
	winoffsetdivisor = 20.0
	medianwindow = 3

	envelopeflag = False
	
	spectmin = 0
	spectmax = 5000
	spectwinfactor = 20

	if voice == 'high':
		f0min = 160
		f0max = 450

	elif voice == 'fairlyhigh':
		f0min = 140
		f0max = 400

	elif voice == 'mid':
		f0min = 120
		f0max = 350

	elif voice == 'fairlylow':
		f0min = 100
		f0max = 300
		framerate = 0.02

	elif voice == 'low':
		f0min = 50
		f0max = 250
		framerate = 0.02

	else:
		print 'Unknown voice type.'; exit()

except:
	print 'Parameter error.'; exit()

#==============================================================================
# WAV file input

try:
	fs, signal = wav.read(filename)

	signallen = len(signal)
	signalduration = int(round(1.0*signallen/fs))

	sampframeratio = fs * framerate

	framelen = np.int(round(fs * framerate))
	framecount = np.int(round(1.0 * signallen / framelen))
	diffoffset = np.int(round(framelen/winoffsetdivisor))

	newsignallen = framecount * framelen
	signal = signal[:newsignallen]

	spectwin = int(fs / spectwinfactor)

	print "Sampling rate: %s Hz"%fs
	print "len(signal): %s"%len(signal)

except:
	print "Error reading signal."; exit()

#==============================================================================
#==============================================================================
# Preprocessing (centre-clipping)

try:
	signal = (abs(signal) > centreclip).astype(np.int) * signal

except:
	print "Error preprocessing."; exit()

#==============================================================================
# F0 estimation

try:
	irange = range(0, signallen-2*framelen, framelen)	# Note truncation.
# Allocate memory for f0list and AMDF list
	f0list = np.zeros(framecount)
	meandiffs = np.zeros(framelen).tolist()
# Move frame window through signal
	count = 0
	for framestart in irange:
		framestop = framestart + framelen
		frame = signal[framestart:framestop]
# Calculate Average Magnitude Difference Function with moving window
		indx = 0
		for winstart in range(framestart,framestop):
			movingwin = signal[winstart:winstart+framelen]
			absdiffs = abs(frame - movingwin)
			meandiffs[indx] = np.mean(absdiffs)
			indx += 1
# Pick smallest absolute difference in frame
		smallest = np.min(meandiffs[diffoffset:])
# Get position of the smallest absolute difference
		index = meandiffs.index(smallest)
# Divide the sampling rate by the number of samples in the interval
		f0 = fs / index	# That is: t = index/fs; f0 = 1/t
# Extend f0 list
		f0list[count] = f0
		count += 1

except:
	print "F0 estimation error."; exit()

#==============================================================================
# Post-processing

try:
# Remove f0 values outside defined limits
	f0list = (f0list > f0min).astype(int) * f0list
	f0list = (f0list < f0max).astype(int) * f0list
# Smooth F0 contour
	f0list = medfilt(f0list,medianwindow)

except:
	print "Post-processing failed."; exit()

#==============================================================================
#==============================================================================
# Print to CSV file (data in rows)

try:
	csvfilename = 'speech-f0estimate-%s.csv'%filebase
	f0csv = filebase+','+','.join(map(str,f0list))
	
	header = 'speech-'+filebase+','+','.join(map(str,np.linspace(0,len(f0list),len(f0list))*framerate))
	csvfiletext = header+'\n'+f0csv+'\n'
	file = open(csvfilename,'w')
	file.write(csvfiletext)
	print 'Output in CSV file. Open with "soffice %s", choose comma separator, click on the left of row 1, then Insert, then Chart, then format the chart as a scatter plot.'%csvfilename

except:
	print "Text file output error."; exit()

#==============================================================================
#==============================================================================
# Figures

try:
	sp1figname = 'speech-waveform-%s.png'%filebase
	sp2figname = 'speech-amenvelope-%s.png'%filebase
	sp3figname = 'speech-f0track-%s.png'%filebase
	sp4figname = 'speech-FMandAMenvelopes-%s.png'%filebase
	sp5figname = 'speech-spectrogram-%s.png'%filebase
	spfullfigname = 'speech-full-%s.png'%filebase

	spfignames = csvfilename,sp1figname,sp4figname,sp5figname,spfullfigname
	print "Output file names:\n\t%s\n\t%s\n\t%s\n\t%s\n\t%s"%spfignames

	if singlefigure:
		fig, (sp1,sp2,sp3,sp4,sp5) = plt.subplots(5, 1, figsize=(12,9))
		fig.suptitle('DIY: F0 estimation (aka "pitch extraction"). DG, 2010-10-10',fontsize=16)
except:
	""; exit()

#==============================================================================
# Plot signal

if True:
	if not singlefigure:
		fig, (sp1) = plt.subplots(1, 1, figsize=(16,4))

	x = np.linspace(0,len(signal),len(signal))/fs
	signal = medfilt(signal,31)
	signal = signal / float(np.max(abs(signal)))
	abssignal = abs(signal)
	sp1.scatter(x,signal,color='g',s=2)
	if envelopeflag:
		sp1.plot(x,abssignal,color='lightgreen')
		peakwin = 20
		peakarray = [ max(abssignal[i-peakwin:i]) for i in range(peakwin,len(abssignal)) ]
		peakarray = np.append(medfilt(peakarray,201),[0]*peakwin)
		peakarray = peakarray / max(peakarray)
		sp1.plot(x,peakarray,color='r',linewidth=2)
	sp1.set_title('Waveform')
	sp1.set_xlabel('Time (s)')
	signalduration = float(len(signal))/fs
	sp1.set_xlim(0,signalduration)
	sp1.set_ylim(-1,1)

	if not singlefigure:
		plt.tight_layout(pad=1, w_pad=0.1, h_pad=0.1)
		plt.savefig(sp1figname)

#==============================================================================
# Plot AM envelope

if True:
	if not singlefigure:
		fig, (sp2) = plt.subplots(1, 1, figsize=(12,4))
	x = np.linspace(0,len(signal),len(signal))/fs
	signal = medfilt(signal,31)
	signal = signal / float(np.max(abs(signal)))
	abssignal = abs(signal)
	peakwin = 20
	amenvelope = [ max(abssignal[i-peakwin:i]) for i in range(peakwin,len(abssignal)) ]
	amenvelope = np.append(medfilt(amenvelope,201),[0]*peakwin)
	amenvelope = amenvelope / max(amenvelope)
	sp2.plot(x,amenvelope,color='r',linewidth=2)
	sp2.set_title('Amplitude Modulation (AM) envelope')
	sp2.set_xlabel('Time (s)')
#	sp2.set_xlim(0,signalduration)
	if not singlefigure:
		plt.tight_layout(pad=1, w_pad=0.1, h_pad=0.1)
		plt.savefig(sp2figname)

#==============================================================================
# F0 estimation
try:
	if not singlefigure:
		fig, (sp3) = plt.subplots(1, 1, figsize=(12,4))
	sp3.set_title('AMDF Frequency Modulation (FM) envelope (F0 estimation, pitch tracking)')
	sp3.set_xlabel('Time (s)')
	sp3.set_xlim(0,signalduration)
	sp3.set_ylim(f0min,f0max)
	y = f0list
	leny = len(y)
	x = np.linspace(0,leny,leny)*framerate
	sp3.scatter(x,y,s=5,color='b')
	if not singlefigure:
		plt.tight_layout(pad=1, w_pad=0.1, h_pad=0.1)
		plt.savefig(sp3figname)
except:
	print "Problem with F0 estimation."; exit()

#==============================================================================
# F0 estimation with Average Magnitude Difference Function in moving window
try:
	if not singlefigure:
		fig, (sp4) = plt.subplots(1, 1, figsize=(12,4))
	if envelopeflag:
		amstring = ' AM envelope & '
	else:
		amstring = ''
	sp4.set_title(amstring+'AMDF Frequency Modulation (FM) envelope (F0 estimation, pitch tracking)')
	sp4.set_xlabel('Time (s)')
	sp4.set_xlim(0,signalduration)
	sp4.set_ylim(f0min,f0max)
	y = f0list
	leny = len(y)
	x = np.linspace(0,leny,leny)*framerate
	if envelopeflag:
		thinenvelope = amenvelope[::int(sampframeratio)]
		y = 0.5*y/max(y)
		z = thinenvelope/max(thinenvelope)
		sp4.plot(x,z,linewidth=2,color='r')
		sp4.set_ylim(0.0,1.1)
		sp4.set_yticklabels([])
	sp4.scatter(x,y,s=5,color='b')
	if not singlefigure:
		plt.tight_layout(pad=1, w_pad=0.1, h_pad=0.1)
		plt.savefig(sp4figname)

#==============================================================================
# Plot spectrogram

	if not singlefigure:
		fig, (sp5) = plt.subplots(1, 1, figsize=(12,8))
	sp5.set_title('Spectrogram')
	sp5.set_xlabel('Time (s)')
	sp5.specgram(signal,NFFT=spectwin, Fs=fs)
	sp5.axis(ymin=spectmin, ymax=spectmax)
	sp5.grid(which='both',axis='both',linewidth="1",linestyle='--')
	sp5.set_xlim(0,signalduration)
	if not singlefigure:
		plt.tight_layout(pad=1, w_pad=0.1, h_pad=0.1)
		plt.savefig(sp5figname)

#==============================================================================
# Format and display figure
	if singlefigure:

		plt.tight_layout(pad=3, w_pad=0.1, h_pad=1)
		plt.savefig(spfullfigname)
		plt.show()

except:
	print "Graphics or CSV output error."; exit()
#===========================================================================EOF