Wavelet analysis. Part 2

Introduction

This publication deals with wavelet - time series analysis. The basic idea of the wavelet transform meets the specifics of many time series, demonstrating the evolution in time of its main characteristics - the average value, dispersion, periods, amplitudes and phases of harmonic components. The overwhelming majority of the processes studied in various fields of knowledge have the above listed features.

The purpose of this publication is to describe the continuous wavelet transform time series technique using the PyWavelets library.

A bit of history
')
Geophysicist D. Morlet in the late 70s of XX century. faced the problem of analyzing signals from seismic sensors, which contained a high-frequency component (seismic activity) for a short period of time and low-frequency components (a quiet state of the earth's crust) - for a long period. The window Fourier transform allows you to analyze either the high-frequency component or the low-frequency component, but not both components at once.

Therefore, an analysis method was proposed in which the width of the window function increased for low frequencies and decreased for high frequencies. The new window transformation was obtained as a result of stretching (compressing) and shifting in time one generating (the so-called scaling function - scaling function, scalet) function. This generating function was called D. Morlet wavelet.

Wavelet D. Morlet

from pylab import* import scaleogram as scg axes = scg.plot_wav('cmor1-1.5', figsize=(14,3)) show()

Continuous Wavelet Transform (CWT)

Single-level wavelet transform:

coefs, frequencies = pywt.cwt (data, scales, wavelet)

Where:

data: (array_like) -Input signal;

scales (array_like) : - The scale of the wavelet to use. You can use the ratio f = scale2frequency (scale, wavelet) / sampling_period and determine which physical frequency f. Here, f is in hertz when sampling_period in seconds;

wavelet (Wavelet object or name) : - Wavelet to use;

sampling_period (float): - The sampling period for output frequencies (optional). The values calculated for the coefs do not depend on the choice of sampling_period (i.e. scales are not scaled by the period sample.);

coefs (array_like) : - Continuous wavelet - conversion of the input signal for a given scale and wavelet;

frequencies (array_like) : - If the unit of the sampling period is a second and is specified, then the frequencies are output in hertz. Otherwise, a sample period of 1 is assumed.

Note: The size of the coefficient arrays depends on the length of the input array and the lengths of the specified scales.

Get the list of available wavelet names compatible with cwt:

 >>> import pywt >>> wavlist = pywt.wavelist(kind='continuous') >>> wavlist

['cgau1', 'cgau2', 'cgau3', 'cgau4', 'cgau5', 'cgau6', 'cgau7', 'cgau8', 'cmor', 'fbsp', 'gaus1', 'gaus2', ' gaus3 ',' gaus4 ',' gaus5 ',' gaus6 ',' gaus7 ',' gaus8 ',' mexh ',' morl ',' shan ']

For wavelet analysis, the choice of maternal wavelet is one of the key tasks. Therefore, before each choice of the wavelet it is simply necessary to use the following program, which allows to determine the dynamic properties of the wavelet:

Listing 1

 import numpy as np import pywt import matplotlib.pyplot as plt vav='cmor1.5-1.0' wav = pywt.ContinuousWavelet(vav) #  ,      print("       [{}, {}]".format( wav.lower_bound, wav.upper_bound)) width = wav.upper_bound - wav.lower_bound scales = [1, 2, 3, 4, 10, 15] max_len = int(np.max(scales)*width + 1) t = np.arange(max_len) fig, axes = plt.subplots(len(scales), 2, figsize=(12, 6)) for n, scale in enumerate(scales): #       cwt int_psi, x = pywt.integrate_wavelet(wav, precision=10) step = x[1] - x[0] j = np.floor( np.arange(scale * width + 1) / (scale * step)) if np.max(j) >= np.size(int_psi): j = np.delete(j, np.where((j >= np.size(int_psi)))[0]) j = j.astype(np.int) # normalize int_psi     int_psi /= np.abs(int_psi).max() #     filt = int_psi[j][::-1] # CWT          #          . nt = len(filt) t = np.linspace(-nt//2, nt//2, nt) axes[n, 0].plot(t, filt.real, t, filt.imag) axes[n, 0].set_xlim([-max_len//2, max_len//2]) axes[n, 0].set_ylim([-1, 1]) axes[n, 0].text(50, 0.35, 'scale = {}'.format(scale)) f = np.linspace(-np.pi, np.pi, max_len) filt_fft = np.fft.fftshift(np.fft.fft(filt, n=max_len)) filt_fft /= np.abs(filt_fft).max() axes[n, 1].plot(f, np.abs(filt_fft)**2) axes[n, 1].set_xlim([-np.pi, np.pi]) axes[n, 1].set_ylim([0, 1]) axes[n, 1].set_xticks([-np.pi, 0, np.pi]) axes[n, 1].set_xticklabels([r'$-\pi$', '0', r'$\pi$']) axes[n, 1].grid(True, axis='x') axes[n, 1].text(np.pi/2, 0.5, 'scale = {}'.format(scale)) axes[n, 0].set_xlabel('time (samples)') axes[n, 1].set_xlabel('frequency (radians)') axes[0, 0].legend(['real', 'imaginary'], loc='upper left') axes[0, 1].legend(['Power'], loc='upper left') axes[0, 0].set_title('filter: wavelet - %s'%vav) axes[0, 1].set_title(r'|FFT(filter)|$^2$') plt.show()

Further we will consider the wavelet functions and their properties using the above program:

Mexican hat wavelet "mexh":

v a r p h i l e f t (t r i g h t) = f r a c 2 s q r t 3 s q r t [4] p i c d o t e x p^{- f r a c t^{2} 2} c d o t l e f t (1 - t^{2} r i g h t)

$\ varphi \ left (t \ right) = \ frac {2} {\ sqrt {3} \ sqrt [4] {\ pi}} \ cdot exp ^ {- \ frac {t ^ {2}} {2} } \ cdot \ left (1-t ^ {2} \ right)$

Morle Morlet Wavelet:

v a r p h i l e f t (t r i g h t) = e x p^{- f r a c t^{2} 2} c d o t c o s (5 t)

$\ varphi \ left (t \ right) = exp ^ {- \ frac {t ^ {2}} {2}} \ cdot cos (5t)$

Complex Morlet wavelet cmorB-C

v a r p h i l e f t (t r i g h t) = f r a c 1 s q r t p i B e x p^{- f r a c t^{2} B} c d o t e x p^{j 2 p i C t}

$\ varphi \ left (t \ right) = \ frac {1} {\ sqrt {\ pi} B} exp ^ {- \ frac {t ^ {2}} {B}} \ cdot exp ^ {j2 \ pi C t}$

where B is the throughput; C is the center frequency.

Hereinafter (without special instructions) the values of B, C are set with a floating point.

Gaussian wavelets "gausP"

v a r p h i l e f t (t r i g h t) = C c d o t e x p^{- t^{2}}

$\ varphi \ left (t \ right) = C \ cdot exp ^ {- t ^ {2}}$

where: P is an integer from 1 to 8 is inserted into the name of the wavelet, for example "gaus8"; C is a built-in normalization constant.

Complex Gaussian wavelets "cgauP"

v a r p h i l e f t (t r i g h t) = C c d o t e x p^{- t^{2}} c d o t e x p^{- j t}

$\ varphi \ left (t \ right) = C \ cdot exp ^ {- t ^ {2}} \ cdot exp ^ {- jt}$

where: P - an integer from 1 to 8 is inserted into the name of the wavelet, for example, “gaus8” and correspond to the order of the derivative of the wavelet function; C is a built-in normalization constant.

Shannon wavelets "shanB-C"

v a r p h i l e f t (t r i g h t) = s q r t B c d o t f r a c s i n (p i B t) p i B T c d o t e x p^{j 2 p i C t}

$\ varphi \ left (t \ right) = \ sqrt {B} \ cdot \ frac {sin (\ pi B t)} {\ pi B T} \ cdot exp ^ {j 2 pi C t}$

where: B is the width of the strip; C is the center frequency.

CWT in PyWavelets is applied to discrete data - convolutions with samples of the wavelet integral. If scale is too small, then we get a discrete filter with inadequate sampling, which leads to smoothing, as shown in the graph for the 'cmor1.5-1.0' wavelet.

The left column on the graphs shows the discrete filters used in convolution with different scales. The right column is the corresponding Fourier power spectra of each filter. With scales 1 and 2 for the 'cmor1.5-1.0' graph, it can be seen that anti-aliasing occurs due to violation of the Nyquist constraint. This phenomenon may also occur in other wavelets when choosing C and B, therefore, when working with wavelets, you must use the program listing 1.

To associate a given scale with a specific signal frequency, the signal sampling period must be known. Using the pywt.scale2frequency () function, you can convert the list of scales to the corresponding frequencies. The correct choice of scales depends on the selected wavelet, so pywt.scale2frequency () should be used to get an idea of the corresponding frequency range of the signal.

 >>> import pywt >>> dt = 0.01 # 100 Hz sampling >>> frequencies = pywt.scale2frequency('cmor1.5-1.0', [1, 2, 3, 4]) / dt >>> frequencies

array ([100., 50., 33.33333333, 25.])

Wavelet - time series analysis using CWT PyWavelets

The el-Nino dataset is a time series dataset used to track El Nino and contains quarterly measurements of sea surface temperature from 1871 to 1997. To understand the power of a scalegram, let's visualize it for the el-Nino dataset along with the original data of the time series and its Fourier transform.

For the wavelet analysis of time series, you must perform the following steps for the items:

1. Choose maternal wavelet: Select the complex Morlet wavelet “cmorB-C”:

v a r p h i l e f t (t r i g h t) = f r a c 1 s q r t p i B e x p^{- f r a c t^{2} B} c d o t e x p^{j 2 p i C t}

$\ varphi \ left (t \ right) = \ frac {1} {\ sqrt {\ pi} B} exp ^ {- \ frac {t ^ {2}} {B}} \ cdot exp ^ {j2 \ pi C t}$

Bandwidth - B and center frequency - From which we will determine in the next stage.

2. Determine the center frequency, taking dt = 0.25 for our time series:

 import pywt dt = 0.25 # 4 Hz sampling scale=range(1,4) frequencies = pywt.scale2frequency('cmor1.0-0.5', scale) / dt print(frequencies)

[2. 1. 0.66666667]

3. Find the Fourier transform of the mother wavelet cmor1.0-0.5 (we use Listing 1):
Continuous wavelet will be evaluated in the whole range [-8.0, 8.0]

4. Select the data for the time series:

 pd.read_csv("http://paos.colorado.edu/research/wavelets/wave_idl/sst_nino3.dat", sep = "|")

The data is a quarterly measurement of the temperature of the sea surface from 1871 to 1997. For this data: t0 = 1871 dt = 0.25

5. Build a time series with a signal and its moving average on one chart:

Listing 2

 from numpy import * from scipy import * import pandas as pd from pylab import * import pywt def get_ave_values(xvalues, yvalues, n = 5): signal_length = len(xvalues) if signal_length % n == 0: padding_length = 0 else: padding_length = n - signal_length//n % n xarr = array(xvalues) yarr = array(yvalues) xarr.resize(signal_length//n, n) yarr.resize(signal_length//n, n) xarr_reshaped = xarr.reshape((-1,n)) yarr_reshaped = yarr.reshape((-1,n)) x_ave = xarr_reshaped[:,0] y_ave = nanmean(yarr_reshaped, axis=1) return x_ave, y_ave def plot_signal_plus_average(time, signal, average_over = 5): fig, ax = subplots(figsize=(15, 3)) time_ave, signal_ave = get_ave_values(time, signal, average_over) ax.plot(time, signal, label='') ax.plot(time_ave, signal_ave, label = '   (n={})'.format(5)) ax.set_xlim([time[0], time[-1]]) ax.set_ylabel(' ', fontsize=18) ax.set_title(' +   ', fontsize=18) ax.set_xlabel('', fontsize=18) ax.legend() show() df_nino = pd.read_csv("http://paos.colorado.edu/research/wavelets/wave_idl/sst_nino3.dat", sep = "|") N = df_nino.shape[0] t0=1871 dt=0.25 time = arange(0, N) * dt + t0 signal = df_nino.values.squeeze() scales = arange(1, 128) plot_signal_plus_average(time, signal)

6. We carry out the Fourier transform and the modality of the spectrum from the time series:

Listing 3

 from numpy import * from scipy import * import pandas as pd from pylab import * import pywt def get_ave_values(xvalues, yvalues, n = 5): signal_length = len(xvalues) if signal_length % n == 0: padding_length = 0 else: padding_length = n - signal_length//n % n xarr = array(xvalues) yarr = array(yvalues) xarr.resize(signal_length//n, n) yarr.resize(signal_length//n, n) xarr_reshaped = xarr.reshape((-1,n)) yarr_reshaped = yarr.reshape((-1,n)) x_ave = xarr_reshaped[:,0] y_ave = nanmean(yarr_reshaped, axis=1) return x_ave, y_ave def get_fft_values(y_values, T, N, f_s): f_values = linspace(0.0, 1.0/(2.0*T), N//2) fft_values_ = fft(y_values) fft_values = 2.0/N * abs(fft_values_[0:N//2]) return f_values, fft_values def plot_fft_plus_power(time, signal): dt = time[1] - time[0] N = len(signal) fs = 1/dt fig, ax = subplots(figsize=(15, 3)) variance = std(signal)**2 f_values, fft_values = get_fft_values(signal, dt, N, fs) fft_power = variance * abs(fft_values) ** 2 # FFT power spectrum ax.plot(f_values, fft_values, 'r-', label='FFT ') ax.plot(f_values, fft_power, 'k--', linewidth=1, label=' ') ax.set_xlabel('[ / ]', fontsize=18) ax.set_ylabel('', fontsize=18) ax.legend() show() df_nino = pd.read_csv("http://paos.colorado.edu/research/wavelets/wave_idl/sst_nino3.dat", sep = "|") N = df_nino.shape[0] t0=1871 dt=0.25 time = arange(0, N) * dt + t0 signal = df_nino.values.squeeze() scales = arange(1, 128) plot_fft_plus_power(time, signal)

7. Determine the scale: scales = arange (1, 128); levels = [2 ** - 4, 2 ** - 3, 2 ** - 2, 2 ** - 1, 2 ** 0, 2 ** 1, 2 ** 2, 2 ** 3].

8. Build a scalegram:

Listing 4

 from numpy import * import pandas as pd from pylab import * import pywt def plot_wavelet(time, signal, scales, waveletname = 'cmor1.0-0.4', cmap = plt.cm.seismic, title = '-( ) ', ylabel = ' ()', xlabel = ''): dt = time[1] - time[0] [coefficients, frequencies] = pywt.cwt(signal, scales, waveletname, dt) power = (abs(coefficients)) ** 2 period = 1. / frequencies levels = [2**-4 , 2**-3 , 2**-2 , 2**-1 , 2**0 , 2**1 , 2**2 , 2**3] contourlevels = log2(levels) fig, ax = subplots(figsize=(15, 10)) im = ax.contourf(time, log2(period), log2(power), contourlevels, extend='both',cmap=cmap) ax.set_title(title, fontsize=20) ax.set_ylabel(ylabel, fontsize=18) ax.set_xlabel(xlabel, fontsize=18) yticks = 2**arange(np.ceil(log2(period.min())), ceil(log2(period.max()))) ax.set_yticks(log2(yticks)) ax.set_yticklabels(yticks) ax.invert_yaxis() ylim = ax.get_ylim() ax.set_ylim(ylim[0], -1) cbar_ax = fig.add_axes([0.95, 0.5, 0.03, 0.25]) fig.colorbar(im, cax=cbar_ax, orientation="vertical") show() df_nino = pd.read_csv("http://paos.colorado.edu/research/wavelets/wave_idl/sst_nino3.dat", sep = "|") N = df_nino.shape[0] t0=1871 dt=0.25 time = arange(0, N) * dt + t0 signal = df_nino.values.squeeze() scales = arange(1, 128) plot_wavelet(time, signal, scales)

The scale diagram shows that most of the spectrum power is concentrated over a period of 2-8 years, this corresponds to a frequency of 0.125 - 0.5 Hz. In the Fourier spectrum, fashion also concentrates around these frequency values. However, the wavelet transform also gives us temporal information, but the Fourier transform does not.

For example, in the scale diagram we see that before 1920 there were many fluctuations, while between 1960 and 1990 there were not so many. We can also see that over time there is a shift from shorter periods to longer ones.

Using the scaleogram module

Scaleogram is a handy tool for analyzing 1D data with a continuous wavelet transform built on the basis of the PyWavelets library. The module is designed to provide a reliable tool for quick data analysis.

Scaleogram has the following features:

simple call sign for beginners
readable axes and matplotlib net integration
Many options for zooming in, filtering the spectrum, embedding colorbar, etc ...
support of frequency and frequency units in accordance with the marking
speed, uses N * log (N) algorithm for conversions
portability: tested with python2.7 and python3.7
detailed error messages and documentation with examples

However, in the usage examples, the data access is not correctly recorded:

 nino3 = "http://paos.colorado.edu/research/wavelets/wave_idl/sst_nino3.dat"

The following warning appears:

nino3 = pd.read_table (nino3)
which leads to a warning:
Warning (from warnings module):
File "C: /Users/User/Desktop/wavelet/pr.py", line 7
nino3 = pd.read_table (nino3)
FutureWarning: read_table is deprecated, use read_csv instead, passing sep = '\ t'

Access to data should be written like this:

 url="http://paos.colorado.edu/research/wavelets/wave_idl/sst_nino3.dat" nino3 = pd.read_csv(url, sep = "|")

To show the use of the scalar and compare the results with the results of the previous example, let us use the same data - for the quarterly measurements of the sea surface temperature from 1871 to 1997. For this data: t0 = 1871 dt = 0.25

Listing 4

 import pandas as pd import pywt from numpy import* import scaleogram as scg from pylab import* url="sst_nino3.dat" nino3 = pd.read_csv(url, sep = "|") data = nino3.values.squeeze() N = data.size t0 = 1871; dt = 0.25 time = t0 + arange(len(data))*dt wavelet = 'cmor1-0.5' ax = scg.cws(time, data, scales=arange(1, 128), wavelet=wavelet, figsize=(14, 3), cmap="jet", cbar=None, ylabel=' ()', xlabel=" []", yscale="log", title='-  \n( )') ticks = ax.set_yticks([2,4,8, 16,32]) ticks = ax.set_yticklabels([2,4,8, 16,32]) show()

If we compare the scalegram with the received scangram, then, with the exception of the color palette, they are identical, and therefore show the same dynamics of the time series.
Listing 4 is simpler than Listing 3 and is faster.

When the frequency spectrum and the Fourier power spectrum do not provide additional information on the dynamics of the time series, then the use of a scalar is preferable.

findings

An example of a wavelet analysis of a time series using the PyWavelets library is given.

Source: https://habr.com/ru/post/452474/

All Articles