Table Of Contents

Previous topic

pycrtools.tasks.antennaresponse

Next topic

pycrtools.tasks.beamformer2

This Page

pycrtools.tasks.averagespectrum

AverageSpectrum([ws, parfile]) The function will calculate an average spectrum from a list of files and a series of antennas (all averaged into one spectrum).
WorkSpace(**args)
averagespectrum_getfile(current_file_number, ...) To produce an error message in case the file does not exist
make_frequencies(spectrum[, offset, ...]) Calculates the frequencies for the calculated spectrum (outdated)

Average spectrum

Module author: Heino Falcke <h.falcke@astro.ru.nl>

class pycrtools.tasks.averagespectrum.AverageSpectrum(ws=None, parfile=None, **kwargs)

The function will calculate an average spectrum from a list of files and a series of antennas (all averaged into one spectrum).

The task will do a basic quality check of each time series data set and only average good data.

For very large FFTs, and to save memory, one can use a doublefft which means that one big FFT is split into two steps of smaller FFts. This allows almost arbitrary resolution.

The function will go through all files in the list and then loop, one-by-one, over all antennas in the files. Data will be read in in nchunks chunks (i.e. blocks) of size fullsizes. For a single FFT these chunks will be subdivided into nblock blocks to do quality checks (i.e., to see if certain blocks deviate from others). For a double FFT with stride>1 only a fraction of these blocks are read in at once.

The desired frequency resolution is provided with the parameter delta_nu, but this will be rounded off to the nearest value using channel numbers of a power of 2. For single FFTs you can also set the parameter chunksize.

Note: the spectrum has N/2 channels and not N/2+1 channels as is usually the case. This means that the last channel is missing!!

The resulting spectrum is stored in the array Task.power (only complete for stride=1) and written to disk as an cr.hArray with parameters stored in the header dict (use getHeader('AverageSpectrum') to retrieve this.)

Note for single FFTs one can decide whether to average all antennas into one spectrum (e.g., an incoherent ‘station spectrum’), or whether to keep the antennas separate and get an average spectrum per antenna (addantennas=False).

This spectrum can be read back and a baseline can be fitted with FitBaseline.

Incoherent Beam

With the option calc_incoherent_sum one can add the square of all time series data of all antennas into one long time series data (containing the power) for single FFTs. The resulting array Task.incoherent_sum has the length of the entire data set (of one antenna, of course) and has the dimensions [nchunks,chunksize=blocksize]. It will be saved on disk in sp.par.incoherent_sum.

Dynamic Spectrum

With the option calc_dynspec turned on, the task will calculate a dynamic spectrum for each file within the length of the data set. It will add all antennas of a station and then divide the data sets into sections of length t_int within which all chunks of data are intergated. The section is the one time-step in the dynamic spectrum. The result is provided in Task.dynspec and Task.cleanspec (where the dynamic spectrum is divided by an average spectrum, stored in Task.avspec). Avspec and cleanspec are also par-array of dynspec (i.e., Task.dynspsc.par.cleanspec). The dynamic spectrum will also be stored in the output directory (output_subdir=filename.dir) as a separate file.

If multiple files are provided, also a dynamic spectrum is made out of the average spectra of the files being worked on and an average spectrum of all files. (Note, that the files have to be sorted by timestamp in the Task.filenames, which is produced by Task.filefilter.

Quality Checking

To avoid the spectrum being influenced by spikes in the time series, those spikes can be replaced by random numbers, before the FFT is taken (see randomize_peaks).

The quality information (RMS, MEAN, flagged blocks per antennas) is stored in a data ‘quality database’ in text and python form and is also available as Task.quality. (See also: Task.antennacharacteristics, Task.mean, Task.mean_rms, Task.rms, Task.rms_rms, Task.npeaks, Task.npeaks_rms, Task.homogeneity_factor - the latter is printend and is a useful hint if something is wrong)

One can specify the maximum number of blocks in a chunk that are allowed to fail a quality criterium (maxblocksflagged), before the entrie chunk is discarded. The way quality checking (see module :func:qualitycheck) works, is that each chunk is divided into blocks. If a few of these blocks have a mean or rms that deviates too much from the others, they will be flagged. Also, a noise level will be determined where one expects roughly one peak per block above that noise for Gaussian noise. If the actual number of peaks is spikeexcess above that number, flag the block (and perhaps the entire chunk)

Flagged blocks can be easily inspeced using the task method Task.qplot (qplot for quality plot).

If you see an outputline like this:

# Start antenna = 92 (ID= 17011092) - 4 passes:
184 - Mean=  3.98, RMS=  6.35, Npeaks=  211, Nexpected=256.00 (Npeaks/Nexpected=  0.82), nsigma=  2.80, limits=( -2.80,   2.80)
185 - Mean=  3.97, RMS=  6.39, Npeaks=  200, Nexpected=256.00 (Npeaks/Nexpected=  0.78), nsigma=  2.80, limits=( -2.80,   2.80)
186 - Mean=  3.98, RMS=  6.40, Npeaks=  219, Nexpected=256.00 (Npeaks/Nexpected=  0.86), nsigma=  2.80, limits=( -2.80,   2.80)
- Block   514: mean=  0.25, rel. rms=   2.6, npeaks=   16, spikyness=  15.00, spikeexcess= 16.00   ['rms', 'spikeexcess']

this will tell you that Antenna 17011092 was worked on (the 92nd antenna in the data file) and the 186th chunk (block 514) contained some suspicious data (too many spikes). If you want to inspect this, you can call:

>>> Task.qplot(186)

This will plot the chunk and highlight the first flagged block (#514) in that chunk:

>>> Task.qplot(186,1)

would highlight the second flagged block (which does not exist here).

If the chunks are too long to be entirely plotted, use:

>>> Task.qplot(186,all=False).

Sections, Chunks, Blocks

The terminology is as follows: the smallest unit is a block. If stride>1 only every nth block will be read in to do a double fft.

stride*blocks will be one chunk of data. The size of a chunk determines the resolution of an FFT.

Input parameters

addantennas [default value: True ]
If True, add all antennas into one average spectrum, otherwise keep them separate in memory per file.
all_avspec
Average spectrum over all times and all files, used for plotting ...?
all_cleanspec
Cleaned dynamic spectrum.
all_dynspec
Dynamic spectrum from the average spectra of all files (if no doublefft and calc_dynspec=True).
all_dynspec_file
Complete filename including directory to store the final dynamic spectrum of all files.
antennaIDs
Antenna IDs from the current file.
antennas
Array of index numbers for antennas to be processed from the current file.
antennas_end [default value: -1 ]
Maximum antenna number (plus one, zero based ...) to use in each file.
antennas_start [default value: 0 ]
Start with the n-th antenna in each file (see also nantennas_stride). Can be used for selecting odd/even antennas.
antennas_stride [default value: 1 ]
Take only every n-th antenna from antennas list (see also antennas_start). Use 2 to select odd/even.
avspec
Average spectrum over all times, used for plotting ...?
blocklen
The size of a block being read in if stride>1, otherwise self.blocklen*self.nblocks is the chunksize.
blocklen_section
Blocklength of a section
calc_all_dynspec
Calculate a dynamic spectrum from the average spectra of all files as well (if no doublefft)
calc_averagespectrum [default value: True ]
Calculate the averagespectrum (not yet fully implemented ... to not do it).
calc_dynspec [default value: False ]
Calculate a dynamic spectrum as well (if no doublefft)
calc_incoherent_sum [default value: False ]
Calculate the incoherent sum of all antennas (doublefft=False). See incoherent_sum for result.
cdata
Complex array holding the FFT of the time series data and other things
cdata2
Supplementary complex array holding the FFT of the time series for a single FFT
cdataT
Complex array holding the FFT of the time series data and other things transposed.
chunkduration
Length (duration) of one chunk of data.
chunks_per_sect
Number of chunks per section of data.
chunksize_estimated
The desired chunksize for a single FFT. Can be set directly, otherwise estimated from delta_nu. The actual size will be rounded to give a power of two, hence, see chunksize_used for the actual chunksize used.
chunksize_used
The chunksize one would use for a single FFT.
cleanspec
Cleaned dynamic spectrum.
datafile
Data file object pointing to raw data.
datalength
The (maximum) length of one full data set of one antenna per file.
delta_band
Frequency width of one section/band of spectrum
delta_frequency
Separation of two subsequent channels in final spectrum
delta_nu [default value: 1000 ]
Frequency resolution
doplot [default value: False ]
Produce output plots. If doplot>1 plot while processing.
doublefft [default value: False ]
if True split the FFT into two, thereby saving memory.
dynspec
Dynamic spectrum of all antennas (if no doublefft and calc_dynspec=True).
dynspec_file
Complete filename including directory to store the final dynamic spectrum.
dynspec_nspectra_added
Number of spectra added per time step.
dynspec_time
Time axis for dynamic spectrum.
end_frequency
End frequency of spectrum
end_time
End of time axis for dynamic spectrum.
fftdata
Complex array holding the FFT of the time series for a single FFT (Same memory as cdata)
file_start_number [default value: 0 ]
Integer number pointing to the first file in the filenames list with which to start. Can be changed to restart a calculation.
fileext [default value: ”.pcr” ]
Extension of filename for temporary data files (e.g., used if stride > 1.)
filefilter [default value: “$LOFARSOFT/data/lofar/RS307C-readfullsecondtbb1.h5” ]
Unix style filter (i.e., with *, ~, $VARIABLE, etc.), to describe all the files to be processed.
filenames
List of filenames of data file to read raw data from.
freqs
Frequencies
frequencies
Frequency axis for final power spectrum.
full_blocklen
Full block length (blocklen*stride)
fullsize
The full length of the raw time series data used for one spectral chunk (nblocks*blocklen*stride).
header
Header of datafile
incoherent_sum
Incoherent sum of the power in all antennas (timeseries data).
incoherent_sum_file
Complete filename including directory to store the incoherent sum of all antennas.
infileext [default value: ”.h5” ]
Extension of the input filename (later to be removde from the filename)
load_if_file_exists [default value: False ]
If average spectrum file (spectrumfile) already exists, skip calculation and load existing file.
maxblocksflagged [default value: 0 ]
Maximum number of blocks that are allowed to be flagged before the entire spectrum of the chunk is discarded. If =0 then flag a chunk if any block is deviant.
maxnchunks [default value: -1 ]
Maximum number of chunks of raw data to integrate spectrum over (-1 = all).
maxpeak [default value: 7 ]
Maximum height of the maximum in each block (in sigma,i.e. relative to rms) before quality is failed.
meanfactor [default value: 3 ]
Factor by which the mean is allowed to change within one chunk of time series data before it is flagged.
nantennas
The number of antennas to be calculated in one file.
nantennas_file
The actual number of antennas available for calculation in the file.
nbands
Number of bands in spectrum which are separately written to disk, if stride > 1. This is one half of stride, since only the lower half of the spectrum is written to disk.
nblocks
Number of blocks of lenght blocklen the time series data set is split in. The blocks are also used for quality checking.
nblocks_section
Number of blocks per section
nchunks
Number of spectral chunks that are averaged
nchunks_max
Maximum number of spectral chunks to average
nfiles
Number of data files to process.
nsamples_data
Number of samples in raw antenna file
nsects
Number of sections in the data file
nsubblocks
Number of subblocks
ntimeseries_data_added_per_chunk
Number of chunks added in each bin for incoherent sum
output_dir [default value: “” ]
Directory where output file is to be written to (look for filename.dir there).
output_subdir
Directory where output data files are being written to.
peak_rmsfactor [default value: 5 ]
At how many sigmas above the mean will a peak be randomized.
plot_antenna [default value: 0 ]
Which antenna to plot?
plot_filename [default value: “” ]
Base filename to store plot in.
plot_finish
Function to be called after each plot to determine whether to pause or not (see ::func::plotfinish)
plot_name [default value: “” ]
Extra name to be added to plot filename.
plot_zoom_slice
slice(n1,n2) - If a slice is provided use this to plot an additional graph zoomed into the region between these channels along the x-axis
plot_zoom_x
(nu1,nu2) - If a tuple with two numbers is provided use this to plot an additional graph zoomed into the region between these frequencies on the x-axis
plot_zoom_x1 [default value: 0 ]
Use this to plot an additional graph zoomed into the region between plot_zoom_x1 and plot_zoom_x2
plot_zoom_x2 [default value: 0 ]
Use this to plot an additional graph zoomed into the region between plot_zoom_x1 and plot_zoom_x2
plotlen [default value: 4096 ]
How many channels +/- the center value to plot during the calculation (to allow progress checking).
plotskip [default value: 1 ]
Plot only every ‘plotskip’-th spectrum, skip the rest (should not be smaller than 1).
power
Resulting average power spectrum (or part thereof if stride > 1)
power2
Copy of power with 2 dimensions also when antennas are being added.
power_size
Size of the output spectrum in the vector power
quality_db_filename [default value: “qualitydatabase” ]
Root filename of log file containing the derived antenna quality values (uses ‘.py’ and ‘.txt’ extension).
qualitycheck [default value: True ]
Perform basic qualitychecking of raw data and flagging of bad data sets.
randomize_peaks [default value: True ]
Replace all peaks in time series data which are ‘rmsfactor’ above or below the mean with some random number in the same range.
rmsfactor [default value: 3 ]
Factor by which the RMS is allowed to change within one chunk of time series data before it is flagged.
rmsrange
Tuple with absolute min/max values of the rms of the time series that are allowed before a block is flagged. Will be igrored if None.
rmsrange_max [default value: 0 ]
Max value of the rms of the time series that are allowed before a block is flagged. Will be igrored if None.
rmsrange_min [default value: 0 ]
Min value of the rms of the time series that are allowed before a block is flagged. Will be igrored if None.
root_filename
Root filename for output without directory and extensions.
samplerate
Sample length in raw data set.
sectduration
Length in time units of one section of data used to extract one chunk, i.e. on time step in dynamic spectrum.
sectlen
Length of one section of data used to extract one chunk, i.e. on time step in dynamic spectrum.
size_data
Number of samples in raw antenna file
spec
Wrapper array for cdata
specT
Wrapper array for cdataT
specT2
Wrapper array for cdataT
speclen
The length of the spectrum for a single FFT
spectrum_file
Complete filename including directory to store the final spectrum.
spikeexcess [default value: 20 ]
Set maximum allowed ratio of detected over expected peaks per block to this level (1 is roughly what one expects from Gaussian noise).
start_frequency
Start frequency of spectrum
start_time [default value: 0 ]
Start of time axis for dynamic spectrum.
store_spectra [default value: True ]
Write calculated spectra to disk.
stride
If stride>1 divide the FFT processing in n=stride blocks.
stride_n [default value: 0 ]
if >0 then divide the FFT processing in n=2**stride_n blocks. This is slower but uses less memory.
subspeclen
Size of one section/band of the final spectrum
t_int [default value: 0.001 ]
Desired intergration time per spectrum for dynamic spectrum - will be rounded off to get integer number of spectral chunks
t_int_used
Actual time resolution of dynamic spectrum
tdata
Array to read time series data in
tmpfilename [default value: “tmp” ]
Root filename for temporary data files.
tmpspec
Wrapper array for cdata
tmpspecT
Wrapper array for cdataT

Output parameters

antenna_list [default value: {} ]
Dictionary of antenna indices used as input from each filename.
antennacharacteristics [default value: {} ]
A dict with antenna IDs as key, containing quality information about every antenna.
antennas_used
A set of antenna names that were actually included in the average spectrum, excluding the flagged ones.
current_file_number [default value: 0 ]
Integer number pointing to the current file in the filenames list.
filename
Index number of the current file in the list of filenames of raw data files.
homogeneity_factor [default value: 0 ]
=1-(rms_rms/rms+ npeaks_rms/npeaks)/2 - this describes the homogeneity of the data processed. A homogeneity_factor = 1 means that all antenna data were identical, a low factor should make one wonder if something went wrong.
mean [default value: 0 ]
Mean of mean time series values of all antennas.
mean_antenna
Mean value of time series per antenna.
mean_rms [default value: 0 ]
RMS of mean of mean time series values of all antennas.
nantennas_total [default value: 0 ]
Total number of antennas that were processed (flagged or not) in this run.
npeaks [default value: 0 ]
Mean of number of peaks all antennas.
npeaks_antenna
Number of peaks of time series per antenna.
npeaks_rms [default value: 0 ]
RMS of npeaks over all antennas.
nspectraadded
Number of spectra added at the end.
nspectraadded_per_antenna
Number of spectra added per antenna.
nspectraflagged
Number of spectra flagged and not used.
nspectraflagged_per_antenna
Number of spectra flagged and not used per antenna.
output_files
Dict containing the outpufiles created in the task. Keywords are ‘dynamic_spectrum’, ‘incohrent_sum’, ‘average_spectrum’, which in turn return listsof filenames
quality [default value: [] ]
A list containing quality check information about every large chunk of data that was read in. Use Task.qplot(Entry#,flaggedblock=nn) to plot blocks in question.
rms [default value: 0 ]
Mean of RMS time series values of all antennas.
rms_antenna
RMS value of time series per antenna.
rms_rms [default value: 0 ]
RMS of rms of mean time series values of all antennas.
starttime_UTC
Start time of the observation in UTC
starttime_local
Start time of the observation in local time
starttime_unix
Unix time stamp of start time.
starttimes_UTC [default value: [] ]
Start times of the observations in UTC per file
starttimes_local [default value: [] ]
Start times of the observations in local time per file
starttimes_unix [default value: [] ]
Start times of the observations in unix time time per file
class WorkSpace(**args)
AverageSpectrum.dynplot(dynspec=None, plot_cleanspec=None, dmin=None, dmax=None, cmin=None, cmax=None, ylabel=None, xlabel=None, extent=None, zoom_slice=None, zoom_x=None)

Plot the dynamic spectrum. Provide the dynamic spectrum computed by the Task AverageSpectrum as input.

plot_cleanspec None If False, don’t plot the cleaned spectrum (provided as dynspec.par.cleanspec).
dmin   Minimum z-value (intensity) in dynamic spectrum to plot.
dmax   Maximum z-value (intensity) in dynamic spectrum to plot.
cmin   Minimum z-value (intensity) in clean dynamic spectrum to plot.
cmax   Maximum z-value (intensity) in clean dynamic spectrum to plot.
y/xlabel   Label of y and x-axis.
extent   (x0,x1,y0,y1) = define extent/range of x and y axis (just for labelling)
zoom_slice   slice(n0,n1) - Zoom into this slice of the data
zoom_x   (nu0,nu1) - zoom_slice slice corresponds to these frequencies

Example:

>>> tload "AverageSpectrum"; go
>>> dsp=hArrayRead("Saturn.h5.dynspec.pcr")
>>> Task.dynplot(dsp,cmin=2.2*10**-5,cmax=0.002,dmin=6.8,dmax=10)
AverageSpectrum.init()

Initialize the task

AverageSpectrum.qplot(entry=0, flaggedblock=0, block=-1, all=True)

If you see an output line like this:

# Start antenna = 0 (ID= 2000000) - 375 passes:
# Flagged: #0 chunk= 0 , Block    14: mean= 121.68, rel. rms= 970.1, npeaks=    0, spikyness=  -1.00, spikeexcess=  0.00   ['rms', 'mean']
# Flagged: #0 chunk= 0 , Block    15: mean= 139.70, rel. rms=1049.7, npeaks=    0, spikyness=  -1.00, spikeexcess=  0.00   ['rms', 'mean']
# Flagged: #2 chunk= 2 , Block    15: mean= 139.70, rel. rms=1049.7, npeaks=    0, spikyness=  -1.00, spikeexcess=  0.00   ['rms', 'mean']
# Data flagged!
# End   antenna = 0  Time = 14.825337 s  nspectraadded = 374 nspectraflagged = 1

this will tell you that Antenna 2000000 was worked on (the 1st antenna in the data file) and the chunk 0 (blocks 14 and 15 within that chunk) and chunk 2 contained some suspicious data (with and rms and mean which fluctuated too much). If you want to inspect this, you can call:

>>> Task.qplot(0)

This will plot the chunk and highlight the first flagged block (#14) in that chunk:

>>> Task.qplot(0,1)

would highlight the second flagged block (#15)

If the chunks are too long to be entirely plotted, use:

>>> Task.qplot(186,all=False).
AverageSpectrum.run()

Run the program.

class pycrtools.tasks.averagespectrum.WorkSpace(**args)
pycrtools.tasks.averagespectrum.averagespectrum_getfile(current_file_number, filenames, nfiles, filefilter)

To produce an error message in case the file does not exist

pycrtools.tasks.averagespectrum.make_frequencies(spectrum, offset=-1, frequencies=None, setxvalues=True)

Calculates the frequencies for the calculated spectrum (outdated)