pycrtools.tasks.averagespectrum¶

`AverageSpectrum`([ws, parfile])	The function will calculate an average spectrum from a list of files and a series of antennas (all averaged into one spectrum).
`WorkSpace`(**args)
`averagespectrum_getfile`(current_file_number, ...)	To produce an error message in case the file does not exist
`make_frequencies`(spectrum[, offset, ...])	Calculates the frequencies for the calculated spectrum (outdated)

Average spectrum¶

Module author: Heino Falcke <h.falcke@astro.ru.nl>

class pycrtools.tasks.averagespectrum.AverageSpectrum(ws=None, parfile=None, **kwargs)¶

The function will calculate an average spectrum from a list of files and a series of antennas (all averaged into one spectrum).

The task will do a basic quality check of each time series data set and only average good data.

For very large FFTs, and to save memory, one can use a doublefft which means that one big FFT is split into two steps of smaller FFts. This allows almost arbitrary resolution.

The function will go through all files in the list and then loop, one-by-one, over all antennas in the files. Data will be read in in nchunks chunks (i.e. blocks) of size fullsizes. For a single FFT these chunks will be subdivided into nblock blocks to do quality checks (i.e., to see if certain blocks deviate from others). For a double FFT with stride>1 only a fraction of these blocks are read in at once.

The desired frequency resolution is provided with the parameter delta_nu, but this will be rounded off to the nearest value using channel numbers of a power of 2. For single FFTs you can also set the parameter chunksize.

Note: the spectrum has N/2 channels and not N/2+1 channels as is usually the case. This means that the last channel is missing!!

The resulting spectrum is stored in the array Task.power (only complete for stride=1) and written to disk as an cr.hArray with parameters stored in the header dict (use getHeader('AverageSpectrum') to retrieve this.)

Note for single FFTs one can decide whether to average all antennas into one spectrum (e.g., an incoherent ‘station spectrum’), or whether to keep the antennas separate and get an average spectrum per antenna (addantennas=False).

This spectrum can be read back and a baseline can be fitted with FitBaseline.

Incoherent Beam

With the option calc_incoherent_sum one can add the square of all time series data of all antennas into one long time series data (containing the power) for single FFTs. The resulting array Task.incoherent_sum has the length of the entire data set (of one antenna, of course) and has the dimensions [nchunks,chunksize=blocksize]. It will be saved on disk in sp.par.incoherent_sum.

Dynamic Spectrum

With the option calc_dynspec turned on, the task will calculate a dynamic spectrum for each file within the length of the data set. It will add all antennas of a station and then divide the data sets into sections of length t_int within which all chunks of data are intergated. The section is the one time-step in the dynamic spectrum. The result is provided in Task.dynspec and Task.cleanspec (where the dynamic spectrum is divided by an average spectrum, stored in Task.avspec). Avspec and cleanspec are also par-array of dynspec (i.e., Task.dynspsc.par.cleanspec). The dynamic spectrum will also be stored in the output directory (output_subdir=filename.dir) as a separate file.

If multiple files are provided, also a dynamic spectrum is made out of the average spectra of the files being worked on and an average spectrum of all files. (Note, that the files have to be sorted by timestamp in the Task.filenames, which is produced by Task.filefilter.

Quality Checking

To avoid the spectrum being influenced by spikes in the time series, those spikes can be replaced by random numbers, before the FFT is taken (see randomize_peaks).

The quality information (RMS, MEAN, flagged blocks per antennas) is stored in a data ‘quality database’ in text and python form and is also available as Task.quality. (See also: Task.antennacharacteristics, Task.mean, Task.mean_rms, Task.rms, Task.rms_rms, Task.npeaks, Task.npeaks_rms, Task.homogeneity_factor - the latter is printend and is a useful hint if something is wrong)

One can specify the maximum number of blocks in a chunk that are allowed to fail a quality criterium (maxblocksflagged), before the entrie chunk is discarded. The way quality checking (see module :func:qualitycheck) works, is that each chunk is divided into blocks. If a few of these blocks have a mean or rms that deviates too much from the others, they will be flagged. Also, a noise level will be determined where one expects roughly one peak per block above that noise for Gaussian noise. If the actual number of peaks is spikeexcess above that number, flag the block (and perhaps the entire chunk)

Flagged blocks can be easily inspeced using the task method Task.qplot (qplot for quality plot).

If you see an outputline like this:
# Start antenna = 92 (ID= 17011092) - 4 passes:
184 - Mean=  3.98, RMS=  6.35, Npeaks=  211, Nexpected=256.00 (Npeaks/Nexpected=  0.82), nsigma=  2.80, limits=( -2.80,   2.80)
185 - Mean=  3.97, RMS=  6.39, Npeaks=  200, Nexpected=256.00 (Npeaks/Nexpected=  0.78), nsigma=  2.80, limits=( -2.80,   2.80)
186 - Mean=  3.98, RMS=  6.40, Npeaks=  219, Nexpected=256.00 (Npeaks/Nexpected=  0.86), nsigma=  2.80, limits=( -2.80,   2.80)
- Block   514: mean=  0.25, rel. rms=   2.6, npeaks=   16, spikyness=  15.00, spikeexcess= 16.00   ['rms', 'spikeexcess']
this will tell you that Antenna 17011092 was worked on (the 92nd antenna in the data file) and the 186th chunk (block 514) contained some suspicious data (too many spikes). If you want to inspect this, you can call:
>>> Task.qplot(186)
This will plot the chunk and highlight the first flagged block (#514) in that chunk:
>>> Task.qplot(186,1)
would highlight the second flagged block (which does not exist here).

If the chunks are too long to be entirely plotted, use:
>>> Task.qplot(186,all=False).
Sections, Chunks, Blocks

The terminology is as follows: the smallest unit is a block. If stride>1 only every nth block will be read in to do a double fft.

stride*blocks will be one chunk of data. The size of a chunk determines the resolution of an FFT.

Input parameters

addantennas [default value: True ]: If True, add all antennas into one average spectrum, otherwise keep them separate in memory per file.
all_avspec: Average spectrum over all times and all files, used for plotting ...?
all_cleanspec: Cleaned dynamic spectrum.
all_dynspec: Dynamic spectrum from the average spectra of all files (if no doublefft and calc_dynspec=True).
all_dynspec_file: Complete filename including directory to store the final dynamic spectrum of all files.
antennaIDs: Antenna IDs from the current file.
antennas: Array of index numbers for antennas to be processed from the current file.
antennas_end [default value: -1 ]: Maximum antenna number (plus one, zero based ...) to use in each file.
antennas_start [default value: 0 ]: Start with the n-th antenna in each file (see also nantennas_stride). Can be used for selecting odd/even antennas.
antennas_stride [default value: 1 ]: Take only every n-th antenna from antennas list (see also antennas_start). Use 2 to select odd/even.
avspec: Average spectrum over all times, used for plotting ...?
blocklen: The size of a block being read in if stride>1, otherwise self.blocklen*self.nblocks is the chunksize.
blocklen_section: Blocklength of a section
calc_all_dynspec: Calculate a dynamic spectrum from the average spectra of all files as well (if no doublefft)
calc_averagespectrum [default value: True ]: Calculate the averagespectrum (not yet fully implemented ... to not do it).
calc_dynspec [default value: False ]: Calculate a dynamic spectrum as well (if no doublefft)
calc_incoherent_sum [default value: False ]: Calculate the incoherent sum of all antennas (doublefft=False). See incoherent_sum for result.
cdata: Complex array holding the FFT of the time series data and other things
cdata2: Supplementary complex array holding the FFT of the time series for a single FFT
cdataT: Complex array holding the FFT of the time series data and other things transposed.
chunkduration: Length (duration) of one chunk of data.
chunks_per_sect: Number of chunks per section of data.
chunksize_estimated: The desired chunksize for a single FFT. Can be set directly, otherwise estimated from delta_nu. The actual size will be rounded to give a power of two, hence, see chunksize_used for the actual chunksize used.
chunksize_used: The chunksize one would use for a single FFT.
cleanspec: Cleaned dynamic spectrum.
datafile: Data file object pointing to raw data.
datalength: The (maximum) length of one full data set of one antenna per file.
delta_band: Frequency width of one section/band of spectrum
delta_frequency: Separation of two subsequent channels in final spectrum
delta_nu [default value: 1000 ]: Frequency resolution
doplot [default value: False ]: Produce output plots. If doplot>1 plot while processing.
doublefft [default value: False ]: if True split the FFT into two, thereby saving memory.
dynspec: Dynamic spectrum of all antennas (if no doublefft and calc_dynspec=True).
dynspec_file: Complete filename including directory to store the final dynamic spectrum.
dynspec_nspectra_added: Number of spectra added per time step.
dynspec_time: Time axis for dynamic spectrum.
end_frequency: End frequency of spectrum
end_time: End of time axis for dynamic spectrum.
fftdata: Complex array holding the FFT of the time series for a single FFT (Same memory as cdata)
file_start_number [default value: 0 ]: Integer number pointing to the first file in the filenames list with which to start. Can be changed to restart a calculation.
fileext [default value: ”.pcr” ]: Extension of filename for temporary data files (e.g., used if stride > 1.)
filefilter [default value: “$LOFARSOFT/data/lofar/RS307C-readfullsecondtbb1.h5” ]: Unix style filter (i.e., with *, ~, $VARIABLE, etc.), to describe all the files to be processed.
filenames: List of filenames of data file to read raw data from.
freqs: Frequencies
frequencies: Frequency axis for final power spectrum.
full_blocklen: Full block length (blocklen*stride)
fullsize: The full length of the raw time series data used for one spectral chunk (nblocks*blocklen*stride).
header: Header of datafile
incoherent_sum: Incoherent sum of the power in all antennas (timeseries data).
incoherent_sum_file: Complete filename including directory to store the incoherent sum of all antennas.
infileext [default value: ”.h5” ]: Extension of the input filename (later to be removde from the filename)
load_if_file_exists [default value: False ]: If average spectrum file (spectrumfile) already exists, skip calculation and load existing file.
maxblocksflagged [default value: 0 ]: Maximum number of blocks that are allowed to be flagged before the entire spectrum of the chunk is discarded. If =0 then flag a chunk if any block is deviant.
maxnchunks [default value: -1 ]: Maximum number of chunks of raw data to integrate spectrum over (-1 = all).
maxpeak [default value: 7 ]: Maximum height of the maximum in each block (in sigma,i.e. relative to rms) before quality is failed.
meanfactor [default value: 3 ]: Factor by which the mean is allowed to change within one chunk of time series data before it is flagged.
nantennas: The number of antennas to be calculated in one file.
nantennas_file: The actual number of antennas available for calculation in the file.
nbands: Number of bands in spectrum which are separately written to disk, if stride > 1. This is one half of stride, since only the lower half of the spectrum is written to disk.
nblocks: Number of blocks of lenght blocklen the time series data set is split in. The blocks are also used for quality checking.
nblocks_section: Number of blocks per section
nchunks: Number of spectral chunks that are averaged
nchunks_max: Maximum number of spectral chunks to average
nfiles: Number of data files to process.
nsamples_data: Number of samples in raw antenna file
nsects: Number of sections in the data file
nsubblocks: Number of subblocks
ntimeseries_data_added_per_chunk: Number of chunks added in each bin for incoherent sum
output_dir [default value: “” ]: Directory where output file is to be written to (look for filename.dir there).
output_subdir: Directory where output data files are being written to.
peak_rmsfactor [default value: 5 ]: At how many sigmas above the mean will a peak be randomized.
plot_antenna [default value: 0 ]: Which antenna to plot?
plot_filename [default value: “” ]: Base filename to store plot in.
plot_finish: Function to be called after each plot to determine whether to pause or not (see ::func::plotfinish)
plot_name [default value: “” ]: Extra name to be added to plot filename.
plot_zoom_slice: slice(n1,n2) - If a slice is provided use this to plot an additional graph zoomed into the region between these channels along the x-axis
plot_zoom_x: (nu1,nu2) - If a tuple with two numbers is provided use this to plot an additional graph zoomed into the region between these frequencies on the x-axis
plot_zoom_x1 [default value: 0 ]: Use this to plot an additional graph zoomed into the region between plot_zoom_x1 and plot_zoom_x2
plot_zoom_x2 [default value: 0 ]: Use this to plot an additional graph zoomed into the region between plot_zoom_x1 and plot_zoom_x2
plotlen [default value: 4096 ]: How many channels +/- the center value to plot during the calculation (to allow progress checking).
plotskip [default value: 1 ]: Plot only every ‘plotskip’-th spectrum, skip the rest (should not be smaller than 1).
power: Resulting average power spectrum (or part thereof if stride > 1)
power2: Copy of power with 2 dimensions also when antennas are being added.
power_size: Size of the output spectrum in the vector power
quality_db_filename [default value: “qualitydatabase” ]: Root filename of log file containing the derived antenna quality values (uses ‘.py’ and ‘.txt’ extension).
qualitycheck [default value: True ]: Perform basic qualitychecking of raw data and flagging of bad data sets.
randomize_peaks [default value: True ]: Replace all peaks in time series data which are ‘rmsfactor’ above or below the mean with some random number in the same range.
rmsfactor [default value: 3 ]: Factor by which the RMS is allowed to change within one chunk of time series data before it is flagged.
rmsrange: Tuple with absolute min/max values of the rms of the time series that are allowed before a block is flagged. Will be igrored if None.
rmsrange_max [default value: 0 ]: Max value of the rms of the time series that are allowed before a block is flagged. Will be igrored if None.
rmsrange_min [default value: 0 ]: Min value of the rms of the time series that are allowed before a block is flagged. Will be igrored if None.
root_filename: Root filename for output without directory and extensions.
samplerate: Sample length in raw data set.
sectduration: Length in time units of one section of data used to extract one chunk, i.e. on time step in dynamic spectrum.
sectlen: Length of one section of data used to extract one chunk, i.e. on time step in dynamic spectrum.
size_data: Number of samples in raw antenna file
spec: Wrapper array for cdata
specT: Wrapper array for cdataT
specT2: Wrapper array for cdataT
speclen: The length of the spectrum for a single FFT
spectrum_file: Complete filename including directory to store the final spectrum.
spikeexcess [default value: 20 ]: Set maximum allowed ratio of detected over expected peaks per block to this level (1 is roughly what one expects from Gaussian noise).
start_frequency: Start frequency of spectrum
start_time [default value: 0 ]: Start of time axis for dynamic spectrum.
store_spectra [default value: True ]: Write calculated spectra to disk.
stride: If stride>1 divide the FFT processing in n=stride blocks.
stride_n [default value: 0 ]: if >0 then divide the FFT processing in n=2**stride_n blocks. This is slower but uses less memory.
subspeclen: Size of one section/band of the final spectrum
t_int [default value: 0.001 ]: Desired intergration time per spectrum for dynamic spectrum - will be rounded off to get integer number of spectral chunks
t_int_used: Actual time resolution of dynamic spectrum
tdata: Array to read time series data in
tmpfilename [default value: “tmp” ]: Root filename for temporary data files.
tmpspec: Wrapper array for cdata
tmpspecT: Wrapper array for cdataT

Output parameters

antenna_list [default value: {} ]: Dictionary of antenna indices used as input from each filename.
antennacharacteristics [default value: {} ]: A dict with antenna IDs as key, containing quality information about every antenna.
antennas_used: A set of antenna names that were actually included in the average spectrum, excluding the flagged ones.
current_file_number [default value: 0 ]: Integer number pointing to the current file in the filenames list.
filename: Index number of the current file in the list of filenames of raw data files.
homogeneity_factor [default value: 0 ]: =1-(rms_rms/rms+ npeaks_rms/npeaks)/2 - this describes the homogeneity of the data processed. A homogeneity_factor = 1 means that all antenna data were identical, a low factor should make one wonder if something went wrong.
mean [default value: 0 ]: Mean of mean time series values of all antennas.
mean_antenna: Mean value of time series per antenna.
mean_rms [default value: 0 ]: RMS of mean of mean time series values of all antennas.
nantennas_total [default value: 0 ]: Total number of antennas that were processed (flagged or not) in this run.
npeaks [default value: 0 ]: Mean of number of peaks all antennas.
npeaks_antenna: Number of peaks of time series per antenna.
npeaks_rms [default value: 0 ]: RMS of npeaks over all antennas.
nspectraadded: Number of spectra added at the end.
nspectraadded_per_antenna: Number of spectra added per antenna.
nspectraflagged: Number of spectra flagged and not used.
nspectraflagged_per_antenna: Number of spectra flagged and not used per antenna.
output_files: Dict containing the outpufiles created in the task. Keywords are ‘dynamic_spectrum’, ‘incohrent_sum’, ‘average_spectrum’, which in turn return listsof filenames
quality [default value: [] ]: A list containing quality check information about every large chunk of data that was read in. Use Task.qplot(Entry#,flaggedblock=nn) to plot blocks in question.
rms [default value: 0 ]: Mean of RMS time series values of all antennas.
rms_antenna: RMS value of time series per antenna.
rms_rms [default value: 0 ]: RMS of rms of mean time series values of all antennas.
starttime_UTC: Start time of the observation in UTC
starttime_local: Start time of the observation in local time
starttime_unix: Unix time stamp of start time.
starttimes_UTC [default value: [] ]: Start times of the observations in UTC per file
starttimes_local [default value: [] ]: Start times of the observations in local time per file
starttimes_unix [default value: [] ]: Start times of the observations in unix time time per file

class WorkSpace(**args)¶

AverageSpectrum.dynplot(dynspec=None, plot_cleanspec=None, dmin=None, dmax=None, cmin=None, cmax=None, ylabel=None, xlabel=None, extent=None, zoom_slice=None, zoom_x=None)¶

Plot the dynamic spectrum. Provide the dynamic spectrum computed by the Task AverageSpectrum as input.

plot_cleanspec	None	If False, don’t plot the cleaned spectrum (provided as dynspec.par.cleanspec).
dmin		Minimum z-value (intensity) in dynamic spectrum to plot.
dmax		Maximum z-value (intensity) in dynamic spectrum to plot.
cmin		Minimum z-value (intensity) in clean dynamic spectrum to plot.
cmax		Maximum z-value (intensity) in clean dynamic spectrum to plot.
y/xlabel		Label of y and x-axis.
extent		(x0,x1,y0,y1) = define extent/range of x and y axis (just for labelling)
zoom_slice		slice(n0,n1) - Zoom into this slice of the data
zoom_x		(nu0,nu1) - zoom_slice slice corresponds to these frequencies

Example:

>>> tload "AverageSpectrum"; go
>>> dsp=hArrayRead("Saturn.h5.dynspec.pcr")
>>> Task.dynplot(dsp,cmin=2.2*10**-5,cmax=0.002,dmin=6.8,dmax=10)

AverageSpectrum.init()¶: Initialize the task

AverageSpectrum.qplot(entry=0, flaggedblock=0, block=-1, all=True)¶

If you see an output line like this:

# Start antenna = 0 (ID= 2000000) - 375 passes:
# Flagged: #0 chunk= 0 , Block    14: mean= 121.68, rel. rms= 970.1, npeaks=    0, spikyness=  -1.00, spikeexcess=  0.00   ['rms', 'mean']
# Flagged: #0 chunk= 0 , Block    15: mean= 139.70, rel. rms=1049.7, npeaks=    0, spikyness=  -1.00, spikeexcess=  0.00   ['rms', 'mean']
# Flagged: #2 chunk= 2 , Block    15: mean= 139.70, rel. rms=1049.7, npeaks=    0, spikyness=  -1.00, spikeexcess=  0.00   ['rms', 'mean']
# Data flagged!
# End   antenna = 0  Time = 14.825337 s  nspectraadded = 374 nspectraflagged = 1

this will tell you that Antenna 2000000 was worked on (the 1st antenna in the data file) and the chunk 0 (blocks 14 and 15 within that chunk) and chunk 2 contained some suspicious data (with and rms and mean which fluctuated too much). If you want to inspect this, you can call:

>>> Task.qplot(0)

This will plot the chunk and highlight the first flagged block (#14) in that chunk:

>>> Task.qplot(0,1)

would highlight the second flagged block (#15)

If the chunks are too long to be entirely plotted, use:

>>> Task.qplot(186,all=False).

AverageSpectrum.run()¶: Run the program.

class pycrtools.tasks.averagespectrum.WorkSpace(**args)¶

pycrtools.tasks.averagespectrum.averagespectrum_getfile(current_file_number, filenames, nfiles, filefilter)¶: To produce an error message in case the file does not exist

pycrtools.tasks.averagespectrum.make_frequencies(spectrum, offset=-1, frequencies=None, setxvalues=True)¶: Calculates the frequencies for the calculated spectrum (outdated)

Table Of Contents

Previous topic

Next topic

This Page

pycrtools.tasks.averagespectrum¶

Average spectrum¶

Navigation

Table Of Contents

Previous topic

Next topic

This Page

Quick search

pycrtools.tasks.averagespectrum¶

Average spectrum¶

Navigation