Table Of Contents

Previous topic

Vectors

Next topic

File I/O

This Page

Arrays

While the basic underlying data structures are plain STL vectors, in many cases, however, one has to deal with multi-dimensional data. For this case we introduce a new wrapper class, named hArrays, that mimicks a multi-dimensional array, but still operates on an underlying vector with essentially a flat, horizontal data structure. Given that a major concern is to minimize duplication of large data structures, the array class will share memory with its associated vector and also with arrays that are derived from it. Explicit copying will have to be done in order to avoid this. Access to various dimensions (rows, columns, etc...) is done via slices that need to be contiguous in memory! Since the array is vector-based, all methods defined for vectors are also inherited by hArrays and can be applied to slices or even automatically loop over multiple slices (e.g., rows or columns).

Creating Arrays and basic operations

An array is defined using the hArray function. This is a constructor function and not a class of its own. It will return array classes of different types, such as IntArray, FloatArray, ComplexArray, StringArray, BoolArray, referring to the different data types they contain. As for vectors, each array can only contain one type of data, e.g.:

>>> hArray(Type=float, dimensions=[n1, n2, n3...], fill=None) -> FloatArray

where Type can be a Python type, a Python list/tuple (where the first element determines the type), an STL vector, or another hArray.

Dimensions are given as a list of the form [dim1, dim2, dim3, ...]. The size of the underlying vector will automatically be resized to dim1 * dim2 * dim3 * ... to be able to contain all elements. Alternatively, one can provide another array, which dimensions will be copied.

The array can be filled with initialization values that can be either a single value, a list, a tuple, or an STL vector of the same type:

>>> v = Vector(range(9))

>>> a = hArray(v, [3, 3])
>>> a
hArray(int, [3, 3], fill=[0,1,2,3,4,5,6,7,8]) # len=9 slice=[0:9])

One may wonder what the representation of the Array actually means.

The first argument of the result of the hArray() command contains the data type. Then the array dimensions and total vector size, and finally the currently active slice (given as start and end index of the vector). An optional asterisk indicates that the next operation will actually loop the previously specified slices (see below). At the end the currently selected slice is displayed (while the array may actually hold more data).

The underlying vector of an array can be retrieved with the vec() method. I.e.:

>>> a.vec()
Vector(int, 9, fill=[0,1,2,3,4,5,6,7,8])

The arrays have most of the vector methods defined, so you can also add, multiply, etc. with scalars or other arrays:

>>> a * 2
hArray(int, [3, 3], fill=[0,2,4,6,8,10,12,14,16]) # len=9 slice=[0:9])

>>> a * a
hArray(int, [3, 3], fill=[0,1,4,9,16,25,36,49,64]) # len=9 slice=[0:9])

Underlying these operations are the basic hftools functions, e.g. the multiplication is essentially a python method that first creates a new array and then calls hMul:

>>> tmp_array = a.new()
>>> tmp_array.mul(a, 2)
>>> tmp_array
hArray(int, [3, 3], fill=[0,2,4,6,8,10,12,14,16]) # len=9 slice=[0:9])
>>> a
hArray(int, [3, 3], fill=[0,1,2,3,4,5,6,7,8]) # len=9 slice=[0:9])

This could also be done calling the function hMul(tmp_array,a,2), rather than the corresponding method.

An important constraint is that all these functions or methods only work with either vector or array classes, a mix in the parameters between vectors and arrays is currently not supported.

Changing dimensions

The dimensions can be obtained and set, using the shape() and reshape() methods. The length of the underlying vector must stay the same:

>>> a.reshape([1, 9])
>>> a.shape()
(1,9)

>>> a.reshape([3, 3])
>>> a.shape()
(3,3)

Memory sharing

Note, that the array and vector point share the same memory. Changing an element in the vector:

>>> v[0] = -1
>>> v
Vector(int, 9, fill=[-1,1,2,3,4,5,6,7,8])

>>> a
hArray(int, [3, 3], fill=[-1,1,2,3,4,5,6,7,8]) # len=9 slice=[0:9])

will also change the corresponding element in the array. The same is true if one creates an array from an array. Both will share the same underlying data vector. They will also share the same size and dimensions:

>>> b = hArray(a)
>>> b[0, 0] = -2

>>> b
hArray(int, [3, 3], fill=[-2,1,2,3,4,5,6,7,8]) # len=9 slice=[0:9])
>>> a
hArray(int, [3, 3], fill=[-2,1,2,3,4,5,6,7,8]) # len=9 slice=[0:9])
>>> v
Vector(int, 9, fill=[-2,1,2,3,4,5,6,7,8])

>>> v[0] = 0
>>> a
hArray(int, [3, 3], fill=[0,1,2,3,4,5,6,7,8]) # len=9 slice=[0:9])

To actually make a physically distinct copy, you need to explicitly copy the data over:

>>> c = hArray(int, a)
>>> c.copy(a)
>>> c[0, 0] = -1
>>> c
hArray(int, [3, 3], fill=[-1,1,2,3,4,5,6,7,8]) # len=9 slice=[0:9])
>>> a
hArray(int, [3, 3], fill=[0,1,2,3,4,5,6,7,8]) # len=9 slice=[0:9])

or more simply:

>>> c = hArray(int, a, a)

or, to explicitly set the shape of the array:

>>> c = hArray(int, a.shape(), a)

(the 2nd parameter is for the dimensions, the third one is the fill parameter that initiates the copying).

Note

c = a.copy() is not a valid syntax.

Basic slicing

The main purpose of these arrays is, of course, to be able to access multiple dimensions. This is done using the usual __getitem__() method of Python.

Let us take our two-dimensional array from before:

>>> a
hArray(int, [3, 3], fill=[0,1,2,3,4,5,6,7,8]) # len=9 slice=[0:9])

The vector followed by a single number in square brackets will in principle obtain the first column of the array:

>>> a[0]
hArray(int, [3, 3], fill=[0,1,2]) # len=9 slice=[0:3])

It says in principle, because the only thing which that command does is to return a new hArray Python object, which will point to the same data vector, but contain a different data slice which is then returned whenever a method tries to operate on the vector:

>>> a[0].vec()
Vector(int, 3, fill=[0,1,2])

This retrieves a copy of the data, since assigning a sub-slice of a vector to another vector actually requires copying the data - as vectors do not know about slicing (yet). Use one-dimensional arrays if what you want to have are reference to a slice only.

In contrast, a.vec(), without slicing, will give you a reference to the underlying vector.

>>> a.vec()
Vector(int, 9, fill=[0,1,2,3,4,5,6,7,8])

For convenience a[0, 1] will return the value, rather than a one element slice.

>>> a[0, 1]
1

just like a[0, 1:2], as the index range 1:2 contains only 1 element and is therefore interpreted as a single index.

Note

If the slice returns a single element, you can not use vec() or val().

Note

In the case that a Vector or hArray consist of a single element vec() and val() do work as intended.

One may wonder, why one has to use the extra methods vec() and val() to access the data. The reason is that slicing on its own will return an array (and not a vector), which we need for other purposes still.

Slicing can also be done over multiple elements of one dimension, using the known Python slicing syntax:

>>> a[0, 0:2].val()
[0, 1]

however, currently this is restricted to the last dimension only, in order to point to a contiguous memory slice. Hence:

>>> a[0:2]
hArray(int, [3, 3], fill=[0,1,2,3,4,5]) # len=9 slice=[0:6])

is possible, but when using slices in multiple dimensions, e.g.:

>>> a[1:2, 0:2]
hArray(int, [3, 3], fill=[3,4]) # len=9 slice=[3:5])

the first slice is simply ignored. Instead, it’s first element is used.

Finally, negative indices count from the end of the slice, i.e.:

>>> a[-1]
hArray(int, [3, 3], fill=[6,7,8]) # len=9 slice=[6:9])

gives the last slice of the first index, while:

>>> a[0:-1]
hArray(int, [3, 3], fill=[0,1,2,3,4,5]) # len=9 slice=[0:6])

gives all but the last slice of the first index.

Selecting & copying parts of the array

Suppose we want to have a subset of elements of an array b that satisfy a certain condition, e.g. have values that lie between two values, on which we later want to do some operation:

>>> b = hArray(int, [15], fill=[100,101,103,107,110,111,112,115,113,110,108,105,103,102,100])

If you want to have a distinct copy of the selected values in b you can use the Select() method, e.g.:

>>> b_s = b.Select("between", 107, 112)
>>> b_s
hArray(int, [4], fill=[110,111,110,108]) # len=4 slice=[0:4])

With the Find() method you obtain the indices of the elements in b. This is useful if you want to do an operation on a conditional subset of the elements of b, e.g.:

>>> b_i = b.Find("between", 107, 112)
>>> b_i
hArray(int, [4], fill=[4,5,9,10]) # len=4 slice=[0:4])

We can use this to modify the selected elements in b. E.g., suppose we want to add 100 to the selected values:

>>> b[b_i] += 100
>>> b
hArray(int, [15], fill=[100,101,103,107,210,211,112,115,113,210,208,105,103,102,100]) # len=15 slice=[0:15])
>>> b_s += 100
>>> b_s
hArray(int, [4], fill=[210,211,210,208]) # len=4 slice=[0:4])

Note

Note that when using the Select() method only works on 1-dimensional arrays.

When using the Find() method on a multi-dimensional array it returns a 1D array of indices. To apply an operation on the original multidimensional array using the 1D array of indices you can do the following:

>>> b = hArray(int, [3,5], fill=[100,101,103,107,110,111,112,115,113,110,108,105,103,102,100])
>>> b
hArray(int, [3, 5], fill=[100,101,103,107,110,111,112,115,113,110,108,105,103,102,100]) # len=15 slice=[0:15])
>>> bs = b.shape()
>>> b.reshape((len(b),))
>>> b_i = b.Find("between", 107, 112)
>>> b[b_i] += 100
>>> b.reshape(bs)
>>> b
hArray(int, [3,5], fill=[100,101,103,107,210,211,112,115,113,210,208,105,103,102,100]) # len=15 slice=[0:15])

This creates a 1D representation (soft copy) of the original vector on which then the desired operation is applied.

The 1D representation vector also works when using Select(), e.g.:

>>> b_s = b.Select("between", 107, 112)
>>> b_s += 100
hArray(float, [4], fill=[210,211,210,208]) # len=4 slice=[0:4])

but beware that the size and dimensionality of the newly created array do not correspond to that of the original array!

Applying methods to slices

First of all, we can apply the defined vector functions also to array slices directly, e.g.:

>>> a = hArray(int, [3,3], range(9))
>>> a[0].sum()
Vector(int, 1, fill=[3])

will return the sum over the first row of the array, i.e. the first three elements of the underlying vector. While:

>>> a[0].negate()
>>> a
hArray(int, [3, 3], fill=[0,-1,-2,3,4,5,6,7,8]) # len=9 slice=[0:9])
>>> a[0].negate()
>>> a
hArray(int, [3, 3], fill=[0,1,2,3,4,5,6,7,8]) # len=9 slice=[0:9])

returns nothing, but will actually change the sign of the first three elements (i.e. the first slice) in the underlying vector.

Ellipses

In principle one could now loop over all slices using a for loop:

>>> for i in range(a.shape()[0]):
...     print "row", i,":", a[i].val()
row 0 : [0,1,2]
row 1 : [3,4,5]
row 2 : [6,7,8]

However, looping over slices in simple way is aready built into the arrays, by appending the ellipsis symbol ... to the dimensions. This will actually put the array in looping mode:

>>> l = a[0:3, ...]
>>> l
hArray(int, [3, 3], fill=[0,1,2]) # len=9 slice=[0:3]*)

which is indicated in the screen representation of the array by an extra asterisk and actually means that one can loop over all the elements of the respective dimension:

>>> iterate = True
>>> while iterate:
...     print "Row", l.loop_nslice(), ":", l.val()
...     iterate = l.next().doLoopAgain()
row 0 : [0, 1, 2]
row 1 : [3, 4, 5]
row 2 : [6, 7, 8]

>>> l
hArray(int, [3, 3], fill=[0,1,2]) # len=9 slice=[0:3]*)

This will do exactly the same as the for-loop above.

Here doLoopAgain() will return True as long as the array is in looping mode and has not yet reached the last slice. loop_nslice() returns the current slice the array is set to (see also loop_i(), loop_start(), loop_end()). next() will advance to the next slice until the end is reached (and doLoopAgain() is set to False). The loop will be reset at the next call of next(). Hence, as written above the loop could be called multiple times where the loop will be automatically reset each time.

As an example, let us calculate the mean value of each slice at the to level of our example array, which is simply:

>>> l.mean()
Vector(float, 3, fill=[1,4,7])

In contrast to the same method applied to vectors, where a single value is returned, the return value is now a vector of values, each of which corresponds to the mean of one top-level slice. Hence, the vector has looped automatically over all the slices specified in the definition of the array.

The looping over slices can be more complex taking start, stop, and increment values into account.

>>> a[1:, ...].mean()
Vector(float, 2, fill=[4,7])

will loop over all top-level slices starting at the 2nd slice (slice #1) until the last.

>>> a[:2, ...].mean()
Vector(float, 2, fill=[1,4])

will loop over the first two top-level slices.

>>> a[0:3:2, ...].mean()
Vector(float, 2, fill=[1,7])

will loop over the two top-level slices using an increment of 2, i.e. here take the first and third only (so, here non contiguous slices can be put to work).

To loop over all slices in one dimensions, a short-cut can be used by leaving away the slice specification. Hence,

>>> a[...].mean()
Vector(float, 3, fill=[1,4,7])

will do the same as

>>> a[0:, ...].mean()
Vector(float, 3, fill=[1,4,7])

It is even possible to specify an array of indices for the slicing.

>>> a[[0, 2], ...].mean()
Vector(float, 2, fill=[1,7])

will loop over slices 0 and 2.

It is possible to specify a slice after the ellipse, e.g.,

>>> a[..., 0:2].mean()
Vector(float, 3, fill=[0.5,3.5,6.5])

which means that the mean is taken only from the first two elements of each top-level slice.

Even more complicated: the elements of the slice can be vectors or lists:

>>> a[..., [0, 1]:[2, 3]].mean()
Vector(float, 3, fill=[0.5,4.5,6.5])

over which one can loop as well. Hence, in the operation on the first row, the subslice [0:2] will be taken, while for the second row the subslice [1:3] is used. The third row uses the subslice [0:2] again, since it is looping over the index pairs [0,2] and [1,3].

Parameters of looping arrays

Looping can also be done for methods that require multiple arrays as inputs. In this case the next() method will be applied to every array in the parameter list and looping proceeds until the first array has reached the end. Hence, care has to be taken that the same slice looping is applied to all arrays in the parameter list.

As an example we create a new array of the dimensions of a:

>>> a
hArray(int, [3, 3], fill=[0,1,2,3,4,5,6,7,8]) # len=9 slice=[0:9])
>>> x = hArray(int, a)
>>> x
hArray(int, [3, 3], fill=[0,0,0,0,0,0,0,0,0]) # len=9 slice=[0:9])

and fill it with slices from a multiplied by the scalar value 2:

>>> x[[0,2], ...].mul(a[[0, 2], ...], 2)
>>> x
hArray(int, [3, 3], fill=[0,2,4,0,0,0,12,14,16]) # len=9 slice=[0:9])

and indeed now the first and last slice were operated on and filled with the results of the operation.

Forgetting slicing in a parameter can lead to unexpected results, e.g., in the following example a is looped over but x is not. Hence, the result will always be written (and overwritten) into the first three elements of x, containing at the end only the results of the multiplication of the last slice in a:

>>> x.fill(0)
>>> x[...].mul(a, 2)
>>> x
hArray(int, [3, 3], fill=[0,2,4,0,2,4,0,2,4]) # len=9 slice=[0:9])

Note

There are currently relatively strict rules on how to change the parameters from a vector to an array.

  1. When going from a vector to an array, all other vectors in the argument list also have to be provided as arrays!

  2. Scalar parameters can be provided as single-valued scalars or as vectors. In the latter case the algorithm will take the first element of the vector.

  3. If one scalar parameter is provided as a vector, all scalar parameters have to be provided as vectors. (They can be of different length or of length unity where, in the latter case, always the same value is taken.)

  4. If an algorithm has a scalar return value, a vector of values will be returned by the same algorithm if invoked with arrays.

  5. If a slice is specified with vectors as elements (i.e. [1, 2, 3]:[5, 6, 7]), both start and stop have to be vectors. The algorithm will then loop over all elements in the lists. E.g.:

    >>> a[..., [0, 1]:[2, 3]].mean()
    Vector(float, 3, fill=[0.5,4.5,6.5])
    

    and:

    >>> a[..., [0, 1]:2].mean()
    

    will result in an ArgumentError.

Units and Scale Factors

Numerical arrays allow one to set a (single) unit for the data. With setUnit(prefix, unit_name) one can specify the name of the unit and the scale factor, which is specified as a string being one of “f”, “p”, “n”, “micro”, “m”, “c”, “d”, “”, “h”, “k”, “M”, “G”, “T”, “P”, “E”, “Z”.

>>> a.setUnit("M", "Hz")
hArray(int, [3, 3], fill=[0,1,2,3,4,5,6,7,8]) # len=9 slice=[0:9])

will set the unit name to MHz without modifiying the values in the array (assuming that the values were deliverd initially in this unit). However, the scaling can be changed by calling setUnit again (with or without a unit name), e.g.:

>>> a.setUnit("k", "")
hArray(int, [3, 3], fill=[0,1000,2000,3000,4000,5000,6000,7000,8000]) # len=9 slice=[0:9])

Which has converted the values to kHz. The name of the unit can be retrieved with:

>>> a.getUnit()
'kHz'

and cleared with clearUnit():

>>> a.clearUnit()
>>> a.getUnit()
''

Keywords and Values

For documenting the vector further and to store certain values, one can store keywords and values in the array. This is done with:

>>> a.setKey("name", "TestArray")

The keywords can be arbitrary strings ann the values also arbitrary strings. Thus numbers need to be converted to strings and back. The keyword name is special in the sense that it is a default key that is recognized by a number of other modules (including the __repr__() method governing array output) to briefly describe the data.

The keyword values can be retrieved using getKey():

>>> a.getKey("name")
'TestArray'