Version 1.0 of astropy introduces a new concept of the “Mixin Column” in tables which allows integration of appropriate non-Column based class objects within a Table object. These mixin column objects are not converted in any way but are used natively.
The available built-in mixin column classes are:
Warning
The interface for using mixin columns is experimental at this point and it is not recommended to use this feature in production code. There are known limitations and some table functionality which is not yet implemented for mixin columns. API changes are likely and since the code is all new there may be some bugs.
As a first example we can create a table and add a time column:
>>> from astropy.table import Table
>>> from astropy.time import Time
>>> t = Table()
>>> t['index'] = [1, 2]
>>> t['time'] = Time(['2001-01-02T12:34:56', '2001-02-03T00:01:02'])
>>> print(t)
index time
----- -----------------------
1 2001-01-02T12:34:56.000
2 2001-02-03T00:01:02.000
The important point here is that the time column is a bona fide Time object:
>>> t['time']
<Time object: scale='utc' format='isot' value=['2001-01-02T12:34:56.000' '2001-02-03T00:01:02.000']>
>>> t['time'].mjd
array([ 51911.52425926, 51943.00071759])
The ability to natively handle Quantity objects within a table makes it easier to manipulate tabular data with units in a natural and robust way. However, this feature introduces an ambiguity because data with a unit (e.g. from a FITS binary table) can be represented as either a Column with a unit attribute or as a Quantity object. In order to retain complete backward compatibility with astropy versions prior to 1.0, a minor variant of the Table class called QTable is available. QTable is exactly the same as Table except that Quantity is the default for any data column with a defined unit.
If you take advantage of the Quantity infrastructure in your analysis then QTable is the preferred way to create tables with units. If instead you use table column units more as a descriptive label then the plain Table class is probably the best class to use.
To illustrate these concepts we first create a standard Table where we supply as input a Time object and a Quantity object with units of m / s. In this case the quantity is converted to a Column (which has a unit attribute but does not have all the features of a Quantity):
>>> import astropy.units as u
>>> t = Table()
>>> t['index'] = [1, 2]
>>> t['time'] = Time(['2001-01-02T12:34:56', '2001-02-03T00:01:02'])
>>> t['velocity'] = [3, 4] * u.m / u.s
>>> print(t)
index time velocity
m / s
----- ----------------------- --------
1 2001-01-02T12:34:56.000 3.0
2 2001-02-03T00:01:02.000 4.0
>>> type(t['velocity'])
<class 'astropy.table.column.Column'>
>>> t['velocity'].unit
Unit("m / s")
>>> (t['velocity'] ** 2).unit # WRONG because Column is not smart about unit
Unit("m / s")
So instead let’s do the same thing using a quantity table QTable:
>>> from astropy.table import QTable
>>> qt = QTable()
>>> qt['index'] = [1, 2]
>>> qt['time'] = Time(['2001-01-02T12:34:56', '2001-02-03T00:01:02'])
>>> qt['velocity'] = [3, 4] * u.m / u.s
Now we print the table again but this time notice that the individual values all have units because this is how Quantity prints a single array element:
>>> print(qt)
index time velocity
m / s
----- ----------------------- ---------
1 2001-01-02T12:34:56.000 3.0 m / s
2 2001-02-03T00:01:02.000 4.0 m / s
The velocity column is now a Quantity and behaves accordingly:
>>> type(qt['velocity'])
<class 'astropy.units.quantity.Quantity'>
>>> qt['velocity'].unit
Unit("m / s")
>>> (qt['velocity'] ** 2).unit # GOOD!
Unit("m2 / s2")
You can easily convert Table to QTable and vice-versa:
>>> qt2 = QTable(t)
>>> type(qt2['velocity'])
<class 'astropy.units.quantity.Quantity'>
>>> t2 = Table(qt2)
>>> type(t2['velocity'])
<class 'astropy.table.column.Column'>
Most common table operations behave as expected when mixin columns are part of the table. However, there are limitations in the current implementation.
Adding or inserting a row
Adding or inserting a row works as expected only for mixin classes that are mutable (data can changed internally) and that have an insert() method. Quantity supports insert() but Time and SkyCoord do not. If we try to insert a row into the previously defined table an exception occurs:
>>> qt.add_row((1, '2001-02-03T00:01:02', 5 * u.m / u.s))
Traceback (most recent call last):
...
ValueError: Unable to insert row because of exception in column 'time':
'Time' object has no attribute 'insert'
Initializing from a list of rows or a list of dicts
This mode of initializing a table does not work with mixin columns, so both of the following will fail:
>>> qt = QTable([{'a': 1 * u.m, 'b': 2},
... {'a': 2 * u.m, 'b': 3}])
Traceback (most recent call last):
...
ValueError: setting an array element with a sequence.
>>> qt = QTable(rows=[[1 * u.m, 2],
... [2 * u.m, 3]])
Traceback (most recent call last):
...
ValueError: setting an array element with a sequence.
The problem lies in knowing if and how to assemble the individual elements for each column into an appropriate mixin column. The current code uses numpy to perform this function on numerical or string types, but it obviously does not handle mixin column types like Quantity or SkyCoord.
Masking
Mixin columns do not support masking, but there is limited support for use of mixins within a masked table. In this case a mask attribute is assigned to the mixin column object. This mask is a special object that is a boolean array of False corresponding to the mixin data shape. The mask looks like a normal numpy array but an exception will be raised if True is assigned to any element. The consequences of the limitation are most obvious in the high-level table operations.
High-level table operations
The table below gives a summary of support for high-level operations on tables that contain mixin columns:
Operation | Support |
---|---|
Grouped operations | Not implemented yet, but no fundamental limitation |
Stack vertically | Not implemented yet, pending definition of generic concatenation protocol |
Stack horizontally | Works if output mixin columns do not require masking |
Join | Works if output mixin columns do not require masking; no mixin key columns allowed |
Unique rows | Not implemented yet, uses grouped operations |
Mixin column attributes
For mixin columns the column attributes name, unit, dtype, format, description and meta are currently stored in a simple dictionary called _astropy_column_attrs. These attributes can be manipulated with the functions col_getattr and col_setattr which are available in the astropy.table.column module. These methods are not part of the astropy public API and are likely to change in the future.
ASCII table writing
Mixin columns can be written out to file using the astropy.io.ascii module, but the fast C-based writers are not available. Instead the legacy pure-Python writers will be used.
A key idea behind mixin columns is that any class which satisfies a specified protocol can be used. That means many user-defined class objects which handle array-like data can be used natively within a Table. The protocol is relatively simple and requires that a class behave like a minimal numpy array with the following properties:
The Example: ArrayWrapper section shows a working minimal example of a class which can be used as a mixin column. A pandas.Series object can function as a mixin column as well.
Other interesting possibilities for mixin columns include:
The code listing below shows a example of a data container class which acts as a mixin column class. This class is a simple wrapper around a numpy array. It is used in the astropy mixin test suite and is fully compliant as a mixin column.
class ArrayWrapper(object):
"""
Minimal mixin using a simple wrapper around a numpy array
"""
_astropy_column_attrs = None
def __init__(self, data):
self.data = np.array(data)
col_setattr(self, 'dtype', self.data.dtype)
def __getitem__(self, item):
if isinstance(item, (int, np.integer)):
out = self.data[item]
else:
out = self.__class__(self.data[item])
return out
def __setitem__(self, item, value):
self.data[item] = value
def __len__(self):
return len(self.data)
@property
def shape(self):
return self.data.shape
def __repr__(self):
return ("<{0} name='{1}' data={2}>"
.format(self.__class__.__name__, col_getattr(self, 'name'), self.data))