.. py:currentmodule:: Orange.data
Class Orange.data.Table holds a list of data instances of type :obj:`Orange.data.Instance`. All instances belong to the same domain (:obj:`Orange.data.Domain`).
Data tables are usually loaded from a file (see :ref:`Orange-data-formats`):
import Orange data = Orange.data.Table("titanic")
Data tables can also be created programmatically, as in the :ref:`code below <example-table-prog1>`.
:obj:`Table` supports most list-like operations: getting, setting, removing data instances, as well as methods :obj:`append` and :obj:`extend`. When setting items, the item must be either the instance of the correct type or a Python list of appropriate length and content to be converted into a data instance of the corresponding domain. Retrieving data instances returns references and not copies: changing the retrieved instance changes the data in the table. Slicing returns ordinary Python lists containing references to data instances, not a new :obj:`Orange.data.Table`.
According to a Python convention, the data table is considered False
when empty.
.. attribute:: domain The domain to which the instances belong. This attribute is read-only.
.. attribute:: owns_instances ``True`` if the table contains the data instances and ``False`` if it contains references to instances owned by another table.
.. attribute:: owner The actual owner of the data when ``own_instances`` is ``False``; ``None`` otherwise.
.. attribute:: version An integer that is increased when instances are added or removed from the table. It does not detect changes of the data.
.. attribute:: random_generator Random generator that is used by method :obj:`random_instance`. If the method is called and ``random_generator`` is ``None``, a new generator is constructed with random seed 0 and stored here for future use.
.. attribute:: attribute_load_status If the table was loaded from a file, this list of flags tells whether the feature descriptors were reused and how they matched. See :ref:`descriptor reuse <variable_descriptor_reuse>` for details.
.. attribute:: meta_attribute_load_status A dictionary holding this same information for meta attributes, with keys corresponding to their ids and values to load statuses.
.. method:: __init__(filename[, create_new_on]) Read data from the given file. If the name includes the extension it must be one of the known file formats (see :ref:`Orange-data-formats`). If no extension is given, the directory is searched for any file with recognized extensions. If the file is not found, Orange will also search the directories specified in the environment variable `ORANGE_DATA_PATH`. The optional flag ``create_new_on`` decides when variable descriptors are reused. See :ref:`descriptor reuse <variable_descriptor_reuse>` for more details. :param filename: the name of the file :type filename: str :param create_new_on: flag specifying when to reuse existing descriptors :type create_new_on: int
.. method:: __init__(domain) Construct an empty data table with the given domain. .. literalinclude:: code/datatable1.py :lines: 7-16 The example :ref:`continues <example-table-prog2>`. :param domain: domain descriptor :type domain: Orange.data.Domain
.. method:: __init__(instances[, references]) Construct a new data table containing the given data instances. These can be given either as another :obj:`Table` or as list of instances represented by list of value or as :obj:`Orange.data.Instance`. If the optional second argument is ``True``, the first argument must be a :obj:`Table`. The new table will contain references to data stored in the given table. If the second argument is omitted or ``False``, data instances are copied. :param instances: data instances :type instances: Table or list :param references: if ``True``, the new table contains references :type references: bool
.. method:: __init__(domain, instances) Construct a new data table with the given domain and initialize it with the given instances. Instances can be given as a :obj:`Table` (if domains do not match, they are converted), as a list containing either instances of :obj:`Orange.data.Instance` or lists. This constructor can also be used for conversion from numpy arrays. The argument ``instances`` can be a numpy array. The number of variables in the domain must match the number of columns. :param domain: domain descriptor :type domain: Orange.data.Domain :param instances: data instances :type instances: Table or list or numpy.array The following example fills the data table created :ref:`above <example-table-prog1>` with some data from a list. .. literalinclude:: code/datatable1.py :lines: 29-34 The following example shows initializing a data table from numpy array. .. literalinclude:: code/datatable1.py :lines: 38-41
.. method:: __init__(tables) Construct a table by combining data instances from a list of tables. All tables must have the same length. Domains are combined so that each (ordinary) feature appears only once in the resulting table. The class attribute is the last class attribute in the list of tables, while all other class attributes are added as ordinary features. For instance, if three tables are merged but the last one is class-less, the class attribute for the new table will come from the second table. Meta attributes for the new domain are merged based on id's: if the same attribute appears under two id's it will be added twice. If, on the opposite, same id appears two different attributes in two tables, this raises an exception. As instances are merged, exception is raised if a features or a meta attribute that appears in multiple tables does not have the same value on all of them; the feature is allowed to have a missing value on one or more (or all) tables. Note that this is not the SQL's join operator as it doesn't try to find matches between the tables but instead merges them row by row. :param tables: tables to be merged into the new table :type tables: list of instances of :obj:`Table` For example, suppose the file merge1.tab contains:: a1 a2 m1 m2 f f f f meta meta 1 2 3 4 5 6 7 8 9 10 11 12 and merge2.tab contains:: a1 a3 m1 m3 f f f f meta meta 1 2.5 3 4.5 5 6.5 7 8.5 9 10.5 11 12.5 The two tables can be loaded, merged and printed out by the following script. .. literalinclude:: code/datatable_merge.py This is what the output looks like:: Domain 1: [a1, a2], {-2:m1, -3:m2} Domain 2: [a1, a3], {-2:m1, -4:m3} Merged: [a1, a2, a3], {-2:m1, -3:m2, -4:m3} [1, 2], {"m1":3, "m2":4} + [1, 2.5], {"m1":3, "m3":4.5} -> [1, 2, 2.5], {"m1":3, "m2":4, "m3":4.5} [5, 6], { "m1":7, "m2":8} + [5, 6.5], {"m1":7, "m3":8.5} -> [5, 6, 6.5], {"m1":7, "m2":8, "m3":8.5} [9, 10], {"m1":11, "m2":12} + [9, 10.5], {"m1":11, "m3":12.5} -> [9, 10, 10.5], {"m1":11, "m2":12, "m3":12.5} Merging succeeds since the values of `a1` and `m1` are the same for all matching instances from both tables.
.. method:: append(instance) Append the given instance to the end of the table. :param instance: instance to be appended :type instance: :obj:`Orange.data.Instance` or a list .. literalinclude:: code/datatable1.py :lines: 21-24
.. method:: extend(instances) Append the given list of instances to the end of the table. :param instances: instances to be appended :type instances: list
.. method:: select(folds[, select, negate=False]) Return a subset of instances as a new :obj:`Table`. The first argument should be a list of the same length as the table; its elements should be integers or bools. The resulting table contains instances corresponding to non-zero elements of the list. If the second argument is given, it must be an integer; method ``select`` will then return the data instances for which the corresponding ``fold``'s elements match the value of the argument ``select``. The third argument, `negate` inverts the selection. It can only be given as a keyword. Note: This method should be used when the selected data instances are going to be modified later on. In all other cases, method :obj:`select_ref` is preferred. :param folds: list of fold indices corresponding to data instances :type folds: list :param select: select which instances to pick :type select: int :param negate: inverts the selection :type negate: bool :rtype: :obj:`Orange.data.Table` One common use of this method is to split the data into folds. A list for the first argument can be prepared using `Orange.data.sample.SubsetIndicesCV`. The following example prepares a simple data table and indices for four-fold cross validation, and then selects the training and testing sets for each fold. .. literalinclude:: code/datatable2.py :lines: 7-27 The printout begins with:: Indices: <1, 0, 2, 2, 0, 1, 0, 3, 1, 3> Fold 0: train [0.000000] [2.000000] [3.000000] [5.000000] [7.000000] [8.000000] [9.000000] : test [1.000000] [4.000000] [6.000000] Another form of calling the method is to use a vector of zero's and one's. .. literalinclude:: code/datatable2.py :lines: 29-31 This prints out:: [0.000000] [1.000000] [9.000000]
.. method:: select_ref(folds[, select, negate=False]) Same as :obj:`select`, except that the resulting table contains references to data instances in the original table instead of its own copy of data. In most cases, this function is preferred over the former since it consumes less memory. :param folds: list of fold indices corresponding to data instances :type folds: list :param select: select which instances to pick :type select: int :param negate: inverts the selection :type negate: bool :rtype: :obj:`Orange.data.Table`
.. method:: get_items(indices) Return a table with data instances indicated by indices. For instance, `data.get_items([0, 1, 9])` returns a table with instances with indices 0, 1 and 9. This function is useful when data is going to be modified. If not, use :obj:`get_items_ref`. :param indices: indices of selected data instances :type indices: list of int's :rtype: :obj:`Orange.data.Table`
.. method:: get_items_ref(indices) Same as above, except that it returns a table with references to data instances. This method is usually preferred over the above one. :param indices: indices of selected data instances :type indices: list of int's :rtype: :obj:`Orange.data.Table`
.. method:: filter(conditions) Return a table with data instances matching the criteria. These can be given in form of keyword arguments or a dictionary; with the latter, additional keyword argument ``negate`` can be given to reverse the selection. Note that method :obj:`filter_ref` is more memory efficient and should be preferred when data is not going to be modified. Young patients from the lenses data set can be selected by :: young = data.filter(age="young") More than one value can be allowed and more than one attribute checked. This selects all patients with age "young" or "psby" who are astigmatic:: young = data.filter(age=["young", "presbyopic"], astigm="y") The following has the same effect:: young = data.filter({"age": ["young", "presbyopic"], "astigm": "y"}) Selection can be reversed only in the latter form, by adding a keyword argument ``negate`` with value 1:: young = data.filter({"age": ["young", "presbyopic"], "astigm": "y"}, negate=1) Filters for continuous features are specified by pairs of values. In dataset "bridges", bridges with lengths between 1000 and 2000 (inclusive) are selected by :: mid = data.filter(LENGTH=(1000, 2000)) Bridges that are shorter or longer than that can be selected by inverting the range. :: mid = data.filter(LENGTH=(2000, 1000))
.. method:: filter(filt) Similar to above, except that conditions are given as :obj:`Orange.core.Filter`.
.. method:: filter_ref(conditions), filter_ref(filter) Same as the above two, except that they return a table with references to instances instead of their copies.
.. method:: filter_bool(conditions), filter_bool(filter) Return a list of bools denoting which data instances are accepted by the conditions or the filter.
.. method:: translate(domain) Return a new data table in which data instances are translated into the given domain. :param domain: new domain :type domain: :obj:`Orange.data.Domain` :rtype: :obj:`Orange.data.Table`
.. method:: translate(variables[, keep_metas]) Similar to above, except that the domain is given by a list of features. If ``keep_metas`` is ``True``, the new data instances will also have all the meta attributes from the original domain. :param variables: variables for the new data :type variables: list :rtype: :obj:`Orange.data.Table`
.. method:: to_numpy(content, weightID, multinominal) Convert a data table to numpy array. Raises an exception if the data contains undefined values. :obj:`to_numpyMA` converts to a masked array where the mask denotes the defined values. (For conversion from numpy, see the constructor.) The function returns a tuple with the array and, depending on arguments, some vectors. The argument ``content`` is a string separated in two parts with a slash. The part to the left of slash describes the content of the array; in the part on the right side lists the vectors. The content is described with the following characters: ``a`` features (without the class); can only appear on the left ``A`` like ``a``, but raises exception if there are no features ``c`` class value represented as an index of the value (0, 1, 2...); if the data has no class, the column is omitted (if ``c`` is to the left of the slash) or the tuple will contain ``None`` instead of the vector. ``C`` like ``c``, but raises exception if the data has no class ``m`` like ``c``, but one column for each target variable in a multi-target domain. ``M`` synonymous to ``m``. ``w`` instance weight; like for ``c`` the column is omitted or ``None`` is returned instead of the vector if the argument ``weightID`` is missing. ``W`` instance weight; raise an exception if ``weightID`` is missing. ``0`` a vector of zeros ``1`` a vector of ones The default content is ``a/cw``: an array with feature values and separate vectors with classes and weights. Specifying an empty string has the same effect. If the elements to the right of the slash repeat, the function returns the same Python object, e.g. in ``acc000/cwww`` the three weight vectors are one and the same Python object, so modifying one will change all three of them. This is the default behaviour on data set iris with 150 data instances described by four features and a class value:: >>> data = orange.ExampleTable("iris") >>> a, c, w = data.toNumpy() >>> a.shape (150, 4) >>> c.shape (150,) >>> print w None >>> a[0] array([ 5.0999999 , 3.5 , 1.39999998, 0.2 ]) >>> c[0] 0.0 For a more complicated example, the array will contain a column with class, features, a vector of ones, two vectors with classes and another vector of zeroes:: >>> a, = data.toNumpy("ca1cc0") >>> a[0] array([ 0., 5.0999999, 3.5 , 1.39999998, 0.2 , 1., 0., 0., 0.]) >>> a[130] array([ 2., 7.4000001, 2.79999995, 6.0999999 , 1.89999998, 1., 2., 2., 0.]) >>> c[120] 2.0 The third argument specifies the treatment of non-continuous non-binary values (binary values are always translated to 0.0 or 1.0). The argument's value can be :obj:`Orange.data.Table.Multinomial_Ignore` (such features are omitted), :obj:`Orange.data.Table.Multinomial_AsOrdinal` (the values' indices are treated as continuous numbers) or :obj:`Orange.data.Table.Multinomial_Error` (an exception is raised if such features are encountered). Default treatment is :obj:`Orange.data.Table.ExampleTable.Multinomial_AsOrdinal`. When the class attribute is discrete and has more than two values, an exception is raised unless multinomial attributes are treated as ordinal. More options for treating multinominal values are available in :obj:`Orange.data.continuization`.
.. method:: to_numpyMA(content, weightID, multinominal) Similar to :obj:`to_numpy` except that it returns a masked array with mask representing the (un)defined values.
.. method:: checksum() Return a CRC32 computed over all discrete and continuous features and class attributes of all data instances. :rtype: int
.. method:: has_missing_values() Return ``True`` if any of data instances has any missing values. Meta attributes are not checked.
.. method:: has_missing_classes() Return ``True`` if any instance miss the class value.
.. method:: random_instance() Return a random instance from the table. Data table's :obj:`random_generator` is used, which is initially seeded to 0, so results are deterministic.
.. method:: remove_duplicates([weightID]) Remove duplicates of data instances. If ``weightID`` is given, a meta attribute is added which contains the number of instances merged into each new instance. :param weightID: id for meta attribute with weight :type weightID: int :rtype: None
.. method:: sort([variables]) Sort the data table. The argument gives the values ordered by importance. If omitted, the order from the domain is used. Values of discrete features are not ordered alphabetically but according to the :obj:`Orange.feature.Discrete.values`. This sorts the data from the bridges data set by the lengths and years of their construction:: data.sort(["LENGTH", "ERECTED"])
.. method:: shuffle() Randomly shuffle the data instances.
.. method:: add_meta_attribute(attr[, value=1]) Add a meta value to all data instances. The first argument can be an integer id, or a string or a variable descriptor of a meta attribute registered in the domain.
.. method:: remove_meta_attribute(attr) Remove a meta attribute from all data instances.