Skip to content

Latest commit

 

History

History
334 lines (238 loc) · 11.4 KB

Orange.data.instance.rst

File metadata and controls

334 lines (238 loc) · 11.4 KB
.. py:currentmodule:: Orange.data

Data instances (Instance)

Class Orange.data.Instance holds a data instance. Each data instance corresponds to a domain, which defines its length, data types and values for symbolic indices.

Features

The data instance is described by a list of features defined by the domain descriptor (:obj:`Orange.data.domain`). Instances support indexing with either integer indices, strings or variable descriptors.

Since "age" is the the first attribute in dataset lenses, the below statements are equivalent:

>>> data = Orange.data.Table("lenses")
>>> age = data.domain["age"]
>>> example = data[0]
>>> print example[0]
young
>>> print example[age]
young
>>> print example["age"]
young

Negative indices do not work as usual in Python, since they refer to the values of meta attributes.

The last element of data instance is the class label, if the domain has a class. It should be accessed using :obj:`~Orange.data.Instance.get_class()` and :obj:`~Orange.data.Instance.set_class()`.

The list has a fixed length that equals the number of variables.

Meta attributes

Meta attributes provide a way to attach additional information to data instances, such as, for example, an id of a patient or the number of times the instance was missclassified during some test procedure. The most common additional information is the instance's weight. These attributes do not appear in induced models.

Instances from the same domain do not need to have the same meta attributes. Meta attributes are hence not addressed by positions, but by their id's, which are represented by negative indices. Id's are generated by function :obj:`Orange.feature.Descriptor.new_meta_id()`. Id's can be reused for multiple domains.

Domain descriptor can, but doesn't need to know about meta descriptors. See documentation on :obj:`Orange.data.Domain` for more on that.

If there is a particular descriptor associated with the meta attribute for the domain, attribute or its name can also be used for indexing. When registering meta attributes with domains, it is recommended to use the same id for the same attribute in all domains.

Meta values can also be loaded from files in tab-delimited format.

Meta attributes are often used as weights. Many procedures, such as learning algorithms, accept the id of the meta attribute defining the weights of instances as an additional argument.

The following example adds a meta attribute with a random value to each data instance.

.. literalinclude:: code/instance-metavar.py
    :lines: 1-

The code prints out:

['young', 'myope', 'no', 'reduced', 'none'], {-2:0.84}

(except for a different random value). Data instance now consists of two parts, ordinary features that resemble a list since they are addressed by positions (eg. the first value is "psby"), and meta values that are more like dictionaries, where the id (-2) is a key and 0.84 is a value (of type :obj:`Orange.data.Value`).

To tell the learning algorithm to use the weights, the id needs to be passed along with the data:

bayes = Orange.classification.bayes.NaiveLearner(data, id)

Many other functions accept weights in similar fashion.

Code

print orange.getClassDistribution(data)
print orange.getClassDistribution(data, id)

prints out

<15.000, 5.000, 4.000>
<9.691, 3.232, 1.969>

where the first line is the actual distribution and the second a distribution with random weights assigned to the instances.

Registering the meta attribute using :obj:`Orange.data.Domain.add_meta` changes how the data instance is printed out and how it can be accessed:

w = Orange.feature.Continuous("w")
data.domain.addmeta(id, w)

Meta-attribute can now be indexed just like ordinary features. The following three statements are equivalent:

print data[0][id]
print data[0][w]
print data[0]["w"]

Another consequence of registering the meta attribute is that it allows for conversion from Python native types:

ok = Orange.feature.Discrete("ok?", values=["no", "yes"])
ok_id = Orange.feature.Descriptor.new_meta_id()
data.domain.addmeta(ok_id, ok)
data[0][ok_id] = "yes"

The last line fails unless the attribute is registered since Orange does not know which variable descriptor to use to convert the string "yes" to an attribute value.

Hashing

Data instances compute hashes using CRC32 and can thus be used for keys in dictionaries or collected to Python data sets.

.. attribute:: domain

    The domain to which the data instance corresponds. This
    attribute is read-only.

.. method:: __init__(domain[, values])

    Construct a data instance with the given domain and initialize
    the values. Values are given as a list of
    objects that can be converted into values of corresponding
    variables: strings and integer indices (for discrete varaibles),
    strings or numbers (for continuous variables), or instances of
    :obj:`Orange.data.Value`.

    If values are omitted, they are set to unknown.

    :param domain: domain descriptor
    :type domain: Orange.data.Domain
    :param values: A list of values
    :type value: list

    The following example loads data on lenses and constructs
    another data instance from the same domain.

    .. literalinclude:: code/instance-construct.py
        :lines: 1-5

    Same can be done using other representations of values

    .. literalinclude:: code/instance-construct.py
        :lines: 7-8

.. method:: __init__([domain ,] instance)

    Construct a new data instance as a shallow copy of the
    original. If a domain descriptor is given, the instance is
    converted to another domain.

    :param domain: domain descriptor
    :type domain: Orange.data.Domain
    :param instance: Data instance
    :type value: :obj:`Instance`

    The following examples constructs a reduced domain and a data
    instance in this domain. ::

        domain_red = Orange.data.Domain(["age", "lenses"], domain)
        inst_red = Orange.data.Instance(domain_red, inst)

.. method:: __init__(domain, instances)

    Construct a new data instance for the given domain, where the
    feature values are found in the provided instances using
    both their ordinary features and meta attributes that are
    registered with their corresponding domains. The new instance
    also includes the meta attributes that appear in the provided
    instances and whose values are not used for the instance's
    features.

    :param domain: domain descriptor
    :type domain: Orange.data.domain
    :param instances: data instances
    :type value: list of Orange.data.Instance

    .. literalinclude:: code/instance_merge.py
            :lines: 3-

    The new domain consists of variables from `data1` and `data2`:
    `a1`, `a3` and `m1` are ordinary features, and `m2` and `a2`
    are meta attributes in the new domain. `m2` has the
    same meta attribute id as it has in `data1`, while `a2` gets a
    new meta id. In addition, the new domain has two new
    attributes, `n1` and `n2`.

    Here is the output::

        First example:  [1, 2], {"m1":3, "m2":4}
        Second example:  [1, 2.5], {"m1":3, "m3":4.5}
        Merge:  [1, 2.5, 3, ?], {"a2":2, "m2":4, -5:4.50, "n2":?}


    Since attributes `a1` and `m1` appear in domains of both
    original instance, the new instance can only be constructed if
    these values match. `a3` comes from the second instance, and
    meta attributes `a2` and `m1` come from the first one. The
    meta attribute `m3` is also copied from the second instance;
    since it is not registered within the new domain, it is
    printed out with an id (-5) instead of with a name. Values of
    the two new attributes are left undefined.

.. method:: native([level])

    Convert the instance into an ordinary Python list. If the
    optional argument `level` is 1 (default), the result is a list of
    instances of :obj:`Orange.data.Value`. If it is 0, it contains
    pure Python objects, that is, strings for discrete variables
    and numbers for continuous ones.

.. method:: compatible(other, ignore_class=False)

    Return ``True`` if the two instances are compatible, that
    is, equal in all features which are not missing in one of
    them. The optional second argument can be used to omit the
    class from comparison.

.. method:: get_class()

    Return the instance's class as :obj:`Orange.data.Value`.

.. method:: get_classes()

    Return the values of multiple classes as a list of
    :obj:`Orange.data.Value`.

.. method:: set_class(value)

    Set the instance's class.

    :param value: the new instance's class
    :type value: :obj:`Orange.data.Value`, number or string

.. method:: set_classes(values)

    Set the values of multiple classes.

    :param values: a list of values; the length must match the number of multiple classes
    :type values: list

.. method:: get_metas([key_type])

    Return a dictionary containing meta values of the data
    instance. The argument ``key_type`` can be ``int`` (default),
    ``str`` or :obj:`Orange.feature.Descriptor` and
    determines whether
    the dictionary keys are meta ids, variables names or
    variable descriptors. In the latter two cases, only registered
    attributes are returned. ::

        data = Orange.data.Table("inquisition2")
        example = data[4]
        print example.get_metas()
        print example.get_metas(int)
        print example.get_metas(str)
        print example.get_metas(Orange.feature.Descriptor)

    :param key_type: the key type; either ``int``, ``str`` or :obj:`~Orange.feature.Descriptor`
    :type key_type: ``type``

.. method:: get_metas(optional, [key_type])

    Similar to above, but return a dictionary that contains
    only non-optional attributes (if ``optional`` is 0) or
    only optional attributes.

    :param optional: tells whether to return optional or non-optional attributes
    :type optional: ``bool``
    :param key_type: the key type; either ``int``, ``str`` or :obj:`~Orange.feature.Descriptor`
    :type key_type: `type``

.. method:: has_meta(attr)

    Return ``True`` if the data instance has the specified meta
    attribute.

    :param attr: meta attribute
    :type attr: :obj:`id`, ``str`` or :obj:`~Orange.feature.Descriptor`

.. method:: remove_meta(attr)

    Remove the specified meta attribute.

    :param attr: meta attribute
    :type attr: :obj:`id`, ``str`` or :obj:`~Orange.feature.Descriptor`

.. method:: get_weight(attr)

    Return the value of the specified meta attribute. The
    attribute's value must be continuous and is returned as ``float``.

    :param attr: meta attribute
    :type attr: :obj:`id`, ``str`` or :obj:`~Orange.feature.Descriptor`

.. method:: set_weight(attr, weight=1)

    Set the value of the specified meta attribute to ``weight``.

    :param attr: meta attribute
    :type attr: :obj:`id`, ``str`` or :obj:`~Orange.feature.Descriptor`
    :param weight: weight of instance
    :type weight: ``float``