Domain description (domain
)¶
Description of a domain stores a list of features, class(es) and meta
attribute descriptors. A domain descriptor is attached to all tables in
Orange to assign names and types to the corresponding columns. Columns in
the Orange.data.Table
have the roles of attributes (features,
independent variables), class(es) (targets, outcomes, dependent variables)
and meta attributes; in parallel to that, the domain descriptor stores
their corresponding
descriptions in collections of variable descriptors of type
Orange.data.Variable
.
Domain descriptors are also stored in predictive models and other objects to facilitate automated conversions between domains, as described below.
Domains are most often constructed automatically when loading the data or
wrapping the numpy arrays into Orange's Table
.
>>> from Orange.data import Table
>>> iris = Table("iris")
>>> iris.domain
[sepal length, sepal width, petal length, petal width | iris]
- class Orange.data.Domain(attributes, class_vars=None, metas=None, source=None)[source]¶
- attributes¶
A tuple of descriptors (instances of
Orange.data.Variable
) for attributes (features, independent variables).>>> iris.domain.attributes (ContinuousVariable('sepal length'), ContinuousVariable('sepal width'), ContinuousVariable('petal length'), ContinuousVariable('petal width'))
- class_var¶
Class variable if the domain has a single class; None otherwise.
>>> iris.domain.class_var DiscreteVariable('iris')
- class_vars¶
A tuple of descriptors for class attributes (outcomes, dependent variables).
>>> iris.domain.class_vars (DiscreteVariable('iris'),)
- variables¶
A list of attributes and class attributes (the concatenation of the above).
>>> iris.domain.variables (ContinuousVariable('sepal length'), ContinuousVariable('sepal width'), ContinuousVariable('petal length'), ContinuousVariable('petal width'), DiscreteVariable('iris'))
- metas¶
List of meta attributes.
- anonymous¶
True if the domain was constructed when converting numpy array to
Orange.data.Table
. Such domains can be converted to and from other domains even if they consist of different variable descriptors for as long as their number and types match.
- __init__(attributes, class_vars=None, metas=None, source=None)[source]¶
Initialize a new domain descriptor. Arguments give the features and the class attribute(s). They can be described by descriptors (instances of
Variable
), or by indices or names if the source domain is given.- Parameters:
attributes (list of
Variable
) -- a list of attributesclass_vars (
Variable
or list ofVariable
) -- target variable or a list of target variablesmetas (list of
Variable
) -- a list of meta attributessource (Orange.data.Domain) -- the source domain for attributes
- Returns:
a new domain
- Return type:
The following script constructs a domain with a discrete feature gender and continuous feature age, and a continuous target salary.
>>> from Orange.data import Domain, DiscreteVariable, ContinuousVariable >>> domain = Domain([DiscreteVariable.make("gender"), ... ContinuousVariable.make("age")], ... ContinuousVariable.make("salary")) >>> domain [gender, age | salary]
This constructs a new domain with some features from the Iris dataset and a new feature color.
>>> new_domain = Domain(["sepal length", ... "petal length", ... DiscreteVariable.make("color")], ... iris.domain.class_var, ... source=iris.domain) >>> new_domain [sepal length, petal length, color | iris]
- classmethod from_numpy(X, Y=None, metas=None)[source]¶
Create a domain corresponding to the given numpy arrays. This method is usually invoked from
Orange.data.Table.from_numpy()
.All attributes are assumed to be continuous and are named "Feature <n>". Target variables are discrete if the only two values are 0 and 1; otherwise they are continuous. Discrete targets are named "Class <n>" and continuous are named "Target <n>". Domain is marked as
anonymous
, so data from any other domain of the same shape can be converted into this one and vice-versa.- Parameters:
X (numpy.ndarray) -- 2-dimensional array with data
Y (numpy.ndarray or None) -- 1- of 2- dimensional data for target
metas (numpy.ndarray or None) -- meta attributes
- Returns:
a new domain
- Return type:
>>> import numpy as np >>> from Orange.data import Domain >>> X = np.arange(20, dtype=float).reshape(5, 4) >>> Y = np.arange(5, dtype=int) >>> domain = Domain.from_numpy(X, Y) >>> domain [Feature 1, Feature 2, Feature 3, Feature 4 | Class 1]
- __getitem__(idx)[source]¶
Return a variable descriptor from the given argument, which can be a descriptor, index or name. If var is a descriptor, the function returns this same object.
- Parameters:
idx (int, str or
Variable
) -- index, name or descriptor- Returns:
an instance of
Variable
described by var- Return type:
>>> iris.domain[1:3] (ContinuousVariable('sepal width'), ContinuousVariable('petal length'))
- __len__()[source]¶
The number of variables (features and class attributes).
The current behavior returns the length of only features and class attributes. In the near future, it will include the length of metas, too, and __iter__ will act accordingly.
- __contains__(item)[source]¶
Return True if the item (str, int,
Variable
) is in the domain.>>> "petal length" in iris.domain True >>> "age" in iris.domain False
- index(var)[source]¶
Return the index of the given variable or meta attribute, represented with an instance of
Variable
, int or str.>>> iris.domain.index("petal length") 2
- has_discrete_attributes(include_class=False, include_metas=False)[source]¶
Return True if domain has any discrete attributes. If include_class is set, the check includes the class attribute(s). If include_metas is set, the check includes the meta attributes.
>>> iris.domain.has_discrete_attributes() False >>> iris.domain.has_discrete_attributes(include_class=True) True
- has_continuous_attributes(include_class=False, include_metas=False)[source]¶
Return True if domain has any continuous attributes. If include_class is set, the check includes the class attribute(s). If include_metas is set, the check includes the meta attributes.
>>> iris.domain.has_continuous_attributes() True
Domain conversion¶
Domain descriptors also convert data instances between different domains.
In a typical scenario, we may want to discretize some continuous data before inducing a model. Discretizers (
Orange.preprocess
) construct a new data table with attribute descriptors (Orange.data.variable
), that include the corresponding functions for conversion from continuous to discrete values. The trained model stores this domain descriptor and uses it to convert instances from the original domain to the discretized one at prediction phase.In general, instances are converted between domains as follows.
If the target attribute appears in the source domain, the value is copied; two attributes are considered the same if they have the same descriptor.
If the target attribute descriptor defines a function for value transformation, the value is transformed.
Otherwise, the value is marked as missing.
An exception to this rule are domains in which the anonymous flag is set. When the source or the target domain is anonymous, they match if they have the same number of variables and types. In this case, the data is copied without considering the attribute descriptors.