Cubes - Lightweight Python OLAP Framework
Cubes - Lightweight Python OLAP Framework
Release 0.10
Stefan Urbanek
CONTENTS
Introduction 1.1 Why cubes? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Cube, Dimensions, Facts and Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Installation 2.1 Basic Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Quick Start or Hello World! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Customized Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Logical Model and Metadata 3.1 Logical Model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Model validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Logical to Physical Model Mapping 4.1 Implicit Mapping . . . . . . . . 4.2 Dimension tables . . . . . . . . 4.3 Database Schemas . . . . . . . 4.4 Explicit Mapping . . . . . . . . 4.5 Date Data Type . . . . . . . . . 4.6 Localization . . . . . . . . . . 4.7 Customization of the Implicit . 4.8 Mapping Process Summary . . 4.9 Join . . . . . . . . . . . . . . . 4.10 Aliases . . . . . . . . . . . . .
3 3 3 3 7 7 7 8 9 9 15 19 20 20 21 21 22 23 24 24 24 27 29 29 31 32 37 39 39 41 42 43 43 45 45 49 52
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
Aggregation Browsing and Aggregations 5.1 ROLAP with SQL backend . . . . . 5.2 Cell Details . . . . . . . . . . . . . . 5.3 Hierarchies, levels and drilling-down 5.4 Multiple Hierarchies . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Creating Cubes 6.1 Relational Database (SQL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Localization 7.1 Metadata Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Data Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Localized Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OLAP Server 8.1 HTTP API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Running and Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
slicer - Command Line Tool 9.1 serve . . . . . . . . . 9.2 model validate . . . . 9.3 model json . . . . . . 9.4 model extract_locale . 9.5 model translate . . . . 9.6 ddl . . . . . . . . . . 9.7 denormalize . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
57 57 58 59 59 59 59 59 61 61 69 78 83 83 85 85 85 87 87 89 91 93 95 97
10 Reference 10.1 Logical Model Reference . . . . . . . . . . . . 10.2 Aggregation Browser Reference . . . . . . . . . 10.3 backends Aggregation Browsing Backends 10.4 HTTP WSGI OLAP Server Reference . . . . . . 10.5 Utility functions . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
11 Developing Cubes 11.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 New or changed feature checklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Development Notes 12.1 Fact Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Contact and Getting Help 14 License 15 Indices and tables Python Module Index Index
ii
Cubes is a light-weight Python framework and set of tools for Online Analytical Processing (OLAP), multidimensional analysis and browsing of aggregated data. It is part of Data Brewery. Contents:
CONTENTS
CONTENTS
CHAPTER
ONE
INTRODUCTION
1.1 Why cubes?
Focus on data analysis, in human way Purpose is to provide a framework for giving analyst or any application end-user understandable and natural way of presenting the multidimensional data. One of the main features is the logical model, which serves as abstraction over physical data to provide end-user layer. It is meant to be used by application builders that want to provide analytical functionality. Features: logical view of analysed data - how analysts look at data, how they think of data, not not how the data are physically implemented in the data stores hierarchical dimensions (attributes that have hierarchical dependencies, such as category-subcategory or country-region) localizable metadata and data (see Localization) OLAP and aggregated browsing (default backend is for relational databse - ROLAP) multidimensional analysis
1.3 Architecture
The framework is composed of four modules and one command-line tool:
Figure 1.1: a data cube model - Description of data (metadata): dimensions, hierarchies, attributes, labels, localizations. browser - Aggregation browsing, slicing-and-dicing, drill-down. backends - Actual aggregation implementation and utility functions. server - WSGI HTTP server for Cubes slicer - Command Line Tool - command-line tool
1.3.1 Model
Logical model describes the data from users or analysts perspective: data how they are being measured, aggregated and reported. Model is independent of physical implementation of data. This physical independence makes it easier to focus on data instead on ways of how to get the data in understandable form. More information about logical model can be found in the chapter Logical Model and Metadata. See also programming reference of the model module.
Chapter 1. Introduction
1.3.2 Browser
Core of the Cubes analytics functionality is the aggregation browser. The browser module contains utility classes and functions for the browser to work. More information about browser can be found in the chapter Aggregation Browsing and Aggregations. See also programming reference of the browser module.
1.3.3 Backends
Backends provide the actual data aggregation and browsing functionality. Cubes comes with built-in ROLAP backend which uses SQL database through SQLAlchemy. Framework has modular nature and supports multiple database backends, therefore different ways of cube computation and ways of browsing aggregated data. See also programming reference of the backends module.
1.3.4 Server
Cubes comes with built-in WSGI HTTP OLAP server called slicer - Command Line Tool and provides json API for most of the cubes framework functionality. The server is based on the Werkzeug WSGI framework. More information about the Slicer server requests can be found in the chapter OLAP Server. See also programming reference of the server module.
1.3. Architecture
Chapter 1. Introduction
CHAPTER
TWO
INSTALLATION
There are two options how to install cubes: basic common installation - recommended mostly for users starting with Cubes. Then there is customized installation with requirements explained.
Note: The command-line tool Slicer does not require knowledge of Python. You do not need to know the language if you just want to serve OLAP data. For quick satisfaction of requirements install the packages:
pip install sqlalchemy werkzeug
The requirements for SQLAlchemy and Werkzeug are optional and you do not need them if you are going to use another kind of backend. Install:
cd cubes pip install -r requirements-optional.txt python setup.py install
Chapter 2. Installation
CHAPTER
THREE
10
11
URL with a JSON dictionary a directory with logical model description les (model, cubes, dimensions) - note that this is the old way of specifying model and is being depreciated Model can be represented also as a single json le containing all model objects. The directory contains: File model.json cube_*cube_name*.json dim_*dimension_name*.json Description Core model information Cube description, one le per cube Dimension description, one le per dimension
3.1.1 Model
The model dictionary contains main model description. The structure is:
{ "name": "public_procurements", "label": "Public Procurements of Slovakia", "description": "Contracts of public procurement winners in Slovakia" "cubes": [...] "dimensions": [...] }
Description list of cube descriptions list of dimension descriptions model name (optional) human readable name - can be used in an application (optional) longer human-readable description of the model (optional)
3.1.2 Cubes
Cube descriptions are stored as a dictionary for key cubes in the model description dictionary or in json les with prex cube_ like cube_contracts, or Key name measures dimensions label details joins mappings options info Example:
{ "name": "date", "label": "Dtum", "dimensions": [ "date", ... ] "measures": [...],
Description cube name list of cube measures (recommended, but might be empty for measure-less, record count only cubes) list of cube dimension names (recommended, but might be empty for dimension-less cubes) human readable name - can be used in an application list of fact details (as Attributes) - attributes that are not relevant to aggregation, but are nice-to-have when displaying facts (might be separately stored) specication of physical table joins (required for star/snowake schema) mapping of logical attributes to physical attributes backend/workspace options custom info, such as formatting. Not used by cubes framework.
12
For more information about mappings see Logical to Physical Model Mapping
3.1.3 Dimensions
Dimension descriptions are stored in model dictionary under the key dimensions.
Figure 3.3: Dimension description - attributes. The dimension description contains keys: 3.1. Logical Model description 13
Description dimension name, used as identier human readable name - can be used in an application list of level descriptions list of dimension hierarchies if dimension has only one hierarchy, you can specify it under this key name of a hierarchy that will be used as default custom info, such as formatting. Not used by cubes framework.
"name": "date", "label": "Dtum", "levels": [ ... ] "attributes": [ ... ] "hierarchies": [ ... ] }
Use either hierarchies or hierarchy, using both results in an error. Hierarchy levels are described as: Key name label attributes key Description level name, used as identier human readable name - can be used in an application list of other additional attributes that are related to the level. The attributes are not being used for aggregations, they provide additional useful information. key eld of the level (customer number for customer level, region code for region level, year-month for month level). key will be used as a grouping eld for aggregations. Key should be unique within level. laname of attribute containing label to be displayed (customer name for customer level, region name bel_attribute region level, month name for month level) for info custom info, such as formatting. Not used by cubes framework. Example of month level of date dimension:
{ "month", "label": "Mesiac", "key": "month", "label_attribute": "month_name", "attributes": ["month", "month_name", "month_sname"] },
Hierarchies are described as: Key name label levels 14 Description hierarchy name, used as identier human readable name - can be used in an application ordered list of level names from top to bottom - from least detailed to most detailed (for example: from year to day, from country to city) Chapter 3. Logical Model and Metadata
Example:
"hierarchies": [ { "name": "default", "levels": ["year", "month"] }, { "name": "ymd", "levels": ["year", "month", "day"] }, { "name": "yqmd", "levels": ["year", "quarter", "month", "day"] } ]
3.1.4 Attributes
Measures and dimension level attributes can be specied either as rich metadata or just simply as strings. If only string is specied, then all attribute metadata will have default values, label will be equal to the attribute name. Key name label order locales aggregations info Description attribute name (should be unique within a dimension) human readable name - can be used in an application, localizable natural order of the attribute (optional), can be asc or desc list of locales in which the attribute values are available in (optional) list of aggregations to be performed if the attribute is a measure custom info, such as formatting. Not used by cubes framework.
The optional order is used in aggregation browsing and reporting. If specied, then all queries will have results sorted by this eld in specied direction. Level hierarchy is used to order ordered attributes. Only one ordered attribute should be specied per dimension level, otherwise the behavior is unpredictable. This natural (or default) order can be later overridden in reports by explicitly specied another ordering direction or attribute. Explicit order takes precedence before natural order. For example, you might want to specify that all dates should be ordered by default:
"attributes" = [ {"name" = "year", "order": "asc"} ]
Locales is a list of locale names. Say we have a CPV dimension (common procurement vocabulary - EU procurement subject hierarchy) and we are reporting in Slovak, English and Hungarian. The attributes will be therefore specied as:
"attributes" = [ {"name" = "group_code"}, {"name" = "group_name", "order": "asc", "locales" = ["sk", "en", "hu"]} ]
group name is localized, but group code is not. Also you can see that the result will always be sorted by group name alphabetical in ascending order. See PhysicalAttributeMappings for more information about how logical attributes are mapped to the physical sources. In reports you do not specify locale for each localized attribute, you specify locale for whole report or browsing session. Report queries remain the same for all languages.
results = model.validate()
This will return a list of tuples (result, message) where result might be warning or error. If validation contains errors, the model can not be used without resulting in failure. If there are warnings, some functionalities might or might not fail or might not work as expected. You can validate model from command line:
slicer model validate model.json
3.2.1 Errors
When any of the following validation errors occurs, then it is very probable that use of the model will result in failure.
16
Error Duplicate measure measure in cube cube Duplicate detail detail in cube cube
Duplicate detail detail in cube cube - specied also as measure No hierarchies in dimension dimension, more than one levels exist (count) No defaut hierarchy specied, there is more than one hierarchy in dimension dimension Default hierarchy hierarchy does not exist in dimension dimension
Level level in dimension dimension has no attributes Key key in level level in dimension dimension is not in levels attribute list Duplicate attribute attribute in dimension dimension level level (also dened in level another_level)
Dimension (dim1) of attribute attr does not match with owning dimension dim2 Dimension dimension is not instance of Attribute
Ob- Resolution ject cube Two or more measures have the same name. Make sure that all measure names are unique within the cube, including detail attributes. cube Two or more detail attributes have the same name. Make sure that all detail attribute names are unique within the cube, including measures. cube A detail attribute has same name as one of the measures. Make sure that all detail attribute names are unique within the cube, including measures. diThere is more than one level specied in the dimension, but men- no hierarchy is dened. Specify a hierarchy with expected sion order of the levels. diDimension has more than one hierarchy, but none of them is men- specied as default. Set the default_hierarchy_name to sion desired default hierarchy. diThere is no hierarchy in the dimension with name specied men- as default_hierarchy_name. Make sure that the default sion hierarchy name refers to existing hierarchy within the dimension. diThere are no attributes specied for level. Set attributes men- during Level obejct creation. This error should not appear sion when creating model from le. diKey should be one of the attributes specied for the level. men- Either add the key to the attribute list (preferrably at the sion beginning) or choose another attribute as the level key. diattribute is dened in two or more levels in the same men- dimension. Make sure that attribute names are all unique sion within one dimension. Example of most common duplicates are: id or name. Recommended x is to use level prex: country_id and country_name. diThis might happen when creating model programatically. men- Make sure that attribute added to the dimension level has sion properely set dimension attribute to the dimension it is going to be part of (dim2). model When creating dimension programatically, make sure that all attributes added to the dimension level are instances of cubes.Attribute. You should not see this error when loading a model from a le. model When creating model programatically, make sure that all dimensions you add to model are subclasses of Dimension. You should not see this error when loading a model from a le. cube When creating cube programatically, make sure that all measures you add to the cube are subclasses of cubes.Attribute. You should not see this error when loading a model from a le. cube When creating cube programatically, make sure that all detail attributes you add to the cube are subclasses of cubes.Attribute. You should not see this error when loading a model from a le.
The following list contains warning messages from validation process. It is not recommended to use the model, some issues might emerge. Warning No cubes dened Object model Resolution Model should contain at least one cube
17
The model construction uses some implicit defaults to satisfy needs for a working model. Validator identies where the defaults are going to be applied and adds information about them to the validation results. Consider them to be informative only. The model can be used, just make sure that defaults reect expected reality. Warning No hierarchies in dimension dimension, at level level will be used. Level level in dimension dim has no key attribute specied, rst attribute will be used: attr Object dimension dimension Resolution There are no hierarchies specied in the dimension and there is only one level. Default hierarchy will be created with the only one level. Each level should have a key attribute specied. If it is not, then the rst attribute from attribute list will be used as key.
18
CHAPTER
FOUR
Note: Despite this chapter describes examples mostly in the relational database backend, the principles are the same, or very similar, in other backends as well. For example, take a reference to an attribute name in a dimension product. What is the column of what table in which schema that contains the value of this dimension attribute? 19
For data browsing, the Cubes framework has to know where those logical (reported) attributes are physically stored. It needs to know which tables are related to the cube and how they are joined together so we get whole view of a fact. The process is done in two steps: 1. joining relevant star/snowake tables 2. mapping logical attribute to table + column There are two ways how the mapping is being done: implicit and explicit. The simplest, straightforward and most customizable is the explicit way, where the actual column reference is provided in a mapping dictionary of the cube description.
It is quite common practice that dimension tables have a prex such as dim_ or dm_. Such prex can be specied with dimension_prefix option. 20 Chapter 4. Logical to Physical Model Mapping
Basic rules: fact table should have same name as represented cube dimension table should have same name as the represented dimension, for example: product (singular) column name should have same name as dimension attribute: name, code, description references without dimension name in them are expected to be in the fact table, for example: amount, discount (see note below for simple at dimensions) if attribute is localized, then there should be one column per localization and should have locale sufx: description_en, description_sk, description_fr (see below for more information) Flat dimension without details: What about dimensions that have only one attribute, like one would not have a full date but just a year? In this case it is kept in the fact table without need of separate dimension table. The attribute is treated in by the same rule as measure and is referenced by simple year. This is applied to all dimensions that have only one attribute (representing key as well). This dimension is referred to as at and without details. Note for advanced users: this behavior can be disabled by setting simplify_dimension_references to False in the mapper. In that case you will have to have separate table for the dimension attribute and you will have to reference the attribute by full name. This might be useful when you know that your dimension will be more detailed. Note: In other than SQL backends, the implicit mapping might be implemented differently. Refer to the respective backend documentation to learn how the mapping is done.
21
Both, explicit and implicit mappings have ability to specify default database schema (if you are using Oracle, PostgreSQL or any other DB which supports schemas). The mapping process process is like this:
Note: In other than SQL backends, the value in the mapping dictionary can be interpreted differently. The (schema, table, column) tuple is used as an example from SQL browser.
22
According to SQLAlchemy, you can extract in most of the databases: month, day, year, second, hour, doy (day of the year), minute, quarter, dow (day of the week), week, epoch, milliseconds, microseconds, timezone_hour, timezone_minute. Please refer to your database engine documentation for more information. Note: It is still recommended to have a date dimension table.
4.6 Localization
Despite localization taking place rst in the mapping process, we talk about it at the end, as it might be not so commonly used feature. From physical point of view, the data localization is very trivial and requires language denormalization - that means that each language has to have its own column for each attribute. Localizable attributes are those attributes that have locales specied in their denition. To map logical attributes which are localizable, use locale sufx for each locale. For example attribute name in dimension category has two locales: Slovak (sk) and English (en). Or for example product category can be in English, Slovak or German. It is specied in the model like this:
attributes = [ { "name" = "category", "locales" = ["en", "sk", "de"] } ]
In short: if attribute is localizable and locale is requested, then locale sufx is added. If no such localization exists then default locale is used. Nothing happens to non-localizable attributes. For such attribute, three columns should exist in the physical model. There are two ways how the columns should be named. They should have attribute name with locale sufx such as category_sk and category_en (_underscore_ because it is more common in table column names), if implicit mapping is used. You can name the columns as you like, but you have to provide explicit mapping in the mapping dictionary. The key for the localized logical attribute should have .locale sufx, such as product.category.sk for Slovak version of category attribute of dimension product. Here the _dot_ is used because dots separate logical reference parts.
4.6. Localization
23
Note: Current implementation of Cubes framework requires a star or snowake schema that can be joined into fully denormalized normalized form just by simple one-key based joins. Therefore all localized attributes have to be stored in their own columns. In other words, you have to denormalize the localized data before using them in Cubes. Read more about Localization.
4.8.1 Joins
Star browser supports a star: and snowake database schema: If you are using either of the two schemas (star or snowake) in relational database, Cubes requires information on how to join the tables. Tables are joined by matching single-column surrogate keys. The framework needs the join information to be able to transform following snowake: to appear as this (denormalized table) with all cube attributes:
4.9 Join
The single join description consists of reference to the master table and a table with details. Fact table is example of master table, dimension is example of a detail table (in star schema). Note: As mentioned before, only single column surrogate keys are supported for joins. The join specication is very simple, you dene column reference for both: master and detail. The table reference is in the form table.column:
"joins" = [ { "master": "fact_sales.product_key", "detail": "dim_product.key"
24
4.9. Join
25
26
} ]
As in mappings, if you have specic needs for explicitly mentioning database schema or any other reason where table.column reference is not enough, you might write:
"joins" = [ { "master": "fact_sales.product_id", "detail": { "schema": "sales", "table": "dim_products", "column": "id" } ]
4.10 Aliases
What if you need to join same table twice or more times? For example, you have list of organizations and you want to use it as both: supplier and service consumer.
4.10. Aliases
27
Note that with aliases, in the mappings you refer to the table by alias specied in the joins, not by real table name. So after aliasing tables with previous join specication, the mapping should look like:
"mappings": { "supplier.name": "suppliers.org_name", "consumer.name": "consumers.org_name" }
For example, we have a fact table named fact_contracts and dimension table with categories named dm_categories. To join them we dene following join specication:
"joins" = [ { "master": "fact_contracts.category_id", "detail": "dm_categories.id" } ]
28
CHAPTER
FIVE
Note: Cubes comes with tutorial helper methods in cubes.tutorial. It is advised not to use them in production, they are provided just to simplify learners life. Prepare the data using the tutorial helpers. This will create a table and populate it with contents of the CSV le:
>>> engine = create_engine(sqlite:///data.sqlite) ... create_table_from_csv(engine, ... "data.csv", ... table_name="irbd_balance", ... fields=[ ... ("category", "string"), ... ("category_label", "string"), ... ("subcategory", "string"), ... ("subcategory_label", "string"), ... ("line_item", "string"), ... ("year", "integer"),
29
Download the example model and save it. Load the model:
>>> import cubes >>> model = cubes.load_model("model.json")
Check whether the model is valid with model.is_valid() - should return True.
>>> model.is_valid() True
Create a workspace and get a browser instance (in this example it is SQL backend):
>>> workspace = cubes.create_workspace("sql.star", model, engine=engine)
cell denes context of interest - part of the cube we are looking at. We start with whole cube:
>>> cell = cubes.Cell(cube)
Compute the aggregate. Measure elds of aggregation result have aggregation sufx. Also a total record count within the cell is included as record_count.
>>> result = browser.aggregate(cell) >>> result.summary["record_count"] 62 >>> result.summary["amount_sum"] 1116860
>>> result = browser.aggregate(cell, drilldown=["item"]) >>> for record in result.drilldown: ... print record {uitem.category: ua, uitem.category_label: uAssets, urecord_count: 32, uamount_sum: 55 {uitem.category: ue, uitem.category_label: uEquity, urecord_count: 8, uamount_sum: 775 {uitem.category: ul, uitem.category_label: uLiabilities, urecord_count: 22, uamount_sum
30
The resulting aggregated attribute name will be constructed from the measure name and aggregation sufx, for example the mentioned amount will have three aggregates in the result: amount_sum, amount_min and amount_max in the case described above. Result of aggregation is a structure containing: summary - summary for the aggregated cell, drilldown - drill down cells, if was desired, and total_cell_count - total cells in the drill down, regardless of pagination.
You might have noticed the two redundant keys: _key and _label - those contain values of a level key attribute and level label attribute respectively. It is there to simplify the use of the details in presentation layer, such as templates. Take for example doing only one-dimensional browsing and compare presentation of breadcrumbs:
labels = [detail["_label"] for detail in cut_details]
31
Note that this might change a bit: either full detail will be returned or just key and label, depending on an option argument (not yet decided).
5.3.1 Hierarchy
Some dimensions can have multiple levels forming a hierarchy. For example dates have year, month, day; geography has country, region, city; product might have category, subcategory and the product. In our example we have the item dimension with three levels of hierarchy: category, subcategory and line item:
Figure 5.1: Item dimension hierarchy. The levels are dened in the model:
"levels": [ { "name":"category", "label":"Category", "attributes": ["category"] }, { "name":"subcategory", "label":"Sub-category", "attributes": ["subcategory"] }, { "name":"line_item", "label":"Line Item",
32
"attributes": ["line_item"] } ]
You can see a slight difference between this model description and the previous one: we didnt just specify level names and didnt let cubes to ll-in the defaults. Here we used explicit description of each level. name is level identier, label is human-readable label of the level that can be used in end-user applications and attributes is list of attributes that belong to the level. The rst attribute, if not specied otherwise, is the key attribute of the level. Other level description attributes are key and label_attribute. The key species attribute name which contains key for the level. Key is an id number, code or anything that uniquely identies the dimension level. label_attribute is name of an attribute that contains human-readable value that can be displayed in user-interface elements such as tables or charts.
5.3.2 Preparation
Again, in short we need: data in a database logical model (see model file) prepared with appropriate mappings denormalized view for aggregated browsing (optional)
5.3.3 Drill-down
Drill-down is an action that will provide more details about data. Drilling down through a dimension hierarchy will expand next level of the dimension. It can be compared to browsing through your directory structure. We create a function that will recursively traverse a dimension hierarchy and will print-out aggregations (count of records in this example) at the actual browsed location. Attributes cell - cube cell to drill-down dimension - dimension to be traversed through all levels path - current path of the dimension Path is list of dimension points (keys) at each level. It is like le-system path.
def drill_down(cell, dimension, path=[]):
Get dimensions default hierarchy. Cubes supports multiple hierarchies, for example for date you might have year-month-day or year-quarter-month-day. Most dimensions will have one hierarchy, thought.
hierarchy = dimension.hierarchy()
Base path is path to the most detailed element, to the leaf of a tree, to the fact. Can we go deeper in the hierarchy?
if hierarchy.path_is_base(path): return
Get the next level in the hierarchy. levels_for_path returns list of levels according to provided path. When drilldown is set to True then one more level is returned.
levels = hierarchy.levels_for_path(path,drilldown=True) current_level = levels[-1]
We need to know name of the level key attribute which contains a path component. If the model does not explicitly specify key attribute for the level, then rst attribute will be used:
33
level_key = dimension.attribute_reference(current_level.key)
For prettier display, we get name of attribute which contains label to be displayed for the current level. If there is no label attribute, then key attribute is used.
level_label = dimension.attribute_reference(current_level.label_attribute)
We do the aggregation of the cell... Note: Shell analogy: Think of ls $CELL command in commandline, where $CELL is a directory name. In this function we can think of $CELL to be same as current working directory (pwd)
result = browser.aggregate(cell, drilldown=[dimension]) for record in result.drilldown: print "%s%s: %d" % (indent, record[level_label], record["record_count"]) ...
And now the drill-down magic. First, construct new path by key attribute value appended to the current path:
drill_path = path[:] + [record[level_key]]
The whole recursive drill down function looks like this: Whole working example can be found in the tutorial sources. Get the full cube (or any part of the cube you like):
cell = browser.full_cube()
34
35
Note that because we have changed our source data, we see level codes instead of level names. We will x that later. Now focus on the drill-down. See that nice hierarchy tree? Now if you slice the cell through year 2010 and do the exact same drill-down:
cell = cell.slice("year", [2010]) drill_down(cell, cube.dimension("item"))
you will get similar tree, but only for year 2010 (obviously).
36
Note the label_attribute keys. They specify which attribute contains label to be displayed. Key attribute is bydefault the rst attribute in the list. If one wants to use some other attribute it can be specied in key_attribute. Because we added two new attributes, we have to add mappings for them:
"mappings": { "item.line_item": "line_item", "item.subcategory": "subcategory", "item.subcategory_label": "subcategory_label", "item.category": "category", "item.category_label": "category_label" }
5.3.6 Summary
hierarchies can have multiple levels a hierarchy level is identier by a key attribute a hierarchy level can have multiple detail attributes and there is one special detail attribute: label attribute used for display in user interfaces
The drilldown argument takes list of three element tuples in form: (dimension, hierarchy, level). The hierarchy and level are optional. If level is None, as in our example, then next level is used. If hierarchy is None then default hierarchy is used.
37
To sepcify hierarchy in cell cuts just pass hierarchy argument during cut construction. For example to specify cut through week 15 in year 2010:
cut = cubes.PointCut("date", [2010, 15], hierarchy="ywd")
Note: If drilling down a hierarchy and asking cubes for next implicit level the cuts should be using same hierarchy as drilldown. Otherwise exception is raised. For example: if cutting through year-month-day and asking for next level after year in year-week-day hierarchy, exception is raised.
38
CHAPTER
SIX
CREATING CUBES
The Cubes framework provides funcitonality for denormalisation and for cube pre-computation. Currently SQL backend supports denormalisation only and mongo backend supports cube precomputation.
See Also: Module backends. More information about cube builders in different database environments. Module model. Logical model description - required for preaggregated cube computation.
39
40
CHAPTER
SEVEN
LOCALIZATION
Having origin in multi-lingual Europe one of the main features of the Cubes framework is ability to provide localizable results. There are three levels of localization in each analytical application: 1. Application level - such as buttons or menus 2. Metadata level - such as table header labels 3. Data level - table contents, such as names of categories or procurement types
Figure 7.1: Localization levels. The application level is out of scope of this framework and is covered in internationalization (i18n) libraries, such as gettext. What is covered in Cubes is metadata and data level. Localization in cubes is very simple: 1. Create master model denition and specify locale the model is in 2. Specify attributes that are localized (see PhysicalAttributeMappings) 3. Create model translations for each required language 4. Make cubes function or a tool create translated versions the master model To create localized report, just specify locale to the browser and create reports as if the model was not localized. See Localized Reporting.
41
If a translation of a metadata attribute is missing, then the one in master model description is used. In our case we have following les:
procurements.json procurements_en.json procurements_hu.json
Figure 7.2: Localization master model and translation les. To load a model:
import cubes model_sk = cubes.load_model("procurements.json", translations = { "en": "procurements_en.json",
42
Chapter 7. Localization
"hu": "procurements_hu.json", })
Or you can get translated version of the model by directly passing translation dictionary:
handle = open("procurements_en.json") trans = json.load(handle) handle.close() model_en = model.translate("en", trans)
43
44
Chapter 7. Localization
CHAPTER
EIGHT
OLAP SERVER
Cubes framework provides easy to install web service WSGI server with API that covers most of the Cubes logical model metadata and aggregation browsing functionality. Note: Server requires the Werkzeug framework. For more information about how to run the server programmatically, please refer to the server module.
45
page - page number for paginated results pagesize - size of a page for paginated results order - list of attributes to be ordered by limit limit number of results in form limit[,measure[,order_direction]]: limit=5:received_amount_sum:asc (this might not be implemented in all backends) Reply: A dictionary with keys: summary - dictionary of elds/values for summary aggregation drilldown - list of drilled-down cells total_cell_count - number of total cells in drilldown (after limir, before pagination) cell - dictionary representation of the query cell Example:
{ "summary": { "record_count": 32, "amount_sum": 558430 } "drilldown": [ { "record_count": 16, "amount_sum": 275420, "year": 2009 }, { "record_count": 16, "amount_sum": 283010, "year": 2010 } ], "total_cell_count": 2, "cell": [ { "path": [ "a" ], "type": "point", "dimension": "item", "level_depth": 1 } ], }
If pagination is used, then drilldown will not contain more than pagesize cells. Note that not all backengs might implement total_cell_count or providing this information can be congurable therefore might be disabled (for example for performance reasons). GET /cube/<cube>/facts Return all facts within a cell. Parameters: cut - see /aggregate page, pagesize - paginate results order - order results format - result format: json (default; see note below), csv
46
elds - comma separated list of fact elds, by default all elds are returned Note: Number of facts in JSON is limited to conguration value of json_record_limit, which is 1000 by default. To get more records, either use pages with size less than record limit or use alternate result format, such as csv. GET /cube/<cube>/fact/<id> Get single fact with specied id. For example: /fact/1024 GET /cube/<cube>/dimension/<dimension> Get values for attributes of a dimension. Parameters: cut - see /aggregate depth - specify depth (number of levels) to retrieve. If not specied, then all levels are returned page, pagesize - paginate results order - order results Response: dictionary with keys dimension dimension name, depth level depth and data list of records. Example for /dimension/item?depth=1:
{ "dimension": "item" "depth": 1, "data": [ { "item.category": "a", "item.category_label": "Assets" }, { "item.category": "e", "item.category_label": "Equity" }, { "item.category": "l", "item.category_label": "Liabilities" } ], }
GET /cube/<cube>/cell Get details for a cell. Parameters: cut - see /aggregate Response: a dictionary representation of a cell (see cubes.Cell.as_dict()) with keys cube and cuts. cube is cube name and cuts is a list of dictionary representations of cuts. Each cut is represented as:
{ // Cut type is one of: "point", "range" or "set" "type": cut_type, "dimension": cut_dimension_name, "level_depth": maximal_depth_of_the_cut, // Cut type specific keys. // Point cut: "path": [ ... ], "details": [ ... ]
47
// Range cut: "from": [ ... ], "to": [ ... ], "details": { "from": [...], "to": [...] } // Set cut: "paths": [ [...], [...], ... ], "details": [ [...], [...], ... ] }
Each element of the details path contains dimension attributes for the corresponding level. In addition in contains two more keys: _key and _label which (redundantly) contain values of key attribute and label attribute respectively. Example for /cell?cut=item:a in the hello_world example:
{ "cube": "irbd_balance", "cuts": [ { "type": "point", "dimension": "item", "level_depth": 1 "path": ["a"], "details": [ { "item.category": "a", "item.category_label": "Assets", "_key": "a", "_label": "Assets" } ], } ] }
GET /cube/<cube>/report Process multiple request within one API call. The data should be a JSON containing report specication where keys are names of queries and values are dictionaries describing the queries. report expects Content-type header to be set to application/json. See Reports for more information. GET /cube/<cube>/search/dimension/<dimension>/<query> Search values of dimensions for query. If dimension is _all then all dimensions are searched. Returns search results as list of dictionaries with attributes: Search result dimension - dimension name level - level name depth - level depth level_key - value of key attribute for level attribute - dimension attribute name where searched value was found value - value of dimension attribute that matches search query path - dimension hierarchy path to the found value level_label - label for dimension level (value of label_attribute for level)
48
Note: Requires a search backend to be installed. Parameters that can be used in any request: prettyprint - if set to true, space indentation is added to the JSON output
Open range:
date:2004,1,1date:-2005,5,10
Dimension name is followed by colon :, each dimension cut is separated by |, and path for dimension levels is separated by a comma ,. Or in more formal way, here is the BNF for the cut:
<list> <cut> <dimension> <path> ::= ::= ::= ::= <cut> | <cut> | <list> <dimension> : <path> <identifier> <value> | <value> , <path>
Note: Why dimension names are not URL parameters? This prevents conict from other possible frequent URL parameters that might modify page content/API result, such as type, form, source. To specify other than default hierarchy use format dimension@hierarchy, the path then should contain values for specied hierarchy levels:
date@ywd:2004,25
Following image contains examples of cuts in URLs and how they change by browsing cube aggregates:
8.2 Reports
Report queries are done either by specifying a report name in the request URL or using HTTP POST request where posted data are JSON with report specication. Keys: queries - dictionary of named queries
8.2. Reports
49
Figure 8.1: Example of how cuts in URL work and how they should be used in application view templates.
50
Query specication should contain at least one key: query - which is query type: aggregate, cell_details, values (for dimension values), facts or fact (for multiple or single fact respectively). The rest of keys are query dependent. For more information see AggregationBrowser documentation. Note: Note that you have to set the content type to application/json. Result is a dictionary where keys are the query names specied in report specication and values are result values from each query call. Example report JSON le with two queries:
{ "queries": { "summary": { "query": "aggregate" }, "by_year": { "query": "aggregate", "drilldown": ["date"], "rollup": "date" } } }
Request:
curl -H "Content-Type: application/json" --data-binary "@report.json" \ "http://localhost:5000/cube/contracts/report?prettyprint=true&cut=date:2004"
Reply:
{ "by_year": { "total_cell_count": 6, "drilldown": [ { "record_count": 4390, "requested_amount_sum": 2394804837.56, "received_amount_sum": 399136450.0, "date.year": "2004" }, ... { "record_count": 265, "requested_amount_sum": 17963333.75, "received_amount_sum": 6901530.0, "date.year": "2010" } ], "summary": { "record_count": 33038, "requested_amount_sum": 2412768171.31, "received_amount_sum": 2166280591.0 } }, "summary": { "total_cell_count": null, "drilldown": {}, "summary": { "date.year": "2004", "requested_amount_sum": 2394804837.56, "received_amount_sum": 399136450.0, "record_count": 4390
8.2. Reports
51
} } }
Explicit specication of a cell (the cuts in the URL parameters are going to be ignored):
{ "cell": [ { "dimension": "date", "type": "range", "from": [2010,9], "to": [2011,9] } ], "queries": { "report": { "query": "aggregate", "drilldown": {"date":"year"} } } }
8.2.1 Roll-up
Report queries might contain rollup specication which will result in rolling-up one or more dimensions to desired level. This functionality is provided for cases when you would like to report at higher level of aggregation than the cell you provided is in. It works in similar way as drill down in serveraggregate but in the opposite direction (it is like cd .. in a UNIX shell). Example: You are reporting for year 2010, but you want to have a bar chart with all years. You specify rollup:
... "rollup": "date", ...
Roll-up can be: a string - single dimension to be rolled up one level an array - list of dimension names to be rolled-up one level a dictionary where keys are dimension names and values are levels to be rolled up-to
52
Run the server using the Slicer tool (see slicer - Command Line Tool):
slicer serve grants_config.ini
Note: the path in [model] has to be full path to the model, not relative to the conguration le. Place the le in the same directory as the following WSGI script (for convenience). Create a WSGI script /var/www/wsgi/olap/procurements.wsgi:
import os.path import cubes CURRENT_DIR = os.path.dirname(os.path.abspath(__file__)) # Set the configuration file name (and possibly whole path) here CONFIG_PATH = os.path.join(CURRENT_DIR, "slicer.ini") application = cubes.server.create_server(CONFIG_PATH)
53
Reply:
{ "drilldown": {}, "summary": { "received_amount_sum": 399136450.0, "requested_amount_sum": 2394804837.56, "record_count": 4390 } }
8.3.4 Conguration
Server conguration is stored in .ini les with sections: [server] - server related conguration, such as host, port backend - backend name, use sql for relational database backend log - path to a log le log_level - level of log details, from least to most: error, warn, info, debug json_record_limit - number of rows to limit when generating JSON output with iterable objects, such as facts. Default is 1000. It is recommended to use alternate response format, such as CSV, to get more records. modules - space separated list of modules to be loaded (only used if run by the slicer command) prettyprint - default value of prettyprint parameter. Set to true for demonstration purposes. host - host where the server runs, defaults to localhost port - port on which the server listens, defaults to 5000 [model] - model and cube conguration path - path to model .json le locales - comma separated list of locales the model is provided in. Currently this variable is optional and it is used only by experimental sphinx search backend. [translations] - model translation les, option keys in this section are locale names and values are paths to model translation les. See Localization for more information.
54
Backend workspace conguration should be in the [workspace]. See backends Aggregation Browsing Backends for more information. Workspace with SQL backend (backend=sql in [server]) options: url (required) database URL in form: adapter://user:password@host:port/database schema (optional) schema containing denormalized views for relational DB cubes dimension_prefix (optional) used by snowake mapper to nd dimension tables when no explicit mapping is specied dimension_schema use this option when dimension tables are stored in different schema than the fact tables fact_prefix (optional) used by the snowake mapper to nd fact table for a cube, when no explicit fact table name is specied use_denormalization (optional) browser will use dernormalized view instead of snowake denormalized_view_prefix (optional, advanced) if denormalization is used, then this prex is added for cube name to nd corresponding cube view denormalized_view_schema (optional, advanced) schema wehere denormalized views are located (use this if the views are in different schema than fact tables, otherwise default schema is going to be used) Example conguration le:
[server] reload: yes log: /var/log/cubes.log log_level: info backend: sql [workspace] url: postgresql://localhost/data schema: cubes [model] path: ~/models/contracts_model.json locales: en,sk [translations] sk: ~/models/contracts_model-sk.json
55
56
CHAPTER
NINE
or:
slicer command sub_command [sub_command_options]
Commands are: Command serve model validate model json extract_locale translate test ddl Description Start OLAP server Validates logical model for OLAP cubes Create JSON representation of a model (can be used) when model is a directory. Get localizable part of the model Translate model with translation le Test the model against backend database (experimental) Generate DDL for SQL backend (experimental)
9.1 serve
Run Cubes OLAP HTTP server. Example server conguration le slicer.ini:
[server] host: localhost port: 5000 reload: yes log_level: info [workspace] url: sqlite:///tutorial.sqlite view_prefix: vft_ [model] path: models/model_04.json
57
In the [server] section, space separated list of modules to be imported can be specied under option modules:
[server] modules=cutom_backend ...
For more information about OLAP HTTP server see OLAP Server
Optional arguments:
-d, --defaults show defaults -w, --no-warnings disable warnings -t TRANSLATION, --translation TRANSLATION model translation file
For more information see Model Validation in Logical Model and Metadata Example output:
loading model wdmmg_model.json ------------------------cubes: 1 wdmmg dimensions: 5 date pog region cofog from ------------------------found 3 issues validation results: warning: No hierarchies in dimension date, flat level year will be used warning: Level year in dimension date has no key attribute specified warning: Level from in dimension from has no key attribute specified 0 errors, 3 warnings
The tool output contains recommendation whether the model can be used: model can be used - if there are no errors, no warnings and no defaults used, mostly when the model is explicitly described in every detail model can be used, make sure that defaults reect reality - there are no errors, no warnings, but the model might be not complete and default assumptions are applied not recommended to use the model, some issues might emerge - there are just warnings, no validation errors. Some queries or any other operations might produce invalid or unexpected output model can not be used - model contain errors and it is unusable
58
9.6 ddl
Note: This is experimental command. Generates DDL schema of a model for SQL backend Usage:
slicer ddl [-h] [--dimension-prefix DIMENSION_PREFIX] [--fact-prefix FACT_PREFIX] [--backend BACKEND] url model
positional arguments:
url model SQL database connection URL model reference - can be a local file path or URL
optional arguments:
--dimension-prefix DIMENSION_PREFIX prefix for dimension tables --fact-prefix FACT_PREFIX prefix for fact tables --backend BACKEND backend name (currently limited only to SQL backends)
9.7 denormalize
Usage:
slicer denormalize [-h] [-p PREFIX] [-f] [-m] [-i] [-s SCHEMA] [-c CUBE] config
positional arguments:
config slicer confuguration .ini file
optional arguments:
59
-h, --help show this help message and exit -p PREFIX, --prefix PREFIX prefix for denormalized views (overrides config value) -f, --force replace existing views -m, --materialize create materialized view (table) -i, --index create index for key attributes -s SCHEMA, --schema SCHEMA target view schema (overrides config value) -c CUBE, --cube CUBE cube(s) to be denormalized, if not specified then all in the model
9.7.1 Examples
If you plan to use denormalized views, you have to specify it in the conguration in the [workspace] section:
[workspace] denormalized_view_prefix = mft_ denormalized_view_schema = denorm_views # This switch is used by the browser: use_denormalization = yes
The denormalization will create tables like denorm_views.mft_contracts for a cube named contracts. The browser will use the view if option use_denormalization is set to a true value. Denormalize all cubes in the model:
slicer denormalize slicer.ini
9.7.2 Schema
Schema where denormalized view is created is schema specied in the conguration le. Schema is shared with fact tables and views. If you want to have views in separate schema, specify denormalized_view_schema option in the conguration. If for any specic reason you would like to denormalize into a completely different schema than specied in the conguration, you can specify it with the --schema option.
60
CHAPTER
TEN
REFERENCE
Contents:
61
cubes.create_model(model, cubes=None, dimensions=None) Create a model from a model description dictionary in model. This is designated way of creating the model from a dictionary. cubes or dimensions are list of their respective dictionary denitions. If denition of a cube in cubes or dimension in dimensions already exists in the model, then ModelError is raised. The model dictionary contains main model description. The structure is:
{ "name": "public_procurements", "label": "Procurements", "description": "Procurement Contracts of an Organisation" "cubes": [...] "dimensions": [...] }
Note: Current implementation is the same as passing description to the Model class initialization. In the future all default constructions will be moved here and the initialization methods will require logical model class instances. cubes.create_cube(desc, dimensions) Creates a Cube instance from dictionary description desc with dimension dictionary in dimensions In le based model representation, the cube descriptions are stored in json les with prex cube_ like cube_contracts, or as a dictionary for key cubes in the model description dictionary. JSON example:
{ "name": "contracts", "measures": ["amount"], "dimensions": [ "date", "contractor", "type"] "details": ["contract_name"], }
cubes.create_dimension(obj, dimensions=None) Creates a Dimension instance from obj which can be a Dimension instance or a string or a dictionary. If it is a string, then it represents dimension name, the only level name and the only attribute. Keys of a dictionary representation: name: dimension name levels: list of dimension levels (see: cubes.Level) hierarchies or hierarchy: list of dimension hierarchies or list of level names of a single hierarchy. Only one of the two should be specied, otherwise an exception is raised. default_hierarchy_name: name of a hierarchy that will be used when no hierarchy is explicitly specied label: dimension name that will be displayed (human readable) description: human readable dimension description info - custom information dictionary, might be used to store application/front-end specic information (icon, color, ...) template name of a dimension to be used as template. The dimension is taken from dimensions argument which should be a dictionary of already created dimensions. Defaults If no levels are specied during initialization, then dimension name is considered at, with single attribute.
62
If no hierarchy is specied and levels are specied, then default hierarchy will be created from order of levels If no levels are specied, then one level is created, with name default and dimension will be considered at String representation of a dimension str(dimension) is equal to dimension name. Class is not meant to be mutable. Raises ModelInconsistencyError when both hierarchy and hierarchies is specied. cubes.create_level(obj) Creates a level from obj which can be a Level instance, string or a dictionary. If it is a string, then the string represents level name and the only one attribute of the level. If obj is a dictionary, then the keys are: name level name attributes list of level attributes key name of key attribute label_attribute name of label attribute Defaults: if no attributes are specied, then one is created with the same name as the level name. Simple Models There might be cases where one would like to analyse simple (denormalised) table. It can be either a table in a database or data from a single CSV le. For convenience, there is an utility function called simple_model that will create a model with just one cube, simple dimensions and couple of specied measures. cubes.simple_model(cube_name, dimensions, measures) Create a simple model with only one cube with name cube_nameand at dimensions. dimensions is a list of dimension names as strings and measures is a list of measure names, also as strings. This is convenience method mostly for quick model creation for denormalized views or tables with data from a single CSV le. Example:
model = simple_model("contracts", dimensions=["year", "supplier", "subject"], measures=["amount"]) cube = model.cube("contracts") browser = workspace.create_browser(cube)
63
description - longer human-readable description of the model info - custom information dictionary add_cube(cube) Adds cube to the model and also assigns the model to the cube. If cube has a model assigned and it is not this model, then error is raised. Raises ModelInconsistencyError when trying to assing a cube that is already assigned to a different model or if trying to add a dimension with existing name but different specication. add_dimension(dimension) Add dimension to model. Replace dimension with same name cube(cube) Get a cube with name name or coalesce object to a cube. dimension(dim) Get dimension by name or by object. Raises NoSuchDimensionError when there is no dimension dim. is_valid(strict=False) Check whether model is valid. Model is considered valid if there are no validation errors. If you want to be sure that there are no warnings as well, set strict to True. If strict is False only errors are considered fatal, if True also warnings will make model invalid. Returns True when model is valid, otherwise returns False. localizable_dictionary() Get model locale dictionary - localizable parts of the model localize(translation) Return localized version of model remove_cube(cube) Removes cube from the model remove_dimension(dimension) Remove a dimension from receiver to_dict(**options) Return dictionary representation of the model. All object references within the dictionary are name based expand_dimensions - if set to True then fully expand dimension information in cubes full_attribute_names - if set to True then attribute names will be written as dimension_name.attribute_name validate() Validate the model, check for model consistency. Validation result is array of tuples in form: (validation_result, message) where validation_result can be warning or error. Returs: array of tuples Cube class cubes.Cube(name, dimensions=None, measures=None, model=None, label=None, details=None, mappings=None, joins=None, fact=None, key=None, description=None, options=None, info=None, **kwargs) Create a new Cube model. Attributes: name: cube name measures: list of measure attributes dimensions: list of dimensions (should be Dimension instances)
64
model: model the cube belongs to label: human readable cube label details: list of detail attributes description - human readable description of the cube key: fact key eld (if not specied, then backend default key will be used, mostly id for SLQ or _id for document based databases) info - custom information dictionary, might be used to store application/front-end specic information Attributes used by backends: mappings - backend-specic logical to physical mapping dictionary joins - backend-specic join specication (used in SQL backend) fact - fact dataset (table) name (physical reference) options - dictionary of other options used by the backend - refer to the backend documentation to see what options are used (for example SQL browser might look here for denormalized_view in case of denormalized browsing) add_dimension(dimension) Add dimension to cube. Replace dimension with same name. Raises ModelInconsistencyError when dimension with same name already exists in the receiver. dimension(obj) Get dimension object. If obj is a string, then dimension with given name is returned, otherwise dimension object is returned if it belongs to the cube. Raises NoSuchDimensionError when there is no such dimension. measure(obj) Get measure object. If obj is a string, then measure with given name is returned, otherwise measure object is returned if it belongs to the cube. Returned object is of Attribute type. Raises NoSuchAttributeError when there is no such measure or when there are multiple measures with the same name (which also means that the model is not valid). remove_dimension(dimension) Remove a dimension from receiver. dimension can be either dimension name or dimension object. to_dict(expand_dimensions=False, with_mappings=True, **options) Convert to a dictionary. If expand_dimensions is True (default is False) then fully expand dimension information If with_mappings is True (which is default) then joins, mappings, fact and options are included. Should be set to False when returning a dictionary that will be provided in an user interface or through server API. validate() Validate cube. See Model.validate() for more information. Dimension, Hierarchy, Level class cubes.Dimension(name, levels, hierarchies=None, default_hierarchy_name=None, bel=None, description=None, info=None, **desc) Create a new dimension Attributes: name: dimension name levels: list of dimension levels (see: cubes.Level) hierarchies: list of dimension hierarchies. If no hierarchies are specied, then default one is created from ordered list of levels. la-
65
default_hierarchy_name: name of a hierarchy that will be used when no hierarchy is explicitly specied label: dimension name that will be displayed (human readable) description: human readable dimension description info - custom information dictionary, might be used to store application/front-end specic information (icon, color, ...) Dimension class is not meant to be mutable. All level attributes will have new dimension assigned. Note that the dimension will claim ownership of levels and their attributes. You should make sure that you pass a copy of levels if you are cloning another dimension. all_attributes() Return all dimension attributes regardless of hierarchy. Order is not guaranteed, use cubes.Hierarchy.all_attributes() to get known order. Order of attributes within level is preserved. attribute(reference) Get dimension attribute from reference. default_hierarchy Get default hierarchy specied by default_hierarchy_name, if the variable is not set then get a hierarchy with name default Warning: Depreciated. Use Dimension.hierarchy() instead. has_details Returns True when each level has only one attribute, usually key. hierarchy(obj=None) Get hierarchy object either by name or as Hierarchy. If obj is None then default hierarchy is returned. is_flat Is true if dimension has only one level key_attributes() Return all dimension key attributes, regardless of hierarchy. Order is not guaranteed, use a hierarchy to have known order. level(obj) Get level by name or as Level object. This method is used for coalescing value level_names Get list of level names. Order is not guaranteed, use a hierarchy to have known order. levels Get list of all dimension levels. Order is not guaranteed, use a hierarchy to have known order. to_dict(**options) Return dictionary representation of the dimension validate() Validate dimension. See Model.validate() for more information. class cubes.Hierarchy(name, levels, dimension=None, label=None, info=None) Dimension hierarchy - species order of dimension levels. Attributes: name: hierarchy name dimension: dimension the hierarchy belongs to label: human readable name levels: ordered list of levels or level names from dimension 66 Chapter 10. Reference
info - custom information dictionary, might be used to store application/front-end specic information Some collection operations might be used, such as level in hierarchy or hierarchy[index]. String value str(hierarchy) gives the hierarchy name. all_attributes() Return all dimension attributes as a single list. is_last(level) Returns True if level is last level of the hierarchy. key_attributes() Return all dimension key attributes as a single list. level_index(level) Get order index of level. Can be used for ordering and comparing levels within hierarchy. levels_for_depth(depth, drilldown=False) Returns levels for given depth. If path is longer than hierarchy levels, cubes.ArgumentError exception is raised levels_for_path(path, drilldown=False) Returns levels for given path. If path is longer than hierarchy levels, cubes.ArgumentError exception is raised next_level(level) Returns next level in hierarchy after level. If level is last level, returns None. If level is None, then the rst level is returned. path_is_base(path) Returns True if path is base path for the hierarchy. Base path is a path where there are no more levels to be added - no drill down possible. previous_level(level) Returns previous level in hierarchy after level. If level is rst level or None, returns None rollup(path, level=None) Rolls-up the path to the level. If level is None then path is rolled-up only one level. If level is deeper than last level of path the cubes.HierarchyError exception is raised. If level is the same as path level, nothing happens. to_dict(**options) Convert to dictionary. Keys: name: hierarchy name label: human readable label (localizable) levels: level names class cubes.Level(name, attributes, dimension=None, key=None, bel_attribute=None, label=None, info=None) Object representing a hierarchy level. Holds all level attributes. sort_key=None, la-
This object is immutable, except localization. You have to set up all attributes in the initialisation process. Attributes: name: level name dimension: dimnesion the level is associated with attributes: list of all level attributes. Raises ModelError when attribute list is empty. key: name of level key attribute (for example: customer_number for customer level, region_code for region level, month for month level). key will be used as a grouping eld for aggregations. Key should be unique within level. If not specied, then the rst attribute is used as key. sort_key: name of attribute that is going to be used for sorting
67
label_attribute: name of attribute containing label to be displayed (for example: customer_name for customer level, region_name for region level, month_name for month level) label: human readable label of the level info: custom information dictionary, might be used to store application/front-end specic information attribute(name) Get attribute by name has_details Is True when level has more than one attribute, for all levels with only one attribute it is False. to_dict(full_attribute_names=False, **options) Convert to dictionary class cubes.Attribute(name, label=None, locales=None, order=None, description=None, dimension=None, aggregations=None, info=None, format=None, **kwargs) Cube attribute - represents any fact eld/column Attributes: name - attribute name, used as identier label - attribute label displayed to a user locales = list of locales that the attribute is localized to order - default order of this attribute. If not specied, then order is unexpected. Possible values are: asc or desc. It is recommended and safe to use Attribute.ASC and Attribute.DESC aggregations - list of default aggregations to be performed on this attribute if it is a measure. It is backend-specic, but most common might be: sum, min, max, ... info - custom information dictionary, might be used to store application/front-end specic information format - application-specic display format information, useful for formatting numeric values of measure attributes String representation of the Attribute returns its name (without dimension prex). cubes.ArgumentError is raised when unknown ordering type is specied. full_name(dimension=None, locale=None, simplify=True) Return full name of an attribute as if it was part of dimension. Append locale if it is one of of attributes locales, otherwise raise cubes.ArgumentError. ref(simplify=True, locale=None) Return full attribute reference. Append locale if it is one of of attributes locales, otherwise raise cubes.ArgumentError. If simplify is True, then reference to an attribute of at dimension without details will be just the dimension name. Warning: This method might be renamed. Helper function to coalesce list of attributes, which can be provided as strings or as Attribute objects: cubes.attribute_list(attributes, dimension=None, attribute_class=None) Create a list of attributes from a list of strings or dictionaries. see cubes.coalesce_attribute() for more information. exception ModelError Exception raised when there is an error with model and its structure, mostly during model construction. exception ModelIncosistencyError Raised when there is incosistency in model structure, mostly when model was created programatically in a wrong way by mismatching classes or misonguration. exception NoSuchDimensionError Raised when a dimension is requested that does not exist in the model. 68 Chapter 10. Reference
69
If there are no more levels to be drilled down, an exception is raised. Say your model has three levels of the date dimension: year, month, day and you try to drill down by date at the next level then ValueError will be raised. Retruns a :class:AggregationResult object. cell_details(cell=None, dimension=None) Returns details for the cell. Returned object is a list with one element for each cell cut. If dimension is specied, then details only for cuts that use the dimension are returned. Default implemenatation calls AggregationBrowser.cut_details() for each cut. Backends might customize this method to make it more efcient. cut_details(cut) Returns details for a cut which should be a Cut instance. PointCut - all attributes for each level in the path SetCut - list of PointCut results, one per path in the set RangeCut - PointCut-like results for lower range (from) and upper range (to) Default implemenatation uses AggregationBrowser.values() for each path. Backends might customize this method to make it more efcient. dimension_object(dimension) Helper function to return proper dimension object as a subclass of Dimension. Warning: Depreciated. Use cubes.Cube.dimension() Arguments: dimension - a dimension object or a string, if it is a string, then dimension object is retrieved from cube fact(key) Returns a single fact from cube specied by fact key key facts(cell=None, **options) Return an iterable object with of all facts within cell features = [] List of browser features as strings. report(cell, queries) Bundle multiple requests from queries into a single one. Keys of queries are custom names of queries which caller can later use to retrieve respective query result. Values are dictionaries specifying arguments of the particular query. Each query should contain at least one required value query which contains name of the query function: aggregate, facts, fact, values and cell cell (for cell details). Rest of values are function specic, please refer to the respective function documentation for more information. Example:
queries = { "product_summary" = { "query": "aggregate", "drilldown": "product" } "year_list" = { "query": "values", "dimension": "date", "depth": 1 } }
Result is a dictionary where keys wil lbe the query names specied in report specication and values will be result values from each query call.:
70
This method provides convenient way to perform multiple common queries at once, for example you might want to have always on a page: total transaction count, total transaction amount, drill-down by year and drill-down by transaction type. Raises cubes.ArgumentError when there are no queries specied or if a query is of unknown type. Roll-up Report queries might contain rollup specication which will result in rolling-up one or more dimensions to desired level. This functionality is provided for cases when you would like to report at higher level of aggregation than the cell you provided is in. It works in similar way as drill down in AggregationBrowser.aggregate() but in the opposite direction (it is like cd .. in a UNIX shell). Example: You are reporting for year 2010, but you want to have a bar chart with all years. You specify rollup:
... "rollup": "date", ...
Roll-up can be: a string - single dimension to be rolled up one level an array - list of dimension names to be rolled-up one level a dictionary where keys are dimension names and values are levels to be rolled up-to Future In the future there might be optimisations added to this method, therefore it will become faster than subsequent separate requests. Also when used with Slicer OLAP service server number of HTTP call overhead is reduced. values(cell, dimension, depth=None, paths=None, hierarchy=None, **options) Return values for dimension with level depth depth. If depth is None, all levels are returned. Note: Some backends might support only default hierarchy.
Result The result of aggregated browsing is returned as object: class cubes.AggregationResult Result of aggregation or drill down. Attributes: cell cell that this result is aggregate of summary - dictionary of summary row elds cells - list of cells that were drilled-down total_cell_count - number of total cells in drill-down (after limit, before pagination) measures measures that were selected in aggregation remainder - summary of remaining cells (not yet implemented) levels aggregation levels for dimensions that were used to drill- down
71
Note: Implementors of aggregation browsers should populate cell, measures and levels from the aggregate query. cached() Return shallow copy of the receiver with cached cells. If cells are an iterator, they are all fetched in a list. cross_table(onrows, oncolumns, measures=None) Creates a cross table from results cells. onrows contains list of attribute names to be placed at rows and oncolumns contains list of attribute names to be placet at columns. measures is a list of measures to be put into cells. If measures are not specied, then only record_count is used. Returns a named tuble with attributes: columns - labels of columns. The tuples correspond to values of attributes in oncolumns. rows - labels of rows as list of tuples. The tuples correspond to values of attributes in onrows. data - list of measure data per row. Each row is a list of measure tuples as specied in measures. Warning: object. Experimental implementation. Interface might change - either arguments or result
table_rows(dimension, depth=None) Returns iterator of drilled-down rows which yields a named tuple with named attributes: (key, label, path, record). depth is last level of interest. If not specied (set to None) then deepest level for dimension is used. key: value of key dimension attribute at level of interest label: value of label dimension attribute at level of interest path: full path for the drilled-down cell is_base: True when dimension element is base (can not drill down more) record: all drill-down attributes of the cell Example use:
for row in result.table_rows(dimension): print "%s: %s" % (row.label, row.record["record_count"])
dimension has to be cubes.Dimension object. Raises TypeError when cut for dimension is not PointCut. to_dict() Return dictionary representation of the aggregation result. Can be used for JSON serialisation.
72
is equivalent to: cut = cubes.PointCut(date, [2010, 1]) cell = cubes.Cell(cube, [cut]) Reverse operation is cubes.rollup("date") Works only if the cut for dimension is PointCut. Otherwise the behaviour is undened. Returns: new derived cell object. is_base(dimension, hierarchy=None) Returns True when cell is base cell for dimension. Cell is base if there is a point cut with path referring to the most detailed level of the dimension hierarchy. level_depths() Returns a dictionary of dimension names as keys and level depths (index of deepest level). multi_slice(cuts) Create another cell by slicing through multiple slices. cuts is a list of Cut object instances. See also Cell.slice(). point_cut_for_dimension(dimension) Return rst point cut for given dimension point_slice(dimension, path) Create another cell by slicing receiving cell through dimension at path. Receiving object is not modied. If cut with dimension exists it is replaced with new one. If path is empty list or is none, then cut for given dimension is removed. Example:
full_cube = Cell(cube) contracts_2010 = full_cube.point_slice("date", [2010])
Returns: new derived cell object. Warning: Depreiated. Use cell.slice() instead with argument PointCut(dimension, path) rollup(rollup) Rolls-up cell - goes one or more levels up through dimension hierarchy. It works in similar way as drill down in AggregationBrowser.aggregate() but in the opposite direction (it is like cd .. in a UNIX shell). Roll-up can be: a string - single dimension to be rolled up one level an array - list of dimension names to be rolled-up one level a dictionary where keys are dimension names and values are levels to be rolled up-to Note: Only default hierarchy is currently supported. rollup_dim(dimension, level=None, hierarchy=None) Rolls-up cell - goes one or more levels up through dimension hierarchy. If there is no level to go up (we are at the top level), then the cut is removed. Returns new cell object. Note: Only default hierarchy is currently supported.
73
slice(cut, dummy=None) Returns new cell by slicing receiving cell with cut. Cut with same dimension as cut will be replaced, if there is no cut with the same dimension, then the cut will be appended. to_dict() Returns a dictionary representation of the cell to_str() Return string representation of the cell by using standard cuts-to-string conversion. Cuts class cubes.PointCut(dimension, path, hierarchy=None) Object describing way of slicing a cube (cell) through point in a dimension level_depth() Returns index of deepest level. to_dict() Returns dictionary representation of the receiver. The keys are: dimension, type=point and path. class cubes.RangeCut(dimension, from_path, to_path, hierarchy=None) Object describing way of slicing a cube (cell) between two points of a dimension that has ordered points. For dimensions with unordered points behaviour is unknown. level_depth() Returns index of deepest level which is equivalent to the longest path. to_dict() Returns dictionary representation of the receiver. The keys are: dimension, type=range, from and to paths. class cubes.SetCut(dimension, paths, hierarchy=None) Object describing way of slicing a cube (cell) between two points of a dimension that has ordered points. For dimensions with unordered points behaviour is unknown. level_depth() Returns index of deepest level which is equivalent to the longest path. to_dict() Returns dictionary representation of the receiver. The keys are: dimension, type=range and set as a list of paths. String conversions In applications where slicing and dicing can be specied in form of a string, such as arguments of HTTP requests of an web application, there are couple helper methods that do the string-to-object conversion: cubes.cuts_from_string(string) Return list of cuts specied in string. You can use this function to parse cuts encoded in a URL. Examples:
date:2004 date:2004,1 date:2004,1|class=5 date:2004,1,1|category:5,10,12|class:5
74
date:2004,5date:-2010
Grammar:
<list> ::= <cut> | <cut> | <list> <cut> ::= <dimension> : <path> <dimension> ::= <identifier> <path> ::= <value> | <value> , <path>
The characters |, : and , are congured in CUT_STRING_SEPARATOR, DIMENSION_STRING_SEPARATOR, PATH_STRING_SEPARATOR respectively. cubes.string_from_cuts(cuts) Returns a string represeting cuts. String can be used in URLs cubes.string_from_path(path) Returns a string representing dimension path. If path is None or empty, then returns empty string. The ptah elements are comma , spearated. Raises ValueError when path elements contain characters that are not allowed in path element (alphanumeric and underscore _). cubes.path_from_string(string) Returns a dimension point path from string. The path elements are separated by comma , character. Returns an empty list when string is empty or None. cubes.levels_from_drilldown(cell, drilldown) Converts drilldown into a list of levels to be used to drill down. drilldown can be: list of dimensions list of dimension level specier strings (dimension@hierarchy:level) list of tuples in form (dimension, hierarchy, level). If drilldown is a list of dimensions or if the level is not specied, then next level in the cell is considered. The implicit next level is determined from a PointCut for dimension in the cell. For other types of cuts, such as range or set, next level is the rst level of hierarachy. Returns a list of tuples: (dimension, levels) where levels is a list of levels to be drilled down. Note: For backward compatibility the function accepts a dictionary where keys are dimension names and values are level names to drill up to. This is argument format is depreciated.
Mapper class cubes.Mapper(cube, locale=None, schema=None, fact_name=None, **options) Abstract class for mappers which maps logical references to physical references (tables and columns). Attributes: cube - mapped cube simplify_dimension_references references for at dimensions (with one level and no details) will be just dimension names, no attribute name. Might be useful when using single-table schema, for example, with couple of one-column dimensions. fact_name fact name, if not specied then cube.name is used 10.2. Aggregation Browser Reference 75
schema default database schema all_attributes(expand_locales=False) Return a list of all attributes of a cube. If expand_locales is True, then localized logical reference is returned for each attributes locale. attribute(name) Returns an attribute with logical reference name. logical(attribute, locale=None) Returns logical reference as string for attribute in dimension. If dimension is Null then fact table is assumed. The logical reference might have following forms: dimension.attribute - dimension attribute attribute - fact measure or detail If simplify_dimension_references is True then references for at dimensios without details is dimension. If locale is specied, then locale is added to the reference. This is used by backends and other mappers, it has no real use in end-user browsing. map_attributes(attributes, expand_locales=False) Convert attributes to physical attributes. If expand_locales is True then physical reference for every attribute locale is returned. physical(attribute, locale=None) Returns physical reference as tuple for attribute, which should be an instance of cubes.model.Attribute. If there is no dimension specied in attribute, then fact table is assumed. The returned tuple has structure: (schema, table, column). This method should be implemented by Mapper subclasses. relevant_joins(attributes) Get relevant joins to the attributes - list of joins that are required to be able to acces specied attributes. attributes is a list of three element tuples: (schema, table, attribute). Subclasses sohuld implement this method. split_logical(reference) Returns tuple (dimension, attribute) from logical_reference string. dimensions.attribute. Syntax of the string is:
class cubes.SnowflakeMapper(cube, mappings=None, locale=None, schema=None, fact_name=None, dimension_prex=None, joins=None, dimension_schema=None, **options) A snowake schema mapper for a cube. The mapper creates required joins, resolves table names and maps logical references to tables and respective columns. Attributes: cube - mapped cube mappings dictionary containing mappings simplify_dimension_references references for at dimensions (with one level and no details) will be just dimension names, no attribute name. Might be useful when using single-table schema, for example, with couple of one-column dimensions. dimension_prex default prex of dimension tables, if default table name is used in physical reference construction fact_name fact name, if not specied then cube.name is used schema default database schema dimension_prex prex for dimension tables dimension_schema schema whre dimension tables are stored (if different than fact table schema) 76 Chapter 10. Reference
mappings is a dictionary where keys are logical attribute references and values are table column references. The keys are mostly in the form: attribute for measures and fact details attribute.locale for localized fact details dimension.attribute for dimension attributes dimension.attribute.locale for localized dimension attributes The values might be specied as strings in the form table.column (covering most of the cases) or as a dictionary with keys schema, table and column for more customized references. physical(attribute, locale=None) Returns physical reference as tuple for attribute, which should be an instance of cubes.model.Attribute. If there is no dimension specied in attribute, then fact table is assumed. The returned tuple has structure: (schema, table, column). The algorithm to nd physicl reference is as follows:
IF localization is requested: IF is attribute is localizable: IF requested locale is one of attribute locales USE requested locale ELSE USE default attribute locale ELSE do not localize IF mappings exist: GET string for logical reference IF locale: append . and locale to the logical reference IF mapping value exists for localized logical reference USE value as reference IF no mappings OR no mapping was found: column name is attribute name IF locale: append _ and locale to the column name IF dimension specified: # Example: date.year -> date.year table name is dimension name IF there is dimension table prefix use the prefix for table name ELSE (if no dimension is specified): # Example: date -> fact.date table name is fact table name
relevant_joins(attributes) Get relevant joins to the attributes - list of joins that are required to be able to acces specied attributes. attributes is a list of three element tuples: (schema, table, attribute). table_map() Return list of references to all tables. Keys are aliased tables: (schema, aliased_table_name) and values are real tables: (schema, table_name). Included is the fact table and all tables mentioned in joins. To get list of all physical tables where aliased tablesare included only once:
77
class cubes.DenormalizedMapper(cube, locale=None, schema=None, denormalized_view_prex=None, ized_view_schema=None, **options) Creates a mapper for a cube that has data stored in a denormalized view/table. Attributes:
fact_name=None, denormal-
denormalized_view_prex default prex used for constructing view name from cube name fact_name fact name, if not specied then cube.name is used schema schema where the denormalized view is stored fact_schema database schema for the original fact table physical(attribute, locale=None) Returns same name as localized logical reference. relevant_joins(attributes) Returns an empty list. No joins are necessary for denormalized view.
78
browser(cube, locale=None) Returns a browser for a cube. browser_for_cube(cube, locale=None) Creates, congures and returns a browser for a cube. Note: Use workspace.browser() instead. create_conformed_rollup(cube, dimension, level=None, hierarchy=None, schema=None, dimension_prex=None, replace=False) Extracts dimension values at certain level into a separate table. The new table name will be composed of dimension_prex, dimension name and sufxed by dimension level. For example a product dimension at category level with prex dim_ will be called dim_product_category Attributes: dimension dimension to be extracted level grain level hierarchy hierarchy to be used schema target schema dimension_prex prex used for the dimension table replace if True then existing table will be replaced, otherwise an exception is raised if table already exists. create_conformed_rollups(cube, dimensions, grain=None, schema=None, dimension_prex=None, replace=False) Extract multiple dimensions from a snowake. See extract_dimension() for more information. grain is a dictionary where keys are dimension names and values are levels, if level is None then all levels are considered. create_cube_aggregate(cube, table_name=None, dimensions=None, required_dimensions=None, schema=None, replace=False) Creates an aggregate table. If dimensions is None then all cubes dimensions are considered. Arguments: dimensions: list of dimensions to use in the aggregated cuboid, if None then all cube dimensions are used required_dimensions: list of dimensions that are required for each aggregation (for example a date dimension in most of the cases). The list should be a subsed of dimensions. aggregates_prex: aggregated table prex aggregates_schema: schema where aggregates are stored create_denormalized_view(cube, view_name=None, materialize=False, replace=False, create_index=False, keys_only=False, schema=None) Creates a denormalized view named view_name of a cube. If view_name is None then view name is constructed by pre-pending value of denormalized_view_prex from workspace options to the cube name. If no prex is specied in the options, then view name will be equal to the cube name. Options: materialize - whether the view is materialized (a table) or regular view replace - if True then existing table/view will be replaced, otherwise an exception is raised when trying to create view/table with already existing name create_index - if True then index is created for each key attribute. Can be used only on materialized view, otherwise raises an exception keys_only - if True then only key attributes are used in the view, all other detail attributes are ignored 10.3. backends Aggregation Browsing Backends 79
schema - target schema of the denormalized view, if not specied, then denormalized_view_schema from options is used if specied, otherwise default workspace schema is used (same schema as fact table schema). validate_model() Validate physical representation of model. Returns a list of dictionaries with keys: type, issue, object. Types might be: join or attribute. The join issues are: no_table - there is no table for join duplicity - either table or alias is specied more than once The attribute issues are: no_table - there is no table for attribute no_column - there is no column for attribute duplicity - attribute is found more than once Browser class cubes.backends.sql.star.SnowflakeBrowser(cube, connectable=None, locale=None, metadata=None, debug=False, **options) SnowakeBrowser is a SQL-based AggregationBrowser implementation that can aggregate star and snowake schemas without need of having explicit view or physical denormalized table. Attributes: cube - browsed cube connectable - SQLAlchemy connectable object (engine or connection) locale - locale used for browsing metadata - SQLAlchemy MetaData object debug - output SQL to the logger at INFO level options - passed to the mapper and context (see their respective documentation) Limitations: only one locale can be used for browsing at a time locale is implemented as denormalized: one column for each language aggregate(cell=None, measures=None, drilldown=None, page_size=None, order=None, **options) Return aggregated result. Number of database queries: without drill-down: 1 (summary) with drill-down: 3 (summary, drilldown, total drill-down record count) fact(key_value) Get a single fact with key key_value from cube. facts(cell, order=None, page=None, page_size=None) Return all facts from cell, might be ordered and paginated. attributes=None, page=None,
80
validate() Validate physical representation of model. Returns a list of dictionaries with keys: type, issue, object. Types might be: join or attribute. The join issues are: no_table - there is no table for join duplicity - either table or alias is specied more than once The attribute issues are: no_table - there is no table for attribute no_column - there is no column for attribute duplicity - attribute is found more than once values(cell, dimension, depth=None, paths=None, hierarchy=None, page=None, page_size=None, order=None, **options) Return values for dimension with level depth depth. If depth is None, all levels are returned. Number of database queries: 1. class cubes.backends.sql.star.QueryContext(cube, mapper, metadata, **options) Object providing context for constructing queries. Puts together the mapper and physical structure. mapper - which is used for mapping logical to physical attributes and performing joins. metadata is a sqlalchemy.MetaData instance for getting physical table representations. Object attributes: fact_table the physical fact table - sqlalchemy.Table instance tables a dictionary where keys are table references (schema, table) or (shchema, alias) to real tables - sqlalchemy.Table instances Note: To get results as a dictionary, you should zip() the returned rows after statement execution with: labels = [column.name for column in statement.columns] ... record = dict(zip(labels, row)) This is little overhead for a workaround for SQLAlchemy behaviour in SQLite database. SQLite engine does not respect dots in column names which results in duplicate column name error. aggregation_statement(cell, measures=None, attributes=None, drilldown=None) Return a statement for summarized aggregation. whereclause is same as SQLAlchemy whereclause for sqlalchemy.sql.expression.select(). attributes is list of logical references to attributes to be selected. If it is None then all attributes are used. drilldown has to be a dictionary. Use levels_from_drilldown() to prepare correct drill-down statement. aggregations_for_measure(measure) Returns list of aggregation functions (sqlalchemy) on measure columns. The result columns are labeled as measure + _ = aggregation, for example: amount_sum or discount_min. measure has to be Attribute instance. If measure has no explicit aggregations associated, then sum is assumed. boundary_condition(dim, hierarchy, path, bound, rst=None) Return a Condition tuple for a boundary condition. If bound is 1 then path is considered to be upper bound (operators < and <= are used), otherwise path is considered as lower bound (operators > and >= are used ) column(attribute, locale=None) Return a column object for attribute. locale is explicit locale to be used. If not specied, then the current browsing/mapping locale is used for localizable attributes.
81
columns(attributes, expand_locales=False) Returns list of columns.If expand_locales is True, then one column per attribute locale is added. condition_for_cell(cell) Constructs conditions for all cuts in the cell. Returns a named tuple with keys: condition - SQL conditions attributes - attributes that are involved in the conditions. This should be used for join construction. group_by - attributes used for GROUP BY expression condition_for_point(dim, path, hierarchy=None) Returns a Condition tuple (attributes, conditions, group_by) dimension dim point at path. It is a compound condition - one equality condition for each path element in form: level[i].key = path[i] denormalized_statement(attributes=None, expand_locales=False, include_fact_key=True) Return a statement (see class description for more information) for denormalized view. whereclause is same as SQLAlchemy whereclause for sqlalchemy.sql.expression.select(). attributes is list of logical references to attributes to be selected. If it is None then all attributes are used. Set expand_locales to True to expand all localized attributes. fact_statement(key_value) Return a statement for selecting a single fact based on key_value join_expression(joins) Create partial expression on a fact table with joins that can be used as core for a SELECT statement. join is a list of joins returned from mapper (most probably by Mapper.relevant_joins()) join_expression_for_attributes(attributes, expand_locales=False) Returns a join expression for attributes range_condition(dim, hierarchy, from_path, to_path) Return a condition for a hierarchical range (from_path, to_path). Return value is a Condition tuple. table(schema, table_name) Return a SQLAlchemy Table instance. If table was already accessed, then existing table is returned. Otherwise new instance is created. If schema is None then browsers default schema is used. Helper functions cubes.backends.sql.star.paginated_statement(statement, page, page_size) Returns paginated statement if page is provided, otherwise returns the same statement. cubes.backends.sql.star.ordered_statement(statement, order, context) Returns a SQL statement which is ordered according to the order. If the statement contains attributes that have natural order specied, then the natural order is used, if not overriden in the order. cubes.backends.sql.star.order_column(column, order) Orders a column according to order specied as string.
10.3.2 Slicer
This backend is just for backend development demonstration purposes. class cubes.backends.slicer.SlicerBrowser(cube, url, locale=None) Demo backend browser. This backend is serves just as example of a backend. Uses another Slicer server instance for doing all the work. You might use it as a template for your own browser. Attributes:
82
cube obligatory, but currently unused here url - base url of Cubes Slicer OLAP server
83
cubes.common.localize_attributes(attribs, translations) Localize list of attributes. translations should be a dictionary with keys as attribute names, values are dictionaries with localizable attribute metadata, such as label or description. cubes.common.get_localizable_attributes(obj) Returns a dictionary with localizable attributes of obj. cubes.common.collect_subclasses(parent, sufx=None) Collect all subclasses of parent and return a dictionary where keys are decamelized class names transformed to identiers and with sufx removed.
84
CHAPTER
ELEVEN
DEVELOPING CUBES
This chapter describes some guidelines how to contribute to the Cubes.
11.1 General
If you are puzzled why is something implemented certain way, ask before complaining. There might be a reason that is not explained and that should be described in documentation. Or there might not even be a reason for current implementation at all, and you suggestion might help. Until 1.0 the interface is not 100% decided and might change Focus is on usability, simplicity and understandability. There might be places where this might not be completely achieved and this feature of code should be considered as bug. For example: overcomplicated interface, too long call sequence which can be simplied, required over-conguration,... Magic is not bad, if used with caution and the mechanic is well documented. Also there should be a way how to do it manually for every magical feature.
85
86
CHAPTER
TWELVE
DEVELOPMENT NOTES
This chapter contains notes related to Cubes development, such as: unresolved design decisions suggestions proposals for changes explaination for certain design decisions Ive included this document as part of documentation to get more feedback or to help understanding why certain things are done in certain way at the time being.
87
88
CHAPTER
THIRTEEN
89
90
CHAPTER
FOURTEEN
LICENSE
Cubes is licensed under MIT license with small addition:
Copyright (c) 2011-2012 Stefan Urbanek, see AUTHORS for more details Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. If your version of the Software supports interaction with it remotely through a computer network, the above copyright notice and this permission notice shall be accessible to all users. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Simply said, that if you use it as part of software as a service (SaaS) you have to provide the copyright notice in an about, legal info, credits or some similar kind of page or info box.
91
92
CHAPTER
FIFTEEN
93
94
b
backends, 78
c
cubes.common, 83
s
server, 83
95
96
INDEX
create_conformed_rollup() (cubes.backends.sql.workspace.SQLStarWorkspace add_cube() (cubes.Model method), 64 method), 79 add_dimension() (cubes.Cube method), 65 create_conformed_rollups() add_dimension() (cubes.Model method), 64 (cubes.backends.sql.workspace.SQLStarWorkspace aggregate() (cubes.AggregationBrowser method), 69 method), 79 aggregate() (cubes.backends.sql.star.SnowakeBrowser create_cube() (in module cubes), 62 method), 80 create_cube_aggregate() aggregation_statement() (cubes.backends.sql.workspace.SQLStarWorkspace (cubes.backends.sql.star.QueryContext method), 79 method), 81 create_denormalized_view() AggregationBrowser (class in cubes), 69 (cubes.backends.sql.workspace.SQLStarWorkspace AggregationResult (class in cubes), 71 method), 79 aggregations_for_measure() create_dimension() (in module cubes), 62 (cubes.backends.sql.star.QueryContext create_level() (in module cubes), 63 method), 81 create_logger() (in module cubes.common), 83 all_attributes() (cubes.Dimension method), 66 create_model() (in module cubes), 61 all_attributes() (cubes.Hierarchy method), 67 create_workspace() (in module all_attributes() (cubes.Mapper method), 76 cubes.backends.sql.workspace), 78 Attribute (class in cubes), 68 cross_table() (cubes.AggregationResult method), 72 attribute() (cubes.Dimension method), 66 Cube (class in cubes), 64 attribute() (cubes.Level method), 68 cube() (cubes.Model method), 64 attribute() (cubes.Mapper method), 76 cubes.common (module), 83 attribute_list() (in module cubes), 68 cut_details() (cubes.AggregationBrowser method), 70 B cut_for_dimension() (cubes.Cell method), 72 cuts_from_string() (in module cubes), 74 backends (module), 78 boundary_condition() (cubes.backends.sql.star.QueryContext D method), 81 default_hierarchy (cubes.Dimension attribute), 66 browser() (cubes.backends.sql.workspace.SQLStarWorkspace denormalized_statement() method), 78 (cubes.backends.sql.star.QueryContext browser_for_cube() (cubes.backends.sql.workspace.SQLStarWorkspace method), 82 method), 79 DenormalizedMapper (class in cubes), 78 C Dimension (class in cubes), 65 dimension() (cubes.Cube method), 65 cached() (cubes.AggregationResult method), 72 dimension() (cubes.Model method), 64 Cell (class in cubes), 72 (cubes.AggregationBrowser cell_details() (cubes.AggregationBrowser method), 70 dimension_object() method), 70 collect_subclasses() (in module cubes.common), 84 column() (cubes.backends.sql.star.QueryContext drilldown() (cubes.Cell method), 72 method), 81 columns() (cubes.backends.sql.star.QueryContext F method), 81 fact() (cubes.AggregationBrowser method), 70 condition_for_cell() (cubes.backends.sql.star.QueryContext fact() (cubes.backends.sql.star.SnowakeBrowser method), 82 method), 80 condition_for_point() (cubes.backends.sql.star.QueryContext fact_statement() (cubes.backends.sql.star.QueryContext method), 82 method), 82 97
facts() (cubes.AggregationBrowser method), 70 facts() (cubes.backends.sql.star.SnowakeBrowser method), 80 features (cubes.AggregationBrowser attribute), 70 full_name() (cubes.Attribute method), 68
G
get_localizable_attributes() (in cubes.common), 84 get_logger() (in module cubes.common), 83 module
Mapper (class in cubes), 75 measure() (cubes.Cube method), 65 MissingPackage (class in cubes.common), 83 Model (class in cubes), 63 ModelError, 68 ModelIncosistencyError, 68 multi_slice() (cubes.Cell method), 73
N
next_level() (cubes.Hierarchy method), 67 NoSuchAttributeError, 68 NoSuchDimensionError, 68
H
has_details (cubes.Dimension attribute), 66 has_details (cubes.Level attribute), 68 Hierarchy (class in cubes), 66 hierarchy() (cubes.Dimension method), 66
O
order_column() (in module cubes.backends.sql.star), 82 ordered_statement() (in module cubes.backends.sql.star), 82
I
IgnoringDictionary (class in cubes.common), 83 is_base() (cubes.Cell method), 73 is_at (cubes.Dimension attribute), 66 is_last() (cubes.Hierarchy method), 67 is_valid() (cubes.Model method), 64
paginated_statement() (in module cubes.backends.sql.star), 82 path_from_string() (in module cubes), 75 path_is_base() (cubes.Hierarchy method), 67 physical() (cubes.DenormalizedMapper method), 78 J physical() (cubes.Mapper method), 76 join_expression() (cubes.backends.sql.star.QueryContext physical() (cubes.SnowakeMapper method), 77 method), 82 point_cut_for_dimension() (cubes.Cell method), 73 join_expression_for_attributes() point_slice() (cubes.Cell method), 73 (cubes.backends.sql.star.QueryContext PointCut (class in cubes), 74 method), 82 previous_level() (cubes.Hierarchy method), 67
K
key_attributes() (cubes.Dimension method), 66 key_attributes() (cubes.Hierarchy method), 67
Q
QueryContext (class in cubes.backends.sql.star), 81
L
Level (class in cubes), 67 level() (cubes.Dimension method), 66 level_depth() (cubes.PointCut method), 74 level_depth() (cubes.RangeCut method), 74 level_depth() (cubes.SetCut method), 74 level_depths() (cubes.Cell method), 73 level_index() (cubes.Hierarchy method), 67 level_names (cubes.Dimension attribute), 66 levels (cubes.Dimension attribute), 66 levels_for_depth() (cubes.Hierarchy method), 67 levels_for_path() (cubes.Hierarchy method), 67 levels_from_drilldown() (in module cubes), 75 load_model() (in module cubes), 61 localizable_dictionary() (cubes.Model method), 64 localize() (cubes.Model method), 64 localize_attributes() (in module cubes.common), 83 localize_common() (in module cubes.common), 83 localized_model() (cubes.server.Slicer method), 83 logical() (cubes.Mapper method), 76
R
range_condition() (cubes.backends.sql.star.QueryContext method), 82 RangeCut (class in cubes), 74 ref() (cubes.Attribute method), 68 relevant_joins() (cubes.DenormalizedMapper method), 78 relevant_joins() (cubes.Mapper method), 76 relevant_joins() (cubes.SnowakeMapper method), 77 remove_cube() (cubes.Model method), 64 remove_dimension() (cubes.Cube method), 65 remove_dimension() (cubes.Model method), 64 report() (cubes.AggregationBrowser method), 70 rollup() (cubes.Cell method), 73 rollup() (cubes.Hierarchy method), 67 rollup_dim() (cubes.Cell method), 73 run_server() (in module cubes.server), 83
S
server (module), 83 SetCut (class in cubes), 74 setnoempty() (cubes.common.IgnoringDictionary method), 83
M
map_attributes() (cubes.Mapper method), 76 98
Index
simple_model() (in module cubes), 63 slice() (cubes.Cell method), 73 Slicer (class in cubes.server), 83 SlicerBrowser (class in cubes.backends.slicer), 82 SnowakeBrowser (class in cubes.backends.sql.star), 80 SnowakeMapper (class in cubes), 76 split_logical() (cubes.Mapper method), 76 SQLStarWorkspace (class in cubes.backends.sql.workspace), 78 string_from_cuts() (in module cubes), 75 string_from_path() (in module cubes), 75
T
table() (cubes.backends.sql.star.QueryContext method), 82 table_map() (cubes.SnowakeMapper method), 77 table_rows() (cubes.AggregationResult method), 72 to_dict() (cubes.AggregationResult method), 72 to_dict() (cubes.Cell method), 74 to_dict() (cubes.Cube method), 65 to_dict() (cubes.Dimension method), 66 to_dict() (cubes.Hierarchy method), 67 to_dict() (cubes.Level method), 68 to_dict() (cubes.Model method), 64 to_dict() (cubes.PointCut method), 74 to_dict() (cubes.RangeCut method), 74 to_dict() (cubes.SetCut method), 74 to_str() (cubes.Cell method), 74
V
validate() (cubes.backends.sql.star.SnowakeBrowser method), 80 validate() (cubes.Cube method), 65 validate() (cubes.Dimension method), 66 validate() (cubes.Model method), 64 validate_model() (cubes.backends.sql.workspace.SQLStarWorkspace method), 80 values() (cubes.AggregationBrowser method), 71 values() (cubes.backends.sql.star.SnowakeBrowser method), 81
Index
99