0% found this document useful (0 votes)
20 views14 pages

Business Intelligence and Databases - Kopie

The document provides an overview of business intelligence and databases. It discusses how business intelligence uses data to help business users make better decisions through applications, technologies, and processes for gathering, storing, accessing, and analyzing data. The four main components of business intelligence are business analytics, data warehousing, business performance management, and user interfaces. Databases are structured collections of data that can be queried and managed through database management systems. Relational and NoSQL databases are discussed.

Uploaded by

f.a.redeker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views14 pages

Business Intelligence and Databases - Kopie

The document provides an overview of business intelligence and databases. It discusses how business intelligence uses data to help business users make better decisions through applications, technologies, and processes for gathering, storing, accessing, and analyzing data. The four main components of business intelligence are business analytics, data warehousing, business performance management, and user interfaces. Databases are structured collections of data that can be queried and managed through database management systems. Relational and NoSQL databases are discussed.

Uploaded by

f.a.redeker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Business Intelligence and Databases

Lecture One: introduction to BIDB


 Business intelligence = broad category of applications, technologies, databases,
methodologies, and processes for gathering, storing, accessing, and analyzing data to
help business users make better decisions
 BI uses data to optimize decision making
o Tracking business problems
o Tracking internal processes
o Identifying trends
o Discovering new markets and revenues
 BI: Helps with analysis of data to derive actionable information
 BI: Present actionable information to executive management to support (better)
decision making
 4 main components BI:
o Business analytics
 A collection of tools and techniques for manipulating, mining, and
analyzing data from the data warehouse
 Data mining: is the process of searching for unknown patterns
and relationships.
 Process mining: supports the analysis of operational processes
based on event logs (sequence of events).
 Artificial intelligence and machine learning techniques can
support prediction tasks
 BA outputs static/dynamic reports and queries combines
variety of tools, methodologies, and technology
 Aim: to produce output that gives insight on business performance
and improvement needs
 Methods:
 Data processing and data mining
 Visual analytics
 Descriptive, predictive, and prescriptive analysis
 Reporting
 Tools:
 Storing and querying
 Visual representation
 Performance and benchmarking
 Analytics derives information from data
 Outputs:
 Descriptive analytics: insight from historical data with
dashboards and scorecards
 Predictive analytics: predicting future values with statistical
and machine learning techniques
 Prescriptive analytics: recommends decisions using
optimization, simulation, etc.
o Data warehouse
 Cornerstone of BI systems
 Contains source data from Enterprise Systems
 Originally only included historical data that were organized and
summarized
 Nowadays also include current data which is used for real-time
decision making
o Business performance management
 Focused on the management, monitoring, and comparing of Key
Performance Indicators (KPI)
 Traditional BI techniques support bottom-up decision-making (from
data to strategy)
 Business Performance Management techniques focus on top-down
decision-making (from strategy to data)
 Balanced Scorecard is a commonly used technique
o User interface
 Provides a comprehensive visual view of the KPIs
 Dashboards are commonly used interfaces
 Contain graphs that show actual performance compared to
desired performance
 Many tool vendors that provide BI functionality ranging from complex
to simplistic
 Strategic BI applications
o Help assess progress in achieving long-term, enterprise-wide goals, such as
increased revenue, reduced costs, improved customer retention etc.
o Strategic dashboards may include KPIs relating to growth, global aspects and
trends
o Balanced Scorecard is often used
o Mostly intended for executives
 Tactical BI applications
o Help analyse short-term initiatives within specific business units, such as
marketing, sales, purchasing, etc.
o Tactical dashboards measure progress in accordance to each strategic
initiative, against a predefined goal (e.g.: budget)
o At this level it is possible to drill-down to find out why certain targets were
not achieved
o Mostly intended for managers
 Operational BI applications
o Help manage daily business operations
o Process-centric solutions for monitoring and optimizing certain business
processes (e.g.: call center operations, inventory management, etc.)
o Operational dashboards contain real-time charts and data is presented with
strong analytical functionality to support performing root-cause analysis
o Mostly intended for department leaders
Lecture 2: introduction to databases
 A database is an organized collection of structured information
 A database management system (DBMS) is a software for creating and managing
databases
 Data independence
o External level: How does the user see the data?
o Logical level: How is the data structured?
o Physical level: How is the data stored?
 Data integrity = a property whereby data is guaranteed to be accurate, complete,
and consistent over its whole life cycle
o Examples of data integrity checks:
 A birthdate field may only contain date values (domain integrity)
 References to other data is kept up-to-date (referential integrity)
 A (database) transaction is a single logical unit of work, that can consist of multiple
Operations
o Transactions must be ACID:
 Atomic: Each transaction is either completely executed or everything
stays unchanged
 Consistent: Each transaction brings the database from one valid state
to a new one
 Isolated: Concurrent transactions do not influence each other
 Durable: Once a transaction is completed, the change is permanent,
even in case of a
 system failure (usually means the result is recorded in non-volatile
memory)
 In a relational database, data is stored in a set of tables that can be connected with
each other.
o The number of attributes (columns) in a relation (table) is called the arity.
o The number of tuples (rows) in a relation (table) is called cardinality.
o Every relation has a primary key, which distinctly identifies each element in
the relation. The primary key can consist of multiple attributes.
o Connections between tuples are established by referencing the primary key
of another tuple. If a tuple contains the primary key (PK) of another relation,
this key is called a foreign key (FK).
 NoSQL databases (non-SQL, non-relational, not only SQL) are databases that use
non-relational data models
o In comparison to relational databases, they usually are more scalable
o Schema less or at least only use a weak schema
o provide easy data replication
o Most systems favor performance over consistency (no ACID transactions)
Relational NoSQL
Relational databases are more efficient for NoSQL databases are more efficient for
frequent but small transactions and mostly large numbers of read and write requests
read-transactions with large payload
Relational databases offer stronger NoSQL databases are more scalable and
consistency; there is middleware for NoSQL can more easily offer redundancy
systems
to support ACID transaction (e.g., CloudTPS)

 Stages database design


o Requirement analysis: the process of identifying the needs
and expectations of stakeholders towards a software project
o Conceptual design: ask questions like:
 What objects, individuals, or concepts are relevant for the
application? (~ entities)
 Which relationships exist among these entities?
 What information about these entities and relationships needs to be
stored? (~ attributes)
 Which rules govern these entities and relations? And which
constraints can be derived from them.
o Logical design
o Physical design
 Schema, which describes the structure, and
 Instance, which describes the content.
 Normalization is a process to optimize the organization of data in a database
o by removing redundancy
o in order to avoid inconsistencies.
o The result of the normalization process is a database in a so-called Normal
Form
o Depending on the source, five to seven different are differentiated:
o Zeroth Normal Form (0NF)
 Data is raw and not normalized, e.g., contains composite values
o First Normal Form (1NF)
 A relation is in 1NF if it contains only atomic values
 I.e., no multi-value or composite attributes
o Second Normal Form (2NF)
 A relation is in 2NF if it is in 1NF and
 All non-key attributes are fully functional dependent on all candidate
keys
 An attribute B is functional dependent on an attribute A (A → B), iff
each A is associated with exactly one B
 Fully means: All non-key attributes must depend on all attributes of all
candidate keys
o Third Normal Form (3NF)
 A relation is in 3NF if it is in 2NF and
 For each functional dependency A → B it must be true that either A is
candidate key or B is part of a candidate key (or both)
 That means, we want to remove transitive dependencies on candidate
keys: A → C because A → B → C
o Boyce Codd Normal Form (BCNF)
 A relation is in BCNF (3.5NF) if it is in 3NF and
 For each functional dependency A → B it must be true that either A is
candidate key or B is part of a candidate key (or both)
 Example: Database of athletes and their teams (let’s assume each
team only competes in one sport, per sport an athlete is only part of
one team, and there are no two athletes with the same name)
o Fourth Normal Form (4NF)
 A relation is in 4NF if it is in BCNF and
 Does not contain any multivalued dependencies on key attributes
 f we have three attributes A, B, and C, we say B has a multivalued
dependency on A, if for a single value of A, multiple values of B can
exist, independent from C.
o Fifth Normal Form (5NF)
 A relation is in 5NF if it is in 4NF and
 it cannot be further split without losing information.
 OLAP vs. OLTP

 DW development approaches
o Inmon Model:
 EDW approach
 Top-down
 Starts with an ERD
o Kimball Model:
 Data Mart approach
 Bottom-up
 “Plan big, build small” with one Data Mart built at a time
 Representation of Data in DW
o Dimensional modelling to support high-volume query access
o Star schema: the most commonly used and the simplest style of dimensional
modeling
 Contains a fact table surrounded by and connected to several
dimension tables
 Fact table contains the data we want to include in reports, aggregated
based on values from dimension tables
 Dimension tables contain classification and aggregation information
about the values in the fact table
o Snowflake schema: an extension of star schema where the diagram
resembles a snowflake in shape
 Dimension tables branch out to other dimension tables
 Analysis of data in DW
o Online analytical processing (OLAP)
 Designed for effective and efficient ad-hoc analysis
 OLAP Operations:
 Drill-down: opposite of roll up
 Roll-up: Climbing the hierarchy or reducing the dimensions
from medals per city to medals per country
 Slice: Selection of one dimension in the cube
 Dice: Selection of two or more dimensions
 Pivot: Rotates the data axis to view it from different
perspectives
Lecture 3: Introduction SQL
 SQL: structured query language
 in a relational database, data is stored in a set of tables that can be connected with
each other.
 Every SQL statement ends with a semicolon. Don’t forget it.
 SQL is case-insensitive. We usually capitalize keywords of the language for better
readability, but “cReaTe DaTabAsE comPanY;” would work just as well.
 We use the same color coding as most editors to highlight keywords, but your DBMS
does not care about colors.
 Data types and their names may differ between RDBMS
 However, the most important basic types work across all systems, some of them are:
o Numbers:
 INT: Integer [-2,147,483,648 to 2,147,483,647]
 TINYINT: Integer [-128 to 127]
 FLOAT: Floating-point [-3,40282347 x 1038 to 3,40282347 x 1038]
o Text:
 VARCHAR(N): String with maximum length N [0-65,535]
 CHAR(N): String with exact length N [0-255]
o Dates:
 DATE: YYYY-MM-DD
 DATETIME: YYYY-MM-DD hh:mm:ss
 Declaring a column as a primary key puts a constraint on this column, it limits which
 data can be inserted and how it can be manipulated, specifically:
o it cannot be null and
o must be unique.
 A foreign key prevents you from (accidently) changing a primary key or deleting an
entry with a primary key that is referenced as a foreign key in your database.
 There is a number of additional constraints in SQL:
o NOT NULL
 It is a universal (w.r.t. data type) symbol for an “empty” field
o UNIQUE
o DEFAULT value
o CHECK (condition)
Lecture 4: Data Warehousing
 BI systems rely on a DW as the information source for creating insight to support
managerial decisions
 DW is a collection of integrated, subject-oriented databases designed to support
decision-making functions, where each unit of data is cleansed, in a standardized
format, non-volatile and relevant to some moment in time
 Characteristics DW:
o Subject-oriented: organized by subject (e.g.: sales, products) and contains
only relevant information for decision-making
o Integrated: places information from different sources in a consistent format
while dealing with naming conflicts and discrepancies
o Time variant (time series): maintains historical data which can be used for
forecasting and comparisons (must contain date/time)
o Non-volatile: once data is entered in a DW it cannot be changed (any change
is recorded as new data)
o Web based
o Relational/Multidimensional
o Client/Server architectures
 needs an internet connection and a web browser
 used to manage the inflow and outflow of data between client and
server
o Real-time
o Include Metadata
 A Data Cube allows data to be viewed in multiple dimensions
 Metadata = data about data
o in a data warehouse, metadata describe the contents of a data warehouse
and the manner of its acquisition and use
o types of metadata
 Descriptive metadata
 Adds information about who created a resource
 What the resources is about, what it includes
 Structural metadata
 Includes additional data about the way data elements are
organized
 Their relationships and the structure they exist in
 Administrative metadata
 Provides information about the origin of resources
 Their types and access rights
 Types of DW
o Enterprise Data Warehouse
 Large-scale DW used across the organization for decision-support
 Integrates data from multiple sources into a standard format
o Data Mart
 Small and stores only relevant information for a specific subject or
department
 Dependent data mart: subset of data directly from the EDW
 Independent data mart: small warehouse with data not from the EDW
o Operational Data Store
 Intermediary staging area for a DW
 Can be updated throughout the course of business operations, unlike
the static nature of EDW
 Used for short-term decision-making since it stores only very recent
data
 Datawarehouse framework:
o Data sources: independent systems or external providers
o ETL: process to extract transform and load data into a DW
o API/Middleware tools: enable access to the DW for SQL queries, analysis,
dashboarding and reporting
 Which architecture is the best?
o Which Database Management System should be used?
o Will parallel processing/partitioning be needed (scalability/speed)?
o Will migration tools be used to load the DW?
o What tools will be used to support data retrieval and analysis?
 Extract Transform Load (ETL) process
o Extraction: reading data from one or more databases
o Transformation: converting extracted data into the form it needs to be in the
DW
o Load: putting the data into the DW
o It is key for integrating data from multiple sources into the DW
o In case of low-quality data (incomplete, not relevant, inconsistent, etc.), data
preprocessing, such as formatting, fixing, filtering (cleansing) is required
o Criteria for selecting ETL tools
 Ability to read from and write to an unlimited number of data
sources/architectures
 Automatic capturing and delivery of metadata
 A history of conforming to open standards
 An easy-to-use interface for the developer and the functional user
Lecture 5 BPM
 Business Performance Management (BPM) = An integrated set of processes,
methodologies, metrics and applications designed to drive the overall financial and
operational performance of an organization
 BPM helps organizations with:
o Translating strategies and objectives into plans
o Monitoring performance against strategic plans of the company
o Analyzing variations between actual and planned results
o Adjusting objectives and actions in response to the performed analysis
 BPM components:
o A set of integrated management and analytic processes, supported by
technology
o Tools for businesses to define strategic goals and the associated key
performance indicators (KPIs) –e.g.: Balanced Scorecard
o Performance Measurement System including methods and tools for
monitoring KPIs –e.g.: BI dashboards
 Closed loop process = Links strategy to execution to optimize business performance
o Step 1: strategize – where do we want to go?
 Strategic plan: Is a map that details a course of action for moving the
organization from its current state to its future vision
o Step 2: plan – How do we get there?
 Operational plan: Translates the strategic objectives and goals into a
set of well-defined tactics, resource requirements, and expected
results for a future time period (e.g. for a year interval)
o Step 3: monitor – How are we doing
 A comprehensive framework for monitoring performance should
address two key issues:
 What to monitor (Goals, KPI, etc.)
 How to monitor
o Step 4: act/adjust – what do we need to do differently?
 Success (or mere survival) depends on reacting on the findings:
 Creating new products
 Entering new markets
 Acquiring new customers/businesses
 Streamlining processes
 How toact/adjust
 Find facts about the problems/bottlenecks using performance
measurement techniques
 Analyze the causes of bottlenecks
o Assignment of resources,
o Completion times,
o Resource utilization, etc.
 Set priorities and assign a problem owner or adjust the
strategy
 Key performance indicator (KPI)Represents a metric that measures performance
against a goal
 Outcome Metrics: KPIs focused on financial performance
 Operational Metrics: KPIs focused on measuring operational activities and
performance
 Characteristics of KPIs
o Embody a strategic objective
o Measure performance against a target
o Targets have performance ranges
o Ranges are encoded in software for visual display (green, red, yellow, etc.)
o Targets are assigned a completion time frame
o Targets are measured against a baseline or benchmark
 KPI typology
o Outcome KPIs: measure performance in outputs
o Driver KPIs: also called leading KPIs: measure activities that have significant
impact on outcome KPIs
o Operational KPIs: focus on operational areas dealing with operational
activities and performance
 Operational metric KPI examples:
o Customer performance
 e.g. customer satisfaction, customer retention, speed of issue
resolution, response time
o Service performance
 service renewal rate, return rates, response time
o Process performance
 completion time, throughput, defect/number of products
o Sales plan/forecast
 order-to-fulfilment ratio, total closed contracts, etc.
 Good KPIs should:
o Be focused on key factors
o Balance the needs of all stakeholders (shareholders, employees, partners,
suppliers)
o Have realistic targets,
o Be measurable and time-framed
 How to derive/formulate KPIs- SMART
o Specific –KPIs should measure the areas that have the greatest impact on
your business performance
o Measurable –Ensure that your KPIs can be identified and tracked
o Achievable –The KPIs should be realistic
o Relevant –KPIs should link to overall strategic goals and objectives of a
business
o Time-framed –KPIs should have relevant data (e.g. timestamp, and/or be
captured systematically) that enables measuring them for specific time
intervals relevant for a business goal.
 BPM methodologies
o Balanced Scorecard (BSC): Performance measurement and management
methodology that helps translate an organization’s financial, customer,
internal process, and learning and growth objectives and targets into a set of
actionable initiatives
o Six Sigma: Performance management methodology aimed at reducing the
number of defects in a business process to as close to zero defects per million
opportunities (DPMO) as possible
o DMAIC performance model
 Define the project goals and customer (internal and external)
deliverables
 Measure the process to determine current performance
 Analyze and determine the root cause(s) of the defects
 Improve the process by eliminating defects
 Control future process performance
 Visualization per aim
o Trend: Column or Line
o Comparison: Area, Bar, Bullet, Column, Line, or Scatter
o Relationship: Line or Scatter
o Distribution: Bar, Boxplot, or Column
o Composition: Donut, Pie, Stacked Bar, or Stacked Column
o Process / sequence: process discovery maps, Line / Dotted chart
 Dashboard vs. Scorecard
o Dashboard:
 Monitor operational performance
 Free form (any measures)
o Scorecard:
 Chart progress against strategic and tactical goals and targets
 Predetermined measures
Lecture 6 data mining
 Queries are precise requests that search the relational database, fetch information,
and display it
 Processing of data: Collect, process, store, retrieve and distribute information
 Data mining = Discovering or “mining” knowledge from large amounts of data
o Data mining seeks to identify four major patterns:
 Predictions: future occurrences of certain events
 Classification learns patterns from past data in order to place
new instances (with unknown labels) into their respective
group
 Decision trees recursively divides a training set until each
division consists entirely or primarily of examples from once
class
 Clusters: grouping of things based on known features
 Association: commonly co-occurrence of things
 Association rule mining aims to find interesting relationships
between variables (items) in large databases
 Apriori algorithm finds subsets that are common to at least a
minimum number of item sets
 Sequential relationships: time-ordered events
 Supervised learning: algorithms require a training data set that includes both
independent and dependent variables
 Unsupervised learning: algorithms only require independent variables
 Hypothesis-driven data mining: starts with proposition made by the user, who seeks
to validate its truthfulness
 Discovery-driven data mining: finds patterns, associations and other relationships
that are hidden in the dataset

Lecture 8: (guest lecture) external data


 External data: Every data source that‘s outside of your organization is considered
external data or third-party data.
 Open data:
o Free to use
o Sources: government, universities, or NGO’s
 Public data:
o Limited use
o Source: companies, individuals, and others
 Commercial data:
o Commercial license
o Source: Companies, NGO ‘s or Government
Lecture 9: artificial intelligence
 The ability of a digital computer or computer-controlled robot to perform tasks
commonly associated with intelligent beings
 Rule-based:
o Knowledge from human experts codified in machine-readable format
o If... then...
o Underlying rules must be known
o Decision are transparent and replicable
o Full control
o Only as “smart” as the rule creator
o Needs structured input
 Machine Learning:
o Statistical analysis of existing data to make predictions about new data
o Supervised or unsupvervised (more about this later)
o A dataset is (usually) needed
o Transparency and replicability are (still) problematic
o Can discover previously unknown correlations
o Can process different inputs
 Supervised ML
o Classification:
 The sample space is discrete (countably infinite, i.e., not continues)
and known before, e.g.:
 True / false
 win / loose / draw
 dog / cat / mouse
o Regression
 The sample space is continuous, e.g.:
 How warm will it be tomorrow?
 A flat of 100m² costs 1.000€, 10m² cost 150€, how much is
50,27m²
 Special: New data can be generated, that has not been seen before
 Unsupervised learning
o K-mean clustering
 Segmentation of all input data into k clusters/groups (k must be given)
o Dimensionality reduction
 Natural Language Processing (NLP) = the machine processing of natural language.
 Stemming: Reduces a declined wordform to its stem (e.g., bor-ing -> bor, bor-ed ->
bor)
 Lemmatization: Transforms a word into its lemma (i.e., its “entry in the lexicon”)
 Correlation does not imply causation

You might also like