Key Notes
Key Notes
goals
Previously, you learned about the four different types of stakeholders you might
encounter as a business intelligence professional:
Project sponsor: A person who provides support and resources for a project and is
accountable for enabling its success.
Developer: A person who uses programming languages to create, execute, test, and
troubleshoot software applications. This includes application software developers and
systems software developers.
Systems analyst: A person who identifies ways to design, implement, and advance
information systems in order to ensure that they help make it possible to achieve
business goals.
Business stakeholders: Business stakeholders can include one or more of the
following groups of people:
The executive team: The executive team provides strategic and operational
leadership to the company. They set goals, develop strategy, and make sure that
strategy is executed effectively. The executive team might include vice presidents, the
chief marketing officer, and senior-level professionals who help plan and direct the
company’s work.
The customer-facing team: The customer-facing team includes anyone in an
organization who has some level of interaction with customers and potential customers.
Typically they compile information, set expectations, and communicate customer
feedback to other parts of the internal organization.
The data science team: The data science team explores the data that’s already out
there and finds patterns and insights that data scientists can use to uncover future
trends with machine learning. This includes data analysts, data scientists, and data
engineers.
The business
In this scenario, you are a BI professional working with an e-book retail company. The
customer-facing team is interested in using customer data collected from the
company’s e-reading app in order to better understand user reading habits, then
optimize the app accordingly. They have asked you to create a system that will ingest
customer data about purchases and reading time on the app so that the data is
accessible to their analysts. But before you can get started, you need to understand all
of your stakeholders’ needs and goals to help them achieve them.
Developers
The developers are the people who use programming languages to create, execute,
test, and troubleshoot software applications. This includes application software
developers and systems software developers. If your new BI workflow includes software
applications and tools, or you are going to need to create new tools, then you’ll need to
collaborate with the developers. Their goal is to create and manage your business’s
software tools, so they need to understand what tools you plan to use and what you
need those tools to do. For this example, the developers you work with will be the ones
responsible for managing the data captured on the e-reading app.
Systems analyst
The systems analyst identifies ways to design, implement, and advance
information systems in order to ensure that they help make it possible to
achieve business goals. Their primary goal is to understand how the business
is using its computer hardware and software, cloud services, and related
technologies, then they figure out how to improve these tools. So the system
analyst will be ensuring that the data captured by the developers can be
accessed internally as raw data.
Business stakeholders
In addition to the customer-facing team, who is the project sponsor for this
project, there may also be other business stakeholders for this project such
as project managers, senior-level professionals, and other executives. These
stakeholders are interested in guiding business strategy for the entire
business; their goal is to continue to improve business processes, increase
revenue, and reach company goals. So your work may even reach the chief
technology officer! These are generally people who need bigger-picture
insights that will help them make larger scale decisions as opposed to detail-
oriented insights about software tools or data systems.
Conclusion
Built In: Built In is an online community specifically designed to connect startups and tech
companies with potential employees. This is an excellent resource for finding jobs specifically in the
tech industry, including BI. Built In also has hubs in some U.S. cities and resources for finding
remote positions.
Crunchboard: Crunchboard is a job board hosted by TechCrunch. TechCrunch is also the creator
of CrunchBase, an open database with information about start-up companies in the tech industry.
This is another valuable resource for people looking for jobs specifically in tech.
Dice: Dice is a career marketplace specifically focused on tech professionals in the United States. It
provides insights and information for people on the job search.
DiversityJobs: DiversityJobs is a resource that hosts a job board, career and resume resources,
and community events intended to help underrepresented job seekers with employers currently
hiring. This resource is not tech specific and encompasses a lot of industries.
Diversify Tech: Diversify Tech is a newsletter that is designed to connect underrepresented
people with opportunities in the tech industry, including jobs. Their job board includes positions from
entry-level to senior positions with companies committed to diversity and inclusion in the field.
LinkedIn: You’ve learned about LinkedIn as a great way to start networking and building your
online presence as a BI professional. LinkedIn also has a job board with postings from potential
employers. It has job postings from across the world in all sorts of industries, so you’ll need to
commit some time to finding the right postings for you, but this is a great place to begin your job
search.
You can also search for more specific job boards depending on your needs as a job seeker and your
career interests!
Considering mentorship
Mentors are professionals who share knowledge, skills, and experiences to help you grow and
develop. These people can come in many different forms at different points in your career. They can
be advisors, sounding boards, honest critics, resources, or all of those things. You can even have
multiple mentors to gain more diverse perspectives!
Decide what you are searching for in a mentor. Think about your strengths and
weaknesses, what challenges you have encountered, and how you would like to grow as a BI
professional. Share these ideas with potential mentors who might have had similar experiences and
have guidance to share.
Consider common ground. Often you can find great mentorships with people who share
interests and backgrounds with you. This could include someone who had a similar career path or
even someone from your hometown.
Respect their time. Often, mentors are busy! Make sure the person you are asking to mentor
you has time to support your growth. It’s also important for you to put in the effort necessary to
maintain the relationship and stay connected with them.
Note that mentors don't have to be directly related to BI. It depends on what you want to focus on
with each individual. Mentors can be friends of friends, more experienced coworkers, former
colleagues, or even teammates. For example, if you find a family friend who has a lot of experience
in their own non-BI field, but shares a similar background as you and understands what you're trying
to achieve, that person may become an invaluable mentor to you. Or, you might fortuitously meet
someone at a casual work outing with whom you develop an instant rapport. Again, even if they are
not in the BI field, they may be able to connect you to someone in their company or network who is
in BI.
One great way to reach out is with a friendly email or a message on a professional networking
website. Describe your career goals, explain how you think those goals align with their own
experiences, and talk about something you admire about them professionally. Then you can suggest
a coffee chat, virtual meetup, or email exchange as a first step.
Be sure to check in with yourself. It’s important that you feel like it is a natural fit and that you’re
getting the mentorship you need. Mentor-mentee relationships are equal partnerships, so the more
honest you are with them, the more they can help you. And remember to thank them for their time
and effort!
As you get in touch with potential mentors, you might feel nervous about being a bother or taking up
too much of their time. But mentorship is meaningful for mentors too. They often genuinely want to
help you succeed and are invested in your growth. Your success brings them joy! Many mentors
enjoy recounting their experiences and sharing their successes with you, as well. And mentors often
learn a lot from their mentees. Both sides of the mentoring relationship are meaningful!
Resources
There are a lot of great resources you can use to help you connect with potential mentors. Here are
just a few:
Mentoring websites such as Score.org, MicroMentor.org, or the Mentorship app allow you to
search for mentors with specific credentials that match your needs. You can then arrange dedicated
times to meet up or talk on the phone.
Meetups, or online meetings that are usually local to your geography. Enter a search for “business
intelligence meetups near me” to check out what results you get. There is usually a posted schedule
for upcoming meetings so you can attend virtually. Find out more information about meetups
happening around the world.
Platforms including LinkedIn and Twitter. Use a search on either platform to find data science or
data analysis hashtags to follow. Post your own questions or articles to generate responses and
build connections that way.
Webinars may showcase a panel of speakers and are usually recorded for convenient access and
playback. You can see who is on a webinar panel and follow them too. Plus, a lot of webinars are
free. One interesting pick is the Tableau on Tableau webinar series. Find out how Tableau has used
Tableau in its internal departments.
Conferences present innovative ideas and topics. The cost varies, and some are pricey. But
many offer discounts to students, and some conferences like Women in Analytics aim to increase
the number of under-represented groups in the field.
Associations or societies gather members to promote a field such as business intelligence.
Many memberships are free. The Digital Analytics Association is one example. The Cape Fear
Community College Library also has a list of professional associations for analytics, business
intelligence, and business analysis.
User communities and summits offer events for users of professional tools; this is a chance
to learn from the best. Have you seen the Tableau community?
Nonprofit organizations that promote the ethical use of data science and might offer events
for the professional advancement of their members. The Data Science Association is one example.
Finding and connecting with a mentor is a great way to build your network, access career
opportunities, and learn from someone who has already experienced some of the challenges you’re
facing in your career. Whether your mentor is a senior coworker, someone you connect with on
LinkedIn, or someone from home on a similar career path, mentorship can bring you great benefits
as a BI professional.
Project sponsor: A person who has overall accountability for a project and
establishes the criteria for its success
Systems analyst: A person who identifies ways to design, implement, and advance
information systems in order to ensure that they help make it possible to achieve
business goals
A
Application programming interface (API): A set of functions and procedures
that integrate computer programs, forming a connection that enables them to
communicate
B
Business intelligence (BI): Automating processes and information channels in
order to transform relevant data into actionable insights that are easily available to
decision-makers
D
Data analysts: People who collect, transform, and organize data
Data governance professionals: People who are responsible for the formal
management of an organization’s data assets
Data maturity: The extent to which an organization is able to effectively use its data
in order to extract actionable insights
Data model: A tool for organizing data elements and how they relate to one another
Data pipeline: A series of processes that transports data from different sources to
their final destination for storage and analysis
E
ETL (extract, transform, and load): A type of data pipeline that enables data to
be gathered from source systems, converted into a useful format, and brought into a
data warehouse or other unified destination system
I
Information technology professionals: People who test, install, repair,
upgrade, and maintain hardware and software solutions
Iteration: Repeating a procedure over and over again in order to keep getting closer
to the desired result
K
Key performance indicator (KPI): A quantifiable value, closely linked to
business strategy, which is used to track progress toward a goal
P
Portfolio: A collection of materials that can be shared with potential employers
Database
technolo Description Use
gy
Provide user access to data
from a variety of source
systems
Used by BI and other data
Online Analytical Processing (OLAP) professionals to support
OLAP systems are databases that have been decision-making processes
primarily optimized for analysis. Analyze data from multiple
databases
Draw actionable insights
from data delivered to
reporting tables
Store transaction data
Used by customer-facing
employees or customer self-
Online Transaction Processing (OLTP)
service applications
systems are databases that have been
OLTP Read, write, and update
optimized for data processing instead
single rows of data
of analysis.
Act as source systems that
data pipelines can be pulled
from for analysis
Database
technolog Description Use
y
Traditional, easy to write database
organization typically used in OLTP
Row-based databases are systems
Row-based
organized by rows. Writes data very quickly
Stores all of a row’s values together
Easily optimized with indexing
Columnar Columnar databases are Newer form of database
organized by columns organization, typically used to
instead of rows. support OLAP systems
Read data more quickly and only
Database
technolog Description Use
y
pull the necessary data for analysis
Stores multiple row’s columns
together
Database
technolog Description Use
y
Easily expanded to address
increasing or larger scale
Distributed databases are business needs
collections of data systems Accessed from different
Distributed
distributed across multiple networks
physical locations. Easier to secure than a
single-homed database
system
Data stored in a single
location is easier to access
Single-homed databases are and coordinate cross-team
Single- databases where all of the data is Cuts down on data
homed stored in the same physical redundancy
location. Cheaper to maintain than
larger, more complex
systems
Database
technolog Description Use
y
The relevant data: The schema describes how the data is modeled and shaped within the
database and must encompass all of the data being described.
Names and data types for each column: Include names and data types for each column
in each table within the database.
Consistent formatting: Ensure consistent formatting across all data entries. Every entry is an
instance of the schema, so it needs to be consistent.
Unique keys: The schema must use unique keys for each entry within the database. These keys
build connections between the tables and enable users to combine relevant data from across the
entire database.
Key takeaways
As you receive more data or business needs change, databases and schemas may also need to
change. Database optimization is an iterative process, which means you may need to check the
schema multiple times throughout the database’s useful life. Use this checklist to help you ensure
that your database schema remains functional.
When considering what checks you need to ensure the quality of your data as it moves through the
pipeline, there are seven elements you should consider:
Completeness: Does the data contain all of the desired components or measures?
Consistency: Is the data compatible and in agreement across all systems?
Conformity: Does the data fit the required destination format?
Accuracy: Does the data conform to the actual entity being measured or described?
Redundancy: Is only the necessary data being moved, transformed, and stored for use?
Timeliness: Is the data current?
Integrity: Is the data accurate, complete, consistent, and trustworthy? (Integrity is influenced by
the previously mentioned qualities.)
Common issues
There are also some common issues you can protect against within your system to ensure the
incoming data doesn’t cause errors or other large-scale problems in your database system:
Check data mapping: Does the data from the source match the data in the target database?
Check for inconsistencies: Are there inconsistencies between the source system and the
target system?
Check for inaccurate data: Is the data correct and does it reflect the actual entity being
measured?
Check for duplicate data: Does this data already exist within the target system?
To address these issues and ensure your data meets all seven elements of quality testing, you can
build intermediate steps into your pipeline that check the loaded data against known parameters. For
example, to ensure the timeliness of the data, you can add a checkpoint that determines if that data
matches the current date; if the incoming data fails this check, there’s an issue upstream that needs
to be flagged. Considering these checks in your design process will ensure your pipeline delivers
quality data and needs less maintenance over time.
Key takeaways
One of the great things about BI is that it gives us the tools to automate certain processes that help
save time and resources during data analysis– building quality checks into your ETL pipeline system
is one of the ways you can do this! Making sure you are already considering the completeness,
consistency, conformity, accuracy, redundancy, integrity, and timeliness of the data as it moves from
one system to another means you and your team don’t have to check the data manually later on.
Schema-validation checklist
In this course, you have been learning about the tools business intelligence professionals use to
ensure conformity from source to destination: schema validation, data dictionaries, and data
lineages. In another reading, you already had the opportunity to explore data dictionaries and
lineages. In this reading, you are going to get a schema validation checklist you can use to guide
your own validation process.
Schema validation is a process used to ensure that the source system data schema matches the
target database data schema. This is important because if the schemas don’t align, it can cause
system failures that are hard to fix. Building schema validation into your workflow is important to
prevent these issues.
Common issues for schema validation
The keys are still valid: Primary and foreign keys build relationships between tables in
relational databases. These keys should continue to function after you have moved data from one
system into another.
The table relationships have been preserved: The keys help preserve the relationships
used to connect the tables so that keys can still be used to connect tables. It’s important to make
sure that these relationships are preserved or that they are transformed to match the target schema.
The conventions are consistent: The conventions for incoming data must be consistent with
the target database’s schema. Data from outside sources might use different conventions for naming
columns in tables– it’s important to align these before they’re added to the target system.
Using data dictionaries and lineages
You’ve already learned quite a bit about data dictionaries and lineages. As a refresher, a data
dictionary is a collection of information that describes the content, format, and structure of data
objects within a database, as well as their relationships. And a data lineage is the process of
identifying the origin of data, where it has moved throughout the system, and how it has transformed
over time. These tools are useful because they can help you identify what standards incoming data
should adhere to and track down any errors to the source.
The data dictionaries and lineages reading provided some additional information if more review is
needed.
Key takeaways
Schema validation is a useful check for ensuring that the data moving from source systems to your
target database is consistent and won’t cause any errors. Building in checks to make sure that the
keys are still valid, the table relationships have been preserved, and the conventions are consistent
before data is delivered will save you time and energy trying to fix these errors later on.
Business rules
As you have been learning, a business rule is a statement that creates a restriction on specific parts
of a database. These rules are developed according to the way an organization uses data. Also, the
rules create efficiencies, allow for important checks and balances, and also sometimes exemplify the
values of a business in action. For instance, if a company values cross-functional collaboration,
there may be rules about at least 2 representatives from two teams checking off completion on some
data set. They affect what data is collected and stored, how relationships are defined, what kind of
information the database provides, and the security of the data. In this reading, you will learn more
about the development of business rules and see an example of business rules being implemented
in a database system.
For example, let’s say the company you work for has a database that manages purchase order
requests entered by employees. Purchase orders over $1,000 dollars need manager approval. In
order to automate this process, you can impose a ruleset on the database that automatically delivers
requests over $1,000 to a reporting table pending manager approval. Other business rules that may
apply in this example are: prices must be numeric values (data type should be integer); or for a
request to exist, a reason is mandatory (table field may not be null).
In order to fulfill this business requirement, there are three rules at play in this system:
1. Order requests under $1,000 are automatically delivered to the approved product order requests
table
2. Requests over $1,000 are automatically delivered to the requests pending approval table
3. Approved requests are automatically delivered to the approved product order requests table
These rules inherently affect the shape of this database system to cater to the needs of this
particular organization.
Verifying business rules
Once the business rules have been implemented, it’s important to continue to verify that they are
functioning correctly and that data being imported into the target systems follows these rules. These
checks are important because they test that the system is doing the job it needs to, which in this
case is delivering product order requests that need approval to the right stakeholders.
Key takeaways
Business rules determine what data is collected and stored, how relationships are defined, what kind
of information the database provides, and the security of the data. These rules heavily influence how
a database is designed and how it functions after it has been set up. Understanding business rules
and why they are important is useful as a BI professional because this can help you understand how
existing database systems are functioning, design new systems according to business needs, and
maintain them to be useful in the future.
Your database systems are a key part of your ETL pipeline– these include where the data in your
pipeline comes from and where it goes. The ETL or pipeline is a user itself, making requests of the
database that it has to fulfill while managing the load of other users and transactions. So database
performance is not just key to making sure the database itself can manage your organization’s
needs– it’s also important for the automated BI tools you set up to interact with the database.
These general performance tests are really important– that’s how you know your database can
handle data requests for your organization without any problems! But when it comes to database
performance testing while considering your ETL process, there is another important check you
should make: testing the table, column, row counts, and Query Execution Plan.
Testing the row and table counts allows you to make sure that the data count matches between the
target and source databases. If there are any mismatches, that could mean that there is a potential
bug within the ETL system. A bug in the system could cause crashes or errors in the data, so
checking the number of tables, columns, and rows of the data in the destination database against
the source data can be a useful way to prevent that.
Key takeaways
As a BI professional, you need to know that your database can meet your organization’s needs.
Performance testing is a key part of the process. Not only is performance testing useful during
database building itself, but it’s also important for ensuring that your pipelines are working properly
as well. Remembering to include performance testing as a way to check your pipelines will help you
maintain the automated processes that make data accessible to users!
Scenario
Arsha, a Business Intelligence Analyst at a telecommunications company, built a data pipeline that
merges data from six sources into a single database. While building her pipeline, she incorporated
several defensive checks that ensured that the data was moved and transformed properly.
1. Customer details
2. Mobile contracts
5. Billing
6. Accounting
All of these datasets had to be harmonized and merged into one target system for business
intelligence analytics. This process required several layers of data harmonization, validation,
reconciliation, and error handling.
Pipeline layers
Pipelines can have many different stages of processing. These stages, or layers, help ensure that
the data is collected, aggregated, transformed, and staged in the most effective and efficient way.
For example, it’s important to make sure you have all the data you need in one place before you
start cleaning it to ensure that you don’t miss anything. There are usually four layers to this process:
staging, harmonization, validation, and reconciliation. After these four layers, the data is brought into
its target database and an error handling report summarizes each step of the process.
Staging layer
First, the original data is brought from the source systems and stored in the staging layer. In this
layer, Arsha ran the following defensive checks:
Compared rows to identify if extra records were created or records were lost
Arsha moved the mismatched records to the error handling report. She included each unconverted
source record, the date and time of its first processing, its last retry date and time, the layer where
the error happened, and a message describing the error. By collecting these records, Arsha was
able to find and fix the origin of the problems. She marked all of the records that moved to the next
layer as “processed.”
Harmonization layer
The harmonization layer is where data normalization routines and record enrichment are
performed. This ensures that data formatting is consistent across all the sources. To harmonize the
data, Arsha ran the following defensive checks:
Split date values to store the year, month, and day in separate columns
Applied conversion and priority rules from the source systems
When a record couldn’t be harmonized, she moved it to Error Handling. She marked all of the
records that moved to the next layer as “processed.”
Validations layer
The validations layer is where business rules are validated. As a reminder, a business rule
is a statement that creates a restriction on specific parts of a database. These rules are developed
according to the way an organization uses data. Arsha ran the following defensive checks:
Ensured that values in the “department” column were not null, since “department” is a crucial
dimension
Ensured that values in the “service type” column were within the authorized values to be processed
Again, when a record couldn’t be harmonized, she moved it to error handling. She marked all the
records that moved to the next layer as “processed.”
Reconciliation layer
The reconciliation layer is where duplicate or illegitimate records are found. Here, Arsha ran
defensive checks to find the following types of records:
Slow-changing dimensions
Historic records
Aggregations
As with the previous layers, Arsha moved the records that didn't pass the reconciliation rules to Error
Handling. After this round of defensive checks, she brought the processed records into the BI and
Analytics database (OLAP).
Completeness: An element of quality testing used to confirm that data contains all desired
components or measures
Conformity: An element of quality testing used to confirm that data fits the required destination
format
Consistency: An element of quality testing used to confirm that data is compatible and in
agreement across all systems
Data dictionary: A collection of information that describes the content, format, and structure of
data objects within a database, as well as their relationships
Data lineage: The process of identifying the origin of data, where it has moved throughout the
system, and how it has transformed over time
Data mapping: The process of matching fields from one data source to another
Integrity: An element of quality testing used to confirm that data is accurate, complete, consistent,
and trustworthy throughout its life cycle
Quality testing: The process of checking data for defects in order to prevent system failures; it
involves the seven validation elements of completeness, consistency, conformity, accuracy,
redundancy, integrity, and timeliness
Redundancy: An element of quality testing used to confirm that no more data than necessary is
moved, transformed, or stored
Schema validation: A process to ensure that the source system data schema matches the
target database data schema
B
Business intelligence (BI): Automating processes and information channels in order to
transform relevant data into actionable insights that are easily available to decision-makers
Business intelligence monitoring: Building and using hardware and software tools to easily
and rapidly analyze data and enable stakeholders to make impactful business decisions
Business intelligence stages: The sequence of stages that determine both BI business value
and organizational data maturity, which are capture, analyze, and monitor
Business intelligence strategy: The management of the people, processes, and tools used
in the business intelligence process
C
Columnar database: A database organized by columns instead of rows
Combined systems: Database systems that store and analyze data in the same place
Contention: When two or more components attempt to use a single resource in a conflicting way
D
Data analysts: People who collect, transform, and organize data
Data availability: The degree or extent to which timely and relevant information is readily
accessible and able to be put to use
Data governance professionals: People who are responsible for the formal management of
an organization’s data assets
Data lake: A database system that stores large amounts of raw data in its original format until it’s
needed
Data mart: A subject-oriented database that can be a subset of a larger data warehouse
Data maturity: The extent to which an organization is able to effectively use its data in order to
extract actionable insights
Data model: A tool for organizing data elements and how they relate to one another
Data partitioning: The process of dividing a database into distinct, logical parts in order to
improve query processing and increase manageability
Data pipeline: A series of processes that transports data from different sources to their final
destination for storage and analysis
Data visibility: The degree or extent to which information can be identified, monitored, and
integrated from disparate internal and external sources
Data warehouse: A specific type of database that consolidates data from multiple source
systems for data consistency, accuracy, and efficient access
Data warehousing specialists: People who develop processes and procedures to effectively
store and organize data
Database migration: Moving data from one source platform to another target database
Deliverable: Any product, service, or result that must be achieved in order to complete a project
Developer: A person who uses programming languages to create, execute, test, and troubleshoot
software applications
Dimension (data modeling): A piece of information that provides more detail and context
regarding a fact
Dimension table: The table where the attributes of the dimensions of a fact are stored
Design pattern: A solution that uses relevant measures and facts to create a model in support of
business needs
Dimensional model: A type of relational model that has been optimized to quickly retrieve data
from a data warehouse
E
ELT (extract, load, and transform): A type of data pipeline that enables data to be gathered
from data lakes, loaded into a unified destination system, and transformed into a useful format
ETL (extract, transform, and load): A type of data pipeline that enables data to be gathered
from source systems, converted into a useful format, and brought into a data warehouse or other
unified destination system
F
Fact: In a dimensional model, a measurement or metric
Fact table: A table that contains measurements or metrics related to a particular event
Foreign key: A field within a database table that is a primary key in another table (Refer to
primary key)
Fragmented data: Data that is broken up into many pieces that are not stored together, often as
a result of using the data frequently or creating, deleting, or modifying files
G
Google DataFlow: A serverless data-processing service that reads data from the source,
transforms it, and writes it in the destination location
I
Index: An organizational tag used to quickly locate data within a database system
Information technology professionals: People who test, install, repair, upgrade, and
maintain hardware and software solutions
Iteration: Repeating a procedure over and over again in order to keep getting closer to the
desired result
K
Key performance indicator (KPI): A quantifiable value, closely linked to business strategy,
which is used to track progress toward a goal
L
Logical data modeling: Representing different tables in the physical data model
M
Metric: A single, quantifiable data point that is used to evaluate performance
O
Object-oriented programming language: A programming language modeled around data
objects
OLAP (Online Analytical Processing) system: A tool that has been optimized for analysis
in addition to processing and can analyze data from multiple databases
OLTP (Online Transaction Processing) database: A type of database that has been
optimized for data processing instead of analysis
Optimization: Maximizing the speed and efficiency with which data is retrieved in order to ensure
high levels of database performance
P
Portfolio: A collection of materials that can be shared with potential employers
Primary key: An identifier in a database that references a column or a group of columns in which
each row uniquely identifies each record in the table (Refer to foreign key)
Project manager: A person who handles a project’s day-to-day steps, scope, schedule, budget,
and resources
Project sponsor: A person who has overall accountability for a project and establishes the
criteria for its success
Q
Query plan: A description of the steps a database system takes in order to execute a query
R
Resources: The hardware and software tools available for use in a database system
Response time: The time it takes for a database to complete a user request
Single-homed database: Database where all of the data is stored in the same physical
location
Snowflake schema: An extension of a star schema with additional dimensions and, often,
subdimensions
Star schema: A schema consisting of one fact table that references any number of dimension
tables
Systems analyst: A person who identifies ways to design, implement, and advance information
systems in order to ensure that they help make it possible to achieve business goals
Systems software developer: A person who develops applications and programs for the
backend processing systems used in organizations
T
Tactic: A method used to enable an accomplishment
Target table: The predetermined location where pipeline data is sent in order to be acted on
Throughput: The overall capability of the database’s hardware and software to process requests
Transferable skill: A capability or proficiency that can be applied from one job to another
V
Vanity metric: Data points that are intended to impress others, but are not indicative of actual
performance and, therefore, cannot reveal any meaningful business insights
W
Workload: The combination of transactions, queries, data warehousing analysis, and system
commands being processed by the database system at any given time
So far in this program, you have learned a lot about available business intelligence tools and how
you can use them as a BI professional. These tools will help you to monitor incoming data, generate
visualizations and dashboards, and empower stakeholders with access to reports. This helps
stakeholders make informed decisions.
These tools also have limitations. They may not be able to process complex demands fast enough
or generate complicated visualizations with a lot of metrics. In this reading, you’ll have an opportunity
to review strengths and limitations of business intelligence tools.
Tool performance
Coming up, you are going to learn more about what affects tool performance. But for now, there are
three elements you should keep in mind:
The scope of a project: How much time needs to be represented in the dashboard? The longer
the timeline, the more data needs to be processed and presented.
The complexity of the metrics: How many key performance indicators need to be captured
by the dashboard? The more complex your metrics, the more processing speed is affected.
The processing speed of your tool: How fast can your tools actually respond to requests?
The more requests, the more burden is placed on the system, which can slow down response times.
Comparing common tools
Tool Strengths Limitations
Can be connected with most databases and big data
Long loading times for larger dashboards
platforms
Not as flexible as other tools
Looker Studio Intuitive and simple to use
Requires additional tools for reading data
Easily connects to other Google tools
Course3
Compare scope in different contexts
In a previous video, you were introduced to the idea of scope as it relates to dashboard design. You
may have also encountered the word “scope” in terms of project scope. In the business intelligence
world, you might find the word scope being used in a variety of contexts. And, as you’ll recall from
earlier discussions of context, understanding these contexts is key. In this reading, you’ll get a side-
by-side comparison of project scope and dashboard scope and at what stage in a project you will
likely encounter these terms. This will help as you encounter scope in different contexts as a BI
professional so you know what the expectations are in every situation.
Project scope Dashboard scope
Refers to the overall project goals, Refers to the breadth of what a dashboard is tracking,
resources, deliverables, deadlines, including the amount of time and how many metrics it
collaborators, and stakeholders. includes.
Determined by team leadership
Determined by BI teams as they consider project and user
including project sponsors and
requirements.
managers.
Outlined at the very beginning of a
Outlined as part of the dashboard creation process based
project to determine the overarching
on the specific reporting needs.
aspects of the project.
Involves working with key sponsors Involves choosing KPIs, how much time should be
and stakeholders to better understand represented, and how to make important data available
and align on the entire project and its and understandable to decision makers through the
goals. dashboard.
Key takeaways
Often, as a BI professional, you will encounter language that means different things in different
contexts. By paying close attention, asking questions, and thinking critically, you can ensure that you
and your team stay on the same page. In this case, the difference between project scope and
dashboard scope is useful to understand as you communicate with stakeholders about their
expectations with the dashboard specifically, and not the entire project.
Pre-aggregating: This is the process of performing calculations on data while it is still in the
database. Pre-aggregating data will transform data into a state that’s closer to what you ultimately
need because some necessary calculations will happen before the data is sent to the data
visualization tool. The trade-off is that your pipeline will involve more steps and your dataset
uploaded into the visualization tool will be less flexible , but your users will get the information they
need more quickly.
Using JOINs: JOINS are used to combine rows from two or more tables based on a related
column. This basically merges tables together before they’re ever used in the dashboard. This can
save a lot of processing load in the actual dashboard. However, if you are trying to join a full table, it
can be more of a burden to the system. This is caused by the dimensionality of the tables. For
example, joining a one million row table with a 100 million row table will most likely generate a lot of
overhead every time the dash is updated. So it’s important to think carefully about how you use
JOINs to reduce processing load!
Filtering: Filtering is the process of showing only the data that meets a specified criteria while
hiding the rest. Filtering the data early in your dashboard’s processing means that it doesn’t have to
sort through data that isn’t actually going to be used. The tradeoff of this is that this means less data
is available for your users to view on their own.
Linking to external locations: In cases where you have data in your dashboard that you can
provide context for outside of the dashboard and which can help cut down on the processing load,
you can link out to that location for users to explore on their own.
Avoiding user-defined functions: Users making requests of your dashboard can add a lot of
load to the processing work it’s doing. Consider the kinds of questions that users might have when
designing the dashboard so that you can address them without the users themselves having to input
functions repeatedly.
Deciding between data views and tables: Tables contain actual data. Data views are the
result of a stored data query that preserves business logic and can be queried like a database. Data
views often require much less processing load because they don’t contain actual data, just a view of
the data. This makes them less flexible, so you’ll want to consider how interactive you need the data
in your dashboard to be.
Key takeaways
When you are considering dashboard design, you’ll have to consider processing speed and load and
decide how to best balance them to deliver the answers your stakeholders need as quickly as
possible. This can be challenging, but you can apply the strategies described in this reading to
reduce processing load and improve performance
Setting permissions
Tableau gives you the power to set permissions to control how users are interacting with your
dashboards and data sources; you can even use permissions to determine which users can access
which parts of a workbook. Tableau organizes permissions into projects and groups. Basically, this
means you can determine permissions depending on project needs, or by groups of users instead of
person-by-person.
You can also use permission settings to choose what metrics users can interact with, show or hide
different sheet tabs, or even add explanations of the data that can be seen by different users
depending on their specific needs.
To learn more about permissions and how to set them yourself in Tableau, you can check out the
Tableau Online Help article about permissions.
To learn more about user visibility settings and how to set them yourself in Tableau, you can check
out the Tableau Online Help article about managing user visibility.
To learn more about user visibility settings and how to set them yourself in Tableau, you can check
out the Tableau Online Help article about user filters and row-level restrictions.
Pain points can change depending on the scale of the dashboard—the larger the scale, the more
additional context is required to make the data understandable for users. However, there are three
general obstacles you might encounter:
Poorly defined use cases: The ways a business intelligence tool is actually used and
implemented by the team are referred to as “use cases.” When designing a dashboard that includes
live-monitoring, it’s important to establish how the different views will be used. For example, if you
only include one “executive view” with no way to drill down into specific information different users
might need, it leaves a lot of the interpreting work to users who may not understand or even need to
understand all of the data.
Isolated snapshots: Snapshots of the latest information can be useful for reports, but if there’s
no way to track the data’s evolution, then these snapshots have a pretty limited utility. Building in
tracking for users to explore will help them understand the snapshots better. Basically, tracking
means including insights about how the data is changing over time.
Lack of comparisons: When creating a dashboard, implementing comparisons can help users
understand whether the visualizations being presented indicate good or bad performance.
Comparisons place KPIs side-by-side in order to easily examine how similar or different they are.
Similar to adding more context to snapshots, adding comparisons is a fast way to ensure users
understand why the data in the dashboard is useful.
Key takeaways
In upcoming activities, you are going to work with stakeholders to create a dashboard designed to
monitor incoming data and provide as close to real-time updates as possible. When designing
dashboards, it’s important to keep the user in mind. Identifying potential pain points they might
encounter and addressing those problems in your design phase is a great way to guide your process
and generate more useful, accessible, and long-lasting solutions for your team.
Who is my audience?
If your intended audience is primarily high-level executives, your presentation should be kept at a
high level. Executives tend to focus on main takeaways that encourage improving, correcting, or
inventing things. Keep your presentation brief and spend most of your time on results and
recommendations, or provide a walkthrough of how they can best use the tools you’ve created. It
can be useful to create an executive summary slide that synthesizes the whole presentation in one
slide.
If your intended audience is comprised of stakeholders and managers, they might have more time to
learn about new processes, how you developed the right tools, and ask more technical questions.
Be prepared to provide more details with this audience!
If your intended audience is comprised of analysts and individual contributors, you will have the most
freedom—and perhaps the most time—to go in to more detail about the data, processes, and
results.
Support all members of your audience by making your content accessible for audience members
with diverse abilities, experiences, and backgrounds.
If the goal of your presentation is to request or recommend something at the end, like a sales pitch,
you can have each slide work toward the recommendations at the end.
If the goal of your presentation is to focus on the results of your analysis, each slide can help mark
the path to the results. Be sure to include plenty of views of the data analysis steps to demonstrate
the path you took with the data.
If the goal of your presentation is to provide a report on the data analysis, your slides should clearly
summarize your data and key findings. In this case, it is alright to simply offer the data on its own.
If the goal of your presentation is to showcase how to use new business intelligence tools, your
slides should clearly showcase what your audience needs to understand to start using the tool
themselves.
Knowing exactly what you will say throughout your presentation creates a natural flow to your story,
and helps avoid awkward pauses between topics. Slides that summarize data can also be repetitive;
if you prepare a variety of interesting talking points about the data, you can keep your audience alert
and paying attention.
Being aware of your timing. This applies to the total number of slides and the time you spend on
each slide. A good starting point is to spend 1-2 minutes on summary slides and 3-5 minutes on
slides that generate discussion.
Presenting your data efficiently. Make sure that every slide tells a unique and important part of your
data story. If a slide isn’t that unique, you might think about combining the information on that slide
with another slide.
Saving enough time for questions at the end or allowing enough time to answer questions
throughout your presentation.
Introductions (4 minutes)
Questions (5 minutes)
Service center consolidation is an important cost savings initiative. The aim of this project is to
monitor the impact of service center consolidation on customer response times for continued
improvement.
Slides typically have a logical order (beginning, middle, and end) to fully build the story.
Each slide should logically introduce the slide that follows it. Visual cues from the slides or verbal
cues from your talking points should let the audience know when you will go on to the next slide.
Remember not to use too much text on the slides. When in doubt, refer back to the second tip on
preparing talking points and limiting the text on slides.
The high-level information that people read from the slides shouldn’t be the same as the information
you provide in your talking points. There should be a nice balance between the two to tell a good
story. You don’t want to simply read or say the words on the slides.
For extra visuals on the slides, use animations. For example, you can:
Only display the visual that is relevant to what you are talking about (fade out non-relevant visuals).
Use arrows or callouts to point to a specific area of a visual that you are using.
Recall our example of a purpose statement: Service center consolidation is an important cost
savings initiative. The aim of this project is to monitor the impact of service center consolidation on
customer response times for continued improvement.
Suppose the monitoring reports showed that service center consolidation negatively impacted
customer response times. A call to action might be to examine if processes need to change to bring
customer response times back to what they were before the consolidation.
Predictive analytics
Predictive analytics is a branch of data analytics that uses historical data to identify patterns to
forecast future outcomes that can guide decision-making. The goal of predictive analytics is to
anticipate upcoming events and preemptively make decisions according to those predictions. The
predictions can focus on any point in the future—from weekly measurements to revenue predictions
for the next year.
By feeding historical data into a predictive model, stakeholders can make decisions that aren’t just
based on what has already happened in the past—they can make decisions that take into account
likely future events, too!
One example would be a hotel using predictive analytics to determine staffing needs for major
holidays. In the hospitality industry, there are many variables that might affect staffing decisions:
Being able to predict needs and schedule employees appropriately is key. So, a hotel might use a
predictive model to consider all of these factors to inform staffing decisions.
Another example could be a marketing team using predictive analysis to time their advertising
campaigns. Based on the successes of previous years, the marketing team can assess what trends
are likely to follow in the coming year and plan accordingly.
Presenting dashboards
As a BI professional, you might not be performing predictive analytics as part of your role. However,
the tools you build to monitor or update data might be helpful for data scientists on your team who
will perform this kind of analysis. By presenting dashboards effectively, you can properly
communicate to stakeholders or data scientists what the next step will be in the data pipeline, and
set them up to take the tools you create to the next level.
Key takeaways
BI professionals collaborate with a variety of different teams and experts to support the business
needs of their organization. Predictive analytics likely will not be a task you perform on the job, but
you may work with teams who do. Understanding the basics will help you consider their needs as
you design tools to support all of the teams who rely on your work!