Data Analyst
Data Analyst
Subject
Data Analyst
Vol.01
Empowering Youth!
Data Analyst
Course name:
•
•
Course Id-
•
Candidate Eligibility : Diploma/ Graduate
Course Duration: (In hours) 650
Website : www.stlacad.tech
The student guide contains modules which will help you to acquire relevant knowledge and skills
(generic and domain-specific skills) related to the ‘Data Analyst’ job role. Knowledge in each module
is easily understood and grasped by you before you move on to the next module. Comprehensible
diagrams & images from world of work have been included to bring about visual appeal and to make
the text lively and interactive for you. You can also try to create your own illustrations using your
imagination or taking the help of your trainer.
Let us now see what the sections in the modules have for you.
This section introduces you to the learning objectives and knowledge criteria covered in the module.
It also tells you what you will learn through the various topics covered in the module.
This section provides you with the knowledge to achieve relevant skill and proficiency to perform tasks
of the Data Analyst. The knowledge developed through the module will enable you to perform certain
activities related to the job market. You should read through the textual information to develop an
understanding on the various aspects of the module before you complete the exercise(s).
Section 3: Exercises
Each module has exercises, which you should practice on completion of the learning sessions of the
module. You will perform the activities in the classroom, at home or at the workplace. The activities
included in this section will help you to develop necessary knowledge, skills and attitude that you need
for becoming competent in performing the tasks at workplace. The activities should be done under
the supervision of your trainer who will guide you in completing the tasks and also provide feedback
to you for improving your performance.
The review questions included in this section will help you to check your progress. You must be able
to answer all the questions before you proceed to the next module.
A Data Analyst is a professional who can analyze data by applying various tools and techniques
and gathering the required insights.
The techniques and the tools used vary according to the organization or individual.
In brief, if you understand your business administration and have the capability to perform
exploratory Data Analysis, to gather the required information, then you are good to go with a
career in Data Analytics.
Then the next prime question that arises in our mind is how do analyze data for our greater good?
Well, that is where the term ‘Data Analytics’ comes into the picture. In this course, you will get
thorough insights into ‘What is Data Analytics?’, “How can You Be a Data Analyst?”, “How Much You
Can Earn as Data Analyst?” and much more.
Helps businesses monitor, manage, and collect performance measures to improve decision-
making across the organization.
Improves business operations.
Improves consumer engagement, corporate performance, and boost revenue.
Helps make decisions based on verifiable, data-driven proofs.
Gather Hidden Insights: Get all the hidden insights from the data, that are gathered and then
analyzed according to the business requirements.
Generate Reports: Generated Reports from the data are passed on to the respective teams and
individuals to deal with further actions for a high rise in business & have a competitive edge.
Perform Market Analysis: To understand the market sentiments, strengths and weaknesses of
competitors etc. Market Analysis must need to be performed.
Improve Business Requirement: Data analysis allows for improving Business to the consumer’s
expectations, requirements and experience.
Now that you know the need for Data Analytics, let us have a quick look at what is Data Analytics.
They must have a basic understanding of statistics, a perfect sense of databases, the ability to
create new views, and the perception to visualize the data. Data analytics can be referred to as
the necessary level of data science.
A Data Analyst delivers value to their companies by taking information about specific topics and
then interpreting, analyzing, and presenting findings in comprehensive reports. So, if you have
the capabilities like collecting data from various sources, analyzing the data, gathering hidden
insights, and generating reports, then you can become a Data Analyst.
A Data Analyst should also possess skills such as Statistics, Data Cleaning, Exploratory Data
Analysis, and Data Visualization. Also, if you know about Machine Learning, then that would make
you stand out from the crowd.
On average, a Data Analyst can expect a salary of ₹404,660 (IND) or $83,878 (US). As experts,
data analysts are often called on to use their skills and tools to provide competitive analysis and
identify trends within industries.
Python:
Python is an open-source, object-oriented programming language that is easy to read, write,
and maintain.
It provides various machine learning and visualization libraries such as Scikit-learn, TensorFlow,
Matplotlib, Pandas, Keras, etc.
QlikView:
This tool offers in-memory data processing with the results delivered to the end-users quickly.
A BI data discovery product.
Rapidly develop and deliver interactive guided analytics applications and dashboards.
Ask and answer your own questions and follow your own paths to insight.
Offers in-memory data processing with the results delivered to the end-users quickly.
Also offers data association and data visualization with data being compressed to almost 10%
of its original size.
SAS: A programming language and environment for data manipulation and analytics.
RapidMiner: A powerful, integrated platform that can integrate with any data source type such
as Access, Excel, Microsoft SQL, Oracle, etc. Mostly used for predictive analytics, such as data
mining, text analytics, and machine learning.
KNIME: Konstanz Information Miner (KNIME) is an open-source data analytics platform to
analyze and model data for reporting and integration through its modular data pipeline concept.
OpenRefine: GoogleRefine is a data cleaning software that will help you clean up messy and
parsing data from websites for analysis.
Apache Spark: One of the largest large-scale data processing engines used for data pipelines
and machine learning model development.
1.3 Dealing with Different Types of Data
The different types of data analytics for a company depend on its stage of development. Most
companies are likely already using some sort of analytics, but it typically only affords insights to
make reactive, not proactive, business decisions.
More and more, businesses are adopting sophisticated data analytics solutions with machine
learning capabilities to make better business decisions and help determine market trends and
opportunities.
Organizations that do not start to use data analytics with proactive, future-casting capabilities may
find business performance lacking because they cannot uncover hidden patterns and gain other
insights.
Typically, there are four main types of data is used in Data Analytics. These are:
1. Predictive Data Analytics
Predictive analytics may be the most commonly used category of data analytics.
Businesses use predictive analytics to identify trends, correlations, and causation.
The category can be further broken down into predictive modeling and statistical modeling;
however, it is important to know that the two go hand in hand.
For example, an advertising campaign for t-shirts on Facebook could apply predictive analytics to
determine how closely the conversion rate correlates with a target audience’s geographic area,
income bracket, and interests. From there, predictive modeling could be used to analyze the
statistics for two (or more) target audiences and provide possible revenue values for each
demographic.
Ad Hoc Reports: These are designed by you and usually are not scheduled. They are
generated when there is a need to answer a specific business question. These reports are
useful for obtaining more in-depth information about a specific query.
It focuses on corporate social media profiles, examining the types of people who have liked your
page and other industry pages, as well as other engagement and demographic information.
Hyperspecificity helps give a more complete picture of your social media audience. Chances are
you would not need to view this type of report a second time (unless there’s a major change to
your audience).
Business Understanding:
Before solving any problem in the Business domain, it needs to be understood properly. Business
understanding forms a concrete base, which further leads to easy resolution of queries. We should
have the clarity of what is the exact problem we are going to solve.
Data Requirements:
The above chosen analytical method indicates the necessary data content, formats and sources to
be gathered. During the process of data requirements, one should find the answers to questions like
‘what’, ‘where’, ‘when’, ‘why’, ‘how’ & ‘who’.
Data Collection:
Data collected can be obtained in any random format. So, according to the approach chosen and
the output to be obtained, the data collected should be validated. Thus, if required one can gather
more data or discard the irrelevant data.
Data Understanding:
Data understanding answers the question “Is the data collected representative of the problem to be
solved?”. Descriptive statistics calculates the measures applied over data to access the content and
quality of matter. This step may lead to reverting to the previous step for correction.
Data Preparation:
Let us understand this by connecting this concept with two analogies. One is to wash freshly picked
vegetables and the second is only taking the wanted items to eat on the plate during the buffet.
Washing of vegetables indicates the removal of dirt i.e. unwanted materials from the data. Here
noise removal is done. Taking only eatable items on the plate is, if we don’t need specific data then
we should not consider it for further process. This whole process includes transformation,
normalization etc.
Modelling:
Modelling decides whether the data prepared for processing is appropriate or requires more finishing
and seasoning. This phase focuses on the building of predictive/descriptive models.
Evaluation:
Model evaluation is done during model development. It checks for the quality of the model to be
assessed and also if it meets the business requirements. It undergoes the diagnostic measure phase
(the model works as intended and where are modifications required) and the statistical significance
testing phase (ensures proper data handling and interpretation).
Deployment:
As the model is effectively evaluated it is made ready for deployment in the business market. The
deployment phase checks how much the model can withstand the external environment and perform
superiorly as compared to others.
Going one level deeper, the following skills will help you carve out a niche as a data scientist:
Strong knowledge of Python, SAS, R, Scala
Hands-on experience in SQL database coding
Ability to work with unstructured data from various sources like video and social media
Understand multiple analytical functions
Knowledge of machine learning
Machine Learning
Machine learning can be defined as the practice of using algorithms to extract data, learn from it,
and then forecast future trends for that topic.
Traditional machine learning software is statistical analysis and predictive analysis that is used to
spot patterns and catch hidden insights based on perceived data.
A good example of machine learning implementation is Facebook. Facebook’s machine learning
algorithms gather behavioural information for every user on the social platform. Based on one’s
past behaviour, the algorithm predicts interests and recommends articles and notifications on the
news feed.
Similarly, when Amazon recommends products, or when Netflix recommends movies based on
past behaviours, machine learning is at work.
Transportation
Big data analytics finds huge applications in the transportation industry.
Governments of different countries use big data to control the traffic, optimize route planning and
intelligent transport systems and congestion management.
Moreover, the private sector uses big data in revenue management, technological enhancements,
logistics and to gain a competitive advantage. Big data is improving user experiences, and the
massive adoption change has just begun.
ii) Remove The Noisy Data: Random error is called noisy data.
Methods to remove noise are:
Binning: Binning methods are applied by sorting values into buckets or bins. Smoothening is
performed by consulting the neighboring values.
Binning is done by smoothing by bin i.e. each bin is replaced by the mean of the bin. Smoothing
by a median, where each bin value is replaced by a bin median.
Smoothing by bin boundaries i.e. The minimum and maximum values in the bin are bin
boundaries and each bin value is replaced by the closest boundary value.
Identifying the Outliers
Resolving Inconsistencies
Data Integration
When multiple heterogeneous data sources such as databases, data cubes or files are combined
for analysis, this process is called data integration.
This can help in improving the accuracy and speed of the data mining process.
Different databases have different naming conventions of variables, by causing redundancies in
the databases.
Additional Data Cleaning can be performed to remove the redundancies and inconsistencies from
the data integration without affecting the reliability of data.
Data Integration can be performed using Data Migration Tools such as Oracle Data Service
Integrator and Microsoft SQL etc.
Data Reduction
This technique is applied to obtain relevant data for analysis from the collection of data. The size of
the representation is much smaller in volume while maintaining integrity. Data Reduction is
performed using methods such as Naive Bayes, Decision Trees, Neural network, etc.
Some strategies of data reduction are:
Dimensionality Reduction: Reducing the number of attributes in the dataset.
Numerosity Reduction: Replacing the original data volume by smaller forms of data
representation.
Data Compression: Compressed representation of the original data.
Data Mining
Data Mining is a process to identify interesting patterns and knowledge from a large amount of
data.
In these steps, intelligent patterns are applied to extract the data patterns.
The data is represented in the form of patterns and models are structured using classification and
clustering techniques.
Pattern Evaluation
This step involves identifying interesting patterns representing the knowledge based on
interestingness measures.
Data summarization and visualization methods are used to make the data understandable by the
user.
Knowledge Representation
Knowledge representation is a step where data visualization and knowledge representation tools
are used to represent the mined data.
Data is visualized in the form of reports, tables, etc.
Data Understanding:
This step will collect the whole data and populate the data in the tool (if using any tool).
The data is listed with its data source, location, how it is acquired and if any issue encountered.
Data is visualized and queried to check its completeness.
Data Preparation:
This step involves selecting the appropriate data, cleaning, constructing attributes from data,
integrating data from multiple databases.
Modeling:
Selection of the data mining technique such as decision-tree, generate test design for evaluating
the selected model, building models from the dataset and assessing the built model with experts
to discuss the result is done in this step.
Evaluation:
This step will determine the degree to which the resulting model meets the business requirements.
Evaluation can be done by testing the model on real applications.
The model is reviewed for any mistakes or steps that should be repeated.
Deployment:
In this step a deployment plan is made, strategy to monitor and maintain the data mining model
results to check for its usefulness is formed, final reports are made and review of the whole process
is done to check any mistake and see if any step is repeated.
For example, by examining the frequency distribution of different values for each column in a table, a
data analyst could gain insight into the type and use of each column. Cross-column analysis can be
used to expose embedded value dependencies; inter-table analysis allows the analyst to discover
overlapping value sets that represent foreign key relationships between entities.
It should be noted that Data Wrangling is a somewhat demanding and time-consuming operation
both from computational capacities and human resources. Data wrangling takes over half of what
data scientist does.
1. Preprocessing — the initial state that occurs right after the acquiring of data.
2. Standardizing data into an understandable format. For example, you have a user
profile events record, and you need to sort it by types of events and time stamps.
4. Consolidating data from various sources or data sets into a coherent whole. For example,
you have an affiliate advertising network, and you need to gather performance statistics for the
current stage of the marketing campaign.
5. Matching data with the existing data sets. For example, you already have user data for a
certain period and unite these sets into a more expansive one
2. Systematic Sampling
In systematic sampling, every population is given a number as well like in simple random sampling.
However, instead of randomly generating numbers, the samples are chosen at regular intervals.
Example: The researcher assigns every member in the company database a number. Instead of
randomly generating numbers, a random starting point (say 5) is selected. From that number
onwards, the researcher selects every, say, 10th person on the list (5, 15, 25, and so on) until the
sample is obtained.
3. Stratified Sampling
In stratified sampling, the population is subdivided into subgroups, called strata, based on some
characteristics (age, gender, income, etc.).
After forming a subgroup, you can then use random or systematic sampling to select a sample for
each subgroup.
This method allows you to draw more precise conclusions because it ensures that every subgroup
is properly represented.
Example: If a company has 500 male employees and 100 female employees, the researcher wants
to ensure that the sample reflects the gender as well. So, the population is divided into two
subgroups based on gender.
4. Cluster Sampling
In cluster sampling, the population is divided into subgroups, but each subgroup has similar
characteristics to the whole sample.
Instead of selecting a sample from each subgroup, you randomly select an entire subgroup.
This method is helpful when dealing with large and diverse populations.
Example: A company has over a hundred offices in ten cities across the world which has roughly
the same number of employees in similar job roles. The researcher randomly selects 2 to 3 offices
and uses them as the sample.
1. Convenience Sampling
In this sampling method, the researcher simply selects the individuals which are most easily
accessible to them.
This is an easy way to gather data, but there is no way to tell if the sample is representative of the
entire population.
The only criteria involved is that people are available and willing to participate.
Example: The researcher stands outside a company and asks the employees coming in to answer
questions or complete a survey.
3. Purposive Sampling
In purposive sampling, the researcher uses their expertise and judgment to select a sample that
they think is the best fit.
It is often used when the population is very small and the researcher only wants to gain knowledge
about a specific phenomenon rather than make statistical inferences.
Example: The researcher wants to know about the experiences of disabled employees at a
company. So, the sample is purposefully selected from this population.
4. Snowball Sampling
In snowball sampling, the research participants recruit other participants for the study.
It is used when participants required for the research are hard to find.
It is called snowball sampling because like a snowball, it picks up more participants along the way
and gets larger and larger.
Example: The researcher wants to know about the experiences of homeless people in a city. Since
there is no detailed list of homeless people, a probability sample is not possible. The only way to
get the sample is to get in touch with one homeless person who will then put you in touch with other
homeless people in a particular area.
Data As Service
Traditionally the Data is stored in data stores, developed to obtain by particular applications.
When the SaaS (software as a service) was popular, Daas was just the beginning.
As with Software-as-a-Service applications, Data as a service uses cloud technology to give users
and applications with on-demand access to information without depending on where the users or
applications may be.
Data as a Service is one of the current trends in big data analytics and will deliver it simpler for
analysts to obtain data for business review tasks and easier for areas throughout a business or
industry to share data.
Predictive Analytics
Big data analytics has always been a fundamental approach for companies to become a
competing edge and accomplish their aims.
They apply basic analytics tools to prepare big data and discover the causes of why specific
issues arise.
Predictive methods are implemented to examine modern data and historical events to know
customers and recognize possible hazards and events for a corporation.
Predictive analysis in big data can predict what may occur in the future.
This strategy is extremely efficient in correcting analyzed assembled data to predict customer
response. This enables organizations to define the steps they have to practice by identifying a
customer’s next move before they even do it.
Quantum Computing
Using current time technology can take a lot of time to process a huge amount of data. Whereas,
Quantum computers, calculate the probability of an object's state or an event before it is
measured, which indicates that they can process more data than classical computers.
If only we compress billions of data at once in only a few minutes, we can reduce processing
duration immensely, providing organizations with the possibility to gain timely decisions to attain
more aspired outcomes.
This process can be possible using Quantum computing. The experiment of quantum computers
to correct functional and analytical research over several enterprises can make the industry more
precise.
Hybrid Clouds
A cloud computing system utilizes an on-premises private cloud and a third-party public cloud
with orchestration between two interfaces.
The hybrid cloud provides excellent flexibility and more data deployment options by moving the
processes between private and public clouds. An organization must have a private cloud to gain
adaptability with the aspired public cloud.
For that, it has to develop a data center, including servers, storage, LAN, and load balancer. The
organization has to deploy a virtualization layer/hypervisor to support the VMs and containers.
And, install a private cloud software layer.
The implementation of software allows instances to transfer data between the private and public
clouds.
Dark Data
Dark data is the data that a company does not use in any analytical system. The data is gathered
from several network operations that are not used to determine insights or for prediction.
The organizations might think that this is not the correct data because they are not getting any
outcome from that. But they know that this will be the most valuable thing.
As the data is growing day by day, the industry should understand that any unexplored data can
be a security risk. The expansion in the amount of Dark Data can be seen as another Trend.
XOps
The aim of XOps (data, ML, model, platform) is to achieve efficiencies and economies of scale.
XOps is achieved by implementing DevOps best practices. Thus, ensuring efficiency, reusability,
and repeatability while reducing technology, process replication and allowing automation.
These innovations would enable prototypes to be scaled, with flexible design and agile
orchestration of governed systems.
Section 3: Exercises
Exercise 1: Sample Data
House_number
Husband_Age
Wife_Age
Husband_Income
Wife_Income
Number_Of_Bedrooms
Electricity_Units
Gas
Number_Of_Children, Internet_Connection
Mode
House_Owned/Rented, Speaking_Language
Decade_Of_House_Built
Problem Statement:
To find out the following:
Know the minimum, maximum and average Age of the Wife
Know the median, quantile, variance and standard deviation of Husband Income
Find the frequency of the Number of Children and Number of Bedrooms
Business analytics is measuring the performance to make improvements to the bottom line of the
business.
Types of Business Analytics
Let’s delve into the types of Business Analytics. Primarily there are 4 types.
Descriptive Analytics
The first generation of business analytics was based on studying historic data and drawing
inferences about the performance of the business.
This is exactly what descriptive analytics does with the available data.
Summarizing data into a few key metrics give a reasonable understanding of how well or not well
a business is doing.
Marketing
Business analytics plays an important role in determining the effectiveness of marketing campaigns
by generating insights on which kind of campaign is most effective and which one is most penetrative
in the market. How much each type of campaign should be invested in to gain maximum benefits and
cut losses.
E Retailing
Today the e-retailing business is expanding like never before with more and more people preferring
to order online than visit brick-and-mortar stores with covid pandemic attenuating it further.
There are many players in the market and it becomes necessary for the e-retailer to keep a hawks
eye on inventories to maintain with suppliers and keep the pricing competitive while cutting losses.
Wrike
Wrike is a project management tool that
runs in the cloud.
It aids in the establishing of deadlines,
scheduling, and resource allocation.
Business analysts may update and
provide tasks from anywhere using
Android and iOS apps.
Oribi
Custom reports, automated event collecting, visitor journey, and email capturing are just a few of
Oribi’s features.
It is appropriate for all types of businesses.
With only a few clicks, a business analyst can simply design marketing funnels and track where
visitors are leaving.
It has event monitoring capabilities and allows you to define conversion targets without writing any
code.
Conditional formatting in Excel enables you to highlight cells with a certain color, depending
on the cell's value.
We can perform following operations:
Highlight Cells Rules
Clear Rules
Top/Bottom
Conditional Formatting with Formulas
Finding Duplicate and Triplicate Values
Finding Duplicate Rows
Data Bars and Colour Scale
Icon Sets
Manage and Conflicting Rules
You can also use this category (see step 3) to highlight cells that are less than a value, between two
values, equal to a value, cells that contain specific text, dates (today, last week, next month,
etc.), duplicates or unique values.
5. Click OK.
Result: Excel calculates the average (42.5) and formats the cells that are above this average.
You can also use this category (see step 3) to highlight the top n items, the top n percent, the
bottom n items, the bottom n percent or cells that are below average.
Explanation: Always write the formula for the upper-left cell in the selected range. Excel automatically
copies the formula to the other cells. Thus, cell A2 contains the formula =ISODD(A2), cell A3 contains
the formula =ISODD(A3), etc.
Explanation: We fixed the reference to column C by placing a $ symbol in front of the column letter
($C2). As a result, cell B2, C2 and cell D2 also contain the formula =$C2="USA", cell A3, B3, C3 and
D3 contain the formula =$C3="USA", etc.
Result
Explanation:
=COUNTIF($A$1:$C$10,A1) counts the number of names in the range A1:C10 that are equal to
the name in cell A1.
If COUNTIF($A$1:$C$10,A1) = 3, Excel formats cell A1. Always write the formula for the upper-
left cell in the selected range (A1:C10).
Excel automatically copies the formula to the other cells. Thus, cell A2 contains the formula
=COUNTIF($A$1:$C$10,A2)=3, cell A3 =COUNTIF($A$1:$C$10,A3)=3, etc.
Notice how we created an absolute reference ($A$1:$C$10) to fix this reference.
Note: you can use any formula you like. For example, use this formula
=COUNTIF($A$1:$C$10,A1)>3 to highlight names that occur more than 3 times.
The named range Animals refers to the range A1:A10, the named range Continents refers to the
range B1:B10 and the named range Countries refers to the range
C1:C10. =COUNTIFS(Animals,$A1,Continents,$B1,Countries,$C1) counts the number of rows
based on multiple criteria (Leopard, Africa, Zambia).
Explanation:
If COUNTIFS(Animals,$A1,Continents,$B1,Countries,$C1) > 1, in other words, if there are
multiple (Leopard, Africa, Zambia) rows, Excel formats cell A1.
Always write the formula for the upper-left cell in the selected range (A1:C10). Excel automatically
copies the formula to the other cells.
We fixed the reference to each column by placing a $ symbol in front of the column letter ($A1,
$B1 and $C1). As a result, cell A1, B1 and C1 contain the same formula, cell A2, B2 and C2 contain
the formula =COUNTIFS(Animals,$A2,Continents,$B2,Countries,$C2)>1, etc.
In the example below, Excel removes all identical rows (blue) except for the first identical row found
(yellow).
Result:
Result:
Result:
To update the rules, go to Conditional Formatting > Manage Rules > Edit rules. You can change
the rules according to your preferences.
Note: because we selected cell A1, Excel shows the rule applied to the range A1:A10.
4. From the drop-down list, change Current Selection to This Worksheet, to view all conditional
formatting rules in this worksheet.
Note: click New Rule, Edit Rule and Delete Rule to create, edit and delete rules.
1. The value 95 is higher than 80 but is also the highest value (Top 1). The formats (yellow fill vs
green fill and yellow text color vs green text color) conflict. A higher rule always wins. As a result, the
value 95 is colored yellow.
Result:
Result:
Note: only use the Stop If True check boxes for backwards compatibility with earlier versions of
Microsoft Excel.
Drag Fields
The PivotTable Fields pane appears. To get the total amount exported of each product, drag the
following fields to the different areas.
1. Product field to the Rows area.
2. Amount field to the Values area.
3. Country field to the Filters area.
Below you can find the pivot table. Bananas are our main export product. That's how easy pivot tables can
be!
Result:
Note: you can use the standard filter (triangle next to Row Labels) to only show the amounts of
specific products.
3. Choose the type of calculation you want to use. For example, click Count.
4. Click OK.
Result: 16 out of the 28 orders to France were 'Apple' orders.
To easily compare these numbers, create a pivot chart and apply a filter. Maybe this is one step too
far for you at this stage, but it shows you one of the many other powerful pivot table features Excel
has to offer.
4. Click United States to find out which products we export the most to the United States.
10. Use the second slicer. Click the Multi-Select button to select multiple products.
Note: instead of using the Multi-Select button, hold down CTRL to select multiple items.
Conclusion: The total amount of apples exported to Canada equals $24,867 (6 orders) and the total
amount of oranges exported to Canada equals $19,929 (3 orders).
17. Click the icon in the upper-right corner of a slicer to clear the filter.
Note: Any changes you make to the pivot chart are immediately reflected in the pivot table and vice
versa.
3. Choose Pie.
4. Click OK.
Result:
Note: pie charts always use one data series (in this case, Beans). To get a pivot chart of a country,
swap the data over the axis. First, select the chart. Next, on the Design tab, in the Data group, click
Switch Row/Column.
A dashboard simplifies that otherwise complex data you have in your spreadsheet and transforms
it into something visual that’s far easier for you to grasp and, thus, utilize.
Needless to say, dashboards have a wide array of uses—from budgeting or project management
to marketing or sales reporting.
For a simple example, we have used a dashboard to transform this spreadsheet of first quarter
expenses:
Into this quick pie chart that shares a breakdown of where money was spent during January:
This example is relatively straightforward. But Excel has tons of capabilities to create as complex
of a dashboard as you require.
TIP: It’s best to keep your original dataset somewhere else. That way, if you make an error, you’ll
be able to retrieve the data that you started with.
When you’ve filtered down to only the data that you want, highlight all of the cells of data, hit “copy,”
and then paste only those rows into your “Chart Data” tab of your workbook. That’s the tab that
you’ll pull data from when building your charts.
Why can’t you just select data from your regular “Data” tab? Put simply, because even though
you’ve filtered the data, those other irrelevant rows are still included there (albeit hidden), meaning
they’ll throw things off in your chart.
3. Build your chart
Now that you have only the data that you need, you’re ready to begin building your chart.
Click on the “Dashboard” tab of your worksheet, click the “Insert” button in the toolbar, and then
select the type of chart you want from the menu. In this case, we’re going to use a clustered column
chart.
When you insert the chart, you’ll see a blank box. We’ll cover how to get your data to appear there
in the next step.
TIP: Still aren’t sure which chart option is the best fit for your data? Highlight all of your rows of data
in your “Chart Data” tab and then click “Recommended Charts” within the “Insert” ribbon. Excel will
suggest some charts for you to use.
After we’ve done that? We’ll end up with a tab that shows how much we spend on each item each
month.
Descriptive Analysis
Descriptive statistical analysis involves collecting, interpreting, analyzing, and summarizing data to
present them in the form of charts, graphs, and tables.
Rather than drawing conclusions, it simply makes the complex data easy to read and understand.
Inferential Analysis
The inferential statistical analysis focuses on drawing meaningful conclusions on the basis of the
data analyzed.
It studies the relationship between different variables or makes predictions for the whole population.
Predictive Analysis
Predictive statistical analysis is a type of statistical analysis that analyzes data to derive past trends
and predict future events on the basis of them.
It uses machine learning algorithms, data mining, data modelling, and artificial intelligence to
conduct the statistical analysis of data.
Prescriptive Analysis
The prescriptive analysis conducts the analysis of data and prescribes the best course of action
based on the results.
It is a type of statistical analysis that helps you make an informed decision.
Causal Analysis
The causal statistical analysis focuses on determining the cause-and-effect relationship between
different variables within the raw data.
In simple words, it determines why something happens and its effect on other variables.
This methodology can be used by businesses to determine the reason for failure.
Regression
Regression is a statistical tool that helps determine the cause-and-effect relationship between the
variables.
It determines the relationship between a dependent and an independent variable.
It is generally used to predict future trends and events.
Simple linear regression is commonly used in forecasting and financial analysis for a company to
tell how a change in the GDP could affect sales.
For Example: Microsoft Excel and other software can do all the calculations, but it's good to know
how the mechanics of simple linear regression work.
Section 3: Exercises
Exercise 1: Create a 3-dimensional column chart for following data.
a.
b.
c.
Superhero Laundry
Exercise 4: Create a Heat Map of following data using Conditional Formatting Colour Scales.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2009 27.9 36.7 42.4 54.5 62.5 67.5 72.7 75.7 66.3 55.0 51.2 35.9
2010 32.5 33.1 48.2 57.9 65.3 74.7 81.3 77.4 71.1 58.1 47.9 32.8
2011 29.7 36.0 42.3 54.3 64.5 72.3 80.2 75.3 70.0 57.1 51.9 43.3
2012 37.3 40.9 50.9 54.8 65.1 71.0 78.8 76.7 68.8 58.0 43.9 41.5
2013 35.1 33.9 40.1 53.0 62.8 72.7 79.8 74.6 67.9 60.2 45.3 38.5
2014 28.6 31.6 37.7 52.3 64.0 72.5 76.1 74.5 69.7 59.6 45.3 40.5
2015 29.9 23.9 38.1 54.3 68.5 71.2 78.8 79.0 74.5 58.0 52.8 50.8
2016 34.5 37.7 48.9 53.3 62.8 72.3 78.7 79.2 71.8 58.8 49.8 38.3
2017 38.0 41.6 39.2 57.2 61.1 72.0 76.8 74.0 70.5 64.1 46.6 33.4
What is Python
Python is a general purpose, dynamic, high-level, and interpreted programming language.
It supports Object Oriented programming approach to develop applications.
It is simple and easy to learn and provides lots of high-level data structures.
Python is easy to learn yet powerful and versatile scripting language, which makes it attractive for
Application Development.
Python's syntax and dynamic typing with its interpreted nature make it an ideal language for
scripting and rapid application development.
Python supports multiple programming pattern, including object-oriented, imperative, and
functional or procedural programming styles.
Python is not intended to work in a particular area, such as web programming. That is why it is
known as multipurpose programming language because it can be used with web, enterprise, 3D
CAD, etc.
We don't need to use data types to declare variable because it is dynamically typed so we can write
a=10 to assign an integer value in an integer variable.
Python makes the development and debugging fast because there is no compilation step included
in Python development, and edit-test-debug cycle is very fast.
Expressive Language:
Python can perform complex tasks using a few lines of code.
A simple example, the hello world program you simply type print("Hello World"). It will take only one
line to execute, while Java or C takes multiple lines.
Interpreted Language
Python is an interpreted language; it means the Python program is executed one line at a time.
The advantage of being interpreted language, it makes debugging easy and portable.
Cross-platform Language
Python can run equally on different platforms such as Windows, Linux, UNIX, and Macintosh, etc.
So, we can say that Python is a portable language.
It enables programmers to develop the software for several competing platforms by writing a
program only once.
Object-Oriented Language
Python supports object-oriented language and concepts of classes and objects come into
existence.
It supports inheritance, polymorphism, and encapsulation, etc.
The object-oriented procedure helps to programmer to write reusable code and develop
applications in less code.
Extensible
It implies that other languages such as C/C++ can be used to compile the code and thus it can be
used further in our Python code.
It converts the program into byte code, and any platform can use that byte code.
Integrated
It can be easily integrated with languages like C, C++, and JAVA, etc.
Python runs code line by line like C,C++ Java. It makes easy to debug the code.
Embeddable
The code of the other programming language can use in the Python source code.
We can use Python source code in another programming language as well.
It can embed other language into our code.
Python Applications
Python is known for its general-purpose nature that makes it applicable in almost every domain of
software development.
Python makes its presence in every emerging field. It is the fastest-growing programming language
and can develop any application.
Here, we are specifying application areas where Python can be applied.
Console-based Application
Console-based applications run from the command-line or shell.
These applications are computer program which are used commands to execute.
This kind of application was more popular in the old generation of computers. Python can develop
this kind of application very effectively.
It is famous for having REPL, which means the Read-Eval-Print Loop that makes it the most
suitable language for the command-line applications.
Software Development
Python is useful for the software development process.
It works as a support language and can be used to build control and management, testing, etc.
SCons is used to build control.
Buildbot and Apache Gumps are used for automated continuous compilation and testing.
Round or Trac for bug tracking and project management.
3D CAD Applications
The CAD (Computer-aided design) is used to design engineering related architecture.
It is used to develop the 3D representation of a part of a system.
Python can create a 3D CAD application by using the following functionalities.
Fandango (Popular)
CAMVOX
HeeksCNC
AnyCAD
RCAM
Enterprise Applications
Python can be used to create applications that can be used within an Enterprise or an Organization.
Some real-time applications are OpenERP, Tryton, Picalo, etc.
Java Program
public class HelloWorld {
public static void main(String[] args){
// Prints "Hello, World" to the terminal window.
System.out.println("Hello World");
}
}
Python Program
On the other hand, we can do this using one statement in Python.
print("Hello World")
Both programs will print the same result, but it takes only one statement without using a semicolon or
curly braces in Python.
For example -
def func():
statement 1
statement 2
…………………
…………………
statement N
In the above example, the statements that are same level to right belong to the function. Generally,
we can use four whitespaces to define indentation.
We can also click on the customize installation to choose desired location and features. Other
important thing is install launcher for the all user must be checked.
Here, we get the message "Hello World !" printed on the console.
Using a script file (Script Mode Programming)
The interpreter prompt is best to run the single-line statements of the code. However, we cannot
write the code every-time on the terminal. It is not suitable to write multiple lines of code.
Using the script mode, we can write multiple lines code into a file which can be executed later.
For this purpose, we need to open an editor like notepad, create a file named and save it with .py
extension, which stands for "Python". Now, we will implement the above example using the script
mode.
print ("hello world"); #here, we have used print() function to print the message on the console. To run
this file named as first.py, we need to run the following command on the terminal.
Step -2: Now, write the code and press "Ctrl+S" to save the file.
Step - 3: After saving the code, we can run it by clicking "Run" or "Run Module". It will display the
output to the shell.
We need to type the python keyword, followed by the file name and hit enter to run the Python file.
Script File
JetBrains provides the most popular and a widely used cross-platform IDE PyCharm to run the python
programs.
As we have already stated, PyCharm is a cross-platform IDE, and hence it can be installed on a variety
of the operating systems. In this section of the tutorial, we will cover the installation process of
PyCharm on Windows.
Windows
Here is a step-by-step process on how to download and install Pycharm IDE on Windows:
Step 2) Once the download is complete, run the exe for install PyCharm. The setup wizard should
have started. Click “Next”.
Step 4) On the next screen, you can create a desktop shortcut if you want and click on “Next”.
Step 5) Choose the start menu folder. Keep selected JetBrains and click on “Install”.
Step 7) Once installation finished, you should receive a message screen that PyCharm is installed. If
you want to go ahead and run it, click the “Run PyCharm Community Edition” box first and click
“Finish”.
Step 8) After you click on “Finish,” the Following screen will appear.
1. You can select the location where you want the project to be created. If you don’t want to change
location than keep it as it is but at least change the name from “untitled” to something more
meaningful, like “FirstProject”.
2. PyCharm should have found the Python interpreter you installed earlier.
3. Next Click the “Create” Button.
Step 4) A new pop up will appear. Now type the name of the file you want (Here we give “HelloWorld”)
and hit “OK”.
Step 6) Now Go up to the “Run” menu and select “Run” to run your program.
Step 8) Don’t worry if you don’t have Pycharm Editor installed, you can still run the code from the
command prompt. Enter the correct path of a file in command prompt to run the program.
Operator Description
+ (Addition) It is used to add two operands. For example, if a = 20, b = 10 => a+b =
30
- (Subtraction) It is used to subtract the second operand from the first operand. If the
first operand is less than the second operand, the value results
negative. For example, if a = 20, b = 10 => a - b = 10
/ (divide) It returns the quotient after dividing the first operand by the second
operand. For example, if a = 20, b = 10 => a/b = 2.0
* (Multiplication) It is used to multiply one operand with the other. For example, if a = 20,
b = 10 => a * b = 200
% (reminder) It returns the reminder after dividing the first operand by the second
operand. For example, if a = 20, b = 10 => a%b = 0
** (Exponent) It is an exponent operator represented as it calculates the first operand
power to the second operand.
// (Floor division) It gives the floor value of the quotient produced by dividing the two
operands.
Comparison Operator
Comparison operators are used to comparing the value of the two operands and returns Boolean true
or false accordingly. The comparison operators are described in the following table.
Operator Description
== If the value of two operands is equal, then the condition becomes true.
!= If the value of two operands is not equal, then the condition becomes true.
<= If the first operand is less than or equal to the second operand, then the condition
becomes true.
>= If the first operand is greater than or equal to the second operand, then the condition
becomes true.
> If the first operand is greater than the second operand, then the condition becomes true.
< If the first operand is less than the second operand, then the condition becomes true.
*= It multiplies the value of the left operand by the value of the right operand and assigns
the modified value back to then the left operand. For example, if a = 10, b = 20 => a* =
b will be equal to a = a* b and therefore, a = 200.
%= It divides the value of the left operand by the value of the right operand and assigns the
reminder back to the left operand. For example, if a = 20, b = 10 => a % = b will be
equal to a = a % b and therefore, a = 0.
**= a**=b will be equal to a=a**b, for example, if a = 4, b =2, a**=b will assign 4**2 = 16 to
a.
//= A//=b will be equal to a = a// b, for example, if a = 4, b = 3, a//=b will assign 4//3 = 1 to
a.
For example,
if a = 7
b=6
then, binary (a) = 0111
binary (b) = 0110
hence, a
& b = 0011
a | b = 0111
a ^ b = 0100
~ a = 1000
Operator Description
& If both the bits at the same place in two operands are 1, then 1 is copied to the
(binary and) result. Otherwise, 0 is copied.
| (binary or) The resulting bit will be 0 if both the bits are zero; otherwise, the resulting bit will be
1.
^ (binary xor) The resulting bit will be 1 if both the bits are different; otherwise, the resulting bit
will be 0.
~ (negation) It calculates the negation of each bit of the operand, i.e., if the bit is 0, the resulting
bit will be 1 and vice versa.
<< (left shift) The left operand value is moved left by the number of bits present in the right
operand.
>> (right The left operand is moved right by the number of bits present in the right operand.
shift)
Logical Operators
The logical operators are used primarily in the expression evaluation to make a decision. Python
supports the following logical operators.
Operator Description
and If both the expression are true, then the condition will be true. If a and b are the two
expressions, a → true, b → true => a and b → true.
or If one of the expressions is true, then the condition will be true. If a and b are the two
expressions, a → true, b → false => a or b → true.
not If an expression a is true, then not (a) will be false and vice versa.
Identity Operators
The identity operators are used to decide whether an element certain class or type.
Operator Description
is It is evaluated to be true if the reference present at both sides point to the same object.
is not It is evaluated to be true if the reference present at both sides do not point to the same
object.
Operator Precedence
The precedence of the operators is essential to find out since it enables us to know which operator
should be evaluated first. The precedence table of the operators in Python is given below.
Operator Description
** The exponent operator is given priority over all the others used in the expression.
~+- The negation, unary plus, and minus.
* / % // The multiplication, divide, modules, reminder, and floor division.
+- Binary plus, and minus
>> << Left shift. and right shift
& Binary and.
^| Binary xor, and or
<= < > >= Comparison operators (less than, less than equal to, greater than, greater then
equal to).
<> == != Equality operators.
= %= /= //= -= += Assignment operators
*= **=
is is not Identity operators
in not in Membership operators
not or and Logical operators
print("Welcome to lotus.")
a = 10
# Two objects are passed in print() function
print("a =", a)
b=a
# Three objects are passed in print function
print('a =', a, '= b')
Output:
Welcome to lotus.
a = 10
a = 10 = b
As we can see in the above output, the multiple objects can be printed in the single print() statement.
We just need to use comma (,) to separate with each other.
a = 10
print("a =", a, sep='dddd', end='\n\n\n')
print("a =", a, sep='0', end='$$$$$')
Output:
a =dddd10
a =010$$$$$
In the first print() statement, we use the sep and end arguments.
The given object is printed just after the sep values.
The value of end parameter printed at the last of given object.
As we can see that, the second print() function printed the result after the three black lines.
Python provides the input() function which is used to take input from the user. Let's understand the
following example.
Example 1-
By default, the input() function takes the string input but what if we want to take other data types
as an input.
If we want to take input as an integer number, we need to typecast the input() function into an
integer.
Example 2-
Example
In this example we use two variables, a and b, which are used as part of the if statement to test
whether b is greater than a. As a is 33, and b is 200, we know that 200 is greater than 33, and so we
print to screen that "b is greater than a".
a = 33
b = 200
if b > a:
print("b is greater than a")
The elif keyword is pythons way of saying "if the previous conditions were not true, then try this
condition".
a = 33
b = 33
if b > a:
print("b is greater than a")
elif a == b:
print("a and b are equal")
In this example a is equal to b, so the first condition is not true, but the elif condition is true, so we
print to screen that "a and b are equal".
The else keyword catches anything which isn't caught by the preceding conditions.
a = 200
b = 33
if b > a:
print("b is greater than a")
elif a == b:
print("a and b are equal")
else:
print("a is greater than b")
In this example a is greater than b, so the first condition is not true, also the elif condition is not true,
so we go to the else condition and print to screen that "a is greater than b".
Python Loops
Sometimes we may need to alter the flow of the program.
The execution of a specific code may need to be repeated several numbers of times.
For this purpose, the programming languages provide various types of loops capable of repeating
some specific code several times. Consider the following tutorial to understand the statements in
detail.
Python For Loop
Python While Loop
Example:
Print each fruit in a fruit list:
The for loop does not require an indexing variable to set beforehand.
Example:
Loop through the letters in the word "banana":
for x in "banana":
print(x)
Exit the loop when x is "banana", but this time the break comes before the print:
fruits = ["apple", "banana", "cherry"]
for x in fruits:
if x == "banana":
break
print(x)
Example
Example
Exit the loop when i is 3:
i=1
while i < 6:
print(i)
if i == 3:
break
i += 1
Example
Continue to the next iteration if I is 3:
i=0
while i < 6:
i += 1
if i == 3:
continue
print(i)
Example
Print a message once the condition is false:
i=1
while i < 6:
print(i)
i += 1
else:
print("i is no longer less than 6")
Python List
Python list holds the ordered collection of items. We can store a sequence of items in a list.
Python list is mutable which means it can be modified after its creation.
The items of lists are enclosed within the square bracket [] and separated by the comma. Let's see
the example of list.
Output:
<class 'list'>
<class 'list'>
Example -
tup = ("Apple", "Mango" , "Orange" , "Banana")
print(type(tup))
print(tup)
Output:
<class 'tuple'>
('Apple', 'Mango', 'Orange', 'Banana')
Example -
The above program throws an error because tuples are immutable type.
Example
Output:
Hi Python
Hi Python
Hi Python
Python doesn't support the character data-type. A single character written as 'p' is treated as a
string of length 1.
Stings are also immutable. We can't change after it is declared.
Dictionaries
Python Dictionary is a most efficient data structure and used to store the large amount of data.
It stores the data in the key-value pair format. Each value is stored corresponding to its key.
Keys must be a unique and value can be any type such as integer, list, tuple, etc.
It is a mutable type; we can reassign after its creation. Below is the example of creating dictionary
in Python.
Example –
Output:
<class 'dict'>
Printing Employee data ....
{'Name': 'John', 'Age': 29, 'salary': 250000, 'Company': 'GOOGLE'}
Example -
# Creating Set
Month = {"January", "February", "March", "April", "May", "June", "July"}
print(Month)
print(type(Month))
Output:
{'March', 'July', 'April', 'May', 'June', 'February', 'January'}
<class 'set'>
Indentation in Python
Indentation is the most significant concept of the Python programming language.
Improper use of indentation will end up "IndentationError" in our code.
Indentation is adding whitespaces before the statement when it is needed.
Without indentation Python doesn't know which statement to be executed to next.
Indentation also defines which statements belong to which block.
If there is no indentation or improper indentation, it will display "IndentationError" and interrupt
our code.
Python indentation defines the particular group of statements belongs to the particular block.
The programming languages such as C, C++, java use the curly braces {} to define code blocks.
In Python, statements that are the same level to the right belong to the same block.
We can use four whitespaces to define indentation. Let's see the following lines of code.
list1 = [1, 2, 3, 4, 5]
for i in list1:
print(i)
if i==4:
break
print("End of for loop")
Output:
1
2
3
4
End of for loop
Explanation:
In the above code, for loop has a code blocks and if the statement has its code block inside for loop.
Both indented with four whitespaces. The last print() statement is not indented; that's means it doesn't
belong to for loop.
Comment in Python
Comments are essential for defining the code and help us and other to understand the code.
By looking the comment, we can easily understand the intention of every line that we have written
in code.
We can also find the error very easily, fix them, and use in other applications.
In Python, we can apply comments using the # hash character.
The Python interpreter entirely ignores the lines followed by a hash character.
A good programmer always uses the comments to make code under stable.
Let's see the following example of a comment.
It is good idea to add code in any line of the code section of code whose purpose is not obvious. This
is a best practice to learn while doing the coding.
Single-Line Comment - Single-Line comment starts with the hash # character followed by text for
further explanation.
# defining the marks of a student
Marks = 90
We can also write a comment next to a code statement. Consider the following example.
Name = "James" # the name of a student is James
Marks = 90 # defining student's marks
Branch = "Computer Science" # defining student branch
Multi-Line Comments - Python doesn't have explicit support for multi-line comments but we can use
hash # character to the multiple lines. For example -
# we are defining for loop
# To iterate the given list.
# run this code.
We can also use another way.
"""
This is an example
Of multi-line comment
Using triple-quotes
"""
Python Identifiers
Python identifiers refer to a name used to identify a variable, function, module, class, module or
other objects. There are few rules to follow while naming the Python Variable.
A variable name must start with either an English letter or underscore (_).
A variable name cannot start with the number.
Special characters are not allowed in the variable name.
The variable's name is case sensitive.
number = 10
print(num)
_a = 100
print(_a)
x_y = 1000
print(x_y)
Output:
10
100
1000
Identifier Naming
Variables are the example of identifiers. An Identifier is used to identify the literals used in the program.
The rules to name an identifier are given below.
The first character of the variable must be an alphabet or underscore ( _ ).
All the characters except the first character may be an alphabet of lower-case(a-z), upper-case (A-
Z), underscore, or digit (0-9).
Identifier name must not contain any white-space, or special character (!, @, #, %, ^, &, *).
Identifier name must not be similar to any keyword defined in the language.
Identifier names are case sensitive; for example, my name, and MyName is not the same.
Examples of valid identifiers: a123, _n, n_9, etc.
Examples of invalid identifiers: 1a, n%4, n 9, etc.
Object References
It is necessary to understand how the Python interpreter works when we declare a variable.
The process of treating variables is somewhat different from many other programming languages.
Python is the highly object-oriented programming language; that's why every data item belongs to
a specific type of class. Consider the following example.
print("John")
Output:
John
The Python object creates an integer object and displays it to the console. In the above print statement,
we have created a string object. Let's check the type of it using the Python built-in type() function.
Type("John")
Output:
<class 'str'>
a = 50
The variable b refers to the same object that a points to because Python does not create another
object.
Let's assign the new value to b. Now both variables will refer to the different objects.
a = 50
b =100
Python manages memory efficiently if we assign the same variable to two different values.
Object Identity
In Python, every created object identifies uniquely in Python.
Python provides the guaranteed that no two objects will have the same identifier.
The built-in id() function, is used to identify the object identifier. Consider the following example.
a = 50
b=a
print(id(a))
print(id(b))
# Reassigned variable a
a = 500
print(id(a))
Output:
140734982691168
140734982691168
2822056960944
We assigned the b = a, a and b both point to the same object. When we checked by the id() function
it returned the same number. We reassign a to 500; then it referred to the new object identifier.
name = "Devansh"
age = 20
marks = 80.50
print(name)
print(age)
print(marks)
Output:
Devansh
20
80.5
Output:
ABCDEDEFGFI
Output:
50
50
50
Output:
5
10
15
The values will be assigned in the order in which variables appear.
Local Variable
Local variables are the variables that declared inside the function and have scope within the function.
Let's understand the following example.
Example -
# Declaring a function
def add():
# Defining local variables. They has scope only within a function
a = 20
b = 30
c=a+b
print("The sum is:", c)
# Calling a function
add()
Output:
The sum is: 50
Explanation:
In the above code, we declared a function named add() and assigned a few variables within the
function. These variables will be referred to as the local variables which have scope only inside the
function. If we try to use them outside the function, we get a following error.
add()
Output:
The sum is: 50
print(a)
NameError: name 'a' is not defined
We tried to use local variable outside their scope; it threw the NameError.
Example -
# Declare a variable and initialize it
x = 101
# Global variable in function
def mainFunction():
# printing a global variable
global x
print(x)
# modifying a global variable
x = 'Welcome To lotus'
print(x)
mainFunction()
print(x)
Output:
101
Welcome To lotus
Welcome To lotus
Explanation:
In the above code, we declare a global variable x and assign a value to it. Next, we defined a function
and accessed the declared variable using the global keyword inside the function. Now we can modify
its value. Then, we assigned a new string value to the variable x.
Now, we called the function and proceeded to print x. It printed the as newly assigned value of x.
Delete a variable
We can delete the variable using the del keyword. The syntax is given below.
Syntax -
del <variable_name>
In the following example, we create a variable x and assign value to it. We deleted variable x, and
print it, we get the error "variable x is not defined". The variable x will no longer use in future.
Output:
6
Example -
# A Python program to display that we can store
# large numbers in Python
a = 10000000000000000000000000000000000000000000
a=a+1
print(type(a))
print (a)
Output:
<class 'int'>
10000000000000000000000000000000000000000001
As we can see in the above example, we assigned a large integer value to variable x and checked
its type.
It printed class <int> not long int. Hence, there is no limitation number by bits and we can expand
to the limit of our memory.
Python doesn't have any special data type to store larger numbers.
Output:
5
5
Output:
56
12345678
Python Data Types
Variables can hold values, and every value has a data-type.
Python is a dynamically typed language; hence we do not need to define the type of the variable
while declaring it.
The interpreter implicitly binds the value with its type.
a=5
The variable a holds integer value five and we did not define its type.
Python interpreter will automatically interpret variables a as an integer type.
Python enables us to check the type of the variable used in the program.
Python provides us the type() function, which returns the type of the variable passed.
Consider the following example to define the values of different data types and checking its type.
a=10
b="Hi Python"
c = 10.5
print(type(a))
print(type(b))
print(type(c))
Output:
<type 'int'>
<type 'str'>
<type 'float'>
Numbers
Number stores numeric values.
The integer, float, and complex values belong to a Python Numbers data-type.
Python provides the type() function to know the data-type of the variable.
Similarly, the isinstance() function is used to check an object belongs to a particular class.
Python creates Number objects when a number is assigned to a variable. For example;
a=5
print("The type of a", type(a))
b = 40.5
print("The type of b", type(b))
c = 1+3j
print("The type of c", type(c))
print(" c is a complex number", isinstance(1+3j,complex))
Output:
The type of a <class 'int'>
The type of b <class 'float'>
The type of c <class 'complex'>
c is complex number: True
1. Int - Integer value can be any length such as integers 10, 2, 29, -20, -150 etc. Python has no
restriction on the length of an integer. Its value belongs to int
2. Float - Float is used to store floating-point numbers like 1.9, 9.902, 15.2, etc. It is accurate upto 15
decimal points.
3. Complex - A complex number contains an ordered pair, i.e., x + iy where x and y denote the real
and imaginary parts, respectively. The complex numbers like 2.14j, 2.0 + 2.3j, etc.
Sequence Type
String
The string can be defined as the sequence of characters represented in the quotation marks.
In Python, we can use single, double, or triple quotes to define a string.
String handling in Python is a straightforward task since Python provides built-in functions and
operators to perform operations in the string.
In the case of string handling, the operator + is used to concatenate two strings as the operation
"hello"+" python" returns "hello python".
The operator * is known as a repetition operator as the operation "Python" *2 returns 'Python
Python'.
Example – 1
Output:
string using double quotes
A multiline
string
Output:
he
o
hello lotushello lotus
hello lotus how are you
Boolean
Boolean type provides two built-in values, True and False.
These values are used to determine the given statement true or false.
It denotes by the class bool. True can be represented by any non-zero value or 'T' whereas false
can be represented by the 0 or 'F'.
Consider the following example.
Output:
<class 'bool'>
<class 'bool'>
NameError: name 'false' is not defined
Set
Python Set is the unordered collection of the data type.
It is iterable, mutable(can modify after creation), and has unique elements.
In set, the order of the elements is undefined; it may return the changed sequence of the element.
The set is created by using a built-in function set(), or a sequence of elements is passed in the
curly braces and separated by the comma.
It can contain various types of values. Consider the following example.
Output:
{3, 'Python', 'James', 2}
{'Python', 'James', 3, 2, 10}
{'Python', 'James', 3, 10}
Dictionary
Dictionary is an unordered set of a key-value pair of items.
It is like an associative array or a hash table where each key stores a specific value.
Key can hold any primitive data type, whereas value is an arbitrary Python object.
The items in the dictionary are separated with the comma (,) and enclosed in the curly braces {}.
print (d.keys())
print (d.values())
Output:
1st name is Jimmy
2nd name is mike
{1: 'Jimmy', 2: 'Alex', 3: 'john', 4: 'mike'}
dict_keys([1, 2, 3, 4])
dict_values(['Jimmy', 'Alex', 'john', 'mike'])
print(type(list1))
# List slicing
print (list1[3:])
# List slicing
print (list1[0:2])
Output:
[1, 'hi', 'Python', 2]
[2]
[1, 'hi']
[1, 'hi', 'Python', 2, 1, 'hi', 'Python', 2]
[1, 'hi', 'Python', 2, 1, 'hi', 'Python', 2, 1, 'hi', 'Python', 2]
# Tuple slicing
print (tup[1:])
print (tup[0:1])
Output:
<class 'tuple'>
('hi', 'Python', 2)
('Python', 2)
('hi',)
('hi', 'Python', 2, 'hi', 'Python', 2)
('hi', 'Python', 2, 'hi', 'Python', 2, 'hi', 'Python', 2)
Example:
a = 10
b=0
print('a is dividing by Zero')
assert b != 0
print(a / b)
Output:
a is dividing by Zero
Runtime Exception:
Traceback (most recent call last):
File "/home/40545678b342ce3b70beb1224bed345f.py", line 4, in
assert b != 0, "Divide by 0 error"
8. def - This keyword is used to declare the function in Python. If followed by the function name.
def my_func(a,b):
c = a+b
print(c)
my_func(10,20)
Output:
30
9. class - It is used to represents the class in Python. The class is the blueprint of the objects. It is the
collection of the variable and methods. Consider the following class.
class Myclass:
#Variables……..
def function_name(self):
#statements………
10. continue - It is used to stop the execution of the current iteration. Consider the following example.
a=0
while a < 4:
a += 1
if a == 2:
continue
print(a)
AssertionError
divide by 0 error
Output:
1
3
4
Output:
0
1
2
End of execution
12. If - It is used to represent the conditional statement. The execution of a particular block is decided
by if statement. Consider the following example.
Example
i = 18
if (1 < 12):
print("I am less than 18")
Output:
I am less than 18
13. else - The else statement is used with the if statement. When if statement returns false, then else
block is executed. Consider the following example.
Example:
n = 11
if(n%2 == 0):
print("Even")
else:
print("odd")
Output:
Odd
14. elif - This Keyword is used to check the multiple conditions. It is short for else-if. If the previous
condition is false, then check until the true condition is found. Condition the following example.
Example:
marks = int(input("Enter the marks:"))
if(marks>=90):
print("Excellent")
elif(marks<90 and marks>=75):
print("Very Good")
elif(marks<75 and marks>=60):
print("Good")
else:
print("Average")
Output:
Enter the marks:85
Very Good
Output:
12
NameError: name 'a' is not defined
16. try, except - The try-except is used to handle the exceptions. The exceptions are run-time errors.
Consider the following example.
Example:
a=0
try:
b = 1/a
except Exception as e:
print(e)
Output:
division by zero
17. raise - The raise keyword is used to through the exception forcefully. Consider the following
example.
Example
a=5
if (a>2):
raise Exception('a should not exceed 2 ')
Output:
Exception: a should not exceed 2
Example:
a=0
b=5
try:
c = b/a
print(c)
except Exception as e:
print(e)
finally:
print('Finally always executed')
Output:
division by zero
Finally always executed
19. for, while - Both keywords are used for iteration. The for keyword is used to iterate over the
sequences (list, tuple, dictionary, string). A while loop is executed until the condition returns false.
Consider the following example.
Output:
1
2
3
4
5
Output:
0
1
2
3
4
Example:
1. import math
2. print(math.sqrt(25))
Output:
5
21. from - This keyword is used to import the specific function or attributes in the current Python script.
Example:
1. from math import sqrt
2. print(sqrt(25))
Output:
5
22. as - It is used to create a name alias. It provides the user-define name while importing a module.
Example:
1. import calendar as cal
2. print(cal.month_name[5])
Output:
May
23. pass - The pass keyword is used to execute nothing or create a placeholder for future code. If we
declare an empty class or function, it will through an error, so we use the pass keyword to declare an
empty class or function.
Example:
class my_class:
pass
def my_func():
pass
24. return - The return keyword is used to return the result value or none to called function.
Example:
def sum(a,b):
c = a+b
return c
Output:
The sum is: 40
Example
x=5
y=5
a = []
b = []
print(x is y)
print(a is b)
Output:
True
False
Note: A mutable data-types do not refer to the same object.
26. global - The global keyword is used to create a global variable inside the function. Any function
can access the global. Consider the following example.
Example
def my_func():
global a
a = 10
b = 20
c = a+b
print(c)
my_func()
def func():
print(a)
func()
Output:
30
10
Example
def outside_function():
a = 20
def inside_function():
nonlocal a
a = 30
print("Inner function: ",a)
inside_function()
print("Outer function: ",a)
outside_function()
Output:
Inner function: 30
Outer function: 30
28. lambda - The lambda keyword is used to create the anonymous function in Python. It is an inline
function without a name. Consider the following example.
Example
a = lambda x: x**2
for i in range(1,6):
print(a(i))
Output:
1
4
9
16
25
29. yield - The yield keyword is used with the Python generator. It stops the function's execution and
returns value to the caller. Consider the following example.
Example
def fun_Generator():
yield 1
yield 2
yield 3
# Driver code to check above generator function
for value in fun_Generator():
print(value)
Output:
1
2
3
Example
with open('file_path', 'w') as file:
file.write('hello world !')
31. None - The None keyword is used to define the null value. It is remembered that None does not
indicate 0, false, or any empty data-types. It is an object of its data type, which is Consider the following
example.
Example:
def return_none():
a = 10
b = 20
c=a+b
x = return_none()
print(x)
Output:
None
Python Literals
Python Literals can be defined as data that is given in a variable or constant.
Python supports the following literals:
1. String literals:
String literals can be formed by enclosing a text in the quotes. We can use both single as well as
double quotes to create a string.
Example:
"Aman" , '12345'
print("x is", x)
print("y is", y)
print("z is", z)
print("a:", a)
print("b:", b)
Output:
x is True
y is False
z is False
a: 11
b: 10
4. Special literals:
Python contains one special literal i.e., None.
None is used to specify to that field that is not created. It is also used for the end of lists in Python.
5. Literal Collections
Python provides the four types of literal collection such as List literals, Tuple literals, Dict literals, and
Set literals.
Output:
['John', 678, 20.4, 'Peter']
['John', 678, 20.4, 'Peter', 456, 'Andrew']
Dictionary:
Python dictionary stores the data in the key-value pair.
It is enclosed by curly-braces {} and each pair is separated by the commas(,).
Example
dict = {'name': 'Pater', 'Age':18,'Roll_nu':101}
print(dict)
Output:
{'name': 'Pater', 'Age': 18, 'Roll_nu': 101}
Tuple:
Python tuple is a collection of different data-type. It is immutable which means it cannot be modified
after creation.
It is enclosed by the parentheses () and each element is separated by the comma(,).
Example
tup = (10,20,"Dev",[2,3,4])
print(tup)
Output:
(10, 20, 'Dev', [2, 3, 4])
Set:
Python set is the collection of the unordered dataset.
It is enclosed by the {} and each element is separated by the comma(,).
Output:
{'guava', 'apple', 'papaya', 'grapes'}
>>> func()
I am function func()!
In this example, func() appears in all the same contexts as the values "cat" and 42, and the
interpreter handles it just fine.
Note: What you can or can’t do with any object in Python depends to some extent on context. There
are some operations, for example, that work for certain object types but not for others. You can add
two integer objects or concatenate two string objects with the plus operator (+). But the plus operator
isn’t defined for function objects.
For present purposes, what matters is that functions in Python satisfy the two criteria beneficial for
functional programming listed above. You can pass a function to another function as an argument:
>>> outer(inner)
I am function inner()!
Python Modules
Python modules are the program files that contain a Python code or functions.
There are two types of module in the Python - User-define modules and built-in modules.
A module that the user defines, or we can say that our Python code saved with .py extension, is
treated as a user-define module.
Built-in modules are predefined modules of Python. To use the functionality of the modules, we
need to import them into our current working program.
Python Exceptions
An exception can be defined as an unusual condition in a program resulting in the interruption in
the flow of the program.
Whenever an exception occurs, the program stops the execution, and thus the further code is not
executed.
Therefore, an exception is the run-time errors that are unable to handle to Python script. An
exception is a Python object that represents an error.
Python CSV
A csv stands for "comma separated values", which is defined as a simple file format that uses
specific structuring to arrange tabular data.
It stores tabular data such as spreadsheet or database in plain text and has a common format for
data interchange.
A csv file opens into the excel sheet, and the rows and columns data define the standard format.
Classes and Objects - Python classes are the blueprint of the object. An object is a collection
of data and method that act on the data.
Inheritance - An inheritance is a technique where one class inherits the properties of other
classes.
Constructor - Python provides a special method __init__() which is known as a constructor.
This method is automatically called when an object is instantiated.
Data Member - A variable that holds data associated with a class and its objects.
Python Iterator
An iterator is simply an object that can be iterated upon.
It returns one object at a time.
It can be implemented using the two special methods, __iter__() and __next__().
Python Generators
The Generators are an easiest way of creating Iterators.
Python Decorators
These are used to modify the behavior of the function.
Decorators provide the flexibility to wrap another function to expand the working of wrapped
function, without permanently modifying it.
Python CGI
Python CGI stands for "Common Gateway Interface", which is used to define how to exchange
information between the webserver and a custom Python script.
The Common Gateway Interface is a standard for external gateway programs to interface with the
server, such as HTTP Servers.
NumPy:
It is the short for Numerical Python it is important in the scientific computing analysis, provides
narray objects efficiently, and linear algebra operations.
Pandas:
Pandas’ name came from panel data.
It helps in making structuring data easier by providing data structure and functions made especially
for it.
Matplotlib:
Is the best-known Python library for providing interactive plots as well as many 2D data
Visualizations.
IPython:
It provides a robust and productive environment for interactive and exploratory computing, it
provides a mathematical to connect IPython through a web browser, and an infrastructure for
interactive parallel and distributed computing.
import pandas as pd
path=" "
df = pd.read_csv(path)
If the dataset doesn't contain a header, we can specify it in the following way-
df = pd.read_csv(path,header=None)
To look at the first five and last five rows of the dataset, we can make use of
df.head() and df.tail() respectively.
Let's have a look at how we can export the data, if we have a file present in the .csv format
then,
path = " "
df.to_excel(path)
It is a process of collecting raw data from the Web using automated method, but some webs forbid
scrapping and they have their good reasons to protect their data.
Python provide easy ways to make the web scrapping more powerful.
Urllib is python standard library which helps in dealing with links to help accessing the web we want
to scrap, BeautifulSoup helps in scraping the information from the web.
Limitations:
It takes long time to be implemented and may require regression and decision tree analysis skills
and other more.
Sometimes hard to determine the product groupings
Complexity grows exponentially with size
Functions
mean() function
The mean() function is used to calculate the arithmetic mean of the numbers in the list.
Example
import statistics
# list of positive integer numbers
datasets = [5, 2, 7, 4, 2, 6, 8]
x = statistics.mean(datasets)
# Printing the mean
print("Mean is :", x)
Output:
Mean is : 4.857142857142857
median() function
The median() function is used to return the middle value of the numeric data in the list.
Example
import statistics
datasets = [4, -5, 6, 6, 9, 4, 5, -2]
# Printing median of the
# random data-set
print("Median of data-set is : % s "
% (statistics.median(datasets)))
Output:
Median of data-set is : 4.5
Example
import statistics
# declaring a simple data-set consisting of real valued positive integers.
dataset =[2, 4, 7, 7, 2, 2, 3, 6, 6, 8]
# Printing out the mode of given data-set
print("Calculated Mode % s" % (statistics.mode(dataset)))
Output:
Calculated Mode 2
stdev() function
The stdev() function is used to calculate the standard deviation on a given sample which is available
in the form of the list.
Example
import statistics
# creating a simple data - set
sample = [7, 8, 9, 10, 11]
# Prints standard deviation
print("Standard Deviation of sample is % s "
% (statistics.stdev(sample)))
Output:
Standard Deviation of sample is 1.5811388300841898
median_low()
The median_low function is used to return the low median of numeric data in the list.
Example
import statistics
# simple list of a set of integers
set1 = [4, 6, 2, 5, 7, 7]
# Note: low median will always be a member of the data-set.
# Print low median of the data-set
print("Low median of data-set is % s "
% (statistics.median_low(set1)))
Output:
Low median of the data-set is 5
Example
import statistics
# list of set of the integers
dataset = [2, 1, 7, 6, 1, 9]
print("High median of data-set is %s "
% (statistics.median_high(dataset)))
Output:
High median of the data-set is 6
Using GroupBy
The groupby() method of pandas can be applied to categorical variables.
It groups the subsets based on different categories. It can involve single or multiple variables.
Let us have a look at an example that would help us to understand how it can be used in Python.
1. df_att=df[['attribute1', 'attribute2', 'attribute3']]
2. df_g=df_att.groupby(['attribute1', 'attribute2'], as_index=False).mean()
3. df_g
import numpy as np
#numpy array
a= np.array([34,67,8,5,33,90,23])
print(a)
Output: [34 67 8 5 33 90 23]
#type
type(a)
Output: numpy.ndarray
print(a[2])
Output: 8
Ones: Creates a NumPy array according to the parameters given, with all elements being 1.
np.ones(5)
array([1., 1., 1., 1., 1.])
np.ones([6,7])
Output:
array([[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.]])
Zeros: Creates a NumPy array according to the parameters given, with all elements being 0.
np.zeros(7)
Output: array([0., 0., 0., 0., 0., 0., 0.])
These functions are simple, they can be used to create sample arrays which are often needed for
various computational purposes.
Eye: Let us now look at the eye function. This function returns a 2-D array with ones on the diagonal
and zeros elsewhere.
np.eye(5)
Output:
array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])
y=np.array([6,78,3,56,89])
np.diag(y)
Output:
array([[ 6, 0, 0, 0, 0],
[ 0, 78, 0, 0, 0],
[ 0, 0, 3, 0, 0],
[ 0, 0, 0, 56, 0],
[ 0, 0, 0, 0, 89]])
np.array([1, 2, 3,7] * 3)
Output : array([1, 2, 3, 7, 1, 2, 3, 7, 1, 2, 3, 7])
np.repeat([1, 4, 2, 3], 5)
Output : array([1, 1, 1, 1, 1, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3])
Output :
[[1 1 1] [1 1 1]]
With vstack(), we can vertically append data, and with hstack(), we can horizontally stack data. Let us
try out some examples.
np.vstack([p, 2*p])
Output :
array([[1, 1, 1],
[1, 1, 1],
[2, 2, 2],
[2, 2, 2]])
Numpy Mathematical Computation
Let us get onto working with NumPy in various mathematical computations.
c=a+b
print(c)
Output :
array([[3. , 3. , 3. ],
[3. , 3. , 3. ],
[2. , 2. , 2. ],
[2. , 2. , 2. ],
[6.5, 6.5, 6.5],
[6.5, 6.5, 6.5],
[4.6, 4.6, 4.6],
[4.6, 4.6, 4.6]])
np.hstack([2*p,5.5*p,9*p])
Output:
array([[2. , 2. , 2. , 5.5, 5.5, 5.5, 9. , 9. , 9. ],
[2. , 2. , 2. , 5.5, 5.5, 5.5, 9. , 9. , 9. ]])
a=np.array([4,6,8])
b=np.array([8,9,7])
c=a*b
print(c)
a=np.array([4,6,8])
b=np.array([8,9,7])
c=a/b
print(c)
c=np.array([6,7,5,8])
d=np.array([4,3,7,8])
print(ans)
Output: 144
Let us now, look at how to create multi-dimensional numpy arrays.
z=np.array([[5,7,5,4,5],[6,2,3,4,6]])
print(z)
print(z.T)
Let us create a list by giving a few parameters. The first and second parameter will be for determining
the range and the third will be for the interval.
h=np.arange(4,90,5)
print(h)
Output: [ 4 9 14 19 24 29 34 39 44 49 54 59 64 69 74 79 84 89]
test = np.random.randint(0,10,(4,3))
print(test)
Here 0 and 10 indicate the range, and 4,3 is the shape of the matrix/2D array.
test2 = np.random.randint(90,120,(8,3))
test2
print(row)
Output:
[106 103 104] [ 96 93 106] [110 108 115] [117 106 114] [ 91 102 103] [ 98 104 92] [112 99 105]
[115 111 118]
test3 = test2**2
test3
Output:
array([[11236, 10609, 10816],
[ 9216, 8649, 11236],
[12100, 11664, 13225],
[13689, 11236, 12996],
[ 8281, 10404, 10609],
[ 9604, 10816, 8464],
[12544, 9801, 11025],
[13225, 12321, 13924]], dtype=int32)
Output:
[106 103 104] + [11236 10609 10816] = [11342 10712 10920] [ 96 93 106] + [ 9216 8649 11236]
= [ 9312 8742 11342] [110 108 115] + [12100 11664 13225] = [12210 11772 13340] [117 106 114]
+ [13689 11236 12996] = [13806 11342 13110] [ 91 102 103] + [ 8281 10404 10609] = [ 8372
10506 10712] [ 98 104 92] + [ 9604 10816 8464] = [ 9702 10920 8556] [112 99 105] + [12544 9801
11025] = [12656 9900 11130] [115 111 118] + [13225 12321 13924] = [13340 12432 14042]
Conclusion
Thus, we can see that NumPy can be used for various types of mathematical calculations and the
important thing is that the computation time is much less than python lists.
This helps while working in real-life cases with millions of data points.
Numpy array is also very convenient to use with a lot of data manipulation tricks and methods.
Installation
Install via pip using the following command,
pip install pandas
The ‘Survived’ column is dropped in the data. The axis=1 denotes that it ‘Survived’ is a column, so
it searches ‘Survived’ column-wise to drop.
Drop multiple columns using the following code:
The row with index 2 is dropped in the data. The axis=0 denotes that index 2 is a row, so it searches
the index 2 column-wise.
Drop multiple rows using the following code:
df_row_dropped_multiple = data.drop([2, 3], axis=0)
df_row_dropped_multiple.head()
The column ‘PassengerId’ is renamed to ‘Id’ in the data. Do not forget to mention the dictionary
inside the columns parameter.
Rename multiple columns using the following code:
df_renamed_multiple = data.rename(
columns={
'PassengerId': 'Id',
'Sex': 'Gender',
}
)
df_renamed_multiple.head()
The columns ‘PassengerId’ and ‘Sex’ are renamed to ‘Id’ and ‘Gender’ respectively.
The above code selects all columns with integer data types.
float_data = data.select_dtypes('float')
float_data.head()
The above code selects all columns with float data types.
The above code returns the first five rows of the first column.
The ‘:5’ in the iloc denotes the first five rows and the number 0 after the comma denotes the first
column, iloc is used to locate the data using numbers or integers.
data.loc[:5, 'PassengerId']
The above code does the same but we can use the column names directly using loc in pandas.
Here the index 5 is inclusive.
df_dup = data.copy()
# duplicate the first row and append it to the data
row = df_dup.iloc[:1]
df_dup = df_dup.append(row, ignore_index=True)
df_dup
df_dup[df_dup.duplicated()]
df_dup.drop_duplicates()
The above code returns the values which are equal to one in the column ‘Pclass’ in the data.
Select multiple values in the column using the following code:
data[data['Pclass'].isin([1, 0])]
The above code returns the values which are equal to one and zero in the column ‘Pclass’ in the
data.
The above code groups the values of the column ‘Sex’ and aggregates the column ‘PassengerId’
by the count of that column.
data.groupby('Sex').agg({'Age':'mean'})
The above code maps the values 0 to ‘Not-Survived’ and 1 to ‘Survived’. You can alternatively use
the following code to obtain the same results.
data.to_csv('/path/to/save/the/data.csv', index=False)
The index=False argument does not save the index as a separate column in the CSV.
Tips Database
Tips database is the record of the tip given by the customers in a restaurant for two and a half
months in the early 1990s. It contains 6 columns such as total bill, tip, sex, smoker, day, time, size.
Example:
import pandas as pd
# reading the database
data = pd.read_csv("tips.csv")
# printing the top 10 rows
display(data.head(10))
Output:
After installing Matplotlib, let’s see the most commonly used plots using this library.
This graph can be more meaningful if we can add colors and also change the size of the points. We
can do this by using the c and s parameter respectively of the scatter function. We can also show
the color bar using the colorbar() method.
Example:
import pandas as pd
import matplotlib.pyplot as plt
plt.show()
Output:
Example:
import pandas as pd
import matplotlib.pyplot as plt
plt.title("Bar Chart")
Output:
import pandas as pd
import matplotlib.pyplot as plt
# hostogram of total_bills
plt.hist(data['total_bill'])
plt.title("Histogram")
Output:
Seaborn is built on the top of Matplotlib; therefore, it can be used with the Matplotlib as well.
Using both Matplotlib and Seaborn together is a very simple process.
We just have to invoke the Seaborn Plotting function as normal, and then we can use Matplotlib’s
customization function.
Note: Seaborn comes loaded with dataset such as tips, iris, etc. but for the sake of this tutorial we
will use Pandas for loading these datasets.
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# draw lineplot
sns.lineplot(x="sex", y="total_bill", data=data)
plt.show()
Output:
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
Output:
You will find that while using Matplotlib it will a lot difficult if you want to color each point of this plot
according to the sex. But in scatter plot it can be done with the help of hue argument.
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
Output:
Example 1:
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
Output:
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
Output:
Example:
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
sns.barplot(x='day',y='tip', data=data,
hue='sex')
plt.show()
Output:
Example:
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
plt.show()
Output:
After going through all these plots you must have noticed that customizing plots using Seaborn
is a lot more easier than using Matplotlib.
It is also built over matplotlib then we can also use matplotlib functions while using Seaborn.
Example:
#importing the modules
from bokeh.plotting import figure, output_file, show
from bokeh.palettes import magma
import pandas as pd
color = magma(256)
Example:
# importing the modules
from bokeh.plotting import figure, output_file, show
import pandas as pd
Example:
Output:
Interactive Legends
click_policy property makes the legend interactive. There are two types of interactivities:
Example:
# importing the modules
from bokeh.plotting import figure, output_file, show
import pandas as pd
# instantiating the figure object
graph = figure(title = "Bokeh Bar Chart")
radio_group.js_on_click(CustomJS(code="""
console.log('radio_group: active=' + this.active, this.toString())
"""))
show(button)
show(checkbox_group)
show(radio_group)
Output:
Output:
Plotly
This is the last library of our list and you might be wondering why plotly. Here’s why –
Plotly has hover tool capabilities that allow us to detect any outliers or anomalies in numerous
data points.
It allows more customization.
It makes the graph visually more attractive.
To install it type the below command in the terminal.
pip install plotly
Example:
import plotly.express as px
import pandas as pd
Output:
Example:
import plotly.express as px
import pandas as pd
Output:
Output:
Example:
import plotly.express as px
import pandas as pd
Output:
Adding Interaction
Just like Bokeh, plotly also provides various interactions. Let’s discuss a few of them.
Output:
Output:
Example:
import plotly.graph_objects as px
import pandas as pd
plot = px.Figure(data=[px.Scatter(
y=data['tip'],
mode='lines',)
])
plot.update_layout(
xaxis=dict(
rangeselector=dict(
buttons=list([
dict(count=1,
step="day",
stepmode="backward"),
])
),
rangeslider=dict(
visible=True
),
)
)
plot.show()
Output:
Let’s load the data from the CSV file using pandas:
# Load the data set
dataset = pd.read_csv("world_data_really_tiny.csv")
Use the boxPlotAll() function from functions.py to plot a box plot for each numeric feature:
# View univariate box plots
boxPlotAll(dataset)
And the classComparePlot() function from functions.py to plot a comparative histogram for the two
classes:
y.head()
The screencast below explains the next stages: building and interpreting the model. Watch first, then
read the notes below.
Let's look at the four samples produced and confirm the randomness of the selection:
X_train
X_test
y_test
Next, assess how well the model predicts happiness using the training data, by “pouring” training
set X into the decision tree:
# Check model performance on training data
predictions = model.predict(X_train)
print(accuracy_score(y_train, predictions))
As you can see, the model hasn’t performed too well on the test data. It’s often the case that they
perform worse, and it shouldn’t be surprising.
What Rules Did Sklearn Come Up With?
At this point, the model produced by sklearn is a bit of a black box.
There is no set of rules to examine, but it is possible to inspect the model and visualize the rules.
For this code to work, you need to first install Graphviz by running the following from a terminal
session:
conda install python-graphviz
You can then run the following code, which uses a function from functions.py:
viewDecisionTree(model, X.columns)
And see the decision tree rules:
Data Visualization: Tableau is a data visualization tool, and provides complex computation, data
blending, and dashboarding for creating beautiful data visualizations.
Quickly Create Interactive Visualization: Users can create a very interactive visual by using drag
and drop functionalities of Tableau.
Comfortable in Implementation: Many types of visualization options are available in Tableau,
which enhances the user experience. Tableau is very easy to learn in comparison to Python. Who
don't have any idea about coding, they also can quickly learn Tableau.
Tableau can Handle Large Amounts of Data: Tableau can easily handle millions of rows of data.
A large amount of data can create different types of visualization without disturbing the performance
of the dashboards. As well as, there is an option in Tableau where the user can make 'live' to
connect different data sources like SQL, etc.
Use of other Scripting Language in Tableau: To avoid the performance issues and to do complex
table calculations in Tableau, users can include Python or R. Using Python Script, user can remove
the load of the software by performing data cleansing tasks with packages. However, Python is not
a native scripting language accepted by Tableau. So, you can import some of the packages or
visuals.
Mobile Support and Responsive Dashboard: Tableau Dashboard has an excellent reporting
feature that allows you to customize dashboard specifically for devices like a mobile or laptops.
Tableau automatically understands which device is viewing the report by the user and make
adjustments to ensure that accurate report is delivered to the right device.
Scheduling of Reports: Tableau does not provide the automatic schedule of reports. That's why
there is always some manual effort required when the user needs to update the data in the back
end.
No Custom Visual Imports: Other tools like Power BI, a developer can create custom visual that
can be easily imported in Tableau, so any new visuals can recreate before imported, but Tableau
is not a complete open tool.
Custom Formatting in Tableau: Tableau's conditional formatting, and limited 16 column table that
is very inconvenient for users. Also, to implement the same format in multiple fields, there is no way
for the user that they can do it for all fields directly. Users have to do that manually for each, so it
is a very time-consuming.
Static and Single Value Parameter: Tableau parameters are static, and it always select a single
value as a parameter. Whenever the data gets changed, these parameters also have to be updated
manually every time. There is no other option for users that can automate the updating of
parameters.
Screen Resolution on Tableau Dashboards: The layout of the dashboards is distributed if the
Tableau developer screen resolution is different from users screen resolution.
Example: If the dashboard is created on the screen resolution of 1920 X 1080 and it viewed on
2560 X 1440, then the layout of the dashboard will be destroyed a little bit, their dashboard is not
responsive. So, you will need to create a dashboard for desktop and mobile differently.
Tableau Desktop
Tableau Desktop has a rich feature set and allows us to code and customize reports.
Right from creating the reports, charts to blending them all to form a dashboard, all the necessary
work is created in Tableau Desktop.
For live data analysis, Tableau Desktop establish connectivity between the Data Warehouse and
other various types of files.
The dashboards and the workbooks created here can be either shared locally or publicly.
Based on the connectivity to the publishing option and data sources, Tableau Desktop is also classified
into two parts-
Tableau Desktop Personal: The personal version of the Tableau desktop keeps the workbook
private, and the access is limited. The workbooks can't be published online. So, it should be
distributed either offline or in Tableau public.
Tableau Desktop Professional: It is similar to Tableau desktop. The main difference is that the
workbooks created in the Tableau desktop can be published online or in Tableau server. In the
professional version, there is full access to all sorts datatypes. It is best for those who want to
publish their workbook in Tableau server.
Tableau Online
Its functionality is similar to the tableau server, but data is stored on the servers that hosted on the
cloud, which is maintained by the Tableau group.
There is no storage limit on the data which is published in the Tableau Online.
Tableau Online creates a direct link over 40 data sources who are hosted in the cloud such as the
Hive, MySQL, Spark SQL, Amazon Aurora, and many more.
To be published, both Tableau Server and Tableau online require the workbooks that are created
by Tableau Desktop.
Data that flow from the web applications say Tableau Server and Tableau Online also support
Google Analytics and Salesforce.com.
Tableau Server
The software is correctly used to share the workbooks, visualizations, which is created in the
Tableau Desktop application over the organization.
To share dashboards in the Tableau Server, you should first publish your workbook in the Tableau
Desktop. Once the workbook has been uploaded to the server, it will be accessible only to the
authorized users.
It's not necessary that the authorized users have the Tableau Server installed on their machine.
They only require the login credentials by which they can check reports by the web browser.
The security is very high in Tableau server, and it is beneficial for quick and effective sharing of
data.
The admin of the organization has full control over the server.
The organization maintains the hardware and the software.
Tableau Reader
Tableau Reader is a free tool which allows us to view the visualizations and workbooks, which is
created using Tableau Desktop or Tableau Public.
The data can be filtered, but modifications and editing are restricted.
There is no security in Tableau Reader as anyone can view workbook using Tableau Reader.
If you want to share the dashboards which are created by you, the receiver should have Tableau
Reader to view the document.
1. Data Server
The primary component of Tableau Architecture is the Data sources which can connect to it.
Tableau can connect with multiple data sources. It can blend the data from various data sources.
It can connect to an excel file, database, and a web application at the same time.
It can also make the relationship between different types of data sources.
4. Gateway
The gateway directed the requests from users to Tableau components.
When the client sends a request, it is forwarded to the external load balancer for processing.
The gateway works as a distributor of processes to different components. In case of absence of
external load balancer, the gateway also works as a load balancer.
For single server configuration, one gateway or primary server manages all the processes.
For multiple server configurations, one physical system works as a primary server, and others
are used as worker servers.
Only one machine is used as a primary server in Tableau Server environment.
5. Clients
The visualizations and dashboards in Tableau server can be edited and viewed using different clients.
Clients are a web browser, mobile applications, and Tableau Desktop.
Web Browser: Web browsers like Google Chrome, Safari, and Firefox support the Tableau
server. The visualization and contents in the dashboard can be edited by using these web
browsers.
Mobile Application: The dashboard from the server can be interactively visualized using mobile
application and browser. It is used to edit and view the contents in the workbook.
Tableau Desktop: Tableau desktop is a business analytics tool. It is used to view, create, and
publish the dashboard in Tableau server. Users can access the various data source and build
visualization in Tableau desktop.
Tableau Desktop
Tableau Desktop is a paid source, personal edition- $35 per month and professional edition- $70
per month.
Tableau desktop data source can connect to any data source file, including databases, web
applications, and more.
Tableau desktop can also install on Window and Mac operating system.
Data and Visualization are secured in Tableau desktop.
In Tableau desktop, data can extract from various data sources and stored as Tableau extract file.
Tableau desktop uses the details at Professional and Enterprise level.
Step 1: Go to https://www.tableau.com/products/desktop
on your Web browser.
Step 3: Now, enter your Email id and click on the 'Download Free Trial' button.
Step 4: This will start downloading the .exe File for window machine by default.
Step 5: Open the download file, and click on the 'Run' button.
Step 7: A pop message will be shown on the screen to get the approval of the administrator to install
the Tableau software. Click on 'yes' to approve it than installation will be started.
Now, you are all set to use your Tableau desktop on your window machine.
Go to the home page and select the global superstore sales-Excel sheet.
In many ways, you can open a workspace page; for example, go to the display Tableau's icon on
your desktop and you have a data source shown on your desktop.
Dragging any data source icon and dropping it on the Tableau icon opens Tableau's worksheet
page for the selected data source.
Also, you can open as many connections as you need in Tableau by going to the data connection
page or start page and select a new connection.
File Menu
For any Windows program the file menu contains New, Open, Close, Save, Save As, and Print,
functions.
The most frequently used feature found in this menu is the Print to pdf option.
This allows us to export our dashboard or worksheet in pdf form.
If you don't remember where Tableau places files, or you want to change the default file-save
location, use the repository location option for review the file and change it.
We can create a packaged workbook from the export packaged workbook option in a fast manner.
Data Menu
You can use a data menu if you find some interesting tabular data on a website that you want to
analyze with Tableau.
Highlight and copy the data from the site, then use the Paste Data option to input it into Tableau.
Once pasted, then Tableau will copy the data from the Windows clipboard and add a data source
in the data window.
The Edit Relationships menu option is used in data blending.
This menu option is needed if the field names are not identical in two different data sources.
It allows you to define the related fields correctly.
Worksheet Menu
The Export option allows you to export the worksheet as an Excel crosstab, an image, or in Access
database file format.
The Duplicate as Crosstab option creates a crosstab version of the worksheet and places it in a
new worksheet.
Dashboard Menu
The Action Menu is a useful feature that is reachable from both the Worksheet Menu and the
Dashboard Menu.
Analysis Menu
In this menu, you can access the stack marks and aggregate measures options.
These switches allow you to adjust default Tableau behaviours that are useful if you required to
build non-standard chart types.
The Create Edit Calculated Field and Calculated Field options are used to make measures and
new dimensions that don't exist in your data source.
Map Menu
The Map Menu bar is used to alter the base map color schemes.
The other menu bar is related in the way of replacing Tableau's standard maps with other map
sources.
You can also import the geocoding for the custom locations using the geocoding menu.
Toolbar Icon
Toolbar icon below the menu bar can be used to edit the workbook using different features like
redo, undo, new data source, save, slideshow, and so on.
Dimension Shelf
The dimension presents in the data source for example- customer (customer name, segment),
order (order date, order id, ship date, and ship mode), and location (country, state, and city) these
all type of data source can be viewed in the dimension shelf.
Measure Shelf
The measures present in the data source, for example- Discount, Profit, Profit ratio, Quantity, and
Sales- These all types of data source can be viewed in the measure shelf.
Page Shelf
Page shelf is used to view the visualization in video format by keeping the related filter on the page
shelf.
Filter Shelf
Filter Shelf is used to filter the graphical view by the help of the measures and dimensions.
Masks Cards
Marks card is used to design the visualization. The data components of the visualization like size,
color, path, shape, label, and tooltip are used in the visualizations. It can be modified in the marks
card.
Worksheet
The worksheet is the space where the actual visualization, design, and functionalities are viewed in
the workbook.
Tableau Repository
Tableau repository is used to store all the files related to the Tableau desktop.
It includes various folders like Connectors, Bookmarks, Data sources, Logs, Extensions, Map
sources, Shapes, Services, Tab Online Sync Client, and Workbooks.
My Tableau repository is located in the file path C:\Users\User\Documents\My Tableau
Repository.
Data Source
We can modify existing data source, and create or add the new data source using the 'Data source'
tab, which is present at the bottom of the Tableau desktop window.
Current Sheet: Current Sheet is a sheet of workbook in which we are currently working. All the
dashboards, worksheets, and storyboard present in the workbook, are available in this tab.
New Sheet: The new sheet icon presents in the tab is used to create a new worksheet in the
Tableau workbook.
New Dashboard
The new dashboard icon presents in the tab is used to create a new dashboard in the Tableau
workbook.
New Storyboard
The new storyboard icon presents in the tab is used to create a new storyboard in the Tableau
Workbook.
First Sheet
This first sheet icon presents in the tab at the bottom of the right-hand side of Tableau desktop window
is used for visiting the first sheet directly.
Previous Sheet
The previous sheet icon is used to return back to the last worksheet from the new sheet.
Next Sheet
The next sheet icon is used to jump to the next worksheet of Tableau desktop.
Show Filmstrip
All the tabs are shown here with their icons by clicking on the show filmstrip.
Show Tabs
This tab concludes all tabs such as worksheets, data sources, dashboards, and storyboard.
Here, a workbook that shows the three different data connection given below:
The green line next to the global superstore data connection indicates that it is the active connection
in the worksheet. So, the bar chart in the spreadsheet was created using 'dimensions and
measures' from that data source. Thus, the bar chart is created using the dimensions and
measures from the data source.
The Olympic Athletes data connection is a direct connection that is also indicated by the grey
highlights. Those data source fields are currently displayed on the measures and dimensions
shelves. The clipboard data source at the top of the data window was dragged and dropped into
Tableau.
When you create data connections, Tableau will automatically evaluate the fields and place them
on the measures and dimensions shelves.
Usually, Tableau placed most of the fields correctly. If something is incorrectly placed, drag the field
to the correct location. Errors sometimes occur when numbers are used to illustrate dimensions.
In the above figure, focus on the icons next to the fields in the measures and dimension shelves.
These icons denote specific data types.
A calendar with a clock is a date or time field. Numeric values have pound signs, and "abc" icons
indicate text fields. Boolean fields have "True or False" values.
In Tableau, you can create aggregation dimensions and measures. Whenever you add measures
to your view, an aggregation is applied to those measures by default.
The type of Aggregation used depends on the context of the view.
If you are not familiar with the database, then refer to Tableau manual for detailed definition of
these aggregate types. You are adding fields into the visualization by default then it will be
displayed.
Tableau allows you to change or alter the aggregation level for a specific view.
To change the default aggregation, do right click on that field inside the data shelf and change its
default by selecting the menu options (default properties or Aggregation).
You can also change the Aggregation of a field for specific use in a worksheet.
The data source used in the above figure is a data extract of an Excel spreadsheet.
It is important to understand that if you depend on a direct connection to Excel, the median and
count (distinct) aggregations would not be available.
Access, Excel, and text files do not support these aggregate types. Tableau's extract engine do
this task.
Aggregating Measures
When you add a measure to the view, Tableau automatically aggregates its value. Average, sum
and median are the common aggregation functions.
The current Aggregation looks like part of the measure's name in the view.
For example: Sales becomes SUM (Sales), and every measure has a default aggregation, which is
set by Tableau when you connect to a data source. You can change or view the default aggregation
for measures.
You can aggregate a measure using Tableau only for relational data sources.
Multidimensional data sources contain data sources which are already aggregated.
In Tableau, the multidimensional data source is supported only in windows.
Set the default Aggregation for Measures
You can set the default aggregations for any measures.
It is not a calculated field that itself contains an aggregate, such as AVG ([Discount]).
A default aggregation is the preferred calculation for summarizing a discrete or continuous field.
The default aggregation is used when you drag a measure to a view automatically.
If it is already selected, click aggregation measures once for deselecting it. Then, you can see the
changes.
Disaggregating data can be useful for analyzing measures which you want to use both dependently
and independently in the view.
Note: If your data source is very large, then, as a result, disaggregating the data can degrade in
significant performance.
Note: The Count Distinct aggregation does not support the Text File and Microsoft Excel data
sources using the inheritance connection. If you are connected to one of these types of data sources,
then the Count Distinct aggregation is unavailable, and it shows the remark "Requires extract." If you
save the data sources as an extract, you will be able to use the Count Distinct aggregation.
Another way to view a dimension as an attribute. You can change it by choosing the Attribute from the
context menu for the dimension.
If MIN (dimension) = MAX (dimension) then MIN (dimension) else "*" end
This given formula is calculated in Tableau after the data is retrieved from the initial query.
The asterisk (*) is a visual indicator of a special type of Null value it occurs when there are multiple
values.
Below is an example of using Attribute in a table calculation. This table shows the market, market size,
state, and sales by the market that is SUM (sales). Suppose, you want to compute the percent of the
total sales according to each state contribution for the market. When you add some Percent of Total
in table calculation that calculates along State, the calculation computes within the black area shown
above figure just because the Market Size of dimension is partitioning the data.
File
Type Purpose
Extension
Tableau Tableau workbook can hold one or more worksheets, and also hold
(.twb)
workbook zero or more stories and dashboards.
Tableau Tableau bookmarks can hold a single worksheet that can be easily
(.tbm)
Bookmarks shared, and pasted into other workbooks.
Tableau data extract is a local copy of the entire data set. It is used
Tableau Data (.hyper or
to share the data with others when you worked offline, and want to
Extract .tde)
improve the performance.
Tableau Tableau packaged data source is very similar to the tableau data
Packaged Data (.tdsx) source, but it has an addition of data along with the connection
Source details.
This file stores the color preferences, which is used among all the
Tableau
(.tps) datasheets. It is also used to generate a customized look for the
Preferences
users.
These files are saved in the associated folders in the My Tableau Repository directory, which is
created in your My Documents folder by default when you install Tableau.
Also, your work files can be saved in other locations, such as a network directory or your desktop.
For example: If you want to have your data on a network server instead of your local machine, then
you can see the remote repository.
1. Select File then go to Repository Location.
2. Select a new folder that will be the new repository location in the select a repository dialog box.
3. Restart Tableau then it uses the new repository.
Changing the repository location does not include the original repository. Alternatively, Tableau
creates a new repository where you can store your files.
The given below picture shows all of the data sources available through Tableau's native data
connectors.
In-Memory
Tableau can also process the data in-memory by caching them in memory, and it not being
connected to the source anymore while analyzing the data.
Of course, there will be a limit on the amount of data cached depending on the availability of the
memory.
Step 3: It connects the Microsoft Excel file to Tableau. The sheets present in the Microsoft Excel file
are shown on the left-hand side of the window.
Creating an Extract
Extraction of the data is done by following the menu:
Data → Extract Data
It creates multiple options such as applying limits to how many rows to extract and whether to
aggregate data for dimensions.
The below figure shows the Extract Data option to you.
Add any filter or select a field among all options such as sub-category and click OK button.
Column Alias
Each column of the data source is assigned as aliases, which helps in better understanding the nature
of the column.
Click on the OK button, and after that, you can see the changes in the column of the data sources.
2. Inner Join: An inner join returns the matching rows from the tables that are being joined.
3. Natural Join:
Natural join is not used any comparison operator. It does not concatenate the way.
Only we can perform a Natural Join if there is at least one common attribute that exists between
two relations. Also, the attributes must have the same name and domain.
Natural join works on those matching attributes where the values of attributes in both the relation
are same.
ii. Right Outer Join: The right outer join operation returns matching rows from the tables being joined,
and also non-matching rows from the right table in the result and places NULL values in the attributes
that come from the left table.
Go to the Data menu and choose Microsoft Excel option below connect.
Then select sample-superstore as a data source and click the Open button.
Drag Orders and Returns tables from sheets of the data source to the data pane. After that
Tableau will automatically create a join between Orders and Returns tables which can be changed
later as per required joins.
Below screenshot shows the building inner join between Orders and Returns tables by using the
Order id field.
Go to the data source below connect → click on MS Access database file and browse for the
sample coffee chain file.
The below screenshot shows the different tables and joins available in the file:
When viewing a visualization, data can be stored using the single-click option from a header, an axis
or field label.
There are many ways to sort a visualization with single click sort buttons:
In all cases, one-click means sorts the data in ascending order, and two-click means it sorts the
data in descending order, and three-click means clear the sorts.
If the underlying data changes, the sort will update correctly.
In the above example, the sort is applied on Color rows based on the values of Metric A.
If there are hierarchical dimensions shown in above example, that type of sort is used on the inner
dimension.
Here, it means that Color rows will sort inside Hue. Dark magenta cannot be sorted at the top of
the viz because it should stay inside the Purple Hue.
In the above example, the sort is applied to a Material column such as Paint, Paper and Fabric based
on the values of Green since the header is used for the sort.
2. Click on the A-Z icon to sort alphabetically, or open the menu to see a list of fields which is possible
to sort according to the field. Then, click on sort after the icon switches to the bar icon.
In the above example, the sort is applied to the outermost dimension such as Hue is based on Metric
B. (Metric B is aggregated for all the colors inside each Hue, and Hue is sorted as first is Purple, then
Green, then Blue.)
2. Choose the appropriate sort button such as ascending or descending order in the toolbar.
In the above example, the sort is applied on Hue unless the Material field is selected before sorting.
In the case of Metric B, the toolbar sort applies to the leftmost measure.
And to sort by Metric A, it would be necessary to use another method of sorting or reverse their
order on the Columns shelf. (To see the effect of sorting by Material, Hue is removed from the view.
this makes it easy to see how the sort is computed.)
The component of the above calculation can be further divided into the following:
1. Functions: IF, THEN, ELSEIF, ELSE, and END.
2. Field: Profit per Day.
3. Operators: > and <=.
4. Literal Expression
String Literals: "Highly Profitable", "Unprofitable", and "Profitable".
Numeric Literals: 5000, and 0.
Note: Not all calculation needs to contain all the four components.
2. Comments: Comment is defined as the notes about a calculation or its parts, but comments not
included in the computation of the calculation.
To enter a comment in a calculation, use two forward slashes //.
For Example
SUM ([Sales]) / SUM ([Profit]) // Nick's calculation
// to be used for profit ratio
// Do not edit
Tableau Operators
An operator is a symbol for performing specific mathematical and logical operations through the
compiler.
Tableau has several numbers of operators which are used to create calculated fields and formulas.
Here are the types of operators with their order of precedence of operation:
Division (/): we can divide two numbers by the help of the division operator.
Example: 15 / 5 = 3
Modulo (%): modulo operator gives you the remainder of the numeric division.
Example: 17 % 2 = 1
Equal to (= or = =): It compares two numbers, strings or two dates to be similar and returns the
Boolean values, true if they are equal else returns False.
OR: If anyone or both of the Boolean values present on both sides of the OR operator analyses to
be TRUE, then the result is TRUE. Else the result is FALSE.
Example: [Ship Date] > #April 1, 2018# OR [Profit] > 20000
There is a list of Tableau functions that are categorized into five parts:
1. Number functions
2. String functions
3. Date functions
4. Logical functions
5. Aggregate functions
1. Number Functions
Number function is a function that uses for the numeric calculations. They take only numbers as inputs.
Let's see some essential examples of number functions:
Ceiling (Number): It rounds a number to the nearest integer of equal or greater values.
Example: CEILING (4.155) = 5
2. String Functions
String functions are used for the manipulation of the string.
Let's see some essential examples of string functions:
LEN (String): LEN string returns the length of the string.
Example: LEN ("Tableau") = 7
LTrim (String): It returns a string that contains a copy of the specified string with no leading (LTrim)
or trailing (RTrim) spaces.
Example: LTrim (" Tableau ") = "Tableau"
REPLACE (String, Substring Replacement): It searches the string for substring and replaces it.
If the substring is not found, that string is not changed.
Example: REPLACE ("Green yellow Green", "yellow", "Red") = "Green Red Green"
DATENAME (date_part, date, start_of_week): It returns date_part of date as a string. And the
start_of_week parameter is optional.
Example: DATENAME ('month', #2018-03-15#) = "March"
DAY (date): It returns the day of the given date in integer form.
Example: DAY (#2018-04-12#) = 12
4. Logical Functions
These functions evaluate some single values and produce a Boolean output.
See some essential examples of logical function:
IFNULL (expression1, expression2): If the result is not null, then IFNULL function returns the first
expression, and if it is null, then it returns the second expression.
Example: IFNULL ([Sales], 0) = [Sales]
ISDATE (string): If the string argument can be converted to a date, the ISDATE function returns
TRUE, and if it cannot, it returns FALSE.
Example: ISDATE ("12/06/99") = "TRUE"
ISDATE ("14/06/99") = "FALSE"
MIN (expression): The MIN function returns the minimum result for each record.
5. Aggregate Functions
Let's see some essential examples of aggregate functions:
AVG (expression): It returns the average of all the values in the expression. AVG is used only with
numeric fields. And the Null values are ignored.
COUNT (expression): It returns the number of items in a group and the Null values are not
counted.
MEDIAN (expression): It returns the median of an expression over all records. Median can only
be used with numeric fields, and Null values are ignored.
STDEV (expression): It returns the statistical standard deviation of all values in the given
expression based on a sample of the population.
Create a Formula
To visualize the difference between Profit and Discount for different shipping mode of the products,
create a formula that subtracts the Discount from the Profit, as shown in the below image, and the
name of this field is profit_n_discount.
Click OK and dragging the Avg_Sales field to the Rows shelf, then you get the following view.
You can change the dropdown value and only see the related functions to strings.
You can change the dropdown value and only see the related functions to date, shown in the below
image:
For example, for calculating an average, we need to apply a single method of calculations on an
entire column. These calculations cannot be performed on some selected rows.
The table has a feature known as "Quick Table Calculations", which is used to create such
calculations.
Step 2: Right-click on the Measure and choose the option Quick Table Calculation.
Step 3: Choose one option among the following options to be applied to the Measure.
Running Total
Difference
Percent Difference
Percent of Total
Rank
Percentile
Moving Average
Year to Date (YTD) Total
Compound Growth Rate
Year over Year Growth
Year to Date (YTD) Growth
1. Table (Across): It computes across the length of the table and restarts after every partition.
For example, in the below screenshot, the calculation is computed across columns such as "Year
(Order Date)" for every row such as "Month (Order Date)".
3. Table (Across then Down): It computes across the length of the table, and then down the length
of the table.
For example, in the below screenshot, the calculation is computed across columns such as "Year
(Order Date)", down a row such as "Month (Order Date)", and then across columns again for the entire
table.
7. Pane (Down then Across): It computes down an entire pane and then across the pane.
For example, in the below screenshot, the calculation is computed down rows such as "Month (Order
Date)" for the length of the pane, across a column such as "Year (Order Date)", and then down the
length of the pane again.
Drag the Region and State field to the Rows shelf and the calculated field (regional_sales) to the
Text shelf under the Marks card.
Also, drag the Region field to the Color shelf.
This creates the below view, that shows a fixed value for different states because we fixed the
dimension as a region for the calculation of Sales value.
Select one option among these options and click on OK button to apply the filter as shown in below
screenshot.
The final view after applying filter looks like below screenshot:
After clicking on the OK button, it opens a filter window shown in the below screenshot.
The below screenshot shows how the quick filters are accessed:
The given below table lists the various quick filters and their uses in Tableau.
After clearing filter from the filter pane, the worksheet looks like the below screenshot:
Step 2: And, choose the horizontal bar chart from the "Show Me" tab.
Step 3: Again, drag the Sub-Category to the Filters shelf. You get the chart shown in the below
screenshot.
Step 4: Right-click on the Sub-Category field in the filter shelf and click on "Edit Filter " option then
go the " Top " tab in the pop-up window.
Step 6: Drag the Category field to the filter shelf. Right-click on the Category field to edit and choose
Furniture from the list. It shows three subcategories of products as a result shown in below
screenshot.
Step 8: Above all steps produce the final result that shows the subcategory of products from the
category Furniture.
Step 1: Drag the Segment field and the Sales field to the Column shelf.
Step 2: Next, drag the Sub-Category field to the Rows shelf. Choose the horizontal bar chart option.
And you get view shown in below screenshot.
After completing the above steps, you get a view which shows only those subcategories of products
that have the required amount of sale.
Also, this shows all the available Segments where the condition is True that shown in below
screenshot.
Step 2: It opens the "Edit Data Source Filters" Window. Then, click on "Add" Option of the window
that shown in below screenshot.
Step 1: Drag the Sub-Category field to the Rows shelf and the Sales field to the Columns shelf.
Choose the horizontal bar from the "Show Me" tab. Tableau shows the following view:
Step 2: Right-click on the Sub-Category field and go to the "Top" tab. And, choose the second radio
"By field" option. From the drop-down, select the Top 10 options by Sum of Sales.
After completing the above all steps, you will get the following view, which shows the top 10 Sub-
Category of products by sales shown in the below screenshot.
For example, consider a data source such as sample-superstore, and you want to sort the
dimensions and the measures fields as follows.
Step 1: Add the sample-superstore data source with Tableau and drag the Order table to the pane
shown in the below screenshot.
Step 2: Go to the worksheet and drag the dimension Category to the row shelf and the measure
Sales to the column shelf.
It creates a horizontal bar chart. Category field present in the visual order, and it is sorted based on
data source by default. We can change the order of sorting by following the below procedure.
After that, it opens the Sort window. All options present inside the sort window is shown below as
follows:
In the above example, it filters the Category field based on the sum of sales in ascending order. And
it sorts the data which is shown in below screenshot.
It created a group whose field name is Category (Group) and added in the dimension list. This is
used for visualizing the group of members present in a field.
The below screenshot explains the functionality. The sum of sales is visualized for both Furniture
and Office Supplies.
The fields in the hierarchy are also removed from the hierarchy, and the hierarchy disappears from
the Data pane.
After clicking the "Show Members" option, it will show all the members present in the set shown in
below screenshot.
After clicking on the "Edit Set" option, Edit Set window will be opened with the set name. Now, you
can edit the set shown in below screenshot.
For example, consider a data source such as Sample-Superstore and its dimensions and measures.
Step 1: First, go to the worksheet
Step 4: By default, it creates the bar chart shown in the below screenshot.
Step 3: Also, drag the Profit field to the Color pane under the Marks Pane and, it produces a different
color for negative bars.
The below-stacked chart appears that shows the distribution of each segment in each bar.
For example, consider a data source such as Sample-Superstore and its dimensions and measures.
Step 1: Select one dimension and one measure to create a simple line chart.
1. Drag the dimension Order Date into Columns Shelf.
2. And Sales into the Rows shelf.
3. It creates the line chart by default or Chooses the Line chart from the "Show Me" button.
You will view the following line chart that shows the variation of Sales for different Order Date showing
in the below screenshot.
Step 2: Drag measures Sales and Discount into the Rows shelf.
Here is one more level into this hierarchy, we get the manufacturer as the label shown in the below
screenshot.
Selecting the New Dashboard option or clicking on the Dashboard icon will open a new window
named Dashboard 1. You change the name of the dashboard as per your liking.
Or, you can select from a list of available fixed dashboard sizes as shown in the screenshot below.
4. Adding a Sheet
Now, we’ll add a sheet onto our empty dashboard. To add a sheet, drag and drop a sheet from
the Sheets column present in the Dashboard tab. It will display all the visualizations we have on that
sheet on our dashboard. If you wish to change or adjust the size and place of the visual/chart/graph,
click on the graph then click on the small downward arrow given at the right. A drop-down list
appears having the option Floating, select it. This will unfix your chart from one position so that you
can adjust it as per your liking.
Then on the selected visual, we make selections. For instance, we select the data point corresponding
to New Jersey in the heat map shown below. As soon as we select it, all the rest of the graphs and
charts change their information and make it relevant to New Jersey. Notice in the Region section, the
only region left is East which is where New Jersey is located.
From the objects pane, we can add a button and also select the action of that button, that is, what
that button should do when you click on it. Select the Edit Button option to explore the options you
can select from for a button object.
For instance, we add a web page of our DataFlair official site as shown in the screenshot below.
We can add filters on this dashboard by clicking on a visual. For instance, we want to add a filter
based on months on the scatter plot showing sales values for different clusters. To add a months
filter, we click on the small downward arrow and then select Filters option. Then we select Months
of Order Date option. You can select any field based on which you wish to create a new filter.
You can make more changes into the filter by right-clicking on it. Also, you can change the type of
filter from the drop-down menu such as Relative Date, Range of Date, Start Date, End Date, Browse
Periods, etc.
Similarly, you can add and edit more filters on the dashboard.
This opens our dashboard in the presentation mode. So far we were working in the Edit Mode. In the
presentation mode, it neatly shows all the visuals and objects that we have added on the dashboard.
We can see how the dashboard will look when we finally present it to others or share it with other
people for analysis.
For instance, we selected the brand Pixel from our list of items from the sub-category field. This
instantly changes the information on the visuals and makes it relevant to only Pixel.
What is SQL?
SQL is a short-form of the Structured Query Language, and it is pronounced as S-Q-L or sometimes
as See-Quell.
This database language is mainly designed for maintaining the data in relational database
management systems.
It is a special tool used by data professionals for handling structured data (data which is stored in
the form of tables). It is also designed for stream processing in RDSMS.
You can easily create and manipulate the database, access and modify the table rows and
columns, etc.
This query language became the standard of ANSI in the year of 1986 and ISO in the year of 1987.
If you want to get a job in the field of data science, then it is the most important query language to
learn.
Big enterprises like Facebook, Instagram, and LinkedIn, use SQL for storing the data in the back-
end.
1. No programming needed
SQL does not require a large number of coding lines for managing the database systems.
We can easily access and maintain the database by using simple SQL syntactical rules.
These simple rules make the SQL user-friendly.
3. Standardized Language
SQL follows the long-established standards of ISO and ANSI, which offer a uniform platform across
the globe to all its users.
4. Portability
The structured query language can be easily used in desktop computers, laptops, tablets, and even
smartphones.
It can also be used with other applications according to the user's requirements.
5. Interactive language
We can easily learn and understand the SQL language.
We can also use this language for communicating with the database because it is a simple query
language.
This language is also used for receiving the answers to complex queries in a few seconds.
1. Cost
The operation cost of some SQL versions is high. That's why some programmers cannot use the
Structured Query Language.
SQL No-SQL
1. SQL is a relational database management 1. While No-SQL is a non-relational or distributed
system. database management system.
2. The query language used in this database 2. The query language used in the No-SQL
system is a structured query language. database systems is a non-declarative query
language.
3. The schema of SQL databases is 3. The schema of No-SQL databases is a dynamic
predefined, fixed, and static. schema for unstructured data.
4. These databases are vertically scalable. 4. These databases are horizontally scalable.
5. The database type of SQL is in the form of 5. The database type of No-SQL is in the form of
tables, i.e., in the form of rows and columns. documents, key-value, and graphs.
6. It follows the ACID model. 6. It follows the BASE model.
7. Complex queries are easily managed in the 7. NoSQL databases cannot handle complex
SQL database. queries.
8. This database is not the best choice for 8. While No-SQL database is a perfect option for
storing hierarchical data. storing hierarchical data.
9. All SQL databases require object-relational 9. Many No-SQL databases do not require object-
mapping. relational mapping.
10. Gauges, CircleCI, Hootsuite, etc., are the 10. Airbnb, Uber, and Kickstarter are the top
top enterprises that are using this query enterprises that are using this query language.
language.
11. SQLite, Ms-SQL, Oracle, PostgreSQL, 11. Redis, MongoDB, Hbase, BigTable, CouchDB,
and MySQL are examples of SQL database and Cassandra are examples of NoSQL database
systems. systems.
2. Interface is Complex
Another big disadvantage is that the interface of Structured query language is difficult, which makes
it difficult for SQL users to use and manage it.
SQL vs No-SQL
The following table describes the differences between the SQL and NoSQL, which are necessary to
understand:
CREATE Command
This command helps in creating the new database, new table, table view, and other objects of the
database.
UPDATE Command
This command helps in updating or changing the stored data in the database.
DELETE Command
This command helps in removing or erasing the saved records from the database tables.
It erases single or multiple tuples from the tables of the database.
SELECT Command
This command helps in accessing the single or multiple rows from one or multiple tables of the
database.
We can also use this command with the WHERE clause.
DROP Command
This command helps in deleting the entire table, table view, and other objects from the database.
INSERT Command
This command helps in inserting the data or records into the database tables.
We can easily insert the records in single as well as multiple rows of the table.
Following are some most important points about the SQL syntax which are to remember:
1. You can write the keywords of SQL in both uppercase and lowercase, but writing the SQL keywords
in uppercase improves the readability of the SQL query.
2. SQL statements or syntax are dependent on text lines. We can place a single SQL statement on
one or multiple text lines.
3. You can perform most of the action in a database with SQL statements.
4. SQL syntax depends on relational algebra and tuple relational calculus.
SQL Statements
SQL statements tell the database what operation you want to perform on the structured data and
what information you would like to access from the database.
The statements of SQL are very simple and easy to use and understand. They are like plain English
but with a particular syntax.
Simple Example of SQL statement:
SELECT "column_name" FROM "table_name";
Each SQL statement begins with any of the SQL keywords and ends with the semicolon (;).
The semicolon is used in the SQL for separating the multiple SQL statements which are going to
execute in the same call.
In this SQL tutorial, we will use the semicolon (;) at the end of each SQL query or statement.
Few commonly used SQL Statements are:
1. Select Statement
2. Update Statement
3. Delete Statement
4. Create Table Statement
5. Alter Table Statement
6. Drop Table Statement
7. Create Database Statement
8. Drop Database Statement
9. Insert Into Statement
10. Truncate Table Statement
11. Describe Statement
12. Distinct Clause
13. Commit Statement
14. Rollback Statement
15. Create Index Statement
16. Drop Index Statement
17. Use Statement
Let's discuss each statement in short one by one with syntax and one example:
2. UPDATE Statement
This SQL statement changes or modifies the stored data in the SQL database.
3. DELETE Statement
This SQL statement deletes the stored data from the SQL database.
This example creates the table Employee_details with five columns or fields in the SQL database.
The fields in the table are Emp_Id, First_Name, Last_Name, Salary, and City.
The Emp_Id column in the table acts as a primary key, which means that the Emp_Id column cannot
contain duplicate values and null values.
5. ALTER TABLE Statement
This SQL statement adds, deletes, and modifies the columns of the table in the SQL database.
BLOB(size) It is used for BLOBs (Binary Large Objects). It can hold up to 65,535 bytes.
This example consists of an Employee_details table, which has four columns Emp_Id, Emp_Name,
Emp_Salary, and Emp_Monthlybonus.
Emp Id Emp Name Emp Salary Emp Monthlybonus
101 Tushar 25000 4000
102 Anuj 30000 200
Suppose, we want to add 20,000 to the salary of each employee specified in the table. Then, we
have to write the following query in the SQL:
Suppose, we want to add the Salary and monthly bonus columns of the above table, then we have
to write the following query in SQL:
This example consists of an Employee_details table, which has four columns Emp_Id, Emp_Name,
Emp_Salary, and Emp_Penalty.
Emp Id Emp Name Emp Salary Penalty
201 Abhay 25000 200
202 Sumit 30000 500
Suppose we want to subtract 5,000 from the salary of each employee given in
the Employee_details table. Then, we have to write the following query in the SQL:
This example consists of an Employee_details table, which has four columns Emp_Id, Emp_Name,
Emp_Salary, and Emp_Penalty.
Emp Id Emp Name Emp Salary Penalty
201 Abhay 25000 200
202 Sumit 30000 500
Suppose, we want to double the salary of each employee given in the Employee_details table.
Then, we have to write the following query in the SQL:
This example consists of an Employee_details table, which has three columns Emp_Id,
Emp_Name, and Emp_Salary.
Emp Id Emp Name Emp Salary
201 Abhay 25000
202 Sumit 30000
Suppose, we want to half the salary of each employee given in the Employee_details table. For
this operation, we have to write the following query in the SQL:
This example consists of a Division table, which has three columns Number, First_operand, and
Second_operand.
Number First operand Second operand
1 56 4
2 32 8
3 89 9
4 18 10
5 10 5
If we want to get the remainder by dividing the numbers of First_operand column by the numbers
of Second_operand column, then we have to write the following query in SQL:
This example consists of an Employee_details table, which has three columns Emp_Id,
Emp_Name, and Emp_Salary.
This example consists of an Employee_details table, which has three columns Emp_Id,
Emp_Name, and Emp_Salary.
Emp Id Emp Name Emp Salary
201 Abhay 45000
202 Ankit 45000
203 Bheem 30000
204 Ram 29000
205 Sumit 29000
Suppose, we want to access all the records of those employees from the Employee_details table
whose salary is not 45000. Then, we have to write the following query in the SQL database:
This example consists of an Employee_details table, which has three columns Emp_Id,
Emp_Name, and Emp_Salary.
Emp Id Emp Name Emp Salary
201 Abhay 45000
202 Ankit 45000
203 Bheem 30000
204 Ram 29000
205 Sumit 29000
Suppose, we want to access all the records of those employees from the Employee_details table
whose employee id is greater than 202. Then, we have to write the following query in the SQL
database:
Let’s understand the below example which explains how to execute greater than equals to the operator
(>=) in SQL query:
This example consists of an Employee_details table, which has three columns Emp_Id,
Emp_Name, and Emp_Salary.
This example consists of an Employee_details table, which has three columns Emp_Id,
Emp_Name, and Emp_Salary.
Emp Id Emp Name Emp Salary
201 Abhay 45000
202 Ankit 45000
203 Bheem 30000
204 Ram 29000
205 Sumit 29000
Suppose, we want to access all the records of those employees from the Employee_details table
whose employee id is less than 204. For this, we have to write the following query in the SQL
database:
This example consists of an Employee_details table, which has three columns Emp_Id,
Emp_Name, and Emp_Salary.
1. SELECT
2. HAVING
3. WHERE
Let's understand the below example which explains how to execute ALL logical operators in SQL
query:
This example consists of an Employee_details table, which has four columns Emp_Id, Emp_Name,
Emp_Salary, and Emp_City.
Let's understand the below example which explains how to execute AND logical operator in SQL
query:
This example consists of an Employee_details table, which has four columns Emp_Id, Emp_Name,
Emp_Salary, and Emp_City.
Here,SQL AND operator with WHERE clause shows the record of employees whose salary is 25000
and the city is Delhi.
Syntax of OR operator:
SELECT column1, ...., columnN FROM table_Name WHERE condition1 OR condition2 OR conditio
n3 OR ....... OR conditionN;
Let’s understand the below example which explains how to execute OR logical operator in SQL query:
This example consists of an Employee_details table, which has four columns Emp_Id, Emp_Name,
Emp_Salary, and Emp_City.
Let's understand the below example which explains how to execute BETWEEN logical operator in
SQL query:
This example consists of an Employee_details table, which has four columns Emp_Id, Emp_Name,
Emp_Salary, and Emp_City.
Emp Id Emp Name Emp Salary Emp City
201 Abhay 25000 Delhi
202 Ankit 45000 Chandigarh
203 Bheem 30000 Delhi
204 Ram 25000 Delhi
205 Sumit 40000 Kolkata
Suppose, we want to access all the information of those employees from
the Employee_details table who is having salaries between 20000 and 40000. For this, we have
to write the following query in SQL:
Syntax of IN operator:
SELECT column_Name1, column_Name2 ...., column_NameN FROM table_Name WHERE column
_name IN (list_of_values);
Let’s understand the below example which explains how to execute IN logical operator in SQL query:
This example consists of an Employee_details table, which has four columns Emp_Id, Emp_Name,
Emp_Salary, and Emp_City.
Here, we used the SQL NOT IN operator with the Emp_Id column.
SELECT column1, column2 ...., columnN FROM table_Name WHERE NOT condition;
Let's understand the below example which explains how to execute NOT logical operator in SQL
query:
This example consists of an Employee_details table, which has four columns Emp_Id, Emp_Name,
Emp_Salary, and Emp_City.
Emp Id Emp Name Emp Salary Emp City
201 Abhay 25000 Delhi
202 Ankit 45000 Chandigarh
203 Bheem 30000 Delhi
204 Ram 25000 Delhi
205 Sumit 40000 Kolkata
Suppose, we want to show all the information of those employees from
the Employee_details table whose Cityis not Delhi. For this, we have to write the following query
in SQL:
SELECT * FROM Employee_details WHERE NOT Emp_City = 'Delhi' AND NOT Emp_City = 'Chand
igarh';
In this example, we used the SQL NOT operator with the Emp_City column.
ANY Operator
The ANY operator in SQL shows the records when any of the values returned by the sub-query
meet the condition.
The ANY logical operator must match at least one record in the inner query and must be preceded
by any SQL comparison operator.
Let's understand the below example which explains how to execute LIKE logical operator in SQL
query:
This example consists of an Employee_details table, which has four columns Emp_Id, Emp_Name,
Emp_Salary, and Emp_City.
Emp Id Emp Name Emp Salary Emp City
201 Sanjay 25000 Delhi
202 Ajay 45000 Chandigarh
203 Saket 30000 Delhi
204 Abhay 25000 Delhi
205 Sumit 40000 Kolkata
If we want to show all the information of those employees from the Employee_details whose name
starts with ''s''. For this, we have to write the following query in SQL:
If we want to show all the information of those employees from the Employee_detailswhose name
ends with ''y''. For this, we have to write the following query in SQL:
If we want to show all the information of those employees from the Employee_detailswhose name
starts with ''S'' and ends with ''y''. For this, we have to write the following query in SQL:
Union Operator
The SQL Union Operator combines the result of two or more SELECT statements and provides the
single output.
The data type and the number of columns must be the same for each SELECT statement used
with the UNION operator. This operator does not show the duplicate records in the output table.
Let's understand the below example which explains how to execute Union operator in Structured
Query Language:
In this example, we used two tables. Both tables have four columns Emp_Id, Emp_Name,
Emp_Salary, and Emp_City.
Emp Id Emp Name Emp Salary Emp City
201 Sanjay 25000 Delhi
202 Ajay 45000 Delhi
203 Saket 30000 Aligarh
Table: Employee_details1
Emp Id Emp Name Emp Salary Emp City
203 Saket 30000 Aligarh
204 Saurabh 40000 Delhi
205 Ram 30000 Kerala
201 Sanjay 25000 Delhi
Table: Employee_details2
Suppose, we want to see the employee name and employee id of each employee from both tables
in a single output. For this, we have to write the following query in SQL:
SELECT Emp_ID, Emp_Name FROM Employee_details1
UNION
SELECT Emp_ID, Emp_Name FROM Employee_details2 ;
Let’s understand the below example which explains how to execute Union ALL operator in Structured
Query Language:
In this example, we used two tables. Both tables have four columns Emp_Id, Emp_Name,
Emp_Salary, and Emp_City.
Emp Id Emp Name Emp Salary Emp City
201 Sanjay 25000 Delhi
Table: Employee_details1
Emp Id Emp Name Emp Salary Emp City
203 Saket 30000 Aligarh
204 Saurabh 40000 Delhi
205 Ram 30000 Kerala
201 Sanjay 25000 Delhi
Table: Employee_details2
If we want to see the employee name of each employee of both tables in a single output. For this,
we have to write the following query in SQL:
SELECT Emp_Name FROM Employee_details1
UNION ALL
SELECT Emp_Name FROM Employee_details2 ;
Suppose, we want to see a common record of the employee from both the tables in a single output.
For this, we have to write the following query in SQL:
SELECT Emp_Name FROM Employee_details1
INTERSECT
SELECT Emp_Name FROM Employee_details2 ;
Let's understand the below example which explains how to execute INTERSECT operator in
Structured Query Language:
In this example, we used two tables. Both tables have four columns Emp_Id, Emp_Name,
Emp_Salary, and Emp_City.
Emp Id Emp Name Emp Salary Emp City
201 Sanjay 25000 Delhi
Table: Employee_details1
Emp Id Emp Name Emp Salary Emp City
203 Saket 30000 Aligarh
204 Saurabh 40000 Delhi
205 Ram 30000 Kerala
201 Sanjay 25000 Delhi
Table: Employee_details2
Suppose, we want to see the name of employees from the first result set after the combination of
both tables. For this, we have to write the following query in SQL:
SELECT Emp_Name FROM Employee_details1
MINUS
SELECT Emp_Name FROM Employee_details2 ;
Suppose, we want to perform the Bitwise OR operator between both the columns of the above
table. For this, we have to write the following query in SQL:
SELECT Column1 | Column2 From TABLE_OR ;
Example 1:
This example creates the Student database. To create the Student database, you have to type the
following command in Structured Query Language:
CREATE DATABASE Student ;
When this query is executed successfully, then it will show the following output:
Database created successfully
You can also verify that your database is created in SQL or not by using the following query:
SHOW DATABASE ;
SQL does not allow developers to create the database with the existing database name.
Suppose if you want to create another Student database in the same database system, then the
Create Database statement will show the following error in the output:
Can't create database 'Student'; database exists
So, firstly you have to delete the existing database by using the Drop Statement.
You can also replace the existing database with the help of Replace keyword.
If you want to replace the existing Student database, then you have to type the following SQL query:
CREATE OR REPLACE DATABASE Student ;
When this query is executed successfully, then it will show the following output:
Database created successfully
You can also check that your database is created in SQL by typing the following query:
SHOW DATABASE ;
We know that SQL does not allow developers to create the database with the existing database name.
Suppose, we want to create another Employee database in the same database system, firstly, we
have to delete the existing database using a drop statement, or we have to replace the existing
Employee database with the help of the 'replace' keyword.
To replace the existing Employee database with a new Employee database, we have to type the
following query in SQL:
CREATE OR REPLACE DATABASE Employee;
SELECT Database
Suppose database users and administrators want to perform some operations on tables, views,
and indexes on the specific existing database in SQL.
Firstly, they have to select the database on which they want to run the database queries.
Any database user and administrator can easily select the particular database from the current
database server using the USE statement in SQL.
Example 1
Suppose, you want to work with the Hospital database. For this firstly, you have to check that if
the Hospital database exists on the current database server or not by using the following query:
SHOW DATABASES;
If the Hospital database is shown in the output, then you have to execute the following query to select
the Hospital database:
USE Hospital;
Example 2
Suppose, you want to work with another College database in SQL. For this firstly, you have to check
that the College database exists on the current database server or not by using the following query:
SHOW DATABASES;
If the College database is shown in the result, then you have to execute the following query to select
the College database:
USE College;
Example 3
Suppose you want to work with another School database in SQL. For this firstly, you have to check
that the School database exists on the current database server or not by using the following query:
SHOW DATABASES;
If the School database is shown in the result, then you have to execute the following query to select
the School database:
USE School;
Note: A table has a specified number of columns, but can have any number of rows.
Table is the simple form of data storage. A table is also considered as a convenient representation
of relations.
In the above table, "Employee" is the table name, "EMP_NAME", "ADDRESS" and "SALARY" are
the column names. The combination of data of multiple columns forms a row e.g. "Ankit", "Lucknow"
and 15000 are the data of one row.
When a transaction rolled back the data associated with table variable is not rolled back.
A table variable generally uses lesser resources than a temporary variable.
Table variable cannot be used as an input or an output parameter.
You can verify it, if you have created the table successfully by looking at the message displayed by
the SQL Server, else you can use DESC command as follows:
SQL> DESC STUDENTS;
MySQL
CREATE TABLE Employee(
EmployeeID NOT NULL,
FirstName varchar(255) NOT NULL,
LastName varchar(255),
City varchar(255),
PRIMARY KEY (EmployeeID)
);
Use the following query to define a PRIMARY KEY constraints on multiple columns, and to allow
naming of a PRIMARY KEY constraints.
On the other hand when we TRUNCATE a table, the table structure remains the same, so you will not
face any of the above problems.
1. Staff Table
2. Payment Table
SQL Injection
The SQL Injection is a code penetration technique that might cause loss to our database.
It is one of the most practiced web hacking techniques to place malicious code in SQL statements,
via webpage input. SQL injection can be used to manipulate the application's web server by
malicious users.
SQL injection generally occurs when we ask a user to input their username/userID. Instead of a
name or ID, the user gives us an SQL statement that we will unknowingly run on our database.
For Example - we create a SELECT statement by adding a variable "demoUserID" to select a string.
The variable will be fetched from user input (getRequestString).
demoUserI = getrequestString("UserId");
demoSQL = "SELECT * FROM users WHERE UserId =" +demoUserId;
The SQL code above is valid and will return EMPLOYEE_ID row from the EMPLOYEE table. The
1=1 will return all records for which this holds true.
All the employee data is compromised; now, the malicious user can also similarly delete the
employee records.
Example:
SELECT * from Employee where (Username == "" or 1=1) AND (Password="" or 1=1).
Now the malicious user can use the '=' operator sensibly to retrieve private and secure user
information. So instead of the query mentioned above, the following query, when exhausted, retrieve
protected data, not intended to be shown to users.
SELECT * from EMPLOYEE where (Employee_name =" " or 1=1) AND (Password=" " or 1=1)
SELECT sales_agent,
COUNT(sales_pipeline.close_value) AS total,
COUNT(sales_pipeline.close_value)
FILTER(WHERE sales_pipeline.close_value > 1000) AS `over 1000`
FROM sales_pipeline
WHERE sales_pipeline.deal_stage = "Won"
GROUP BY sales_pipeline.sales_agent
The first several rows of the resulting table would look like this:
sales_agent total over 1000
Wilburn Farren 55 38
Elease Gluck 80 32
SELECT sales_agent,
AVG(close_value)
FROM sales_pipeline
WHERE sales_pipeline.deal_stage = "Won"
GROUP BY sales_agent
ORDER BY AVG(close_value) DESC
sales_agent avg
Order By Command
The ORDER BY statement in SQL is used to sort the fetched data in either ascending or
descending according to one or more columns.
By default ORDER BY sorts the data in ascending order.
We can use the keyword DESC to sort the data in descending order and the keyword ASC to sort
in ascending order.
ORDER BY Syntax
SELECT column1, column2, …
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;
SQL Aliases
SQL aliases are used to give a table, or a column in a table, a temporary name.
Aliases are often used to make column names more readable.
An alias only exists for the duration of that query.
An alias is created with the AS keyword.
SQL Subqueries
A Subquery or Inner query or a Nested query is a query within another SQL query and embedded
within the WHERE clause.
A subquery is used to return data that will be used in the main query as a condition to further restrict
the data to be retrieved.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along
with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.
There are a few rules that subqueries must follow −
Subqueries must be enclosed within parentheses.
A subquery can have only one column in the SELECT clause, unless multiple columns are in
the main query for the subquery to compare its selected columns.
An ORDER BY command cannot be used in a subquery, although the main query can use an
ORDER BY. The GROUP BY command can be used to perform the same function as the
ORDER BY in a subquery.
Subqueries that return more than one row can only be used with multiple value operators such
as the IN operator.
The SELECT list cannot include any references to values that evaluate to a BLOB, ARRAY,
CLOB, or NCLOB.
A subquery cannot be immediately enclosed in a set function.
The BETWEEN operator cannot be used with a subquery. However, the BETWEEN operator
can be used within the subquery.
UPDATE table
SET column_name = new_value
[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
FROM TABLE_NAME)
[ WHERE) ]
For example, the user joe who connects from office.example.com need not be the same person as the
user joe who connects from home.example.com. MySQL handles this by enabling you to distinguish
users on different hosts that happen to have the same name: You can grant one set of privileges for
connections by joe from office.example.com, and a different set of privileges for connections
by joe from home.example.com. To see what privileges a given account has, use the SHOW
GRANTS statement.
For example:
SHOW GRANTS FOR 'joe'@'office.example.com';
SHOW GRANTS FOR 'joe'@'home.example.com';
Internally, the server stores privilege information in the grant tables of the MySQL system
database.
The MySQL server reads the contents of these tables into memory when it starts and bases
access-control decisions on the in-memory copies of the grant tables.
MySQL access control involves two stages when you run a client program that connects to the
server:
Stage 1: The server accepts or rejects the connection based on your identity and whether you
can verify your identity by supplying the correct password.
Stage 2: Assuming that you can connect, the server checks each statement you issue to
determine whether you have sufficient privileges to perform it. For example, if you try to select
rows from a table in a database or drop a table from the database, the server verifies that you
have the SELECT privilege for the table or the DROP privilege for the database.
There are some things that you cannot do with the MySQL privilege system:
You cannot explicitly specify that a given user should be denied access. That is, you cannot
explicitly match a user and then refuse the connection.
You cannot specify that a user has privileges to create or drop tables in a database but not
to create or drop the database itself.
A password applies globally to an account. You cannot associate a password with a specific
object such as a database, table, or routine.
Login Tab
The Login tab provides the following information related to the selected user account:
Login Name: You may create multiple accounts with the same name to connect from different
hosts.
Authentication Type: For standard password or host-based authentication, select Standard.
The caching_sha2_password and SHA256_Password authentication types provide more
secure password encryption than the Standard authentication type.
Starting with MySQL 8.0.4, the caching_sha2_password plugin is the default authentication
plugin for the server. An account that authenticates with caching_sha2_password must use
either a secure connection or an unencrypted connection that supports password exchange
using an RSA key pair.
Limit to Hosts Matching: The % and _ characters may be used as wildcards. The percent sign
(%) matches zero or more characters and the underscore (_) matches a single character.
Password and Confirm Password: To reset a password, type in the new password and then
confirm it. Consider using a password of eight or more characters with mixed-case letters,
numbers, and punctuation marks.
Use Expire Password to require a change of password to use the account.
It is possible for the client host name and user name of an incoming connection to match more
than one row in the user table. The preceding set of examples demonstrates this: Several of the
entries shown match a connection from h1.example.net by fred.
As of MySQL 8.0.23, accounts with an IP address in the host part have this order of specificity:
Accounts that have the host part given as an IP address:
CREATE USER 'user_name'@'127.0.0.1';
CREATE USER 'user_name'@'198.51.100.44';
Accounts that have the host part given as an IP address using CIDR notation:
CREATE USER 'user_name'@'192.0.2.21/8';
CREATE USER 'user_name'@'198.51.100.44/16';
Accounts that have the host part given as an IP address with a subnet mask:
CREATE USER 'user_name'@'192.0.2.0/255.255.255.0';
CREATE USER 'user_name'@'198.51.0.0/255.255.0.0';
global privileges
OR database privileges
OR table privileges
OR column privileges
OR routine privileges
Restore and Back-up
Backup
It is important to back up your databases so that you can recover your data and be up and running
again in case problems occur, such as system crashes, hardware failures, or users deleting data
by mistake.
Backups are also essential as a safeguard before upgrading a MySQL installation, and they can
be used to transfer a MySQL installation to another system or to set up replica servers.
MySQL offers a variety of backup strategies from which you can choose the methods that best suit
the requirements for your installation.
Snapshot Backups
Some file system implementations enable “snapshots” to be taken.
These provide logical copies of the file system at a given point in time, without requiring a physical
copy of the entire file system.
MySQL itself does not provide the capability for taking file system snapshots.
It is available through third-party solutions such as Veritas, LVM, or ZFS.
You need only a read lock; this enables other clients to continue to query the tables while you are
making a copy of the files in the database directory.
The flush is needed to ensure that the all-active index pages are written to disk before you start
the backup.
Step 4. In Back Up Database window, select the Backup Type as Full and under Destination,
select Back up to: Disk.
Step 6. Click on Add button to select the destination and name for the database backup file.
Step 7. Select the required folder for the backup file and enter the file name with a .bak extension.
We now have recovered the data to its state as of Tuesday 1 p.m., but still are missing the changes
from that date to the date of the crash.
To not lose them, we would have needed to have the MySQL server store its MySQL binary logs
into a safe location (RAID disks, SAN, ...) different from the place where it stores its data files, so
that these logs were not on the destroyed disk. That is, we can start the server with a log-bin option
that specifies a location on a different physical device from the one on which the data directory
resides. That way, the logs are safe even if the device containing the directory is lost.
If we had done this, we would have the gbichot2-bin.000009 file (and any subsequent files) at
hand, and we could apply them using mysqlbinlog and mysql to restore the most recent data
changes with no loss up to the moment of the crash:
$> mysqlbinlog gbichot2-bin.000009 ... | mysql
Importance of BI
Business intelligence is used to improve all parts of a company by improving access to the firm's
data and then using that data to increase profitability.
Companies that practice BI can translate their collected data into insights their business
processors.
Then the insights can be used to create strategic business decisions that improve productivity and
accelerate the growth.
Types of BI Tools
BI combines a broad set of data analysis applications that includes:
Mobile BI
Real-time BI
Operational BI
Open-source BI (OSBI)
Collaborative BI
Location intelligence (LI)
Software-as-a-service BI (SaaS BI)
Online analytical processing (OLAP)
Ad hoc analytics
Power BI Versions
Different Power BI version like Desktop, Service-based (SaaS), and mobile Power BI apps are
used in different platforms.
Power BI desktop app is used to create reports
Power BI Service (Software as a Service - SaaS) is used to publish those reports
Power BI mobile app is used to view the reports and dashboards.
Artificial Intelligence
Users can access image recognition and text analytics in Power BI, create machine learning models
using automated machine learning capabilities and integrate with Azure Machine Learning.
Quick Insights
This feature allows users to create subsets of data and automatically apply analytics to that
information.
Customization
This feature allows developers to change the appearance of default visualization and reporting tools
and import new tools into the platform.
Modelling view
This allows users to divide complex data models by subject area into separate diagrams, multiselect
objects and set common properties, view and modify properties in the properties pane, and set display
folders for simpler consumption of complex data models.
1. Power Query: It is used to access, search, and transform public and internal data sources.
2. Power Pivot: Power pivot is used in data modeling for in-memory analytics.
3. Power View: By using the power view, you can analyze, visualize, and display the data as an
interactive data visualization.
4. Power Map: It brings the data to life with interactive geographical visualization.
5. Power BI Service: You can share workbooks and data views which are restored from on-
premises and cloud-based data sources.
6. Power BI Q&A: You can ask any questions and get an immediate response with the natural
language query.
7. Data Management Gateway: You get periodic data refreshers, expose tables, and view data
feeds.
8. Data Catalogue: By using the data catalogue, you can quickly discover and reuse the queries.
Power BI architecture has three phases. The first two phases use ETL (extract, transform, and load)
process to handle the data.
1. Data Integration
An organization needs to deal with the data that comes from different sources.
First, extract the data from different sources which can be your separate database, servers, etc.
Then the data is integrated into a standard format and stored at a common area that's called staging
area.
2. Data Processing
Still, the integrated data is not ready for visualization because the data needs processing before it
can be presented.
This data is pre-processed. For example, the missing values or redundant values will be removed
from the data sets.
After that, the business rules will be applied to the data, and it transforms into presentable data.
Then this data will be loaded into the data warehouse.
3. Data presentation
Once the data is loaded and processed, then it can be visualized much better with the use of
various visualization that Power BI offers.
By using of dashboard and reports, we represent the data more intuitively.
These visual reports help business end-users to take business decision based on the insights.
Power BI Desktop
It is a primary authoring and publishing tool.
Power BI users and developers use it to create brand new models and reports.
Power BI Desktop tool is available at free of cost.
Power BI Service
The Power BI data modules, dashboards, and reports are hosted in the online software as a service
(SaaS).
Sharing, administration, and collaboration happen in the cloud.
Power BI Service tool is available at the pro license, and the user has to pay $10 per month in
2022.
1. Secure Report Publishing: You can automate setup data refresh and publish reports that allowing
all the users to avail the latest information.
2. No Memory and Speed Constraints: To Shift an existing BI system into a powerful cloud
environment with Power BI embedded eliminates memory. Speed constraints ensure that data is
quickly retrievable and analyzed.
3. No Specialized Technical Support required: The Power BI provides quick inquiry and analysis
without the need for specialized technical support. It also supports a powerful natural language
interface and the use of intuitive graphical designer tools.
4. Simple to Use: Power BI is simple to use. Users can easily find it only on behalf of a short learning
curve.
5. Constant innovation: The Power BI product is updated in every month with new functions and
features.
6. Rich, personalized dashboard: The crowning feature of Power BI is the information dashboards
that can be customized to meet the exact need of any enterprise. You can easily embed the
dashboards, and BI reports in the applications to provide a unified user experience.
Disadvantages
Here are some disadvantages of Power BI, as shown below:
1. Dashboards and reports are only shared with the users who are having the same email domains.
2. Power BI will not merge imported data that is accessed from real-time connections.
3. Power BI only accepts the file size maximum 250 Mb and the zip file which is compressed by the
data of the x-velocity in-memory database.
4. Dashboard never accepts or pass user, account, or any other entity parameters.
5. Very few data sources permit real-time connections to Power BI reports and dashboards.
Step 3: Now, you will redirect to a Microsoft Store and then select the Get button.
Step 5: You can see "welcome to Power BI Desktop" screen and then register yourself on the
desktop.
Step 6: When you run the Power BI desktop, it displayed the home page or welcome screen.
It starts import excel workbook and creating report view worksheets shown in the below screenshot.
In the below screenshot, you can see the discount analysis of the imported dataset in the form of
tiles.
A. This report has four pages (or tabs) and you're currently viewing the Sentiment page.
B. On this page are five different visuals and a page title.
C. The Filters pane shows us one filter applied to all report pages. To collapse the Filters pane, select
the arrow (>).
D. The Power BI banner displays the name of the report and the last updated date. Select the arrow
to open a menu that also show the name of the report owner.
E. The action bar contains actions you can take on this report. For example, you can add a comment,
view a bookmark, or export data from the report. Select More options (...) to reveal a list of additional
report functionality.
On the left side, it shows a category of all the available data sources. You also have an option to
perform search operation at the top.
Let's see all the listed data sources in detail:
1. All
In this category, you can see all the available data sources of the Power BI desktop.
2. File
When you click on the File option, it shows you all the flat files supported in Power BI desktop. Select
any file type from the list
3. click on the Connect button to connect that file.
You need to pass the server name, user name, and password to connect. Also, you can connect
via a direct SQL query using the Advanced option.
You can also select connectivity mode - Import or DirectQuery.
Import: Import method allows to perform data transformations and manipulation. When you
publish the data to PBI service (limit 1 GB), it consumes and pushes data into Power BI Azure
backend and data can be refreshed up to 8 times a day and a schedule can be set up for data
refresh.
DirectQuery: It limits the option of data manipulation, and the data stays in the SQL database.
The DirectQuery is live, and there is no need to schedule refresh as in the Import method.
6. Online Services
The Power BI also allows you to connect to different online services such as Exchange, Salesforce,
Google Analytics, and Facebook.
Following screenshots showed the various options available under Online Services.
Power BI Embedded has benefits for an ISV, their developers, and the customers. For example,
an ISV can start creating the visuals for free with Power BI Desktop.
By minimizing the visual analytic development efforts, ISVs achieve faster time to market and
stand out from the competitors with differentiated data experiences.
Also, ISVs can opt to charge a premium for the additional value they create with embedded
analytics.
With Power BI Embedded, your customers don't need to know anything about Power BI. You can
use two different methods to create an embedded application:
Power BI Pro account
Service principle
The Power BI Pro account acts as the master account of your applications (think of it as a proxy
account). This account allows generating embed tokens which provide access to your
application's Power BI dashboards and reports.
Service principle can embed Power BI content into an application using an app-only token. It also
allows generating embed tokens which provide access to your application's Power BI dashboards
and reports.
Note: While embedding requires the Power BI service, customers do not need to have a Power
BI account to view the application embedded content.
According to Power BI, a cloud service creates a query which requires data from an on-premises
data source.
This query from cloud services goes to the gateway cloud service with encrypted credentials.
The gateway cloud services process and analyze the request and then forward it to the Azure
service bus.
You don't need to configure azure service bus separately because Power BI manages it by default.
The Azure service bus keeps all the requests to be sent forward to the on-premises data gateway.
The on-premises data gateway decrypted credentials for the data source and connect the user to
the data source.
The on-premises data gateway forwards the query sent from the cloud service to the on-premises
data source.
The data query is executed at a data source that can be SQL Server, SharePoint, files, SSAS, etc.
Result of the query is returned to On-premises data gateway by the data source.
The On-premises data gateway sends the result back to the cloud service via Azure Service Bus.
Visualization: The visualization is a type of chart or visuals that built by the Power BI designers.
These visuals display the data from the datasets and report.
For example, line graph, pie chart, bar charts, and other graphical representation of the source
data on a top geographical map, etc.
Reports: A report is a collection of one or more pages of interactive visuals, text, and graphics that
together make a single report.
For example, state, city report, sales by country, profit by-products report, logistic performance
report, etc.
Dashboards: Dashboard is a single layer presentation of multiple visualizations with interactive
visuals, text, and graphics. A dashboard collects the most important metrics, on one screen, to tell
a story or answer a question. The dashboard content comes from one or more datasets and one
or more reports.
For example, pie charts, bar charts, and geographical maps.
Datasets: The dataset is a collection of data which is used to create its visualization in Power BI.
For example, Oracle or SQL servers tables and excel sheets.
Tiles: The tile is a single visualization in the report or on the dashboards.
For example, the pie chart in reports or dashboard.
Power BI Report Server is a specific edition of SQL Server Reporting Services that can host Power
BI reports.
For running Power BI Report Server, you don’t need to have SQL Server installation disk; the
Report Server already comes with its setup files.
You can download set up files. Power BI report server can host Power BI Reports as well as
Reporting Services (SSRS) Reports.
With Power BI report server, there will be an instance of Power BI Desktop installation.
The Power BI Desktop edition that comes with the report server should be used to create Power
BI reports. Otherwise, reports cannot be hosted on the report server.
The Power BI Desktop report server edition is regularly updated, and its experience will be very
similar to the Power BI Desktop.
You can download the latest edition of Power BI report server from the below
link. https://powerbi.microsoft.com/en-us/report-server/
Types of Functions
Here are some important DAX functions:
a) Aggregate Functions
MIN
This DAX function returns the minimum numeric value in a column, or between the two scalar
expressions.
Syntax
MIN(<column>)
MAX
This DAX function returns the maximum value in a column, including any logical values and
numbers represented as text.
Syntax
MAX(<column>)
AVERAGE
This DAX function returns the arithmetic mean of the values in a column.
Syntax
AVERAGE(<column>)
SUM
This DAX function adds all the numbers in a column.
Syntax
SUM(<column>)
b) Count Function
COUNT
This DAX function is used to return the count of items in a column. If there are multiple
numbers of the same thing, this function will count it as separate items and not a single item.
Syntax
COUNT(<column>)
DISTINCTCOUNT
This DAX function is used to return the distinct count of items in a column. If there are multiple
numbers of the same thing, this function will count it as a single item.
Syntax
DISTINCTCOUNT(<column>)
HOUR
This DAX function returns the specified hour as a number from 0 to 23 (12:00 A.M. to 11:00
P.M.).
Syntax
HOUR(>datetime<)
d) Logical Function
AND
This DAX function performs logical AND(conjunction) on two expressions. For AND to return
true, both conditions specified have to be fulfilled.
Syntax
AND(<logical argument1>,<logical argument2>)
OR
This DAX function performs logical OR(disjunction) on two expressions. For OR to return true,
either of the two conditions specified has to be fulfilled.
Syntax
OR(<logical argument1>,<logical argument2>)
NOT
This DAX function performs logical NOT (negation) on given expression.
Syntax
NOT(<logical argument>)
e) Text function
CONCATENATE
This DAX function joins two text strings into one text string.
Syntax
CONCATENATE(<text1>, <text2>)
FIXED
This DAX function rounds a number to the specified number of decimals and returns the result
as text.
Syntax
FIXED(<number>, <decimals>, <no_commas>)
REPLACE
This DAX function replaces part of a text string, based on the number of characters you
specify, with a different text string.
Syntax
REPLACE(<old_text>, <start_num>, <num_chars>, <new_text>)
Calculated Columns
When you create a data model on the Power BI Desktop, you can extend a table by creating new
columns.
The content of the columns is defined by a DAX expression, evaluated row by row or in the context
of the current row across that table.
Measures
There is another way of defining calculations in a DAX model, useful if you need to operate on
aggregate values instead of on a row-by-row basis.
These calculations are measures. One of the requirements of DAX is a measure that needs to be
defined in a table.
The action does not belong to the table. So, you can move a measure from one table to another
one without losing its functionality.
Pages
Data Sources
Pinning
Filtering
Feature
Set alerts
Subscribe