0% found this document useful (0 votes)
92 views9 pages

Exploratory Data Analysis EDA On Power BI 1712874850

This document provides an overview of exploratory data analysis techniques in Power BI, including data import, transformation, cleaning, validation, exploration, visualization, statistical analysis, time series analysis, geographic analysis, and data quality checks. The techniques are demonstrated through examples of Power Query M code.

Uploaded by

Gaby Sánchez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views9 pages

Exploratory Data Analysis EDA On Power BI 1712874850

This document provides an overview of exploratory data analysis techniques in Power BI, including data import, transformation, cleaning, validation, exploration, visualization, statistical analysis, time series analysis, geographic analysis, and data quality checks. The techniques are demonstrated through examples of Power Query M code.

Uploaded by

Gaby Sánchez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

# [ Exploratory Data Analysis (EDA) on Power BI ] [ cheatsheet ]

1. Data Import

● Import data from CSV: Source = Csv.Document(File.Contents("file.csv"),


[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None])
● Import data from Excel: Source =
Excel.Workbook(File.Contents("file.xlsx"), null, true)
● Import data from SQL Server: Source = Sql.Database("server", "database",
[Query="SELECT * FROM table"])
● Import data from Web: Source =
Web.Page(Web.Contents("https://example.com"))

2. Data Transformation

● Remove columns: Table.RemoveColumns(Source, {"Column1", "Column2"})


● Rename columns: Table.RenameColumns(Source, {{"OldName", "NewName"}})
● Filter rows: Table.SelectRows(Source, each [Column] > 10)
● Sort rows: Table.Sort(Source, {{"Column", Order.Ascending}})
● Group rows: Table.Group(Source, {"Column"}, {{"Count", each
Table.RowCount(_), type number}})
● Merge queries: Table.NestedJoin(Source1, {"Key"}, Source2,
{"ForeignKey"}, "NewColumn", JoinKind.Inner)
● Append queries: Table.Combine({Source1, Source2})
● Pivot data: Table.Pivot(Source, List.Distinct(Source[ColumnToPivot]),
"ColumnToPivot", "ValueColumn", List.Sum)

3. Data Cleaning

● Remove duplicates: Table.Distinct(Source)


● Replace values: Table.ReplaceValue(Source, "OldValue", "NewValue",
Replacer.ReplaceText, {"Column"})
● Fill null values: Table.FillDown(Source, {"Column"})
● Handle errors: Table.ReplaceErrorValues(Source, {{"Column",
"DefaultValue"}})
● Trim whitespace: Table.TransformColumns(Source, {{"Column", Text.Trim,
type text}})
● Remove non-printable characters: Table.TransformColumns(Source,
{{"Column", each Text.Remove(_, {"0".."9", "a".."z", "A".."Z", " "}),
type text}})

By: Waleed Mousa


4. Data Validation

● Check for null values: Table.AddColumn(Source, "IsNull", each if [Column]


= null then "Yes" else "No")
● Check for empty values: Table.AddColumn(Source, "IsEmpty", each if
[Column] = "" then "Yes" else "No")
● Check for duplicate values: Table.AddColumn(Source, "IsDuplicate", each
if List.Contains(List.RemoveFirstN(Source[Column],
List.PositionOf(Source[Column], [Column])), [Column]) then "Yes" else
"No")
● Check for valid data types: Table.AddColumn(Source, "IsValid", each if
Value.Is([Column], type text) then "Yes" else "No")
● Check for valid ranges: Table.AddColumn(Source, "IsInRange", each if
[Column] >= 0 and [Column] <= 100 then "Yes" else "No")

5. Data Exploration

● View column data types: Table.TransformColumnTypes(Source, {{"Column1",


type text}, {"Column2", type number}})
● View column statistics: Table.Profile(Source, {"Column"})
● View unique values: Table.Distinct(Table.SelectColumns(Source,
{"Column"}))
● View top N rows: Table.FirstN(Source, 10)
● View bottom N rows: Table.LastN(Source, 10)
● View sample rows: Table.Sample(Source, 100, 1234)
● View missing values: Table.AddColumn(Source, "IsMissing", each if
[Column] = null then 1 else 0)
● View data distribution: Table.Profile(Source, {"Column"}, 0.1)

6. Data Visualization

● Create a bar chart: BarChart = Table.Group(Source, {"Category"},


{{"Value", each List.Sum([Value]), type number}})
● Create a line chart: LineChart = Table.Group(Source, {"Date"}, {{"Value",
each List.Sum([Value]), type number}})
● Create a pie chart: PieChart = Table.Group(Source, {"Category"},
{{"Value", each List.Sum([Value]), type number}})
● Create a scatter plot: ScatterPlot = Table.Group(Source, {"X", "Y"},
{{"Value", each List.Sum([Value]), type number}})

By: Waleed Mousa


● Create a treemap: Treemap = Table.Group(Source, {"Category",
"Subcategory"}, {{"Value", each List.Sum([Value]), type number}})
● Create a heatmap: Heatmap = Table.Pivot(Source,
List.Distinct(Source[Row]), "Row", List.Distinct(Source[Column]),
"Column", "Value", List.Sum)
● Create a funnel chart: FunnelChart = Table.Group(Source, {"Stage"},
{{"Value", each List.Sum([Value]), type number}})
● Create a gauge chart: GaugeChart = Table.Group(Source, {"Category"},
{{"Value", each List.Sum([Value]), type number}})

7. Statistical Analysis

● Calculate mean: Mean = List.Average(Source[Column])


● Calculate median: Median = List.Median(Source[Column])
● Calculate mode: Mode = Table.Group(Source, {"Column"}, {{"Count", each
Table.RowCount(_)}})[Column]{List.MaxN(Table.Group(Source, {"Column"},
{{"Count", each Table.RowCount(_)}}) [Count], 1)}
● Calculate standard deviation: StandardDeviation =
List.StandardDeviation(Source[Column])
● Calculate variance: Variance = List.Variance(Source[Column])
● Calculate minimum value: Minimum = List.Min(Source[Column])
● Calculate maximum value: Maximum = List.Max(Source[Column])
● Calculate quartiles: Quartiles = {List.Percentile(Source[Column], 0.25),
List.Percentile(Source[Column], 0.5), List.Percentile(Source[Column],
0.75)}
● Calculate correlation: Correlation = Table.RowCount(Source) > 0 ?
List.Correlation(Source[Column1], Source[Column2]) : null
● Perform t-test: TTest = List.TTest(Source[Column1], Source[Column2],
0.95, 0)

8. Time Series Analysis

● Convert to date type: Table.TransformColumnTypes(Source, {{"Date", type


date}})
● Extract year from date: Table.TransformColumns(Source, {{"Year", each
Date.Year([Date]), type number}})
● Extract month from date: Table.TransformColumns(Source, {{"Month", each
Date.Month([Date]), type number}})
● Extract day from date: Table.TransformColumns(Source, {{"Day", each
Date.Day([Date]), type number}})

By: Waleed Mousa


● Extract day of week: Table.TransformColumns(Source, {{"DayOfWeek", each
Date.DayOfWeek([Date]), type number}})
● Extract day of year: Table.TransformColumns(Source, {{"DayOfYear", each
Date.DayOfYear([Date]), type number}})
● Extract quarter from date: Table.TransformColumns(Source, {{"Quarter",
each Date.QuarterOfYear([Date]), type number}})
● Calculate moving average: Table.AddColumn(Source, "MovingAverage", each
List.Average(List.Range(Source[Value], [Index] - 2, 3)))
● Calculate year-over-year growth: Table.Group(Source, {"Year"}, {{"Value",
each List.Sum([Value])}})

9. Geographic Analysis

● Create a map visualization: Map = Table.AddColumn(Source, "Location",


each Text.Combine({Text.From([Latitude], "en-US"), ",",
Text.From([Longitude], "en-US")}))
● Calculate distance between points: Distance = (6371 *
Number.Acos(Number.Cos(Number.Radians(90 - [Latitude1])) *
Number.Cos(Number.Radians(90 - [Latitude2])) +
Number.Sin(Number.Radians(90 - [Latitude1])) *
Number.Sin(Number.Radians(90 - [Latitude2])) *
Number.Cos(Number.Radians([Longitude1] - [Longitude2]))))
● Identify nearest location: NearestLocation = Table.AddColumn(Source,
"NearestLocation", each
Text.Combine({Text.From(List.Min(Table.AddColumn(Source, "Distance", each
Distance([Latitude], [Longitude], [Latitude1], [Longitude1]))[Distance]),
"en-US"), ",", Text.From(List.Min(Table.AddColumn(Source, "Distance",
each Distance([Latitude], [Longitude], [Latitude1],
[Longitude1]))[Longitude]), "en-US")}))
● Create a choropleth map: ChoroplethMap = Table.Group(Source, {"Region"},
{{"Value", each List.Sum([Value])}})

10. Data Insights

● Identify top N categories: TopCategories =


Table.FirstN(Table.Sort(Table.Group(Source, {"Category"}, {{"Value", each
List.Sum([Value])}}), {{"Value", Order.Descending}}), 5)
● Identify bottom N categories: BottomCategories =
Table.LastN(Table.Sort(Table.Group(Source, {"Category"}, {{"Value", each
List.Sum([Value])}}), {{"Value", Order.Ascending}}), 5)
● Identify trending categories: TrendingCategories =
Table.AddColumn(Table.Group(Source, {"Category", "Date"}, {{"Value", each

By: Waleed Mousa


List.Sum([Value])}}), "PreviousValue", each
Table.RowCount(Table.SelectRows(Source, each [Date] < [Date] and
[Category] = [Category])) > 0 ? Table.Max(Table.SelectRows(Source, each
[Date] < [Date] and [Category] = [Category])[Value]) : null)
● Identify anomalies: Anomalies = Table.AddColumn(Source, "IsAnomaly", each
if [Value] < List.Average(Source[Value]) - 2 *
List.StandardDeviation(Source[Value]) or [Value] >
List.Average(Source[Value]) + 2 * List.StandardDeviation(Source[Value])
then "Yes" else "No")
● Identify patterns: Patterns = Table.AddColumn(Source, "Pattern", each if
[Value] > 1000 then "High" else if [Value] < 500 then "Low" else
"Medium")

11. Data Quality

● Calculate missing value percentage: MissingValuePercentage =


Table.RowCount(Table.SelectRows(Source, each
List.Any(Record.FieldValues(_), each _ = null))) / Table.RowCount(Source)
● Calculate duplicate value percentage: DuplicateValuePercentage =
(Table.RowCount(Source) - Table.RowCount(Table.Distinct(Source))) /
Table.RowCount(Source)
● Calculate outlier percentage: OutlierPercentage =
Table.RowCount(Table.SelectRows(Source, each [Value] <
List.Average(Source[Value]) - 2 * List.StandardDeviation(Source[Value])
or [Value] > List.Average(Source[Value]) + 2 *
List.StandardDeviation(Source[Value]))) / Table.RowCount(Source)
● Identify data type mismatches: DataTypeMismatches =
Table.AddColumn(Source, "DataTypeMismatch", each if Value.Is([Column],
type text) and not Text.Contains([Column], ".") then "Yes" else "No")
● Identify inconsistent formats: InconsistentFormats =
Table.AddColumn(Source, "InconsistentFormat", each if
Text.Contains([Column], "/") and Text.Contains([Column], "-") then "Yes"
else "No")

12. Data Transformation

● Pivot data: PivotedData = Table.Pivot(Source,


List.Distinct(Source[ColumnToPivot]), "ColumnToPivot", "ValueColumn")
● Unpivot data: UnpivotedData = Table.UnpivotOtherColumns(Source,
{"KeyColumn"}, "Attribute", "Value")
● Transpose data: TransposedData = Table.Transpose(Source)

By: Waleed Mousa


● Split column by delimiter: SplitColumn = Table.SplitColumn(Source,
"ColumnToSplit", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),
{"Column1", "Column2"})
● Merge columns: MergedColumn = Table.AddColumn(Source, "MergedColumn",
each Text.Combine({[Column1], [Column2]}, " "))
● Create conditional column: ConditionalColumn = Table.AddColumn(Source,
"ConditionalColumn", each if [Column1] > 10 then "High" else "Low")
● Group and aggregate data: GroupedData = Table.Group(Source,
{"GroupColumn"}, {{"AggregatedValue", each List.Sum([Value]), type
number}})

13. Data Modeling

● Create a calendar table: CalendarTable =


Table.AddColumn(Table.TransformColumnTypes(Table.FromList(List.Dates(#dat
e(2020, 1, 1), 365, #duration(1, 0, 0, 0)), {{"Date", type date}}),
"Year", each Date.Year([Date]))
● Create a date dimension: DateDimension =
Table.TransformColumnTypes(Table.FromList(List.Distinct(Table.TransformCo
lumns(Source, {{"Date", each Date.From([Date]), type date}}))), {{"Date",
type date}, {"Year", Int64.Type}, {"Month", Int64.Type}, {"Day",
Int64.Type}, {"DayOfWeek", Int64.Type}})
● Create a slowly changing dimension (SCD): SCDimension =
Table.Distinct(Table.Buffer(Table.NestedJoin(Source, {"ProductID"},
Table.AddIndexColumn(Table.SelectColumns(Table.Distinct(Table.SelectColum
ns(Source, {"ProductID", "ProductName", "Category"})), {"ProductID",
"ProductName", "Category"}), "Version", 1, 1), {"ProductID", "Version"},
"Product", JoinKind.FullOuter)))
● Create a fact table: FactTable = Table.NestedJoin(Source, {"OrderID"},
Table.AddIndexColumn(Table.Distinct(Table.SelectColumns(Source,
{"OrderID", "CustomerID", "ProductID", "OrderDate", "Quantity",
"TotalAmount"})), "FactID", 1, 1), {"OrderID"}, "Fact", JoinKind.Inner)
● Create a star schema: StarSchema = Table.NestedJoin(FactTable,
{"CustomerID"},
Table.AddIndexColumn(Table.Distinct(Table.SelectColumns(Source,
{"CustomerID", "CustomerName", "Country"})), "CustomerKey", 1, 1),
{"CustomerID", "CustomerKey"}, "Customer", JoinKind.Inner)

14. Advanced Analytics

● Perform market basket analysis: MarketBasket =


Table.AddColumn(Table.Group(Table.Distinct(Table.SelectColumns(Source,

By: Waleed Mousa


{"OrderID", "ProductID"})), {"OrderID"}, {{"Products", each
Text.Combine([ProductID], ","), type text}}), "SupportCount", each
Table.RowCount(Table.SelectRows(Source, each
List.Contains(Text.Split([Products], ","), [ProductID]))))
● Perform customer segmentation: CustomerSegmentation =
Table.AddColumn(Table.Group(Source, {"CustomerID"}, {{"TotalSpend", each
List.Sum([TotalAmount]), type number}, {"VisitFrequency", each
Table.RowCount(_), type number}, {"Recency", each
Date.From(List.Max([OrderDate])), type date}}), "Segment", each if
[TotalSpend] > 1000 and [VisitFrequency] > 10 and
Date.IsInPreviousNMonths([Recency], 3) then "High Value" else if
[TotalSpend] > 500 and [VisitFrequency] > 5 and
Date.IsInPreviousNMonths([Recency], 6) then "Mid Value" else "Low Value")
● Perform cohort analysis: CohortAnalysis =
Table.Group(Table.AddColumn(Source, "CohortMonth", each
Date.StartOfMonth([OrderDate])), {"CohortMonth", "CustomerID"},
{{"TotalSpend", each List.Sum([TotalAmount]), type number},
{"VisitFrequency", each Table.RowCount(_), type number}})
● Perform RFM analysis: RFMAnalysis = Table.AddColumn(Table.Group(Source,
{"CustomerID"}, {{"Recency", each Date.From(List.Max([OrderDate])), type
date}, {"Frequency", each Table.RowCount(_), type number}, {"Monetary",
each List.Sum([TotalAmount]), type number}}), "RFMScore", each
Text.Combine({Text.Range(Text.From(Date.DayOfYear([Recency])), 0, 1),
Text.Range(Text.From([Frequency]), 0, 1),
Text.Range(Text.From([Monetary]), 0, 1)}))
● Perform customer lifetime value analysis: CustomerLifetimeValue =
Table.AddColumn(Table.Group(Source, {"CustomerID"}, {{"TotalSpend", each
List.Sum([TotalAmount]), type number}, {"VisitFrequency", each
Table.RowCount(_), type number}, {"AverageOrderValue", each
List.Average([TotalAmount]), type number}, {"CustomerLifetime", each
Duration.Days(DateTime.LocalNow() - Table.Min(_[OrderDate])), type
number}}), "CLV", each [AverageOrderValue] * [VisitFrequency] *
[CustomerLifetime] / 365)

15. Data Storytelling

● Create a KPI visual: KPIVisual = Table.AddColumn(Table.Group(Source,


{"Category"}, {{"TotalSales", each List.Sum([Sales]), type number},
{"TargetSales", each List.Sum([Target]), type number}}), "Status", each
if [TotalSales] >= [TargetSales] then "Meeting Target" else "Below
Target")

By: Waleed Mousa


● Create a trend visual: TrendVisual = Table.AddColumn(Table.Group(Source,
{"Date"}, {{"Sales", each List.Sum([Sales]), type number}}),
"PreviousSales", each #"Sales"{[Index] - 1})
● Create a comparison visual: ComparisonVisual = Table.Group(Source,
{"Category", "Date"}, {{"ThisYearSales", each List.Sum([This Year
Sales]), type number}, {"LastYearSales", each List.Sum([Last Year
Sales]), type number}})
● Create a distribution visual: DistributionVisual = Table.Group(Source,
{"AgeGroup"}, {{"Sales", each List.Sum([Sales]), type number}})
● Create a relationship visual: RelationshipVisual =
Table.NestedJoin(Table.Group(Source, {"CustomerID"}, {{"TotalSales", each
List.Sum([Sales]), type number}}), {"CustomerID"}, Table.Group(Source,
{"CustomerID", "ProductCategory"}, {{"CategorySales", each
List.Sum([Sales]), type number}}), {"CustomerID"}, "CustomerProduct",
JoinKind.Inner)

16. Reporting and Dashboard Design

● Create a dynamic title: DynamicTitle = "Sales Analysis - " &


Text.From(List.Min(Source[OrderDate]), "MMMM YYYY") & " to " &
Text.From(List.Max(Source[OrderDate]), "MMMM YYYY")
● Create a drill-through report: DrillThroughReport =
Table.SelectRows(Source, each [OrderID] = OrderIDParameter)
● Create a conditional formatting rule: ConditionalFormatting = if [Sales]
< 1000 then "Red" else if [Sales] < 5000 then "Yellow" else "Green"
● Create a tooltip: Tooltip = "Sales: " & Text.From([Sales], "$#,0.00") & "
| Quantity: " & Text.From([Quantity], "#,0")
● Create a custom visual: CustomVisual =
Table.ToColumns(Table.Group(Source, {"Category"}, {{"Sales", each
List.Sum([Sales]), type number}}))
● Create a responsive layout: ResponsiveLayout =
Table.Combine({Table.SelectColumns(Source, {"Category"}),
Table.SelectColumns(Source, {"Sales"})})

17. Data Refresh and Scheduling

● Refresh data manually: ManualRefresh = Table.Refresh(Source)


● Schedule data refresh: ScheduledRefresh = Table.Buffer(Source, 1440)
● Incremental refresh: IncrementalRefresh = Table.SelectRows(Source, each
[OrderDate] > MaxDate)

By: Waleed Mousa


● Refresh a specific table: TableRefresh = Table.Refresh(Source, {"Table1",
"Table2"})
● Refresh multiple data sources: MultiSourceRefresh =
Table.Combine({Table.Refresh(Source1), Table.Refresh(Source2)})

By: Waleed Mousa

You might also like