Handout
Handout
PART I BACKGROUND
"Hello, World!"
If the string itself needs to contain the double-quote character, the simplest way to accommodate that need is
to escape each double quote by doubling it:
"He said ""run"" and so I did." // evaluates to the string: He said "run" and
so I did.
Strings can also span multiple lines.
"Hello
World"
However, at times (perhaps often), you may find it more convenient to code multi-line strings using a single
line of source code. To do this, use an escape sequence. These sequences start with “#(“, contain an escape
code (or several codes, as we’ll see in a moment) and end with “)”.
cr Carriage Return
lf Line Feed
tab Tab
Below, a line feed character is encoded into a string using an escape sequence.
"Hello#(lf)World"
Multiple escape codes can be combined inside an escape sequence by separating them with commas. However,
no extra whitespace is allowed (interesting—this is one of the few places in Power Query where whitespace
between language elements is relevant).
"Hello#(cr,lf)World"
"Hello#(cr)#(lf)World" // evaluates to the same string as the preceding
OPERATORS
Once you have a value of type text—whether you hand-coded it using a string literal or received it as the output
value from another expression, you can work with it using several operators.
Comparison
Text support standard comparison operators (=, <>, >=, >, <=, <). Using them is straightforward.
Concatenation
Text values can be combined (concatenated) using the combination operator (&):
"Good" & " " & "Morning!" // evaluates to: Good Morning!
However, Power Query does not implicitly convert non-text values to text when they are concatenated with
text. The below isn’t allowed. Text and numbers can’t be directly concatenated together.
"You have " & 5 & " left."
Don’t worry, though! It’s easy to adapt the above so it works. Simply use a library function to convert the
number to text.
"You have " & Text.From(0.5) & " left." // evaluates to: You have 0.5 left.
"You have " & Number.ToText(0.5, "P") & " left." // evaluates to: You have
50.00% left.
Combining text with null produces null.
2
EXERCISES
Please study the Text functions https://learn.microsoft.com/en-us/powerquery-m/text-functions
Information
Name Description
Text.InferNumberType Infers the granular number type (Int64.Type, Double.Type, and so on) of a number
encoded in text.
Text Comparisons
Name Description
Text.FromBinary Decodes data from a binary value in to a text value using an encoding.
Value.FromText Decodes a value from a textual representation, value, and interprets it as a value
with an appropriate type. Value.FromText takes a text value and returns a
number, a logical value, a null value, a DateTime value, a Duration value, or a text
value. The empty text value is interpreted as a null value.
Extraction
Name Description
3
Name Description
Text.Range Returns a number of characters from a text value starting at a zero-based offset and for count
number of characters.
Text.Start Returns the count of characters from the start of a text value.
Text.End Returns the number of characters from the end of a text value.
Modification
Name Description
Text.Insert Returns a text value with newValue inserted into a text value starting at a zero-based
offset.
Text.Remove Removes all occurrences of a character or list of characters from a text value. The
removeChars parameter can be a character value or a list of character values.
Text.ReplaceRange Replaces length characters in a text value starting at a zero-based offset with the new
text value.
Text.Select Selects all occurrences of the given character or list of characters from the input text
value.
Membership
Name Description
Text.Contains Returns true if a text value substring was found within a text value string; otherwise,
false.
Text.EndsWith Returns a logical value indicating whether a text value substring was found at the end
of a string.
Text.PositionOf Returns the first occurrence of substring in a string and returns its position starting at
startOffset.
Text.PositionOfAny Returns the first occurrence of a text value in list and returns its position starting at
startOffset.
Text.StartsWith Returns a logical value indicating whether a text value substring was found at the
beginning of a string.
4
Transformations
Name Description
Text.BetweenDelimiters Returns the portion of text between the specified startDelimiter and
endDelimiter.
Text.Clean Returns the original text value with non-printable characters removed.
Text.Combine Returns a text value that is the result of joining all text values with each value
separated by a separator.
Text.PadEnd Returns a text value padded at the end with pad to make it at least length
characters.
Text.PadStart Returns a text value padded at the beginning with pad to make it at least length
characters. If pad is not specified, whitespace is used as pad.
Text.Proper Returns a text value with first letters of all words converted to uppercase.
Text.Repeat Returns a text value composed of the input text value repeated a number of
times.
Text.Split Returns a list containing parts of a text value that are delimited by a separator
text value.
Text.SplitAny Returns a list containing parts of a text value that are delimited by any separator
text values.
Text.TrimEnd Removes any occurrences of the characters specified in trimChars from the end
of the original text value.
Text.TrimStart Removes any occurrences of the characters in trimChars from the start of the
original text value.
5
NUMBER DATA TYPE
You might think that working with numbers would be so simple we’d hardly need to talk about them. However,
there’s a got-ya that can bite: if you’re not careful, you can end up with arithmetic not producing the results
you expect! After we go over M’s syntax for numeric literals, we’ll talk about this potential pain-point and how
to not let it cause unexpected complications.
Also, in M, columns can be tagged to identify the specific kind of numbers they contain. Properly setting this
subtype can improve performance and storage as well as enhance the default formatting used for the
column’s values. We’ll learn how to do this (it’s easy!).
LITERAL SYNTAX
Literal numbers can be typed out as whole numbers:
0
5
-7
Decimals:
2.5
0.5
.5
In exponential form:
1.2e10
1.2e-5
1.2E-5
…and using hexadecimal syntax:
0xFF
0xff
0XFF
0Xff
Above, notice how both the exponent indicator (the “e”) and the characters used in the hexadecimals are case-
insensitive. Upper- and lower-case are allowed and behave identically.
In the case of numbers with decimal points, no digits are required before the decimal point but at least one is
required after it.
.5 // valid - no digit before the decimal
.2e25 // valid - no digit before the decimal
5. // invalid - must have at least one digit after the decimal
2.e25 // invalid - must have at least one digit after the decimal
6
Special Numbers
In addition to ordinary numbers, the special “numbers” infinity and not-a-number are supported.
Negation
Numbers can be negated using the unary minus operator—a fancy way of saying that a number can be changed
to its negative by proceeding it with a minus sign.
-5
-5e-2
-#infinity // produced by an expression like -1/0
(Note: Syntax-wise, it’s permissible to write -#nan. However, this won’t negate not-a-number; instead, the ‘-’
will quietly be ignored. This makes sense, as not-a-number is a sign-less concept.)
Just a Style
The various styles used to type out numbers are just that: styles. The different options are offered as a
convenience to the mashup author. Each style’s syntax creates an expression that produces a numeric value.
If two expressions evaluate to the same number, their values are equal—even if the expressions are written
using different syntax styles.
Both the arithmetic (+, -, *, /) and equality (=) operators always use double precision. Double
precision sacrifices some accuracy for the sake of efficiency. Decimal precision—potentially slower but
producing more accurate results—is available but must be explicitly requested, when desired.
“Wait a minute!,” you might say. “I thought computes don’t lie. This just doesn’t make sense. Are you saying
that two numbers added together might not always equal the sum of those two numbers?!”
What should 0.1 + 0.2 equal? 0.3, correct?! What about a very large number plus 1—shouldn’t it equal that
very large number plus 1?! Well, both are not necessarily so with double precision:
7
0.1 + 0.2 // 0.30000000000000004
10000000000000000 + 1 // 10000000000000000
In contrast, decimal precision produces the expected, exactly accurate totals.
Remember back in grade school, you learned that 1/3 can’t be precisely represented as a finite decimal. It’s
0.33333333…, with the threes repeating forever and ever. Since we can’t write an infinite number of threes, if
we’re going to work with 1/3 in decimal form we have to pick a level of accuracy that’s good enough for the
situation at hand. For example, if an item is priced as 3 for a dollar, we could represent its price in decimal form
as $0.33.
Did you notice the precision loss? Each item costs 1/3 dollar. However, if you multiply $0.33 times 3, the result
isn’t $1. Instead, it’s $0.99. A loss of $0.01 occurred. This loss could have been avoided if we worked with the
price in its fractional form (1/3 dollar * 3 = exactly $1). However, it’s often more convenient to work with
monetary amounts in decimal form—it’s often so much more convenient that we’re okay with tolerating a
small loss in accuracy so that we can do it.
Notice that the loss of accuracy wasn’t because we were careless in how we did arithmetic. Instead, the
inaccuracy stemmed from the fact that the number system we used (decimal) couldn’t precisely represent the
fractional value 1/3. To work around this limitation, an approximation was used (0.33). While all the math done
on the approximation was 100% correct and the result was 100% predicable, the result was not 100% accurate
because it was based on an approximation.
8
It’s much the same with computers. To make a long story short, in a nutshell, similar to how we can use
fractions or decimals to represent numbers, computers also have multiple ways to handle numbers. Our choice
between decimals and fractions involved deciding between more convenient to work with and exactly
accurate. With computers, the options are different, because computers think differently than the human
brain, but the same trade-off applies: convenience vs. accuracy Since Power Query mashups are executed by
computers, the same choice applies to working with numbers in M.
Specifically, in M, double precision (conforms to IEEE Standard for Floating-Point Arithmetic [IEEE 754-2008])
is more convenient for the computer to work with, offering the potential for better computational performance
and more efficient storage. Decimal precision is more accurate.
(For reference: Microsoft Excel does all its math at double precision, with an extra limit imposed on the number
of significant digits. If you are an Excel user, you’ve already been living in, and likely operating successfully in,
a world with double precision ramifications, whether or not you’ve realized it.)
Which to Choose?
Perhaps it would help to think of it this way: If you’re comparing the distance between various stars, does being
exactly accurate to the smallest nuance matter? Likely not because that level of accuracy is probably irrelevant
considering the scale of the numbers you’re working with (trillions of miles). Given the context, losing a little
preciseness to gain improved efficiency is probably a reasonable choice. On the other hand, you probably don’t
want your bank sacrificing a little accuracy for improved efficiency when they calculate the balance of your
bank account. In this context, accuracy is paramount, regardless of the efficiency cost.
In M, this accuracy is achieved by using Value functions to mandate decimal precision (as several examples
above demonstrate). The necessity of using Value functions to achieve decimal precision applies even when
the numbers at hand are already stored as decimals or as non-floating point numbers. The standard arithmetic
and comparison operators convert their inputs to double precision, regardless the source arguments’ storage
precision. If you want decimal precision for operations, the only way to get it is to specify it!
9
Back to the example of adding 0.1 + 0.2: Let’s pretend these values represent dollar amounts where only at
most the first two digits after the decimal point matter. By rounding away the irrelevant digits, we can convert
the long ugly result of the double precision computation into the 100% accurate result we expected.
Let’s jump back to the example of representing 1/3 as a decimal. What if we convert this fraction to $0.333
instead of $0.33? Multiplying $0.333 produces $0.999 which rounded to two digits is $1—the exactly accurate
total we were looking for.
Nice…but now try this: Multiply 10 items times the $0.333 price. The total is $3.33. Add ten more, then yet 10
more. The sum is $9.99 ($3.33 + $3.33 + $3.33). However, that sum represents 30 items at 1/3 dollar each,
which should equal a total of exactly $10. Ouch! Rounding didn’t help us here.
Rounding is an option to keep in your back pocket for when it’s helpful—but be careful that you don’t treat it
as a “one size fits all, solves all problems with double precision math” solution. Using it can be excellent—if you
understand exactly how it will behave in your context. Otherwise, using it can bite!
10
Decimal Number corresponds with M’s base number type; the others are subtype claims (or facets) of type
number. Choosing one of this dropdown’s options tags the column as containing values of the specific type or
pseudo-subtype and converts the data in that column to align with the chosen type or the chosen subtype
claim.
For example, if you specify that a column is Whole Number, the column will be tagged as containing whole
numbers and any numbers in that column with decimal components will be rounded (e.g. 1.75 becomes 2).
However, the values in the column will still be of type number; “whole number” (technically, Int64) is just a
claim, not a true data type from the mashup engine’s perspective.
This type/pseudo-subtype tagging can have (significant) benefits outside of Power Query. When the data
output by your mashup is handed to the tool hosting Power Query (e.g. Microsoft Excel, Microsoft Power BI or
Microsoft SSIS), that tool can use this type information to better handle the values in the column, potentially
leading to more efficient storage and calculations as well as improved styling of the column’s values. In the
case of the latter, the host environment might prefix a dollar sign (or other culturally-appropriate currency
indicator) to numbers from a column tagged as currency.
Best Practice Recommendation: Ensure that table columns containing numbers have their type set to the most
specific applicable number type or subtype claim.
EXERCISE
Please study the Number function https://learn.microsoft.com/en-us/powerquery-m/number-functions
INFORMATION
Name Description
Byte.From Returns an 8-bit integer number value from the given value.
11
Name Description
Int8.From Returns a signed 8-bit integer number value from the given value.
Int16.From Returns a 16-bit integer number value from the given value.
Int32.From Returns a 32-bit integer number value from the given value.
Int64.From Returns a 64-bit integer number value from the given value.
ROUNDING
Name Description
Number.RoundDown Returns the largest integer less than or equal to a number value.
Number.RoundUp Returns the larger integer greater than or equal to a number value.
OPERATIONS
Name Description
12
Name Description
Number.Combinations Returns the number of combinations of a given number of items for the optional
combination size.
Number.IntegerDivide Divides two numbers and returns the whole part of the resulting number.
Number.Mod Divides two numbers and returns the remainder of the resulting number.
Number.Permutations Returns the number of total permutations of a given number of items for the
optional permutation size.
Number.Sign Returns 1 for positive numbers, -1 for negative numbers or 0 for zero.
RANDOM
Name Description
Number.RandomBetween Returns a random number between the two given number values.
TRIGONOMETRY
Name Description
13
Name Description
BYTES
Name Description
Number.BitwiseAnd Returns the result of a bitwise AND operation on the provided operands.
Number.BitwiseNot Returns the result of a bitwise NOT operation on the provided operands.
Number.BitwiseShiftLeft Returns the result of a bitwise shift left operation on the operands.
Number.BitwiseShiftRight Returns the result of a bitwise shift right operation on the operands.
Number.BitwiseXor Returns the result of a bitwise XOR operation on the provided operands.
DATE
Type date holds, well, can you guess? A date!
#date(2018, 4, 27) // year, month, day - April 27, 2018
#date(3000, 12, 4) // December 4, 3000
Years between 1 and 9999 are supported. While date is great for a long way into the future, it not usable for
dates Before Christ (B.C.).
#date(-25, 6, 10) // not allowed - year can’t be before the first century AD
14
TIME
Type time is used for time values (no surprise here!). Fractions of a second are supported, down to a 100-
nanosecond level of precision.
#time(11, 15, 25) // hh, mm, ss — 11:15:25 AM
#time(13, 0, 0) // 1:00:00 PM
#time(13, 0, 0.53257) // 1:00:00.5325700 PM
Keep in mind that a time value is different from a duration value (which we’ll talk about shortly).
Type time represent a moment in time, a value that can be displayed on the face of a 24-hour
clock. duration represents a quantity of time. A time of 1:00 AM represents, well, exactly that, 1:00 AM. In
contrast, a duration of 1:00 represents the fact that one hour elapsed since the start—but without telling you
when that start was.
time’s range is from the stroke of midnight through the stroke of midnight 24 hours later.
#time( 0, 0, 0) // 12:00:00 AM (stroke of midnight)
#time(24, 0, 0) // 12:00:00 AM (stroke of midnight, 24 hours after starting)
Distinct Midnights?
Both 00:00 and 24:00 refer to midnight. Both refer to exactly the same point on the clock face. So, in one sense,
they are one and the same.
However, from the human perspective, sometimes we want to differentiate between the two. We sometimes
use 24:00 to refer to midnight when, from our perspective, it ends the day and 00:00 to refer to midnight when,
again from our perspective, it begins the day. Say, you want to describe the fact that you worked from 10 pm
to midnight using a 24-hour clock. You might say you worked from 22:00 to 24:00. On the other hand, if you
wanted to say you worked from midnight to 2 AM, you’d probably say 00:00 to 02:00. In this sense, the times
00:00 and 24:00 are different.
So, 00:00 and 24:00 are both the same and different. This paradox spills over into how M handles the two.
In M, both #time(0, 0, 0) and #time(24, 0, 0) refer to the point on the clock face where hour = 0, minute
= 0 and second = 0.
Time.ToRecord(#time( 0, 0, 0)) // [Hour = 0, Minute = 0, Second = 0]
Time.ToRecord(#time(24, 0, 0)) // [Hour = 0, Minute = 0, Second = 0]
However, the two value are not equal and, when converted to numbers, return different values:
15
be distinct, the equality operator will serve you well (as demonstrated by the previous example). If the two
should be treated as equivalent, you’ll need to use more involved logic—perhaps something like:
Time.ToRecord(#time( 0, 0, 0)) = Time.ToRecord(#time( 0, 0, 0)) // true
Time.ToRecord(#time( 0, 0, 0)) = Time.ToRecord(#time(24, 0, 0)) // true
Time.ToRecord(#time(24, 0, 0)) = Time.ToRecord(#time(24, 0, 0)) // true
DATETIME
Combine the ideas of date and time together and you get type datetime.
#datetime(2018, 4, 30, 15, 30, 15) // yyyy, mm, dd, hh, mm, ss -- April 30,
2018 3:30:15 PM
The time part of datetime differs in behavior from type time in one aspect. time supports the special time of
24 hours, 0 minutes, 0 seconds; datetime doesn’t. With datetime, to indicate a point in time exactly 24 hours
after the start of a day, simply set the datetime to the start of the next day.
DATETIMEZONE
datetimezone takes the idea of datetime and adds a time zone to it. The time zone is defined as an offset of
hours and minutes from UTC, not as a friendly name like “Eastern Standard Time” or “Australian Central Time.”
#datetimezone(2018, 4, 30, 15, 30 ,15, 04, 00) // yyyy, mm, dd, hh, mm, ss,
hours +/- UTC, minutes +/- UTC -- 4/30/2018 3:30:15 PM +04:00
DURATION
duration represents a quantity of time. It doesn’t hold a specific date or time. Instead, it represents an amount
minutes is identical to a duration that’s initialized as 4 hours because duration knows that 60 minutes equals 1
hour.
#duration(0, 4, 0, 0) // 4h
#duration(0, 0, 240, 0) // the resulting duration is identical to the previous
because it represents the same quantity of time
16
When initializing, positive and negative argument values can even be combined to produce a duration equal
to their sum:
#duration(2, -24, 0, 0) // duration of 1 day ((2 days) + (-24 hours) = 1 day)
#duration(0, 1, -5, 0) // duration of 55 minutes ((1 hour) + (-5 minutes) = 55
minutes)
COMMONALITIES
Now that we’ve met the entire family—the date/time siblings ( date, time, datetime and datetimezone) and
their cousin duration—let’s look at behaviors and traits that are shared between more than one member of
the family.
COMBINATION
Use the combination operator between a date and a time and what do you end up with? Why, a datetime, of
course!
#date(2018, 4, 30) & #time(15, 30, 10) // datetime of April 30, 2018 3:30:10 PM
Handy when you want to combine a date column and a time column into a datetime column.
CONVERSION
Where it makes sense, date/time sibling types can be converted to other sibling types:
17
DateTime.From(#datetimezone(2018, 5, 30, 0, 0, 0, 0, 0)) -- May 29, 2018 7:00:00
PM
Above, the value output is 5 hours earlier than the input because the output is relative to the local time zone
offset and that offset is five hours earlier than the input’s offset.
MATH
Where it makes sense, the arithmetic operators can be used with temporal values.
Addition
Add together a date/time sibling and duration and a result is a value that’s the same type as the date/time
sibling type you started with, just with the duration added:
#date(2018, 6, 1) + #duration(1, 2, 0, 0) = June 2, 2018
#time(13, 5, 25) + #duration(0, 0, 0, 35) = 1:06:00 PM
#datetime(1000, 10, 25, 6, 13, 0) + #duration(0, -6, 0, 0) = October 25, 1000
12:13:00 AM
#datetimezone(1000, 10, 25, 6, 13, 0, 0, 0) + #duration(0, -4, 0, 0) = October
25, 1000 2:13:00 AM +00:00
When you add a time and a duration, it may help to think of time as a 24-hour clock face and duration as
spinning the hands on that clock face forward (or backward) the amount of time specified by the duration.
#time(22, 0, 0) + #duration(0, 4, 0, 0) = 2:00 AM
#time( 4, 0, 0) + #duration(2, 0, 0, 0) = 4:00 AM
Notice in the second example, the result is exactly the same value as the initial time. Why? Adding 2 days to
the time of 4:00 AM causes time’s pretend clock hands to move forward 24 hours for the addition of the first
day then 24 hours for the second day. After all that forward motion, the clock hands end up resting on exactly
the same hour, minute and second where they started: 4:00 AM.
Since addition is commutative, whether the date/time sibling or duration comes first doesn’t matter.
Subtraction
duration can also be subtracted from date/time siblings. The returned value will be of the same type as the
date/time sibling in the expression. This time, order matters: duration can be subtracted from a date/time
sibling, but not the other way around.
#time(13, 5, 25) - #duration(0, 0, 0, 25) = 1:05 pm
#duration(0, 0, 0, 25) - #time(13, 5, 25) // not allowed -- date/time sibling
cannot be subtracted from duration.
If, instead of subtracting a duration from a date/time sibling, you subtract date/time sibling from another
value of the same type, what do you end up with? Why, a duration describing the difference between the two
values!
#date(2018, 8, 10) - #date(2018, 8, 5) // duration of 5 days
#time(12, 0, 0) - #time(14, 0, 0) // duration of -2 hours
#datetimezone(2018, 10, 5, 16, 0, 0, 4, 0) - #datetimezone(2018, 10, 5, 15, 0,
0, -4, 0) // duration of -7 hours
18
In the last example, notice how the resulting duration properly accounted for the time zone offsets.
A duration can be added or subtracted from another duration.
#duration(1, 5, 0, 0) + #duration(0, 0, 25, 0) // 1d 5h 25m
#duration(1, 5, 0, 0) - #duration(0, 0, 25, 0) // 1d 4h 35m
STRING FORMATTING
Converting a temporal value to text produces a string using a default format.
19
Date.ToText(#date(2010, 12, 31), "m", "fr-FR") // 31 décembre
Date.ToText(#date(2010, 12, 31), "m", "ja-JP") // 12 月 31 日
(Duration.ToText doesn’t accept a culture argument. This makes sense since duration’s formatting options
are culture-agnostic.)
One way to ensure that values are rendered consistently across systems is to indicate the reference culture to
use, as we did above. Another option is to use a format string that is culture-agnostic. Format string “o” is one
such string. Using it results in a datetime string being output in a format that stays the same regardless of the
system’s current culture configuration.
An OADate uses a single number to indicate how many days, including fractions of a day, it is ahead or behind
a reference point. An OADate of 2.5 represents a point in time two and a half days after the reference point
while -15.75 indicates fifteen and three-quarter days before the reference point.
For date, datetime and datetimezone, the reference point is start of day December 30, 1899.
Date.From(2.75) // January 1, 1900
DateTime.From(2.75) // January 1, 1900 6:00 PM
Notice above how date ignores the fractional part of the OADate—in effect, it’s truncated off.
Since date doesn’t know anything about time, this makes sense.
Also, in the case of datetimezone, OADates are always interpreted as relative to the system’s current time
zone (which, in the below example, has an offset of -0500).
DateTimeZone.From(2.75) // January 1, 1900 6:00 PM -0500
For time, the reference point of 0 represents 12:00 AM. Time supports OADates from 0 up to (but not
including) 1.
Time.From(0.75) // 6:00 pm
Since an OADate of 1 isn’t a valid input for time, it’s not possible to create the equivalent of #time(24, 0,
0) using an OADate.
Time.From(1) // not allowed
duration’s 0 reference point is the start of the duration. A positive value includes a positive duration and a
20
Number.From(#date(2015, 12, 25)) // 42363
Number.From(#time(11, 32, 18.5)) // 0.4807696759259259
Number.From(#datetime(2015, 12, 25, 11, 32, 18.5)) // 42363.480769675924
Number.From(#datetimezone(2015, 12, 25, 11, 32, 18.5, -4, 0)) //
42363.397436342595
Number.From(#duration(35, 10, 15, 25.2)) // 35.427375
As we’ve seen before, when datetimezone is converted to a value that doesn’t contain a time zone offset, the
outputted value is adjusted to be relative to the local system’s time zone.
EXERCISE
Study the date functions https://learn.microsoft.com/en-us/powerquery-m/date-functions
These functions create and manipulate the date component of date, datetime, and
datetimezone values.
Name Description
Date.AddMonths Returns a DateTime value with the month portion incremented by n months.
Date.AddYears Returns a DateTime value with the year portion incremented by n years.
Date.DayOfWeek Returns a number (from 0 to 6) indicating the day of the week of the provided
value.
Date.DayOfYear Returns a number that represents the day of the year from a DateTime value.
Date.DaysInMonth Returns the number of days in the month from a DateTime value.
21
Name Description
Date.FromText Creates a Date from local, universal, and custom Date formats.
Date.IsInCurrentDay Indicates whether the given datetime value dateTime occurs during the
current day, as determined by the current date and time on the system.
Date.IsInNextDay Indicates whether the given datetime value dateTime occurs during the next
day, as determined by the current date and time on the system.
Date.IsInNextNDays Indicates whether the given datetime value dateTime occurs during the next
number of days, as determined by the current date and time on the system.
Date.IsInNextNMonths Indicates whether the given datetime value dateTime occurs during the next
number of months, as determined by the current date and time on the
system.
Date.IsInNextNQuarters Indicates whether the given datetime value dateTime occurs during the next
number of quarters, as determined by the current date and time on the
system.
22
Name Description
Date.IsInNextNWeeks Indicates whether the given datetime value dateTime occurs during the next
number of weeks, as determined by the current date and time on the system.
Date.IsInNextNYears Indicates whether the given datetime value dateTime occurs during the next
number of years, as determined by the current date and time on the system.
Date.IsInPreviousDay Indicates whether the given datetime value dateTime occurs during the
previous day, as determined by the current date and time on the system.
Date.IsInPreviousNDays Indicates whether the given datetime value dateTime occurs during the
previous number of days, as determined by the current date and time on the
system.
Date.IsInPreviousNMonths Indicates whether the given datetime value dateTime occurs during the
previous number of months, as determined by the current date and time on
the system.
Date.IsInPreviousNQuarters Indicates whether the given datetime value dateTime occurs during the
previous number of quarters, as determined by the current date and time on
the system.
Date.IsInPreviousNWeeks Indicates whether the given datetime value dateTime occurs during the
previous number of weeks, as determined by the current date and time on
the system.
Date.IsInPreviousNYears Indicates whether the given datetime value dateTime occurs during the
previous number of years, as determined by the current date and time on
the system.
23
Name Description
Date.IsLeapYear Returns a logical value indicating whether the year portion of a DateTime
value is a leap year.
Date.QuarterOfYear Returns a number between 1 and 4 for the quarter of the year from a
DateTime value.
Date.WeekOfMonth Returns a number for the count of week in the current month.
Date.WeekOfYear Returns a number for the count of week in the current year.
24
Use them to build the DateDimension
LOGICAL TYPE
Type logical stores Boolean values. True and False. Not very exciting, technically speaking, but very
important.
true
false
This type is so simple, there’s not much to say about it. Out of the box, when converting values to type logical,
the string “false” converts to false and “true” converts to true (imagine that!). On the number front, 0
translates to false while any other number is turned into true.
Logical.FromText("true") // true
Logical.FromText("false") // false
Logical.FromText("something else") // error - not allowed
Logical.From(0) // false
Logical.From(1) // true
Logical.From(-199) // true
25
Other values can easily be converted to logical using a simple comparison or if statement.
value = "T" // converts "T" -> true and all other values -> false
if value = "T" then true else if value = "F" then false else null // converts
"T" -> true, "F" -> false and everything else -> null
Null represents the absence of a value (or an unknown value or an indeterminate value). If null represents a
value that’s not known, is null actually a value after all? (See what I mean, it’s hard to describe.)
Thankfully, we can leave such deep ponderings to the philosophers and computer language theorists. However,
there is a practical aspect to this quandary. How should operators handle null? For example, if two nulls are
compared together (null = null), should the result be true, because identical values are being compared, or
null, because those values represent unknown and comparing unknown with unknown arguably equals an
unknown result?
As you can see, there are at least a couple reasonable ways an operator can handle null. Since an operator
can’t support multiple behaviors simultaneously, language designers must choose between the various
possible behaviors when deciding how a particular operator will work.
In M’s case, direct equality comparisons (operators = and <>) when an argument is null evaluate to true or
false:
null = null // true
null <> null // false
1 = null // false
26
null <> 1 // true
Comparing a null with and returns null unless the other argument is false, in which case false is returned.
When or is used with a null, null is returned unless the other argument is true, in which case the result is true.
null and null // null
null and true // null
null and false // false
null or null // null
null or true // true
null or false // null
If null is used as an argument to almost any other operator, including less than or equal to ( <=) and greater
than or equal to (>=), the result is null.
1 > null // null
1 >= null // null
null < null // null
null <= null // null
10 + null // null
null - 16.3 // null
null * 25 // null
8 / null // null
"abc" & null & "def" // null
(The exceptions to “almost any other operator” are is and meta—advanced operators related to getting
information about a value vs. working directly with value.)
Take the last line of the proceeding example. Let’s say you want to concatenate a couple strings and a variable
such that the strings are still combined even when the variable is null. One way to achieve this is to check
whether the variable holds a null. If it does, replace the null with a blank string before concatenating with it.
27
let
value = null,
NullToBlank = (input) => if (input = null) then "" else input
in
"abc" & NullToBlank(value) & "def" // "abcdef"
Another situation where you might want different null behavior has to do with less than and greater than
comparisons. In M, if a null is compared using a relational operator (>, >=, <, <=), the result is null. This
makes sense from the perspective that it’s not possible to know if an unknown value is greater than or less
than another value because the one value is unknown. However, another valid way of handling this situation
is to rank null values as less than all non-null values.
If you prefer this behavior, you can use Comparer.Value to do the comparison. This library function returns 0
if the compared values are equal, -1 if the first value is less than the second and 1 if the first value is greater
than the second. Unlike the relational operators, with this method null is ranked as less than all non-null values.
Value.Compare(1, 1) // 0 (equal)
Value.Compare(10, 1) // 1 (first value greater than second)
Value.Compare(10, 100) // -1 (first value less than second)
null > 1 // null
Value.Compare(null, 1) // -1
null = null // true
Value.Compare(null, null) // 0
"a" < null // null
Value.Compare("a", null) // 1
Before leaving alternative null handling options, there’s one more possibility we’ll consider. Out of the
box, null = null evaluates to true. If, you’d rather null = null to evaluate to null,
try Value.NullableEquals.
1 null = null // true
28
BINARY
You’ll typically see type binary when working with files. Usually, you’ll use a library method (or a chain of
methods) to transform the binary value into something more convenient to process, like a table.
If, for some reason, you want to literally type out a binary value, doing so is easy enough. Both lists of numbers
(integer or hexadecimal) and base 64 encoded text values are supported.
Below, we see the same two bytes written out using three syntaxes.
The standard library contains a number of functions for working with binary values. As you might expect, there
are methods that convert values to and from binary. You can compress and uncompress using gzip and deflate.
There’s also a method that attempts to extract content type and, in some cases, encoding and potential CSV
delimiter information (might be useful, say, if you want to find all text files in a folder when they don’t all have
.txt extensions). There’s even a family of functions that can be used to define a custom format parser, for the
odd case when you need to parse a binary value that no library function understands.
PART II ADVANCES
LIST
Type list stores exactly what its name implies: a list of values.
{ 1, 2, 5, 10 }
{ "hello", "hi", "good bye" }
As you might expect, a list can be empty:
{ }
Out of the box, a list’s values don’t need to be of the same type.
29
Since a list can contain values and a list is itself a value, a list can contain lists which in turn can contain lists
and so forth.
{ 1, 2 } = { 1, 2} // true
{ 1, 2 } = { 1, 2, 3} // false
{ 4, 5 } = { 5, 4 } // false -- same values but different order
{ 2, 4 } = { 2, 4 } // true
{ 2, 4 } <> { 2, 4 } // false
Greater than (>) and less than (<) comparisons are not supported on lists.
ITEM ACCESS
List items can be accessed using the positional index operator. Simply take a reference to the list of interest
and append the index of the desired list item surrounded by curly braces: SomeList{SomeIndex}.
In M, list indexes (or indices, if you prefer) are 0-based, meaning that the first list item is at index 0, the second
item is at index 1, and so forth. So, to access the first item, use an index of zero.
Assuming Values = { 10, 20, 30 }, the following expressions produce the indicated output:
Values{0} // 10 -- value of 1st item in list
Values{1} // 20 -- value of 2nd item in list
Values{2} // 30 -- value of 3rd item in list
If an attempt is made to access an index larger than what’s in the list, an error is returned.
30
LAZY EVALUATION
Lists are evaluated lazily. The below list doesn’t produce an error, even though the value of an item in it is
defined as an expression that raises an error. We didn’t asked for the value of that item, so no attempt was
made to generates its value. Since the error-raising expression was never invoked, no error was raised.
let
Data = { 1, 2, error "help", 10, 20 }
in
List.Count(Data) // 5
All we asked was “how many items are in the list?,” and that’s all M figured out. Whether or not they are all
valid is a different question, one we didn’t ask and one M’s mashup engine didn’t try to answer.
Also, when values are needed, M’s laziness means that it only evaluates as many list items as are necessary to
produce the requested output. Using Data from the above example, the following expressions do not raise
errors. Neither needs the value of index 2, so the error that would be raised if that item’s expression were
evaluated isn’t raised.
LIBRARY HIGHLIGHTS
As you might expect, the standard library includes a number of methods for working with lists. They cover tasks
from counting items to searching for text, from doing math on a list (sum, product, etc.) to transforming a list
(e.g. remove items, replace items, reverse, etc.), from generating statistics (e.g. average, max, standard
deviation) to testing membership (like “does it contain this value?” or “do all values in the list cause the
provided function to return true?”), as well as supporting set operations (union, intersect, difference, zip) and
sorting. There’s even a family of functions for generating lists of values of other types (handy, say, if you want
a sequential list of datetimes or durations or maybe a list of random numbers, etc.).
REFERENCE
Please study the list functions https://learn.microsoft.com/en-us/powerquery-m/list-functions
Information
Name Description
31
Selection
Name Description
List.Alternate Returns a list with the items alternated from the original list based on a count,
optional repeatInterval, and an optional offset.
List.Buffer Buffers the list in memory. The result of this call is a stable list, which means it will
have a determinimic count, and order of items.
List.Distinct Filters a list down by removing duplicates. An optional equation criteria value can be
specified to control equality comparison. The first value from each equality group is
chosen.
List.FindText Searches a list of values, including record fields, for a text value.
List.First Returns the first value of the list or the specified default if empty. Returns the first
item in the list, or the optional default value, if the list is empty. If the list is empty
and a default value is not specified, the function returns.
List.FirstN Returns the first set of items in the list by specifying how many items to return or a
qualifying condition provided by countOrCondition.
List.InsertRange Inserts items from values at the given index in the input list.
List.Last Returns the last set of items in the list by specifying how many items to return or a
qualifying condition provided by countOrCondition.
List.LastN Returns the last set of items in a list by specifying how many items to return or a
qualifying condition.
List.Single Returns the single item of the list or throws an Expression.Error if the list has more
than one item.
List.Skip Skips the first item of the list. Given an empty list, it returns an empty list. This
function takes an optional parameter countOrCondition to support skipping multiple
values.
32
Transformation functions
Name Description
List.Accumulate Accumulates a result from the list. Starting from the initial value seed this
function applies the accumulator function and returns the final result.
List.RemoveRange Returns a list that removes count items starting at offset. The default count is
1.
List.RemoveFirstN Returns a list with the specified number of elements removed from the list
starting at the first element. The number of elements removed depends on
the optional countOrCondition parameter.
List.RemoveItems Removes items from list1 that are present in list2, and returns a new list.
List.RemoveLastN Returns a list with the specified number of elements removed from the list
starting at the last element. The number of elements removed depends on
the optional countOrCondition parameter.
List.Repeat Returns a list that repeats the contents of an input list count times.
List.ReplaceRange Returns a list that replaces count values in a list with a replaceWith list
starting at an index.
List.ReplaceMatchingItems Replaces occurrences of existing values in the list with new values using the
provided equationCriteria. Old and new values are provided by
the replacements parameters. An optional equation criteria value can be
specified to control equality comparisons. For details of replacement
operations and equation criteria, go to Parameter values.
List.ReplaceValue Searches a list of values for the value and replaces each occurrence with the
replacement value.
List.Split Splits the specified list into a list of lists using the specified page size.
List.Transform Performs the function on each item in the list and returns the new list.
List.TransformMany Returns a list whose elements are projected from the input list.
33
Membership functions
Since all values can be tested for equality, these functions can operate over
heterogeneous lists.
Name Description
List.PositionOf Finds the first occurrence of a value in a list and returns its position.
List.PositionOfAny Finds the first occurrence of any value in values and returns its position.
Set operations
Name Description
List.Difference Returns the items in list 1 that do not appear in list 2. Duplicate values are supported.
List.Intersect Returns a list from a list of lists and intersects common items in individual lists. Duplicate
values are supported.
List.Union Returns a list from a list of lists and unions the items in the individual lists. The returned
list contains all items in any input lists. Duplicate values are matched as part of the Union.
Ordering
Ordering functions perform comparisons. All values that are compared must be comparable with each
other. This means they must all come from the same datatype (or include null, which always compares
smallest). Otherwise, an Expression.Error is thrown.
Number
Duration
DateTime
Text
Logical
Null
34
Name Description
List.Max Returns the maximum item in a list, or the optional default value if the list is empty.
List.MaxN Returns the maximum values in the list. The number of values to return or a filtering
condition must be specified.
List.Min Returns the minimum item in a list, or the optional default value if the list is empty.
List.MinN Returns the minimum values in a list. The number of values to return or a filtering condition
may be specified.
List.Percentile Returns one or more sample percentiles corresponding to the given probabilities.
Averages
These functions operate over homogeneous lists of Numbers, DateTimes, and Durations.
Name Description
List.Average Returns an average value from a list in the datatype of the values in the list.
List.Modes Returns all items that appear with the same maximum frequency.
Addition
Name Description
Numerics
35
Name Description
Generators
Name Description
List.Dates Returns a list of date values from size count, starting at start and adds an increment
to every value.
List.DateTimes Returns a list of datetime values from size count, starting at start and adds an
increment to every value.
List.DateTimeZones Returns a list of of datetimezone values from size count, starting at start and adds an
increment to every value.
List.Durations Returns a list of durations values from size count, starting at start and adds an
increment to every value.
List.Numbers Returns a list of numbers from size count starting at initial, and adds an increment.
The increment defaults to 1.
List.Random Returns a list of count random numbers, with an optional seed parameter.
RECORD
A record allows a set of named fields to be grouped into a unit.
[ FirstName = "Joe", LastName = "Smith", Birthdate = #date(2010, 1, 2) ]
Technically, a record preserves the order of its fields. However, as we’ll see in a moment, field order isn’t
considered when comparing records, so mostly this preservation of field order is a convenience for humans
(e.g. fields will be output on screen in the same order you defined them, making it easier for you to visually
locate data items of interest).
[ ]
36
Equality is determined by field name and value. Field position is not considered.
[ a = 1, b = 2] = [a = 1, b = 2] // true
[ a = 1, b = 2] = [b = 2, a = 1 ] // true -- same field names and values, even
though ordering is different
[ a = 1 ] & [ b = 2 ] // [ a = 1, b = 2]
If the same field name is present in both merge inputs, the value associated with the field from the on the right
is used.
[ a = 1 ] & [ a = 10 ] // [ a = 10 ]
FIELD ACCESS
Remember how lists use {index} to access list items? With records, something similar is used—the lookup
operator, which consists of the field name inside square brackets: SomeList[SomeField]
If Value = [ Part = 1355, Description = "Widget", Price = 10.29, Internal Cost = 8.50 ] then
the following expressions will return the noted values:
Value[Part] // 1355
Value[Description] // "Widget"
Value[Price] // 10.29
Value[Internal Cost] // 8.50
Similar to list, appending a ? to the lookup operator changes its not-found behavior from an error to returning
null (technically, this is called “performing an optional field selection”).
Value[NonExistentField] // error - Expression.Error: The filed
'NonExistentField' of the record wasn’t found.
Value[NonExistentField]? // null
Within a record, the expression for a field value can reference other fields.
[
FirstName = "Sarah",
LastName = "Smith",
FullName = FirstName & " " & LastName
]
A field’s expression can even reference itself if its name is proceeded by the scoping operator ( @).
“Why would a field want to reference itself?” you might ask. This behavior may not seem intuitive in the context
of a field containing a data value. However, the ability to self-reference comes in handy when the value is a
function because it allows the function to be recursive.
37
AddOne = (x) => if x > 0 then 1 + @AddOne(x - 1) else 0,
AddOneThreeTimes = AddOne(3)
][AddOneThreeTimes] // 3
PROJECTION
In addition to square brackets being used to select record fields, they can also be used to perform record
projection—that is, reshaping a record to contain fewer fields. Below are a couple examples (assume
that Source = [ FieldA = 10, FieldB = 20, FieldC = 30 ]):
Source[[FieldA], [FieldB]] // [ FieldA = 10, FieldB = 20 ] -- FieldC was
removed
Source[[FieldC]] // [ FieldC = 30 ] -- all fields except C were removed
Similar to when [] are used for field selection, with projection, referencing a non-existent field causes an error.
However, if a ? is appended, any non-existent fields referenced by the projection expression will be added to
the output with their values set to null.
Source[[FieldA], [FieldD]] // error - Expression.Error - The field 'FieldD' of
the record wasn't found.
Source[[FieldA], [FieldD]]? // [ FieldA = 10, FieldD = null]
Outside of square brackets, the identifier Street Address needs quoting because it contains a space
and try needs quoting because it’s a keyword. Below, inside the square brackets, quoting these identifiers is
optional:
[#"try" = true, #"Street Address" = "123 Main St."]
[try = true, Street Address = "123 Main St."] // identical in effect to the
preceding
SomeRecord[#"Street Address"]
SomeRecord[Street Address] // identical in effect to the preceding
SomeRecord[#"try"]
SomeRecord[try] // identical in effect to the preceding
Note, however, that M assumes whitespace occurring at the start or end of an unquoted field name can be
ignored and so excludes it from the field name. If, for some reason, you want leading or trailing whitespace to
be a part of a field name, you’ll need to quote it.
38
LAZY EVALUATION & VALUE FIXING
Like list, record is lazy. If a value isn’t needed, it isn’t evaluated.
[ Price = GetValueFromRemoteServer() ]
Imagine that the first time Price is accessed, the remote server returns 10. Later on while your mashup is still
running, the record’s Price field is accessed again. Perhaps by this point in time,
invoking GetValueFromRemoteServer() would return 11. However, that method is not re-executed. Instead,
the value cached when the field was first accessed (10) is returned.
If, instead, when Price was first accessed, GetValueFromRemoteServer() raised an error due to a temporary
communications glitch, that same error will be re-raised each subsequent time Price is accessed, even if by
the time the subsequent access occurs, the glitch is resolved and GetValueFromRemoteServer() would return
a non-error value if it were invoked.
This value fixing (or caching) provides consistency. Thanks to it, you know that a field’s value will always be the
same throughout your mashup’s execution.
Value caching is not shared across record instances, even if the records have identical fields and field value
expressions. If your code causes the record [ Price = GetValueFromRemoteServer() ] to be generated
twice and Price is accessed on both instances, each will separately
invoke GetValueFromRemoteServer() once and cache the value returned. If the value returned is different
between the two invocations, the two records will have different values for Price.
If the record you are working with is assigned to a variable, each time you access that variable, you’ll access
the same record instance. However, if instead you access a record multiple times by invoking an expression
that retrieves it from an external source (e.g. a database or web service), each retrieval may technically return
a different record instance. If it’s important to be sure that you are always working with the same record
instance, retrieve it once then save it to a variable or, in the case of a list of records, buffer the list.
39
LIBRARY HIGHLIGHTS
In the standard library, you’ll find several functions for working with records, including methods to add,
rename, reorder and remove fields as well as to transform field values. There is also a method returning a list
of the record’s field names (with field order preserved) and a similar method returning field values.
DYNAMIC OPERATIONS
Above, we used the lookup operator to access field values by hard-coded names. What if, instead, we wanted
to use programmatic logic to choose the field to access? The following doesn’t work because field names inside
square brackets must be strings; variable references aren’t allowed.
let
Item = [Name = "Widget", Wholesale Price = 5, Retail Price = 10],
PriceToUse = "Wholesale Price"
in
Item[PriceToUse] // doesn’t work--doesn’t return the value of field "Wholesale
Price"
To solve this dilemma, the standard library comes to the rescue. Record.Field is the dynamic equivalent of
the lookup operator. Record.FieldOrDefault works like a dynamic lookup operator followed by a question
mark, with the added bonus of optionally allowing you to specify the value to be returned if the field name
doesn’t exist
Record.Field(Item, PriceToUse) // returns 5
If, instead, PriceToUse is set to “Sale Price” (a value which doesn’t correspond with a fieldname), then:
Record.Field(Item, PriceToUse) // error - Expression.Error: The field 'Sale
Price' of the record wasn’t found.
Record.FieldOrDefault(Item, PriceToUse) // returns null
Record.FieldOrDefault(Item, PriceToUse, 0) // returns 0
Similarly, if we want to dynamically perform projection, Record.SelectFields is our go-to. There are also
standard library functions to remove fields (instead of projecting by listing the fields desired, specify the
undesired fields and a new record containing all of the other fields will be returned) and to reorder
fields (handy in those few cases where field order matters).
LET SUGAR
Ready for a surprise? A let expression is, in essence, syntactic sugar for an implicit record expression.
let
A = 1,
B = 2,
Result = A + B
in
Result
Is equivalent to:
[
A = 1,
40
B = 2,
Result = A + B
][Result]
It might be good to pause and ponder this for a moment. This fact means that what we know about how records
work also applies to let expressions and vice versa.
For example, we know that a record field’s value is computed on first access then cached. Since let is in essence
a record expression, this same immutability rule applies to it: a let variable’s expression will be evaluated on
first access then its value will be cached. However, for let expressions, we know there’s one exception to
immutability, which occurs when streaming comes into play. This same exception must also apply to records…it
must because let and record share the same behavior.
REFERENCE
Please study the Record functions https://learn.microsoft.com/en-us/powerquery-m/record-functions
Information
Name Description
Record.HasFields Returns true if the field name or field names are present in a record.
Transformations
Name Description
Geography.ToWellKnownText Translates a structured geographic point value into its Well-Known Text
(WKT) representation.
Geometry.ToWellKnownText Translates a structured geometric point value into its Well-Known Text
(WKT) representation.
41
Name Description
Record.RemoveFields Returns a new record that reorders the given fields with respect to each
other. Any fields not specified remain in their original locations.
Record.RenameFields Returns a new record that renames the fields specified. The resultant
fields will retain their original order. This function supports swapping
and chaining field names. However, all target names plus remaining field
names must constitute a unique set or an error will occur.
Record.ReorderFields Returns a new record that reorders fields relative to each other. Any
fields not specified remain in their original locations. Requires two or
more fields.
Selection
Name Description
Record.Field Returns the value of the given field. This function can be used to dynamically create
field lookup syntax for a given record. In that way it is a dynamic version of the
record[field] syntax.
Record.FieldOrDefault Returns the value of a field from a record, or the default value if the field does not
exist.
Record.SelectFields Returns a new record that contains the fields selected from the input record. The
original order of the fields is maintained.
Serialization
Name Description
Record.FromList Returns a record given a list of field values and a set of fields.
Record.FromTable Returns a record from a table of records containing field names and values.
Record.ToList Returns a list of values containing the field values of the input record.
Record.ToTable Returns a table of records containing field names and values from an input record.
42
TABLE
POSITIONAL SELECTION
With type list, remember how elements can be accessed by positional index using the selection operator
(think: curly branches)? For example myList{2} returns the third element from myList. (Reminder: M’s
indexes are zero-based, so the first element is considered to be at index 0, the second at index 1, etc.)
Type table also supports the selection operator—except with this type, the index identifies which row to
return. The identified row is returned as a record, with each column represented as a field in the record.
(Records are usually [always?] the way M returns single rows from tables.)
let
Parts = #table(
{ "Code", "Category", "PriceGroup" },//Column name
{
{ 123, "Widget", "A" },
{ 456, "Thingamajig", "B" },
{ 789, "Widget", "B" }
}
)
in
Parts{1} // returns the second row as a record: [Code = 456, Category =
"Thingamajig", PriceGroup = "B"]
(Above, library function #table is used to create a table. In this example, the first argument is a list of column
names. The second is a list of lists which defines row values: each item in the outer list corresponds to a row;
each item in the inner list corresponds with a column value for that row.)
If the requested index does not exist, the behavior is the same as with list: an error is returned but this can be
changed to null by making the selection optional (a.k.a. by appending a ? after the curly braces).
Parts{3} // Expression.Error: There weren't enough elements in the enumeration
to complete the operation.
Parts{3}? // null
The standard library contains numerous functions that work with table rows based on position. Examples
include: Table.SingleRow, Table.First, Table.Last, Table.Skip, Table.FirstN, Table.LastN, Table.Range, Table.Pos
itionOf, Table.ReverseRows.
VALUE-BASED SELECTION
With tables, the selection operator can also be used to select a single row based on field value(s). To do this,
inside the curly braces, pass a record defining the search criteria to use. Each field name should correspond to
a table column to search and each field value should correspond with the value to search for in that particular
column. It’s unnecessary for the record to contain fields for every column in the table; only those you want
searched need to be included.
43
Parts{[Code = 123]} // returns [Code = 123, Category = "Widget", PriceGroup =
"A"]
Parts{[Category = "Thingamajig"]} // [Code = 456, Category = "Thingamajig",
PriceGroup = "B"]
Parts{[Category = "Widget", PriceGroup = "B"]} // [Code = 789, Category =
"Widget", PriceGroup = "B"]
The selection operator returns, at most, one row, represented as a record. If the search criteria match more
than a single row, an error is returned. Using an optional selection (by appending the question-mark,
like {[…]}?) does not suppress the error. Optional selection only affects behavior when no match is
found, not when multiple matches are found.
Parts{[Category = "Other"]} // returns Expression.Error: The key didn't match
any rows in the table.
Parts{[Category = "Other" ]}? // returns null
The list that’s returned contains the value from each row for the specified column, in the order those values
appear in that column (so the first item in the list will correspond with the value of the column from the first
row, the second item with the value from the second row, etc.).
44
PROJECTION
Perhaps you have a table where you only want to work with (or output) some of the columns.
Like record, table supports projection. In the case of tables, projection produces a new table that contains a
reduced set of columns. The projection operator shares square bracket syntax with field/column selection.
Parts[[Code], [Category]] // returns a two-column table consisting of Code &
Category
If a specified column does not exist, an error is returned; however, if the projection is made optional (by
appending a question-mark to the square brackets), the non-existent column(s) will be included in the new
table with values set to null.
Column names must be hard-coded. If you’d rather provide the column names as a list, library
function Table.SelectColumns is your friend.
COMPARISON
If two tables contain the same number of columns, with the same names, and the same number of rows, where
the same-name columns from each table have the same values in each position (e.g. Col1, index 0 in table 1
equals Col1, index 0 in table 2), the equality operator (=) considers the tables equal. Otherwise (obviously!),
they’re not equal (<>).
#table({"Col1"}, {{1}, {2}}) = #table({"Col1"}, {{1}, {2}}) // true
#table({"Col1"}, {{1}, {2}}) <> #table({"Col1"}, {{1}, {2}}) // false
45
#table({"Col1"}, {{1}, {2}}) = #table({"Col1"}, {{2}, {1}}) // false--even
though the tables contain the same rows, they are ordered differently, so the
tables are not equivalent
Column schema and metadata do not have to match in order for tables to be considered equal.
let
Table1 = Table.TransformColumnTypes(#table({"Col1"}, {{1}}),{{"Col1", type
any}}),
Table2 = Table.TransformColumnTypes(#table({"Col1"}, {{1}}),{{"Col1", type
number}})
in
Table1 = Table2 // returns true even though the column types differ (any vs
number)
You could think of it this way: M considers two tables to be equal if they contain the same data. Things like
column order, schema details and metadata are accessory items, not data proper, so are not included in the
equality comparison. (For example, you can get the exact same data out of a table regardless of how its columns
are positioned, so column order isn’t factored in to the “are these tables equal?” decision.)
COMBINING TABLES
Two tables can be combined using the combination ( &) operator.
#table({"City", "State"}, {{"Chicago", "IL"}}) & #table({"City", "State"},
{{"Washington", "DC"}})
Columns are paired by name, not position. If a column exists in only one of the tables, it’s values will be set to
null for rows from the other table.
In the table that’s returned, all columns from the first table are outputted first, positioned based on the order
in which they appear in that table (e.g. first column from that table is outputted first, second comes second,
etc.). Then, any columns that are only in the second table are outputted, positioned based on where they
appear in that table (e.g. first column exclusive to that table is outputted, followed by the second exclusive
column, etc.).
46
M does not require that column data types be compatible. The resulting table from the below contains a string
value and a numeric value.
These last three behaviors are a bit different from SQL’s UNION ALL operator. With SQL, combining two tables
requires that the tables contain the same number of columns, columns are combined based on position (not
name) and columns being combined must have compatible data types. With Power Query, tables with a
dissimilar number of columns may be concatenated together, columns are paired by name (not position) and
column data types are irrelevant for combination purposes.
EXERCISE
Please study the https://learn.microsoft.com/en-us/powerquery-m/table-functions
Table construction
Name Description
ItemExpression.From Returns the abstract syntax tree (AST) for the body of a function.
ItemExpression.Item An abstract syntax tree (AST) node representing the item in an item expression.
RowExpression.Column Returns an abstract syntax tree (AST) that represents access to a column within a row expression.
RowExpression.From Returns the abstract syntax tree (AST) for the body of a function.
RowExpression.Row An abstract syntax tree (AST) node representing the row in a row expression.
Table.FromColumns Returns a table from a list containing nested lists with the column names and values.
Table.FromList Converts a list into a table by applying the specified splitting function to each item in the list.
Table.FromRows Creates a table from the list where each element of the list is a list that contains the column value
Table.FromValue Returns a table with a column containing the provided value or list of values.
Table.FuzzyGroup Groups the rows of a table by fuzzily matching values in the specified column for each row.
47
Name Description
Table.FuzzyJoin Joins the rows from the two tables that fuzzy match based on the given keys.
Table.FuzzyNestedJoin Performs a fuzzy join between tables on supplied columns and produces the join result in a new c
Table.Split Splits the specified table into a list of tables using the specified page size.
Table.View Creates or extends a table with user-defined handlers for query and action operations.
Table.ViewError Creates a modified error record which won't trigger a fallback when thrown by a handle
(via Table.View).
Table.ViewFunction Creates a function that can be intercepted by a handler defined on a view (via Table.View).
Conversions
Name Description
Table.ToColumns Returns a list of nested lists each representing a column of values in the input table.
Table.ToList Returns a table into a list by applying the specified combining function to each row of
values in a table.
Information
Name Description
Table.IsEmpty Returns true if the table does not contain any rows.
Table.Schema Returns a table containing a description of the columns (i.e. the schema) of
the specified table.
Row operations
Name Description
Table.AlternateRows Returns a table containing an alternating pattern of the rows from a table.
48
Name Description
Table.Combine Returns a table that is the result of merging a list of tables. The tables must
all have the same row type structure.
Table.FindText Returns a table containing only the rows that have the specified text within
one of their cells or any part thereof.
Table.FirstValue Returns the first column of the first row of the table or a specified default
value.
Table.FromPartitions Returns a table that is the result of combining a set of partitioned tables
into new columns. The type of the column can optionally be specified, the
default is any.
Table.InsertRows Returns a table with the list of rows inserted into the table at an index.
Each row to insert must match the row type of the table..
Table.LastN Returns the last row(s) from a table, depending on the countOrCondition
parameter.
Table.Partition Partitions the table into a list of groups number of tables, based on the
value of the column of each row and a hash function. The hash function is
applied to the value of the column of a row to obtain a hash value for the
row. The hash value modulo groups determines in which of the returned
tables the row will be placed.
Table.Range Returns the specified number of rows from a table starting at an offset.
Table.RemoveFirstN Returns a table with the specified number of rows removed from the table
starting at the first row. The number of rows removed depends on the
optional countOrCondition parameter.
Table.RemoveLastN Returns a table with the specified number of rows removed from the table
starting at the last row. The number of rows removed depends on the
optional countOrCondition parameter.
49
Name Description
Table.RemoveRows Returns a table with the specified number of rows removed from the table
starting at an offset.
Table.RemoveRowsWithErrors Returns a table with all rows removed from the table that contain an error
in at least one of the cells in a row.
Table.Repeat Returns a table containing the rows of the table repeated the count
number of times.
Table.ReplaceRows Returns a table where the rows beginning at an offset and continuing for
count are replaced with the provided rows.
Table.SelectRows Returns a table containing only the rows that match a condition.
Table.SelectRowsWithErrors Returns a table with only the rows from table that contain an error in at
least one of the cells in a row.
Table.Skip Returns a table that does not contain the first row or rows of the table.
Table.SplitAt Returns a list containing the first count rows specified and the remaining
rows.
Column operations
Name Description
Table.ColumnsOfType Returns a list with the names of the columns that match the specified
types.
Table.DemoteHeaders Demotes the header row down into the first row of a table.
Table.DuplicateColumn Duplicates a column with the specified name. Values and type are copied
from the source column.
Table.Pivot Given a table and attribute column containing pivotValues, creates new
columns for each of the pivot values and assigns them values from the
valueColumn. An optional aggregationFunction can be provided to handle
multiple occurrence of the same key value in the attribute column.
50
Name Description
Table.PrefixColumns Returns a table where the columns have all been prefixed with a text
value.
Table.PromoteHeaders Promotes the first row of the table into its header or column names.
Table.ReorderColumns Returns a table with specific columns in an order relative to one another.
Table.Unpivot Given a list of table columns, transforms those columns into attribute-
value pairs.
Table.UnpivotOtherColumns Translates all columns other than a specified set into attribute-value pairs,
combined with the rest of the values in each row.
Transformation
Name Description
Table.AddIndexColumn Returns a table with a new column with a specific name that, for each
row, contains an index of the row in the table.
Table.AddJoinColumn Performs a nested join between table1 and table2 from specific
columns and produces the join result as a newColumnName column
for each row of table1.
51
Name Description
Table.ExpandListColumn Given a column of lists in a table, create a copy of a row for each value
in its list.
Table.ExpandRecordColumn Expands a column of records into columns with each of the values.
Table.FillDown Replaces null values in the specified column or columns of the table
with the most recent non-null value in the column.
Table.FillUp Returns a table from the table specified where the value of the next
cell is propagated to the null values cells above in the column specified.
Table.Group Groups table rows by the values of key columns for each row.
Table.Join Joins the rows of table1 with the rows of table2 based on the equality
of the values of the key columns selected by table1, key1 and table2,
key2.
Table.NestedJoin Joins the rows of the tables based on the equality of the keys. The
results are entered into a new column.
Table.ReplaceErrorValues Replaces the error values in the specified columns with the
corresponding specified value.
Table.SplitColumn Returns a new set of columns from a single column applying a splitter
function to each value.
52
Name Description
Table.Transpose Returns a table with columns converted to rows and rows converted
to columns from the input table.
Membership
Name Description
Table.ContainsAll Determines whether all of the specified records appear as rows in the table.
Table.ContainsAny Determines whether any of the specified records appear as rows in the
table.
Table.Distinct Removes duplicate rows from a table, ensuring that all remaining rows are
distinct.
Table.PositionOfAny Determines the position or positions of any of the specified rows within the
table.
Table.ReplaceMatchingRows Replaces specific rows from a table with the new rows.
Ordering
Name Description
Table.Max Returns the largest row or rows from a table using a comparisonCriteria.
Table.MaxN Returns the largest N rows from a table. After the rows are sorted, the
countOrCondition parameter must be specified to further filter the result.
Table.Min Returns the smallest row or rows from a table using a comparisonCriteria.
Table.MinN Returns the smallest N rows in the given table. After the rows are sorted, the
countOrCondition parameter must be specified to further filter the result.
Table.AddRankColumn Appends a column with the ranking of one or more other columns.
53
Name Description
Table.Sort Sorts the rows in a table using a comparisonCriteria or a default ordering if one is
not specified.
TABLE THINK I
Why should you concern yourself with how Power Query “thinks” about tables? After all, you write an
expression that outputs the table you want, the mashup engine executes it and everyone is happy without you
having to think about how the engine does its thing…right? Yes—at least until you encounter performance
problems, values change during processing or a firewall error bites—then what do you do?
Understanding how M processes tables is an important asset in developing efficient mashups, avoiding
unexpected data variability and keeping the data privacy layer happy. Streaming, query folding, buffering, table
keys, native query caching and the firewall—all of these relate to how the interpreter thinks
about/processes/handles tables.
There’s so much to cover, we’ll split the list in two. Let’s tackle the first half (streaming, query folding and
buffering) in this post and save the remainder (table keys, native query caching and the firewall) for next time.
Imagine you are the mashup engine. How would you execute the below?
let
Source = SomeDataSourceReturningATable,
Filtered = Table.SelectRows(Source, each [Office] = "Chicago"),
Result = Table.FirstN(Filtered, 3)
in
Result
A simple way would be to retrieve all the rows returned by SomeDataSourceReturningATable and save them
in variable Source. Then, take the contents of that variable, figure out which rows pass the [Office] =
"Chicago" test, and save those in variable Filtered. Lastly, grab the first three rows from Filtered, save
54
Logical? Yes. Efficient? No. Why not? For one, there’s resource usage: While at most three rows will be output,
the system hosting the mashup engine must have enough capacity to store everything returned from the
source (which could be billions of rows). Attempting to do this could lead to that system running out of
resources (e.g. memory or disk space).
Thankfully, Power Query doesn’t handle table expressions in this simplistic way. Instead, M uses streaming,
query folding or a combination of these two techniques. We learned about both back in Paradigm (part 5); in
this post, we’ll try to hone our understanding by delving deeper into their details and walking through several
examples. If the general ideas of what streaming and query folding are isn’t sharp in your mind, it’s probably
worth jumping back to part 5 for a refresher before continuing on.
STREAMING
Let’s say M executes the above expression using streaming….
When Result’s contents are requested, Table.FirstN in the Result step starts by asking the proceeding step
(Filtered) for one row of data. When Filtered’s Table.SelectRows receives this request, it turns around
and asks step Source for a row of data, which Source provides. When Filtered’s SelectRows receives this
row, it checks whether it passes Filtered’s [Office] = "Chicago" test. If so, SelectRows returns the row
to Result; if not, it discards the row then requests another, repeating this process until it finds one that passes
the test, which is returned to step Result. Once Result’s FirstN has received a row, it outputs that row then
turns around and asks Filtered for a second row (because it’s looking for a total of three
rows). Filtered’s SelectRows then picks back up where it left off with Source, asking for one row at a time
until it finds another that passes the [Office] = "Chicago" test, which it then passes to Result’s FirstN,
which then outputs it. Lastly, this process is repeated one more time to retrieve the third row FirstN needs to
satisfy how it was programmed.
Each step produces rows one at a time, only requesting as many rows as it needs from the preceding step to
produce the requested row. By working with just enough data (vs. the hypothetical simplistic approach we
started with which stored the entire output of each step in memory), Power Query is able to handle data sets
that are too large to be stored locally and doesn’t waste resources storing rows that ultimately are unnecessary
to produce the requested output.
55
Internal In-Memory Row Storage
Excepting the data provider (which, for performance reasons, might fetch rows in chunks from the external
data source), none of the operations in the proceeding example held rows in memory. When a functioned
processed a row, it either passed it on or discarded it.
However, this isn’t true for every operations. Let’s say we add a sort step to our example:
let
Source = SomeDataSourceReturningATable,
Filtered = Table.SelectRows(Source, each [Office] = "Chicago"),
Sorted = Table.Sort(Filtered,{{"TotalSales", Order.Descending}}),
Result = Table.FirstN(Sorted , 3)
in
Result
Sorting (generally) requires retrieving all rows from the previous step so that they can be put in proper order.
When the above expression is executed, Result’s Table.FirstN asks Sorted’s Table.Sort for the first row.
To figure out which row to return, Sort gets all rows from Filtered, sorts them, saves the sorted rows in
memory, then returns the first row from the sorted set. Each time Sort is asked for a subsequent row, it returns
the appropriate row from what it has in memory. After Sort returns a row, it will never need it again, so it can
remove that row from its memory (whether or not it purges memory like this is an internal implementation
detail—but at least at the theoretical level it’s allowed to do this).
This internal storing of rows in memory is not a persistent cache; rather, it is limited in scope to a single method
invocation during a single execution of the query. There is no sharing of these held in-memory rowsets when
a function is invoked multiple times, like Table.Sort is below
(both List.Sum(Top3[TotalSales]) and List.Average(Top3[TotalSales]) end up calling it).
let
Source = SomeDataSourceReturningATable,
Filtered = Table.SelectRows(Source, each [Office] = "Chicago"),
Sorted = Table.Sort(Filtered, {{"TotalSales", Order.Descending}}),
Top3 = Table.FirstN(Sorted , 3)
in
{ List.Sum(Top3[TotalSales]), List.Average(Top3[TotalSales]) }
“Which operations hold rows internally?,” you might ask. Table at a time operations, like joins (though not
always nested joins), sorts, grouping, pivot/unpivot, are all suspects—and obviously buffering. (Unfortunately,
other than for buffering, I’m not aware of documentation officially detailing this so anecdotal evidence
gathered from testing and answers to forum posts are what we have to go by.)
Above, the holding of rows was described as “in memory.” Keep in mind that memory can be paged to disk.
Working with memory paged to disk is much, much, much slower than working with memory stored in RAM.
The memory usage point that triggers paging to disk is environment specific. In some environments, paging
starts when a query’s total memory use exceeds 256 MB.
56
Performance
The order of operations can have a significant impact on how much data must be kept in memory. To see this
significance, let’s contrast two variations of an expression. Both that produce the same output but can differ
significantly in local resources used.
let
Source = SomeDataSourceReturningATable,
Sorted = Table.Sort(Source, {{"TotalSales", Order.Descending}}),
Filtered = Table.SelectRows(Sorted, each [Office] = "Chicago"),
Result = Table.FirstN(Filtered, 3)
in
Result
let
Source = SomeDataSourceReturningATable,
Filtered = Table.SelectRows(Source, each [Office] = "Chicago"),
Sorted = Table.Sort(Filtered, {{"TotalSales", Order.Descending}}),
Result = Table.FirstN(Sorted, 3)
in
Result
With the first expression, sorting occurs directly after Source, so all rows from Source are held in memory by
the sort function. The second expression sorts after the Table.SelectRows filter so only rows that pass that
filter are held by the sort. Say Source contains two billion rows, out of which only 500 that pass the [Office]
= "Chicago" test. With the first version of the expression, all two billion rows are held by the sort; while with
Performance
Tip: When streaming is in play, if your query contains steps that hold rows in memory, try placing any applicable
filter steps before the row-holding steps. This way, the filtering steps will reduce the quantity of what needs to
be in memory.
QUERY FOLDING
Streaming can involve pulling lots of rows which are later discarded. In the example we’ve been using, it’s
potentially necessary to stream billions of rows from the source to produce the three requested output rows.
If, instead, you directly interacted with the source, you could probably tell it exactly what you wanted and it
would produce just that.
SELECT TOP 3 *
FROM Customers
WHERE Office = 'Chicago'
ORDER BY TotalSales DESC;
57
In either case, you get back at most 3 results. Potentially billions of rows aren’t sent to you for you to sort
through to find the three you want; instead, that processing occurs on the external system (utilizing any
indexing or caching it may have) and just the final results are sent back (much less data crossing the wire). It
should be intuitively obvious which approach is more efficient.
Thankfully, M’s query folding offers the ability to leverage the performance of native queries without needing
to write them yourself.
Quick recap: Query folding takes one or more Power Query steps and translates them into a native request
which is then executed on the source system (again, for a refresher, jump back to part 5, if needed).
With query folding, if our source system is a SQL database, it’s almost as though the example:
let
Source = SomeDataSourceReturningATable,
Filtered = Table.SelectRows(Source, each [Office] = "Chicago"),
Sorted = Table.Sort(Filtered, {{"TotalSales", Order.Descending}}),
Result = Table.FirstN(Sorted, 3)
in
Result
Is internally replaced by M’s interpreter with something like:
let
Result = Value.NativeQuery(SomeDataSourceReturningATable, "SELECT TOP 3 *
FROM Customers WHERE Office = 'Chicago' ORDER BY TotalSales DESC;")
in
Result
(Technically, the internal mechanism used may work a bit differently, but as far as producing rows go, the net
effect is approximately the same.)
In the proceeding, the native query to use can be statically deduced simply by looking at the expression steps.
Power Query’s query folding can also dynamically factor data in when it produces native requests.
For example, take the below expression, which filters data ( MainData) pulled from source A using a list of values
(FilterData) retrieved from source B.
let
MainData = GetFromSourceA(),
FilterData = GetFromSourceB(),
Result = Table.SelectRows(MainData, each List.Contains(FilterData[ID], [ID]))
in
Result
At first glance, the expression in step Result may not look like a candidate for query folding because it
combines data from two sources. Instead, it may seem necessary for the mashup engine to retrieve all data
from both sources then apply the Table.SelectRows filter locally.
58
However, Power Query can pull data from one source and write that data into the native request it sends to
another source. Say FilterData (from source B) contains only a few rows. Power Query might first pull those
few rows locally, then decide to push data from those rows into the native request it sends to source A. For
example, pretend FilterData contains three rows and those rows’ ID column values are 1, 2 and 3. Power
Query’s query folding might execute step Result by first pulling those values from source B then sending a
query like the following query to source A:
SELECT *
FROM SomeTableInSourceA
WHERE ID IN (1, 2, 3); -- these values were pulled from FilterData (source B)
then written into this query
The above query tells source A just which rows are needed based on filtering data retrieved from source B. By
doing this, it avoids fetching rows that ultimately would have been discarded if the Table.SelectRows filter
were applied locally.
This pulling data from one source then pushing it to another can provide performance benefits and pose
security concerns. Power Query’s environment has a mechanism for managing the latter—something which
we’ll explore shortly.
Folding + Streaming
As we discussed in part 5, not all operations can be query folded. Once a non-foldable operation is encountered
in an expression chain, any potentially foldable operations that come after it won’t be folded back into the data
source that started the chain because the non-foldable operation blocks that folding.
Query folding does not eliminate streaming; rather it folds certain steps into a native request whose results
are then streamed to any subsequent steps that weren’t query folded. To put it another way: With M, tables
are always either streamed or query folded then streamed, never just query folded without then being
streamed.
Performance
This leads to what may be an obvious performance tip: Try to put all foldable operations before any non-
foldable operations so that the maximum amount of processing can be offloaded to the data sources.
Which steps are query folded, as well as how they are folded, can change as the mashup engine improves, as
the quantity of data involved changes, as library/data source functions are revised and as security settings are
changed (more on the latter shortly). So, there may be times where you may find it advantageous to re-try
performance tuning even though you haven’t made any code changes.
59
TABLES ARE NOT IMMUTABLE
A variable that appears to hold a table (or list) actually just holds a handle to the expression that produces the
table (or list). When accessed, that handle executes logic which produces the requested data. While handle is
immutable throughout the lifetime of the query’s execution, the data returned when it is invoked is not. This
is because that data is produced on demand each time the handle is involved. The fact that the data returned
is not immutable can result in values seeming to change during the execution of an M query.
The below expression returns a pair of tables. One holds all customers associated with the Chicago office; the
other contains the three customers with the largest total sales amounts. Both tables are ultimately pulled from
step Source.
let
Source = SomeDataSourceReturningATable,
ChicagoOffice = Table.SelectRows(Source, each [Office] = "Chicago"),
Top3Sales = Table.FirstN(Table.Sort(Source, {{ "TotalSales",
Order.Descending }}), 3),
Result = { ChicagoOffice, Top3Sales }
in
Result
Let’s pretend you ran the above and are looking at the rows returned in the first table ( ChicagoOffice). In
them, you find customer ABC:
CustomerID = 123, Customer = ‘ABC’, Office = ‘Chicago’, TotalSales = 50255
Looking at the rows in the second table (Top3Sales), you also find customer ABC (apparently, it’s assigned to
the Chicago office and is one of your top customers):
CustomerID = 123, Customer = ‘ABC’, Office = ‘Chicago’, TotalSales = 62199
Wait a minute! What in the world?! The same customer has a row in each table but the data in those rows is
different between the two tables. (The value of TotalSales is different between the two.) How could this
happen?
In light of the fact that table and list variables really just hold handles to the expression that produces the table
or list, this behavior makes sense. Both ChicagoOffice and Top3Sales were invoked to produce results.
When invoked, each expression chain called back to the ultimate data source to get data. Total sales for ABC
must have changed between those two calls to the data source (perhaps another sale was processed during
the intervening moment of time).
To recap: Variable that seem to “hold” (e.g. produce or output) a table or list really just holds an immutable
reference to an expression that, when invoked, produces the desired output—the expression is immutable,
but the data returned when it is invoked is not. So, when a particular query pulls from the same source
multiple times during execution, there is the possibility that the data pulled could change between accesses.
60
If this possibility isn’t acceptable, there are two options: rework the expression to eliminate the multiple
invocations (which may or may not be possible) or manually cache (buffer) the output.
Please don’t go away thinking that Power Query is flawed because it allows this variability. This potential for
variability is a necessary side effect of M not always saving all rows in memory (and it’s a good thing it doesn’t
do that!). Instead, M puts you in control: if there’s a point where a data set needs to be cached to provide
stability, you have the power to do that. By putting you in charge, you control when the associated resource
and performance costs are paid.
BUFFERING
When you need to stabilize a table (or list), how do you do it? By buffering.
BufferedTable = Table.Buffer(SomeExpressionProducingATable)
BufferedList = List.Buffer(SomeExpresssionProducingAList)
The first time Table.Buffer is invoked, a buffer is loaded by reading all values from the source and saving them
in memory. The data in memory is then used to service any accesses to that buffer that occur during the query’s
execution (including when multiple method chains in the query reference the same buffer). However, buffers
are not shared across separate executions of the same query.
Adapting our Chicago office + top sales example:
let
Source = SomeDataSourceReturningATable,
BufferedSource = Table.Buffer(Source),
ChicagoOffice = Table.SelectRows(BufferedSource , each [Office] = "Chicago"),
Top3Sales = Table.FirstN(Table.Sort(BufferedSource , {{ "TotalSales",
Order.Descending }}), 3),
Result = { ChicagoOffice, Top3Sales }
in
Result
Above, the first time step BufferedSource is accessed, Table.Buffer will pull all rows from Source and store
them to memory. This stable snapshot of data will then be used to service both
steps ChicagoOffice and Top3Sales. Both steps will be offered the exact same rows, with the exact same
row values, in the exact same order. Then, when the query finishes executing, the buffer is discarded. If the
query is later executed again, a new buffer will be populated from Source.
Of course, buffering all of Source could pose a resource problem, depending on how much data in contains.
Again, the key is that you are in control: you decide when to use buffering, when the benefits it brings are
worth the associated cost.
When you buffer, be consciousness of how much data you’re buffering. Minimize this quantity where possible.
Above, if Source is expected to return more than a small number of rows, see if there is any way to apply
61
filtering before buffering. For example, if you know that all large customers have at least $50,000 in total sales,
you could limit buffering to customers with sales above this amount or a Chicago office assignment by
changing BufferedSource to:
BufferedSource = Table.Buffer(Table.SelectRows(Source, each [Office] =
"Chicago" or [TotalSales] >= 50000))Table.Buffer(Table.SelectRows(Source, each
[Office] = "Chicago" or [TotalSales] >= 50000))
A small change like this could significantly reduce the quantity buffered, making a vast difference on resource
usage.
PERFORMANCE THOUGHTS
Now that you hopefully have a solid understanding of streaming, query folding and buffering, let’s test that
knowledge by pondering some performance tips. Do the tips below make sense (like why/how each impacts
performance)? Does the order they’re in make sense (like why would it be unlikely that you would want to do
step 3 before step 1)?
The following are not absolute rules but rather a suggested starting place for addressing performance issues.
Please don’t blindly follow these but rather evaluate them in conjunction with an understanding of your
context and how Power Query processes tables.
1. Order steps that are query folded first. This offloads as much processing as possible to the external
data sources. To determine which steps are being folded, in many cases, you’ll need to use a trace tool
or check logs to see the native requests generated, as the UI doesn’t always reveal whether/how steps
are being folded.
2. Next, for those steps that cannot be folded, try for the following order: first filters, then operations
that do not internally hold rows in memory, followed by any remaining operations. This way, you
discard unneeded rows before performing additional processing or in-memory storage.
3. When buffering is necessary, buffer late in the chain of expressions. Why pay the cost for buffering
before that data really needs to be buffered?
4. If the above steps don’t produce the desired performance or cost too much in terms of caching and/or
buffering resources (e.g. you run out of memory!), hand-crafting a native query which you then have
Power Query execute is an alternative. Power Query’s query folding supports a limited set of
possibilities; you may be able to hand-code a native request that incorporates logic that folding doesn’t
know how to produce and so achieve performance or eliminate buffering in a way that isn’t possible
using automatic query folding.
Since data, library/data source function, security and environment changes can also effect performance, there
may be times where you may find it advantageous to re-try performance tuning even though you haven’t made
any code changes.
62
TABLE THINK II
Last time, we began exploring how Power Query “thinks” about tables, delving more deeply into streaming and
query folding. This time, we’ll continue building our understanding of how tables are processed by learning
about keys, native query result caching and the data protection layer (firewall). We’ll also explore why native
queries may be executed more times than you might expect.
The goal between these two posts is to equip you with a better understanding of the context in which your
mashups are executed—knowledge you can use to author more efficient M queries, avoid unexpected data
changes during processing and keep the data protection layer (firewall) happy.
KEYS
Keys to the kingdom…well, maybe more like a possible key to better performance.
Out of the box, a Power Query function that process tables uses an algorithm capable of working with any
combination of valid values streamed to it in any order. However, if information about table keys is provided,
an operation may be able to internally optimize itself to be more efficient.
For example, take a join operation between tables A and B on column ID. In table A, the operation encounters
a row with an ID of 1, so it looks in table B for rows with that ID. It finds one, joins it to the row from A and
returns the joined pair. Then, it resumes searching table B, looking for additional ID 1 rows to join to. In
contrast, if the operation knows that table B’s ID column contains unique values, it doesn’t need to search for
additional ID 1 rows in that table because the column’s uniqueness guarantees that there aren’t any more.
“What exactly is a key?” you might ask. For our purposes, we’ll use the following definition: A key is a column
or set of columns whose values identify rows. A unique key is a key whose values uniquely identify a row in the
current table. Out of a table’s unique keys, one may be identified as the primary key, which indicates that it is
the main identifier being used.
For example, imagine a table holding company information. Among its columns there’s one named CompanyID.
Almost certainly, just based on the name, we can guess that CompanyID is a key column—that the column’s
values are used to identify company rows in the table. If each CompanyID value identifies exactly one row in
the table, CompanyID is a unique key and, based on its name, is likely the table’s primary key (primary unique
identifier).
The Power Query language specification defines the ability to annotate a table with key information but—with
one exception (primary keys)—does not specify the kinds of keys that can be identified or what significance
63
operations should give to the keys that are identified. Generally, it seems like an operation can infer what it
needs about non-unique and foreign keys (keys pointing to rows in other tables) based on what the operation
is being asked to do, without those keys being explicitly tagged. Based on this, it would seem like unique keys,
such as primary keys, are what we should focus on. However, it may be worth tagging other keys when they’re
relevant, just in case the operation uses that information.
Keys can be viewed, defined and replaced using library functions on tables
(Table.Keys, Table.AddKey, Table.ReplaceKeys) and on table types
(Type.TableKeys, Type.AddTableKey, Type.ReplaceTableKeys).
Here’s an example of defining a key on a table then viewing details about the table’s keys:
let
Source = #table(
{"CompanyID", "Name", "Location"},
{
{1, "ABC Company", "Chicago"},
{2, "ABC Company", "Charlotte"},
{3, "Some Other Company", "Cincinnati"}
}
),
KeysTagged = Table.AddKey(Source, {"CompanyID"}, true)
in
KeysTagged
Data connectors may automatically attach key information to the tables they output. For example, a database
data connector might use the database’s indexes and constraints to determine which columns are keys and
then tag the outputted table accordingly.
Functions may adjust key information on the tables they return. For example, if a table doesn’t have any keys
identified, Table.Distinct tags a primary key on the table it outputs (defined as all columns in the table). This
makes sense: After applying Distinct, the set of values in each row are, well, distinct, and so uniquely identify
the row.
As was mentioned earlier, while the Power Query language provides support for identifying table keys, it’s up
to each operation whether to use this information and, if so, how to use it. Unfortunately, there doesn’t seem
to be any official operation-specific documentation describing when or how key information is used, so we
have to fall back to experimentation to figure out when it’s profitable to identify keys.
To performance tune an operation by providing key information, you might try something like the following:
64
Determine the column(s) used by the function to identify rows. If the operation works with multiple
tables, do this for each input table. For example, if the operation is a join, which column from each
input table is used to match rows between the two tables?
Next, for each input table, check whether the values in the identified column(s) uniquely reference
rows in that table. If so, check whether the table has a primary key defined for those column(s). If
not, tag those column(s) as the primary key. For good measure, if the identified column(s) do not
uniquely identify rows, tag them as a non-primary key, just in case the operation at hand uses that
information.
After adding keys, rerun the query and see if performance improved.
It only makes sense to try this tuning on operations that identify rows based on column values. Knowledge of
keys won’t help a function call like Table.FirstN(Source, 5) know which five rows to return!
What library functions do with table key information is an internal implementation detail that can change as
the library is updated, so at times you may find it advantageous to re-try this performance tuning even though
you haven’t changed your code.
HOST ENVIRONMENT
Mashups are executed in the context of an environment—a host program like Microsoft Power BI or Microsoft
Excel. This host environment can incorporate functionality that’s specially built to (hopefully!) improve
performance or that can have a side effect of reducing performance. While not strictly a part of either the
Power Query language or the standard library, these environmental factors are important to understand in
order to most effectively work with tables.
65
data back to disk doesn’t always make sense). Even for supported data sources, caching may not always be
used (such as when the same native query is executed multiple times before any instances of that native query
have completed).
Persistent caches, by definition, are intended to be shared. How broadly they are shared is also environment
specific. Sharing possibilities include just within a single mashup query during a single execution, across all
mashups during a single refresh and across multiple refreshes. Currently, Microsoft Power BI Desktop and
Microsoft Excel may share these caches between all M queries during the same refresh while older versions of
Excel may only share them within a single M query during a single refresh. In contract, both Power BI’s and
Excel’s query editors can preserve their caches across refreshes (in query editor, ever notice a warning about
the preview being x days old…that’s the persistent cache coming into play).
Technically, persistent caching may protect you from the data variability issue discussed in the previous post
(see Tables Are Not Immutable in part 12). However, I’d strongly suggest not to rely on this caching to protect
you—since you’re not guaranteed that it will always be in play. If data stability during processing is important,
make sure your expressions are coded to provide stable data.
You may not need to worry much about this caching kicking in. If it kicks in, it does so quietly and hopefully
gives your mashups a performance boost—a bonus—without you having to do anything. If you’re tracing native
requests and are puzzled by why you only see one request where you’d expect to see that request repeated
several times, it may be that persistent caching eliminated the need for that native request to go all the way
back to source multiple times.
On the other hand, if caching stops where you’re used to it being in play, you may notice a performance loss.
If performance mysteriously slows down after your dataset grows past a certain point, it could be that some of
the results from the native queries it executes have become too large for the persistent cache, resulting in
repeat native query invocations all being sent back to the external source. If this occurs, you might consider
increasing the size limit on the persistent cache (if your environment allows that to be configured).
From the security standpoint, persistent caching may result in data being left around even after the report that
loaded that data has been deleted. The persistent cache isn’t stored inside the report’s file, so deleting the
report doesn’t remove its data from the cache. To guard against this, you’ll need to manually clear your
environment’s persistent cache after deleting reports.
To optimize the likelihood of persistent caching, you could disable parallel loading of tables (if your host
environment allows this). At the cost of a longer refresh run time, this decreases the likelihood of the same
native query being executed simultaneously and so increases the chance that repeated invocations of the same
66
native query will be serviced out of the cache. While not something I’d recommend doing by default, disabling
parallel loading is an option to consider if repetitive native queries are incurring significant performance costs.
[Further Reading/References: TechNet forum post 1, TechNet forum post 2, TechNet forum post 3 (all by a
Power Query team member)]
Firewall
When first running a query that pulls from multiple data sources, you’re asked to set the source’s privacy level.
“What are privacy levels?,” you wonder, so you check the documentation where you learn that these levels
control the level of isolation between data sources.
This leaves you puzzled. Aren’t data sources intrinsically isolated? After all, when you use M to pull from
multiple data sources, M’s doing the combining, so each source is isolated from all the rest, right?
But what about query folding? With query folding, data returned by one source may be written into the native
query (or native request) sent to another source.
To borrow an example from the last post, when executing the below, data could be separately pulled from the
two sources then filtered locally or—if query folding comes into play—the appropriate filtering data could be
pulled from one sources then written into the native query that’s sent to the other source.
let
TableFromSourceA = GetFromSourceA(),
TableFromSourceB = GetFromSourceB(),
Result = Table.SelectRows(TableFromSourceA, each
List.Contains(TableFromSourceB[ID], [ID]))
in
Result
If query folding occurs, a native query similar to the following might be generated for source A (assuming that
the source is a SQL database):
SELECT *
FROM SomeTable
WHERE ID IN (1, 2, 3); -- these three values were extracted from the results of
executing TableFromSourceB
Above, query folding resulted in data being pulled from one source then pushed to the other. Sometimes this
data disclosure is acceptable and the performance benefit is delightful. Other times, depending on factors such
as the type of data (confidential healthcare records, trade secrets, etc.) and the trustworthiness of the data
sources, quiet leaking of data across sources like this could be a major security issue and so must not be allowed
regardless of performance impact.
67
Privacy levels are the mechanism for controlling the scope of data sharing that’s allowed across data sources
during query folding. Privacy levels are not intended to keep you from combining between sources or to stop
you from purposefully writing code that pulls data from one source then hands it to another—they exist
solely to control query folding of data from one source into native queries sent to another source.
Privacy level public indicates that data from the source can be freely shared with other sources during query
folding. Organizational level sources can only have their data exposed to other organizational level sources
during folding. Data from private sources cannot be folded to any other source, even to other private sources.
[References: Power BI, Excel]
Privacy levels have performance impacts. Allowing data to be shared across sources can bring performance
advantages from query folding; blocking sharing can result in the cost of a larger than strictly necessary set of
data being fetched and buffered locally before being combined.
Privacy levels also have coding impacts. Behind the scenes, when privacy levels are enabled, the Power Query
interpreter divides your code into units called partitions (for now, just think “groupings of my code”). Then, it
rewrites any code references that access data from other partitions to pass that data through Power Query’s
firewall. This allows the firewall to act as a gatekeeper, controlling the flow of data between partitions. When
cross-data source query folding needs to be blocked, the fact that cross-partition data flows through the
firewall allows the firewall to buffer that data at the relevant partition boundary. Since buffering blocks query
folding at the point where it occurs, this action keeps query folding from occurring across the partition
boundary and so prevents leaking data between sources.
The way the data protection layer is currently designed, the following rule must be complied with to ensure
that firewall logic can be inserted in the appropriate locations:
Either a partition may contain data sources with compatible privacy levels (i.e. where privacy levels allow folding
between the sources) or the partition may reference other partitions—but not both.
If the first part of that rule is violated—meaning that, within a partition, there is more than one data source
and the sources do not all have compatible privacy levels, an error along the lines of the following will be
returned:
Formula.Firewall: Query 'ImportantData' (step 'Source') is accessing data sources that have
privacy levels which cannot be used together. Please rebuild this data combination.
On the other hand, if the “not both” part of the rule is violated—that is, if a partition contains a data source
and references another partition—an error something like the below will be returned:
68
Formula.Firewall: Query 'ImportData' (step 'Source') references other queries or steps, so
it may not directly access a data source. Please rebuild this data combination.
In either case, don’t worry! You can combine the two sources—just the data protection level needs your code
to be re-worked so that it can insert the gatekeeping firewall code in the appropriate places to ensure that the
firewall can do its job.
To resolve, the code that combines between sources can’t be in the same partition as the sources and
incompatible sources can’t be together in the same partition. The key to adjusting code to comply with these
requirements is understanding where partition boundaries are drawn.
Unfortunately, the rules that define how code is partitioned are complex. A shortcut solution: If you encounter
one of these errors, place the code specific to each data source in a separate query (one query per data source)
and then reference those queries from another query that combines between them. This will result in partition
boundaries that are aligned in a way that works while saving you from wading through the complex specifics
of partitioning.
[Further Reading/Reference: TechNet forum post (by a Power Query team member)]
Technically, if all data sources have a public privacy level or if all have an organization privacy level, they can
be freely combined between, so the data protection level isn’t doing any protecting. Disabling it gets it out of
the way, avoiding the coding constraints it imposes and any performance overhead it incurs.
Is disabling the data protection layer a good idea? Only you can answer that question. I’d hesitate, though.
What if a data source is added down the road where data sharing should be prevented? The person adding the
new source correctly configures its privacy level but doesn’t realize that privacy levels are being ignored. Query
folding ends up quietly leaking data. Ouch! Leaving privacy level enforcement enabled gives you a valuable
future-proofing protection. I’d encourage you to not go around disabling it by default but instead only turn it
off when you encounter a problem where disabling is the only reasonable solution and you thoroughly
understand the impact (risk) of making that change.
69
EXTRA NATIVE QUERY INVOCATIONS
If you trace the native queries (or native requests) sent from Power Query to external sources, you might be
surprised. Where it looks like a mashup would execute a particular native query once, you might find that the
native request, or variants of it, are invoked multiple times.
“Why?” has to do with internal implementation details of the host environment and/or the functions being
used. Perhaps a promote headers operation pulled from source twice—once to get data to derive header
names and a second time to stream the result set. An environment might want the schema describing an M
query’s results before starting to pull data. Maybe the firewall requested a chunk of rows for analysis to help
it decide how to partition code. For various reasons, a native query (or variants of that query) may be executed
multiple times where you’d only expect it to be invoked once.
Sometimes the data connector can optimize the “extra” native query invocations so that they only pull a subset
of data (e.g. pull a zero-row result set from the database when just schema information is needed). When the
connector cannot optimize, these “extra” requests can get costly because the full native request may be
executed multiple times in its entirety.
Depending on the connector and external source, native requests involving entities like stored procedures,
temporary tables and basic (vs. OData) API calls may not be automatically optimizable (or only partly
optimizable) due to limitations of the external source. If lack of automatic optimization is causing performance
complications, you may be able to implement your own optimization logic using Table.View.
The possibility of extra native query executions is one reason why it is strongly advisable not to use Power
Query to execute native queries that modify data. You might think “On a scheduled basis, my report needs to
pull some data but, before that can happen, a table needs to be updated. I know what I’ll do—I’ll schedule a
report refresh that calls a stored procedure which first performs the update then returns the report’s data.”
Not a good idea—your table may end up updated more times than expected because of extra native query
invocations.
let is also an expression that produces a value. However, let lets us define intermediate expressions whose
results are assigned to variables. These intermediate expressions can then be used to produce the final
value returned by the let expression.
let
Multiplicand = 10,
Multiplier = 20
in
Multiplicand * Multiplier
70
Since a let expression is an expression that produces a value, let expressions can be used wherever values are
expected. This means we can assign them to variables, nest them inside other let expressions, and use them to
produce values for function call arguments.
let
x = 20,
y =
let
a = 10,
b = 20
in
a + b
in
x * y
let
x = 20
in
x * (let a = 10, b = 20 in a + b)
Date.AddDays(
DateTime.LocalNow(),
let
a = 2,b = 6
in (a + b) * b
)
FUNCTIONS: DEFINING
A function’s definition starts with the list of expected parameters inside a pair of parenthesis, followed by =>,
followed by the function body expression (or function body, for short). This expression defines how to compute
the value that’s returned when the function is invoked. Any parameters defined for the function become
variables inside this expression which it can use when computing the return value.
let
Input1 = 10,
Input2 = 15,
Input3 = 30,
SecretFormula = (a, b, x) =>
let
Output = x * (a + b),
OutputDoubled = Output * 2
in
Output,
Result = SecretFormula(Input1 , 15, 30)
in
Result
Above we see a function defined inside a let expression. Below, there’s a function with a function inside it.
71
Result
TOP-LEVEL FUNCTION
In contrast to a nested function, a top-level function stands on its own. It can be referenced from other
expressions (including other functions) but isn’t defined inside another expression.
By default, a value must be provided for each parameter when the function is invoked. If you’d like certain
arguments to be optional, simply proceed them with optional when defining the function’s parameter list.
CLEANTEXT FUNCTION
(input as text) =>
let
set1 = {"A".."Z"},
set2 = {"a".."z"},
set3 = List.Transform({0..9}, each Number.ToText(_) ),
set4 = {" ", "-", "_", ".", "'"},
allowedChars = set1 & set2 & set3 & set4,
output = Text.Select(input, allowedChars)
in
output
72
EXERCISE WRITE A FUNCTION TO EXTRACT TEXT WITHIN A PAIR OF SIGNALS
Given a string “There are several texts Nt(Jack the Ripper) and other texts …”, extract the
ARGUMENT TYPES
In the function signature, we also see the parameters’ types and the function’s return type. By default, they’re
all type any (notice the four as anys in the signature).
You can be more specific about types when defining functions. This function specifies types for its parameters
(both are number) and return value (text):
(PriceEach as number, Quantity as number) as text =>
"Total Cost: " & Number.ToText(PriceEach * Quantity)
PASSING FUNCTIONS
The ability to pass a function into another function is powerful. The other function can implement a generic
algorithm that’s widely applicable then use the function passed into it to custom its behavior so that it’s
relevant to our particular situation.
Take, for example, the idea of adding a new column to a table. Table.AddColumn implements the generic
formula which makes this possible. However, we want to customize Table.AddColumn’s behavior so that we
73
can control the values used for the new column. To enable us to do this, Table.AddColumn allows us to pass it
a function as a parameter. It then invokes this function once per table row, passing the passed-in function the
current row as its argument and then using the value it returns as the value for the new column for the current
row.
let
Source = #table( {"Col1", "Col2"}, { {1, 2}, {3, 4} } ),
ColumnCreator = (row) => row[Col1] + row[Col2],
AddColumn = Table.AddColumn(Source, "RowTotal", ColumnCreator)
in
AddColumn
INLINE DEFINITION
Since a function is an expression and expressions are allowed in parameter lists, we can define functions inline,
directly in a parameter list.
Below, the new column function is defined in the argument list instead of first being assigned to a variable. As
far as Table.AddColumn is concerned, the effect is the same as the previous example.
let
Source = #table( {"Col1", "Col2"}, { {1, 2}, {3, 4} } ),
AddColumn = Table.AddColumn(Source, "RowTotal", (row) => row[Col1] +
row[Col2])
in
AddColumn
SHORTCUTS: EACH & _
In life, each person is special. In Power Query M, each is also special—because it simplifies a common M code
pattern.
Defining a function that accepts a single argument is such a common need in Power Query M that the language
defines a shortcut to simplify it: Keyword each is shorthand for (_) =>.
Since we haven’t talked about records yet, we’re jumping ahead of ourselves—but I’ll go ahead and let you in
on another shortcut: [FieldName] without a name directly before it is shorthand for _[FieldName].
Each of the below statements is equivalent. Each successive statement uses more concise syntax which makes
it easier to read.
let
Source = #table( {"Col1", "Col2"}, { {1, 2}, {3, 4} } ),
AddColumn = Table.AddColumn(Source, "RowTotal", each [Col1] + [Col2])
in
AddColumn
74
Why the name each? My guess is the name comes from the fact that each is often used to simplify function
definition where the function will be invoked once per item in the input data set (for
example, Table.AddColumn invokes a single-argument function once for each table row). Regardless of its
etymology, each can be used any time you want to define a single argument function, whether or not it will be
called once per item.
RETURNING FUNCTIONS
Functions can also return functions.
() => (x, y) => x * y
Does the above look pointless to you?! Returning functions becomes much more advantageous when we take
advantage of something called closure. A closure allows a function to remember the values of variables that
were in scope when it was defined.
Below, when we invoke the outer function and pass it a value for x, the inner function that’s returned
remembers the value of x. When we invoke the inner function, we only need to pass it a value for y. It will then
multiply the remembered x value by y.
(x) => (y) => (x * y)
Invoking the above, passing 5 as the value for x returns a function that behaves as though it were defined like
this (notice how the function remembers the value of x that used when it was generated):
(y) =>
let
x = 5
in
(x * y)
Library function List.Transform expects two arguments. First, the source list; then a function that will be
invoked once per list item to transform that item to its new value. This function will be passed the value of the
current list item as its argument.
We want to transform a list of numeric values, reducing them by a certain percentage. One way to do this is to
define a function that accepts the discount percentage and returns a function that accepts a value and reduces
it by the remembered discount percentage. This returned function will be passed in to List.Transform.
let
Source = { 1, 2, 3, 4, 5 },
CalculatorGenerator = (discountPercentage) =>
(value) => (1 - discountPercentage) * value,
HalfOff = CalculatorGenerator(0.5),
Result = List.Transform(Source, HalfOff)
in
Result
75
CONTROL STRUCTURE
Nope. That’s not a typo in the title. In the Power Query world, there aren’t control structures (plural); there’s
just one control structure (singular). We’re about to examine its simplicity. As to the “missing” control
structures (which you may be used to from other programming languages), we’ll explore ways of implementing
similar functionality the M way.
CONDITIONAL EXPRESSIONS – IF
What if you want one or the other of two expressions to be executed, with the choice of which based on some
condition? Power Query’s if expression allows you to do just this!
if TestExpression
then ExpressionToUseWhenTestEvaluatesToTrue
else ExpressionToUseWhenTestEvaluatesToFalse
When evaluated, the value of the test expression is first computed. If the result is a logical true, the then
expression is evaluated and its value returned. If the test produces false, the else expression is evaluated and
its value returned. If test evaluates to something other than a logical value ( true or false), an error is raised.
Pretend we have a table of test scores. There’s a need to add a column indicating whether each score is a pass
or a fail. A conditional expression comes to the rescue:
let
Source = GetTestScoresFromSource(),
EvaluateScore = (score) => if score >= 0.7 then "Pass" else "Fail",
Result = Table.AddColumn(Source, "Result", each EvaluateScore([Score]))
in
Result
Some programming languages allow an if expression without an else clause. This isn’t allowed in Power
Query. In M, every expression (including if expressions) must either return a value or raise an error.
Allowing if without else would result in nothing happening when the if test evaluates to false, which
would break how M works.
Power Query has exactly one general syntax form for if. Unlike some other languages, there’s no short-form
general purpose syntax variation (so no alternative syntax like (score >= 0.85)? "Pass" : "Fail").
However, there is a shortcut syntax (well, technically, an operator) for null coalescing.
Suppose you want to return a value so long as it is not null; but if it is null, you instead want to return an
alternate value. You could code this out in long form using an if statement—or more succinctly using the null
coalescing operator (??). Both achieve the same effect of returning ValueA unless it is null, in which
case ValueB is returned.
if ValueA <> null then ValueA else ValueB
ValueA ?? ValueB
76
(The null coalescing operator is a new addition to Power Query. It has not been officially documented yet and
may not be available in all environments.)
Speaking of syntax, there’s also no special elseif syntax. Instead, to apply another conditional expression
when the first condition fails, simply put an if statement in the proceeding statement’s else clause. Repeat
this as many times as necessary to achieve the desired effect.
(Grade) =>
if Grade= "A" then 0.9
else if Grade= "B" then 0.8
else if Grade= "C" then 0.7
else if Grade= "D" then 0.6
else 0 // Grade= "F"
Chaining a more than a few if statements together like this can get verbose and violate the Don’t Repeat
Yourself (DRY) principle. If you’re familiar with other programming languages, you might want to replace a set
of if ... else ifs with a switch statement. M doesn’t provide a switch-like syntax option. Instead, perhaps
you can treat the set of conditionals as a set of lookup values, placing them in a table or record then looking
up the value of interest.
let
Source = "C",
fxgradeLookup = (Grade) =>
let
Map = #table({"LetterGrade", "Score"}, {{"A", 0.9}, {"B", 0.8}, {"C",
0.7}, {"D", 0.6}, {"F", 0}})
in
Map{[LetterGrade = Grade]}[Score],
Result=fxgradeLookup(Source)
in
Result
Or, in this case, even more concisely, using a record:
Customer group 1
77
Customer group 2
Customer group 3
78
Customer group 4
79
EXERCISE WRITE A FUNCTION TO CALCULATE THE PERSONAL INCOME TAX (PIT)
Given the below table for personal income tax calculation.
- Giảm trừ gia cảnh đối với bản thân người nộp thuế là 11 triệu đồng/tháng (tương đương 132 triệu
đồng/năm) và đối với mỗi người phụ thuộc là 4,4 triệu đồng/tháng.
Given a total income to pay tax, number of dependent persons, write a function to calculate the PIT for
the person
LOOPING CONSTRUCTS?
As far as control structures go, if is it. There are no other control structures, including no language syntax for
looping constructs like foreach, for or while/do while (which you might be familiar with from other
programming languages).
You might find this surprising and perhaps be inclined to think that M is a very immature language to be missing
such fundamental concepts. Not so fast! It just might be that functional programming in general and/or M in
80
particular are different from what you’re used to and so rely on different means to achieve the same ultimate
effect.
foreach
Traditionally, a foreach expression executes a block of code for each item in a collection (like each item in a
list or each row in a table). Power Query is built to process list items and table rows. However, instead of
explicitly coding low-level looping details, you declaratively use library functions to define what you want done
at a higher level, and let the language and library take care of applying that processing to the relevant items or
rows.
For example, to apply a transform to each value in a column, it’s not necessary write a foreach which loops
through the table’s rows one at a time, updating the column of interest in each row as it’s iterated over. Instead,
you use Table.TransformColumns to say “apply the given transform to the specified column” and M takes
care of the menial, row-by-row application of that transformation. Similarly, to add a new column, you don’t
iterate through the table’s rows, adding the column to each row one at a time. Instead, you simply declare
your intention using Table.AddColumn, providing that method with an expression defining the new column,
and M takes care of the per-row application.
Since your focus is declaring intent vs. coding up row-by-row processing to implement your intent, the resulting
syntax may end up more succinct. For example, instead of imperatively using foreach to sum up values in a
column, in the M world you’d simply declare that intent by applying List.Sum to the column.
// Instead of this:
var total = 0;
foreach(var row in SomeTable) {
total += row[ColumnOfInterest];
}
// You do this:
List.Sum(SomeTable[ColumnOfInterest])
Notice how the non-M example involved changing a variable. It’s a good thing Power Query doesn’t require us
to write code like this, otherwise we’d be in trouble. Writing iterative code (like the above) often require
modifying variables—but with M, this isn’t allowed, for simple variables are immutable!
81
processing to the resulting sequence of values. To implement break-like behavior, use an appropriate library
function (like List.Select) to stop processing when you encounter the desired state.
For example, suppose you’re waiting for a long-running remote job to complete. You have a function which
checks the job’s status by calling a web service, returning null if the job is still running and returning the job’s
results if it has completed. You want to repetitively query the source until either you receive a non-null
response (indicating completion) or you’ve tried a certain number of times (a safety cutoff).
In an imperative programming language, you might use a for loop (or maybe a do while loop) with
a break clause. In M, you can pull this off using something like:
(MaxAttempts, DelayBetweenAttempts) =>
let
Numbers = List.Numbers(1, MaxAttempts),
WebServiceCalls = List.Transform(Numbers, each
Function.InvokeAfter(CallWebService, if _ > 1 then DelayBetweenAttempts else
#duration(0,0,0,0))),
OnlySuccessful = List.Select(WebServiceCalls, each _ <> null),
Result = List.First(OnlySuccessful, null)
in
Result
Here the beauty of streaming shines. When executed, Result‘s List.First asks OnlySuccessful for a single
list item. OnlySuccessful‘s List.Select pulls one value at a time from WebServiceCalls until it finds one
that isn’t null, which it returns to Result‘s List.First.
Each time a value is pulled from WebServiceCalls, List.Transform invokes the CallWebService function,
which calls the web service. The maximum number of times List.Transform will do this is constrained by the
quantity of values in Numbers.
Once the first non-null value has been returned, streaming stops and so calls to the web service also stop. This
is true even if unprocessed values remain in Numbers. Since streaming is complete, nothing causes these extra
numbers to be streamed through the transform and so trigger web service calls. The net effect is that the web
service is queried until it returns a non-null result, up to MaxAttempts number of times. Beautiful!
If your need is to loop indefinitely until the desired state is reached (vs. up to a fixed number of times, like the
above example), you can drive that iteration using an infinite series produced by List.Generate:
List.Generate(() => null, each true, each null)
Just keep in mind if you need to loop many, many times, chances are there’s either a non-iterative approach
you could (should) be using or you’re doing something outside of Power Query’s forte (beyond mashing up
data). In the later case you likely will be better off using a more appropriate language/technology.
(Bonus, before we leave the topic of looping: For an alternate implementation of wait-retry logic that wires in
the take action [e.g. call web service] and check for success steps using arguments to List.Generate,
see Value.WaitFor‘s description and definition.)
82
LIST ACCUMULATE FUNCTION
= List.Accumulate( // List.Accumulate,
list as list, // by using the items in this list,
seed as any, // transforms the seed (starting) value
accumulator as function // with this function
) as any
List.Accumulate takes a list of values, a starting value and an accumulator function. Everything begins with
the seed (starting value). Next, List.Accumulate applies a function (3rd argument) on the starting value
(2nd argument) repeatedly using each of the list items.
The user can specify the accumulator function and the function iterates through each item in the list. This
means that each transformation works on top of the previous one.
In essence the function performs a transformation on the seed (starting value) iterates through each item
in the list. After iterating a number of times over the starting value using each list item, the accumulator
function returns the result.
Notice that a lot of these functions are not best practice and can more easily be done with other functions.
For learning purposes below examples are however wonderful.
= List.Accumulate( { 1, 2, 3, 4, 5 }, // the list used as
input
0, // the starting value
( state, current ) => state + current ) // logic to apply
LIST GENERATE
Parameter 1
"initial"
The "initial" parameter is a function that specifies the start value of the loop. In addition to a scalar, i.e. a
number, this start value can also be a "Structured Value", i.e. a "Table", "Record" or a "List".
Parameter 2
83
"condition"
The "condition" parameter is a function that makes List.Generate() a kind of "Do-Loop" loop. Here you
have to specify the execution condition of the loop. As long as the condition is met, the loop will continue.
Parameter 3
"next"
The "next" parameter is a function that defines which operation will be performed on each new loop
iteration. The start value from "initial" or the intermediate result of the last iteration is changed in each
loop pass. This happens as long as "condition" is fulfilled.
Parameter 4
"selector"
The "selector" parameter is the only optional parameter of List.Generate(). Specifying "selector" you can
set the structure in which the final result should be returned. For example, if you process a "record" with
several columns using List.Generate(), you can specify which of these columns should be included in the
output of the final result. If you do not specify "selector", the complete structure will be returned.
APPLICATION
You can use List.Accumulate to perform simple operations. To replicate the List.Sum function, you use the
following List.Accumulate formula:
EXAMPLE 1 GENERATE
let
StartFunction = ()=>1,
TestFunction = each (_ <= 10),
IncrementFunction = each (_ + 1),
in
MyList
EXAMPLE 2 ACCUMULATE
let
Source= List.Accumulate( { "a", 5, "c", "%", 100 },
null,
( state, current ) =>
state & ","& Text.From( current )??current
) // Returns a,5,c,%,100
in
Source
84
EXAMPLE 3 FIND MAX VALUE
let
Result= List.Accumulate( { 1, 49, - 400, 150, 60 }, // the list used as
input
0,
( state, current ) =>
if state < current // if the state value is
then current // smaller then current
else state // return the current value
)
in
Result //Return 150
85
EXAMPLE 5 MULTIPLE REPLACEMENT
EXAMPLE 6 HIERARCHY
Given a table, add an additional column C to fill the value the same as column A
86
let
Source = Excel.Workbook(File.Contents(ExcelDirectory & "\Hierarchy.xlsx"),
null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(Sheet1_Sheet,{{"Column1",
Int64.Type}, {"Column2", Int64.Type}, {"Column3", Int64.Type}, {"Column4", type
text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Column3"}),
tab = Table.RenameColumns(#"Removed Columns",{{"Column1", "A"}, {"Column2",
"B"}, {"Column4", "D"}}),
colC = List.Generate(()=>[k=0, C=tab[A]{0},i=0], each
[i]<Table.RowCount(tab), each [k= if tab[A]{i} is null then [k] else i,
C=tab[A]{k}, i=[i]+1], each [C]),
tabC= Table.FromColumns(Table.ToColumns(tab)&{colC}, {"A","B","D","C"})
in
tabC
87
Expected result is
Add another column OrderDetailLine to represent the ID of row in a basket with the format <OrderID>-ID.
For example, if OrderID is CA-2016-152156, then the OrderDetailLine is CA-2016-152156-1, CA-2016-
152156-2, … The expected result is shown below:
88
EXERCISE GENERATE ID FOR A GROUP 3
Modify the previous example with Table.Buffer
ERROR HANDLING
Your Power Query is skipping merrily along its mashup way. Then, bam! Something bad happens! Uh oh! What
do you do when an error raises its ugly head? Or, for that matter, what if code you write detects an anomaly
and you want to announce this fact in an informative manner?
Thankfully, M has error handling capabilities, allowing you to both raise and handle runtime errors. We’ll learn
how to do both.
Important: If you’re familiar with the idea of an exception from other programming languages, Power Query’s
error handling is different in at least one significant respect from what you may be familiar with.
ANNOUNCING AN ERROR
In Power Query, each expression must produce something. Ideally, this is the expected value. However, there’s
an alternative: an expression can raise an error, which is a special way of indicating that the expression could
not produce a value.
The main way to raise an error is by using keyword error accompanied with a record describing the problem.
error [
Reason = "Business Rule Violated",
Message = "Item codes must start with a letter",
Detail = "Non-conforming Item Code: 456"
]
89
In the error definition record, five fields are
relevant: Reason, Message, Message.Format, Message.Parameters and Detail. Technically, all these fields
are optional, and any extra fields included in the error definition record will be ignored.
Special behavior applies to field Reason and the Message* trio of fields:
Reason—If this field is missing, the error that’s raised will have its reason defaulted to
“Expression.Error” (at least, this is true with the version of the mashup engine I’m using—technically,
the language specification doesn’t mandate this defaulting).
Message* Fields—Two options are available for defining the error’s message: Directly specify
a Message, or use Message.Format + Message.Parameters to define a structured error message
(see New M Feature: Structured Error Messages for more details).
As an alternate to creating the error definition record by hand, helper method Error.Record can be used to
build the record. The function’s first argument maps to field Reason. The second to either field Message or, if
a list is passed as Error.Record‘s forth argument, to Message.Format. Arguments three and four map
to Detail and Message.Parameters, respectively. Unlike the above build-your-own-record
approach, Error.Record requires that you provide a Reason; its other arguments are optional.
error Error.Record("Business Rule Violated", "Item codes must start with a letter", "
1
456")
It’s up to you as to whether you prefer to create error definition records using [...] syntax or
with Error.Record. In either case, ultimately, a record is being created which you hand over to error when
you’re ready for the error to be raised.
Both of the above examples produce an equivalent error:
Looking at the above screenshot, it’s easy to see how the three values that were provided map to the error
messaging that’s displayed.
In lieu of a record, error also accepts a string. The resulting error will have its Message set to the provided string
and its Reason set to “Expression.Error” (at least, that’s the default Reason with the mashup engine version
I’m using—technically, the language specification doesn’t mandate this defaulting).
1 error "help!"
90
Ellipsis Shortcut
There’s also a shortcut operator for raising errors which comes in handy during development.
Let’s say you want to test a mashup that’s under development where you haven’t yet implemented every
branch of each expression. Of course, since each branch must either return a value or raise an error, you can’t
test run your query without putting something as a placeholder in those unimplemented branches, but what
should you use?
When you encounter a situation like this, consider the ellipsis operator (...). When invoked, ... raises an
error something like “Expression.Error: Not Implemented” or “Expression.Error: Value was not specified” (the
exact wording depends on your mashup engine version).
Here’s a bit of code where the developer hasn’t yet implemented the if statement’s else branch so is
using ... as a placeholder:
if Value then DoSomething() else ... // when Value evaluates to false, "..." is
called, which raises the placeholder error
(Notice how keyword error is not used. The ellipsis operator both defines and raises the error. Very short,
sweet and simple to use.)
SPECIAL BEHAVIOR
What exactly happens when an error is raised? What special behavior does raising an error entail that sets it
apart from simply returning an ordinary value?
SomeFunction(GetValue())
When evaluated under normal circumstances, first GetValue() is executed. Then, the value it produces is
passed into SomeFunction(). Lastly, SomeFunction()‘s result is returned as the expression’s output.
Heaven forbid, but suppose instead that GetValue() raises an error. Immediately, further execution of the
expression stops. SomeFunction() is not called. Instead, GetValue()‘s error becomes the expression’s
output: it is propagated (a.k.a. raised) to whomever or whatever invoked the expression.
What happens next depends on whether that whomever or whatever can hold a value: the error may be
contained or may become the mashup’s top-level error. Only in case of the latter does the error cause the
mashup as a whole to terminate.
91
Error Containment
If the error is encountered by an expression that defines something holding a value (like the expression for a
record field, a table cell or a let variable), the error is contained by that something—its effects are limited to
that something and any logic that attempts to access that something’s value.
Below, the effects of GetValue()‘s error are contained to the portion of the larger mashup affected by it. The
error does not terminate the entire mashup; rather, the mashup completes successfully and returns a valid
record. Only FieldB and FieldC are errored because they are the only “somethings” affected by the error.
let
GetValue = () => error "Something bad happened!",
DoSomething = (input) => input + 1,
Result = [
FieldA = 25,
FieldB = DoSomething(GetValue),
FieldC = FieldA + FieldB
]
in
Result
This containment of errors brings with it another special behavior: When an error is contained, the error is
saved into the something that contains it. Throughout the remainder of the mashup’s execution, any attempt
to access that something’s value causes the saved error to be re-raised. When an access attempt occurs, the
logic that originally caused the error is not re-evaluated to see if it now will produce a valid value; that logic is
skipped and the previously saved error is simply re-raised.
Below, Data‘s GetDataFromWebService() is only evaluated once, even though Data itself is accessed twice.
The second access attempt receives the error saved from the first access.
let
Data = GetDataFromWebService() // raises an error
in
{ List.Sum(Data[Amount]), List.Max(Data[TransactionDate]) }
Top-Level Errors
When an error is encountered, if nothing contains it, the error is propagated from the mashup’s top-level
expression (the mashup’s output clause) to the host environment as the mashup’s result. Execution of the
mashup then stops.
This mashup’s top-level expression errors. Nothing is present to contain the error, so the mashup dies,
outputting the error as its result:
let
92
GetValue= () => error "Something bad happened!",
SomeFunction = (input) => input + 1
in
SomeFunction(GetValue())
The below mashup’s error is first contained in Result but then the top-level expression accesses Result which
results in the error being re-raised to the top-level expression. Since nothing contains the error this time, it
becomes the mashup’s output—like the preceding, the mashup dies with the error.
let
GetValue= () => error "Something bad happened!",
SomeFunction = (input) => input + 1,
Result = SomeFunction(GetValue())
in
Result
Error containment is a great behavior considering M’s target use case: processing data. Suppose the expression
defining a table column value errors for one cell out of the entire table. In an exception-based world, this error
might cause all processing to terminate. In M’s world, the error simply affects that single cell and any code that
accesses that cell. Processing continues and the decision of whether the error is significant is left to whatever
code consumes the cell’s value.
In fact, due to M’s laziness, if nothing ever attempts to use that cell’s value, its expression may not be evaluated,
and so the error never raised. Why should the mashup engine waste effort computing something that will just
be thrown away untouched?
let
Data = #table({"Col1"}, {{"SomeValue"}, { error "bad" }})
in
Table.RowCount(Data)
Above, row and column values are not needed to produce the requested output (the count of rows), so the
second row’s error expression has no effect.
While error containment is a great default behavior, what if it doesn’t suit your needs? In particular, with tables,
what if it’s important to differentiate between rows with errors and those without? Perhaps you’re not
accessing row contents directly, so aren’t doing anything that would trigger error propagation, but still want
93
to know which rows have an error somewhere in them and which do
not. Table.SelectRowsWithErrors and Table.RemoveRowsWithErrors are likely just what you need.
let
Data = #table({"Col1"}, {{"SomeValue"}, { error "bad" }})
in
[
RowsWithErrors = Table.RowCount(Table.SelectRowsWithErrors(Data)),
RowsWithoutErrors = Table.RowCount(Table.RemoveRowsWithErrors(Data))
]
HANDLING ERRORS
With an understanding of raising errors tucked away, what do you do if you’re handed an error? Surely there’s
a graceful way to handle it—some way to try to resolve it!
That’s it—that’s the keyword: try. try allows you to attempt to handle an error by taking remedial action.
try comes in three main variants:
// try otherwise
try ExpressionToTry otherwise FallbackExpression
// try catch
try ExpressionToTry catch (e) => FunctionBody
try ExpressionToTry catch () => FunctionBody
// plain try
try ExpressionToTry
try otherwise
The first version, try otherwise, tries to execute the expression to try. If that expression returns a
value, try simply returns that value. If, instead, the expression errors, that error is ignored, the otherwise
expression is evaluated and whatever that expression produces becomes the output of the try
otherwise expression. In essence, if the first expression (the “to try” expression) errors, fallback to the second
expression (the “otherwise” expression).
try Number.FromText(input) otherwise 0
If Number.FromText returns a value, then that value is returned from try. Instead, if Number.FromText raises
an error, try handles that error, replacing it with the output produced by the otherwise expression (in this
case, the value 0). So, if input can be parsed to a number, that number is returned; otherwise, a default value
of 0 is returned.
Keep in mind that only the expression directly to the right of try will have its errors caught and replaced. If the
otherwise expression returns an error, that error won’t be handled by the try coming before it. Of course,
since the otherwise expression is itself just an expression, you could put a try inside that expression to handle
errors raised from it.
94
try GetFromPrimary()
otherwise try GetFromSecondary()
otherwise "Having problems with both servers. Take the rest of the day off."
Try otherwise works well in a situations like text-to-number parsing but it can leave something to be desired in
more complex scenarios. Why? The catch is that the otherwise is indiscriminate: it replaces any error by
evaluating the fallback expression. Sometimes, the desired remedial action differs based on the specifics of the
error encountered.
try catch
try catch allows us to handle this possibility. If the tried expression completes successfully (i.e. it returns a
value), the value it produces is output. If, instead, the expression being tried raises an error, the catch function
is invoked. This sounds very much like try otherwise, and it is—except for one very significant difference.
The catch function can be defined as accepting zero arguments or one argument. If a zero-argument function
is used, then try catch is identical in behavior to try otherwise.
// both are equivalent in behavior
try Number.FromText(input) catch () => 0
try Number.FromText(input) otherwise 0
On the other hand, if the catch function is defined as accepting an argument, then when that function is
invoked, it will be passed a record with details about the error that just occurred. This presents the possibility
to dynamically adapt how the error is handled based on its specifics—a significant ability not possible with try
otherwise.
let
Source =
try GetDataFromPrimary()
catch (e) =>
// if the error is because primary is unreachable, fall back to secondary
if e[Reason] = "External Source Error" and e[Message] = "Server is
unreachable"
then GetDataFromSecondary()
95
It must be defined inline. Defining the function elsewhere and then simply referencing it by name
isn’t allowed.
Its parameter list must be defined using parenthesis syntax. The each shortcut isn’t allowed.
Type assertions may not be used in the definition.
try
Last but not lease, plain vanilla try evaluates the provided expression, then returns a record with details about
the expression’s result.
try SomeExpression
If the tried expression completed successfully, the record try outputs is in the form of:
[
HasError = false,
Value = (whatever value the tried expression returned)
]
For example:
let
DoSomething = () => 45,
Result = try DoSomething()
in
Result // [HasError = false, Value = 45]
If the tried expression raised an error, the returned record looks like:
[
HasError = true,
Error = (error details record)
]
Example:
let
DoSomething = () => error "bad",
Result = try DoSomething()
in
Result
// [
// HasError = true,
96
// Error = [
// Reason = "Expression.Error",
// Message = "bad",
// Detail = null,
// Message.Format = "bad,
// Message.Parameters = null
// ]]
// ]
Prior to try catch being added to M, implementing conditional remediation logic required using try with some
boilerplate code, resulting in verbose expression like:
let
Primary = try GetDataFromPrimary(),
Source =
// if primary is good, use what it returns
if Primary[HasError] = false
then Primary[Value]
97
Scope
In order have an effect, error handing must occur at a level where the error is encountered. Error handling has
no effect on errors that are contained at a different level.
let
Data = #table({"Amount"}, {{10}, {error "help!"}, {error "save me!"}})
in
try Data otherwise 0
Result‘s try doesn’t do anything for this mashup. Apparently, the developer hoped it would replace any
column errors with zero, but that’s not how it was applied. The way things were wired up, if the expression
defining Data raises an error, try will replace that error with zero. However, in this case, Data returns a valid
table. True, there are cells in that table with errors, but those errors are contained at the cell level. Since they
do not affect Data‘s table-level expression, the try at the table expression level has no practical effect.
try does help with the following, but its effect may not be what the developer intended.
let
Data = #table({"Amount"}, {{10}, {error "help!"}, {error "save me!"}})
in
try List.Sum(Data[Amount]) otherwise 0
Above, List.Sum iterates through the values in Data[Amount], adding them up. If an expression defining an
item value raises an error, that error is propagated out of List.Sum, causing the summation as a whole to
abort. try handles this error, returning 0 in place of the total List.Sum would have output in error-free
circumstances.
If that was the intention, great! However, if the intention was to replace any erroring items with 0 while
allowing the summation as a whole to complete, try must be applied so that it handles errors at the table cell
level—it needs to be wired in to receive errors from column value expressions.
At first glance, Table.TransformColumns(Data, {"Col1", (input) => try input otherwise 0}) might
seem like an option. Perhaps surprisingly, this logic does not catch errors raised by column value expressions.
Why not? A function’s arguments are eagerly evaluated before their values are passed into the function. If that
evaluation results in an error, the function is not invoked so never sees the error; instead, the error is
propagated out to the caller. In the case of Table.TransformColumns, if a column value expression raises an
error, the transformation function (e.g. (input) => ...) is not called, so its try cannot handle the error;
instead, the error is propagated back to Table.TransformColumns.
The problem is that the column value expression needs to be evaluated inside the try. To achieve this, try
stepping back to the row level. Wire in a function that receives a reference to the entire row. Then, inside your
function, use the row reference to access the column’s value, wrapped in a try expression. Now, any errors
raised as a result of that evaluation will be propagated to your try expression which can then handle them
appropriately.
98
It’s not simple, but one of the simplest ways to get a column’s value via a row reference, work with it, then
save the resulting output back to the table is to replace the column of interest by
using Table.AddColumn followed by Table.RemoveColumns + Table.RenameColumns:
let
Data = #table({"Amount"}, {{10}, {error "help!"}, {error "save me!"}}),
ErrorsReplacedWithZero = Table.AddColumn(Data, "NewAmount", (row) => try
row[Amount] otherwise 0),
RemoveOldAmount = Table.RemoveColumns(ErrorsReplacedWithZero, {"Amount"}),
RenameNewAmount = Table.RenameColumns(RemoveOldAmount, {"NewAmount",
"Amount"})
in
List.Sum(RenameNewAmount[Amount]) // returns 10
I agree with you—the above is a complex solution to achieve something that seems like it should be
straightforward. If you want to use an elaborate try, unfortunately, some form of working with the table at
the row level is required. However, if all you need is to simply replace any error in a particular column with a
default value (which is all the above example’s try does), Table.ReplaceErrorValues is your friend.
let
Data = #table({"Amount"}, {{10}, {error "help!"}, {error "save me!"}}),
ErrorsReplacedWithZero = Table.ReplaceErrorValues(Data, {{"Amount", 0}}) //
replaces any errors in column Amount with 0
in
List.Sum(ErrorsReplacedWithZero[Amount]) // returns 10
Applying similar behavior to items in a list is more complex. There’s no List.ReplaceErrorValues library
function and List.Transform(Data, (input) => ...) doesn’t help for the same reason
that Table.TransformColumns doesn’t help with tables. Instead, the simplest solution may be to turn the list
into a table, handle the error appropriately, then convert the table back to a list.
let
Data = {10, error "help!", error "save me!"},
DataAsTable = Table.FromValue(Data),
ErrorsReplacedWithZero = Table.ReplaceErrorValues(DataAsTable, {{"Value",
0}}),
BackToList = ErrorsReplacedWithZero[Value]
in
List.Sum(BackToList) // returns 10
RULE VIOLATIONS
You may not find yourself raising errors that often. Typically, the errors you encounter may come from data
connectors and library functions. Don’t forget, though, that you can use errors to announce violations of
expectations, such as to signify that a particular data item failed to conform to a business rule.
Say you’re processing a CSV file where values in the ItemCode column should always start with an “A”. Early in
your mashup, you could check values for conformance to this rule, replacing abnormal values with errors. Later
99
processing steps which access the column will be alerted if they attempt to work with rule-violating values
(because of the errors that will be raised).
let
Data = GetData(), // for testing use: #table({"ItemCode"}, {{"1"}, {"A2"}})
Validated = Table.TransformColumns(Data, {"ItemCode", each if
Text.StartsWith(_, "A") then _ else error Error.Record("Invalid Data",
"ItemCode does not start with expected letter", _) })
in
Validated
This approach may be of particular interest when writing a base query that several other queries will pull from,
as it allows you to centralize your validation (think: the DRY principle) while ensuring that attempted users of
erroneous data are forcibly alerted to the presence of the anomalies.
By no means is this the only means of centralizing validation logic. Another option is simply to define an extra
column for the rule, set to true or false, based on whether the rule is complied with:
let
Data = GetData(), // for testing use: #table({"ItemCode"}, {{"1"}, {"A2"}})
Validated = Table.AddColumn(Data, "ValidItemCode", each
Text.StartsWith(_[ItemCode], "A"), type logical)
in
Validated
With this option, logic that cares whether ItemCode is valid is responsible to check ValidItemCode. If the
developer forgets to perform this check, invalid data may be treated as valid. In contrast, the replace invalid
data with errors approach ensures that logic attempting to access an invalid value is forced to recon with its
nonconformance (because the access attempt raises an error).
Whether either of these options is appropriate will depend on your context.
100