0% found this document useful (0 votes)
29 views129 pages

Logic Hacks For The Optimization of Term

Uploaded by

sheologian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views129 pages

Logic Hacks For The Optimization of Term

Uploaded by

sheologian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 129

Logic Hacks for the

Optimization of
Terminal Devices
Po-Chun Huang ( 黃柏鈞 )
Department of Electronic Engineering,
National Taipei University of Technology
Disclaimer
• The copyright and intellectual properties of all materials, except
for those produced by the lecturer (Po-Chun Huang), belong to
the original authors of the sources.
• A majority of the materials in this slides are revised from Small
Memory Software: Patterns for systems with limited memory by
Charles Weir and James Noble, Addison-Wesley Professional,
2000.

2
Agenda
• Basic operations
• Intermediate arithmetic
• Multiplication & division
• Next steps

3
Reference Book
• Henry S. Warren Jr., Hacker's Delight, 2nd
Edition, Addison-Wesley, 2002.

https://www.specsavers.co.uk/sites/default/files/styles/ssw_left_right_image_565x278/public/material-men-3-v2_0.jpg?itok=3-Gvjw_M

4
What’s Horrible?
• How to do the following with the fastest steps in
C or assembly languages?
• Double a number or half a number.
• Round up an integer to the next power of 2.
• Round down an integer to the previous power of 2.
• Compute the mean of two integers, rounded down.
• Compute the mean of two integers, rounded up.
• We will cover these tomorrow.

5
We’ve seen how numbers are
represented in a modern fashion.
How about texts?

6
Alphanumeric Codes
• Extended Binary Coded Decimal Interchange Code (EBCDIC)
• Used by early IBM mainframe computers.
• Drawback: English letters in EBCDIC are not consecutive in
numerical values.
• American Standard Code for Information Interchange (ASCII)
• Current standard for encoding characters.
• Limitation: ASCII only supports English letters and basic punctuation
marks. No supports for Chinese letters.
• Unicode
• Latest standard that supports almost all languages.
• Issue: Many existing programming libraries still do not provide the
native support for Unicode… 7
ASCII Code

https://cdn.shopify.com/s/files/1/1014/5789/files/Standard-ASCII-Table_large.jpg?10669400161723642407 8
Some Details
• ASCII code covers printable characters and
non-printable characters.
• Printable characters are visible on screen, such as
‘a’ to ‘z’, ‘A’ to ‘Z’, and ‘0’ to ‘9’.
• Non-printable characters, a.k.a., control characters,
have special purposes, e.g.,
• Carriage return (CR) and line feed (LF) for line endings
• Bell (BEL) for triggering the beeper
• Acknowledge (ACK) and synchronous idle (SYN) for
network communication
• Null (NUL) for string endings
• Horizontal tab (HT) and vertical tab (VT) for positioning
9
Design Ideas of ASCII
• Digits ‘0’–‘9’ are arranged from to . 
A character can be converted to its numerical
value by subtracting the char value by 48.
• ‘A’ to ‘Z’ are also continuous from , while ‘a’ to
‘z’ are from . 
The capital of a lower-case letter can be
obtained by subtracting 32, and vice versa.
• [Discussion] Can we do even better? Yes! We can do
“ANI 10111111,” and “ORI 01000000,”
respectively.
• These are even faster than subtraction or addition.
10
Design Ideas of ASCII (Cont’d)
• Why doesn’t ASCII use escaped sequences to reduce the code
length of each character?
• Escape sequences are not reliable. Why?
• Consider a string with one character smudged on paper or cut from
paper. We may still recognize most of the contents.
 Some redundancy must exist, but where?
• JPEG and some sound formats are resistant to minor corruptions. What
are the “corruptions” here?
• [Research problem] How about our data structures on main
memory?
• [Research problem] Consider more sophisticated file formats, such
as .odt or .mp3. How to make them “green” (consuming less energy
as being used)? 11
Design Tradeoff for Environments
• Why is EBCDIC still used even after ASCII is developed?
• Punch cards become physically vulnerable when there are too many holes on the
same row or column!
• [Discussion] Can you suggest a solution, or at least some characteristics of the viable
solution?
• Thus, instead of just reducing the code length, there are something more to think
in real-world applications!
• See http://opass.logdown.com/posts/1300438-the-story-of-auto-beverage-
machine-23 for an inspiring example!

A punch card

http://faculty.washington.edu/rjl/uwamath583s11/sphinx/notes/html/_images/punch-card.png 12
How Large Are the Books in ASCII?
• Marcel Proust, À la recherche du temps perdu
(Rememberance of Things Past, 追憶逝水年華 ) is
about 7.7 MB.
• F. Scott Fitzgerald, The Great Gatsby ( 大亨小傳 )
is about 300 KB.
• Margaret Mitchell, Gone with the Wind ( 飄 , 或譯
亂世佳人 ) is about 2.5 MB.
• We are able to create more information than ever!
What a pity, most of the information we created are
garbage…
13
Characteristics of Unicode
• Universality
• Efficiency
• Characters, not glyphs
• Semantics
• Plain text

14
Characteristics of Unicode (Cont’d)
• Logical order
• Unification
• Dynamic composition
• Stability
• Convertibility

15
Thank You!
Any Questions?
https://i2.kknews.cc/large/142200021390de6462ae
231958 數位邏輯設計 Lecture 3:
Boolean Algebra &
Switching Algebra

Po-Chun Huang ( 黃柏鈞 )


Department of Electronic Engineering,
National Taipei University of Technology
http://i1.wp.com/p3.pstatp.com/large/5d90005d8089972348b
Huntington’s Axioms

• Set contains at least 2 elements .


• Closure properties: For any , we have and
.
• Cumulative law for and : For any , and .
• Identity elements 0 and 1: For any , and .

18
Huntington’s Axioms (Cont’d)

• Distributive laws: For any ,

• Complementation: For any , there exists a


complement of , denoted as .

19
Observations
• Each of the above axioms is independent
and consistent with each other.
• So, we cannot prove or disprove them.
• [Discussion] Can we remove any axiom
from the systems? Do we need to add
anything such as the associative laws ?

20
Operator Precedence
• Operators have order in computation:

• ’
Complement

21
Duality Principle
• Suppose that holds (is true), then the dual
is defined as with all 0s and 1s exchanged,
and with all and exchanged: .
• Duality principle: holds iff holds.
(iff: if and only if)

22
Idempotent Laws
• For any , we have

• Proof:
• , quod erat demonstrandum (QED).
• can be proven by duality principle.
• [Exercise] Prove from scratch!

23
Boundedness Theorems
• For each ,
0 and 1 are the lower and
upper “bounds” in this
system.

• Proof:
• . QED.
• From duality principle we also have
• [Discussion] Prove from scratch!

24
Uniqueness of Complements
• In Boolean algebra, the complement of a number
is unique. That is, suppose that both and are
complements of , then .
• Proof.
• Suppose that both and are complements of , then
we have , , , and .
•.
•.
• So . The complement of any is unique. QED.

25
Complementation
• In Boolean algebra, and .
• Proof.
• , and vice versa. QED.

26
Involution
• For all , .
• Proof.
• From above, we know that , , , and .
• So, both and are complements of .
• Because the complement of a number is
unique, we have . QED.

27
Associativity Laws

•For any , we have

•Equivalently, we say that for the


same operator or , the
computation order does not
affect the results.
28
Associativity Laws (Cont’d)
• Proof. .
• Let and . We want to prove that .
• We first prove that : .
• Likewise we have , , , , , and .
• Now we have .
• On the other side, we have .
• So . Likewise we prove . So . QED.

29
Associativity Laws (Cont’d)
•Proof. .
• We have several possible proofs.
• By duality principle.
• By analogous argument as in the
previous page.
• By normal, “textbook” proofs on our
textbook.
• Try yourself!
30
DeMorgan’s Theorems
•For any , we have

31
DeMorgan’s Theorems (Cont’d)

• Proof.
• We start by confirming that is a
complement of :

• So is a complement of .
• Since complement is unique, we have .

32
DeMorgan’s Theorems (Cont’d)
• Proof.
• Try yourself!
• DeMorgan’s Theorems can be used on over
2 operands, such as:

• You may verify yourself!

33
Absorption Rules
•For any , we have

•In addition, we have

•Prove them yourself!


34
Consensus Equalities
• Claims.

and its complement appears with and , so the


consensus terms can be removed.

• Please try to prove yourself.


35
Some Exercises
• Prove the following equations:

• Why? .

36
Some Exercises (Cont’d)

• Why? .

• Why? Based on (1), , and


• So .

37
Switching Algebra
• Switching algebra is a Boolean algebra
system defined over the minimal set .
• The operator is named as “OR” (iff or )
• The operator is named as “AND” (iff and )
• The operator is named as “NOT” (iff )

38
Switching Algebra (Cont’d)
• Due to the limited value range of , a way for
proving switching algebra statements is to use
truth table to exhaustively list all possible
combinations of values.
• Example: Prove .

0 0 0 0 0
0 1 0 0 0
1 0 0 1 1
1 1 1 1 1
39
Proving by Perfect Induction
(i.e., the Truth Table)
• Prove .
• Method 1: algebraic proof. .
• Method 2: perfect induction.

0 0 1 0 0 0
0 1 0 0 1 1
1 0 1 1 1 1
1 1 0 0 1 1
40
Cancelling the Cancellation Law
• In classical arithmetic, we have the cancellation
law: For any , we have . Unfortunately, this does
not hold on switching algebra: Letting and , we
have . But obviously, cannot be cancelled out.
• For switching algebra we have the revised
cancellation law: If and , then can be cancelled
out, and we obtain .

41
Proving the Cancellation Law
• Prove that if and , then .
• . QED.
• Prove that if and , then .
• [Exercise] Your turn!
• Duality principle also applies.

42
Switching Expressions
• A switching expression is:
1. A switching variable or constants 0 or 1, or
2. Letting and be any switching expressions, then , ,
or are also switch expressions.
3. All other combinations are not switching
expressions.
• The value of a switching expression can be
(very intuitively) obtained by replacing the
values of all involving variables , which have
possible combinations in total.
43
Switching Functions
• A switching function is a relationship from
the value combinations of its variables to ,
that is, .
• [Exercise] How many possible -variable
switching functions are there? .
• NOT, AND, and OR functions can also be
applied on switching functions, like Boolean
variables.
44
Uniqueness of Truth Table
• Although there might be multiple algebraic
representations of a switching function, a
specific truth table represents only one,
unique switching function.
• Thus, if multiple switching functions (with
different algebraic representations) have
the same truth table, they are essentially
the same switching function.
45
Other Operators in
Switching Algebra
• AND, OR, and NOT can represent any
switching functions, by definition.
• There are also other (somewhat redundant)
operators:
• NAND
• NOR
• XOR ( 是否不同 ?)
• XNOR ( 是否相同 ?)

46
Peculiarities of NAND and NOR
• NAND and NOR operators can apply
commutative property, but not associative
property.
• [Exercise] Verify using DeMorgan’s Theorems.
• However, we can regard NAND as AND-before-
NOT, and NOR as OR-before-NOT, so as to
extend NAND and NOR to 3+ operands:

47
XOR and XNOR
• XOR and XNOR have commutative
property and associative property, just like
AND, OR, and NOT.
• Prove by truth table yourself!
• In the case with over 2 operands,
• Even number of operands: XOR and XNOR
are complements.
• Odd number of operands: NOR and XNOR
are the same.
• [Discussion] Why?
48
Some Definitions
• Bitwise AND: &
• Bitwise OR: |
• Bitwise NOT:
• Bitwise XOR: ( in C++)
• Equivalence:
• Unsigned shift: (with zero-padding)
• Signed shift: (with sign-bit padding)

49
Zero vs. Sign Padding
• Left shifts
• Shifting an integer (w/ NBC) left, the
rightmost bits obviously need to fill with
bit-0s. Why?
• Left shift should have the semantic of
“timed by 2.”
• Example: , .

50
Zero vs. Sign Padding (Cont’d)
• Right shifts
• For unsigned integers (w/ NBC), right shifts should have the
semantic “(integral) divided by 2.”
• , Should it be , or ?
• , while . The latter is obviously wrong. So, right shifts should have
the MSBs padded with bit-0s for all the time.
• For signed integers (w/ NBC and 2’s complement), likewise,
right shifts should have the semantic “(integral) divided by 2.”
• , Should it be , or ?
• , which is wrong. , which is correct. So, right shifts should have the
MSBs padded with the sign bits.
• [Exercise] Verify the positive signed integer case yourself!
51
Review: 1’s & 2’s Complements

https://www3.ntu.edu.sg/home/ehchua/programming/java/images/DataRep_TwoComplement.png 52
Basic Operations I
• Turn the rightmost bit-1 off:
• Suppose (1 個 bit-1 ,後接 0 個以上的結尾 bit-0s),
then , so .
• In the other case, , then (0 和任何人 AND 都是 0).
• Application: Use to determine whether an integer
is the power of 2:
bool isPowerOf2( unsigned int x ) {
return !(x&(x-1)); // 3 cycles
}
53
Basic Operations I (Cont’d)
• Turn the rightmost bit-0 on:
• Turn a continuous chunk of bit-1s at the end of a
number off:
• For example, if , then , and .
• If there is no 1s at the end, the result is 0.
• Turn a continuous chunk of bit-0s at the end of a
number on:

54
Basic Operations II
• : 把最右邊往左看遇到的第一個 bit-0 所
在位置設成 1 ,其餘所有 bits 則清 0
• Example:
• Discussion: Why?
• The parenthesis is not needed:
• : ________ (exercise)

55
Basic Operations II (Cont’d)
• or or
• 所有最右邊的整坨 0 變成 1 ,其他的則為
0 ;如果輸入全部都是 0 ,則輸出也是全 0 。
• Why? 使得從 ??...?100…0 變成 ??...?011…
1 ,而 NOT 則是 ?’?’…?’011…1 。
• ??...? 部分翻轉後 AND 會變成 0 。
• 從右往左看最右邊的 bit-1 經翻轉後會變成 bit-
0 ,,所以此位也會輸出 0 。
• 只有最低位的 bit-0s 會在翻轉或在減 1 後變成
1 ,因而 AND 出來輸出是 1 。
56
Basic Operations II (Cont’d)
• : ________ (exercise)
• : isolate the rightmost bit-1, leaving all other
bits being 0.
•,
• Why? 尾巴的在 NOT 後會變成,後會變成。所
以只有最右的 1 會保存下來,成為整個數字中唯
一的 1 。
• [Exercise] What is the output if ?
• : ________ (exercise)
• : ________ (exercise)
57
Lemmas from DeMorgan’s Theorems
• ! can be applied to each bit of a number, thus:

• (Why?)


(property of 2’s complement)
• and
(XOR’s definition)
• and (Why?)
58
Why Does ?
• By 2’s complement’s definition, is the
representation of .
• By 2’s complement’s definition, .
• So we know that .
• Moving the “” from LHS to RHS yields .

*LHS = left-hand side; RHS=right-hand side. 59


Why Does ?
• By 2’s complement’s definition, is the
representation of .
• So .

*LHS = left-hand side; RHS=right-hand side. 60


Parallizability Theorem
• A mapping from one bit stream (“source”) to
another (“result”) can be computed in parallel iff
each bit in the result can be obtained by just the
bits at the corresponding position or any of the
lower bits in the source.

Sources

Result …
61
More on Addition/Subtraction
• (from 2’s complement’s definition)

http://img.ubiaoqing.com/u/
589122bcce9c1d775343720c01ed7f79.jpg

Works for number


computation in real
machines, not switching
algebra…

62
Average of Integers
• The average of integers and (rounded down) is

• The average of integers and (rounded up) is

• These formulas take only 4 cycles, and never


trigger overflows! Why do they work?
• Of course, if the carry bit can be pushed back
automatically by the CPU, is even more efficient!

63
Average of Integers (Cont’d)
• Why does make sense?
• 考量 , 中對應的 bits .
• 如果在作,結果的對應 bit .
• 如果,那麼本 bit 的平均結果不影響的對應 bit ,
只會影響低一位的對應 bit ,所以要右移一位。
• Why does the other formula work?

64
Rounding Manners
• Rounding for integer computation can be
done in four manners:
• Rounding down removes anything less than
1. This is the default behavior.
• Rounding up treats anything less than 1 as
1.
• Rounding toward 0. This is called
“truncated average.”
• Rounding toward (away from 0)
65
Truncated Average
(Average Rounded toward 0)

• A truncated average is the average


rounded toward 0, despite the average to be
positive or negative.
• A truncated average of ints and can be
computed as follows:
// Assume that and are signed ints

66
Truncated Average (Cont’d)
• The basic idea of the previous algorithm is to compute
the round-down average and fix the LSB bit for the
negative case, in which round-down round-toward-0.

只有和的最低位會有效…

Extract the sign Conditionally


process actions w/o
bit from . using branching. GJ!
• [HW] Use C or C++ to implement the following
function for truncated average:
int truncAvg( int x, int y );
67
Sign Function
• The sign function of int is defined as:
, if
sign(x)= 0, if
1, if
• The sign function of int can be obtained by .
(Assumed 32-bit integers.)
• [Discussion] Why does this work?
• 如果 , , and . 故結果為。
• 反過來說,, , and . 故結果為。
• 如果,怎麼算都是 0 。故結果為 . 68
Sign Function (Cont’d)
• On some CPUs or programming languages, there
is no support for signed shifts .  cannot be used.
• We can use the followings instead:
• (4 cycles)
• (3 cycles)
• (3 cycles)
• How about ?
• This works except for . Why? Because there is no
corresponding representation for .
69
Three-valued Compare Function
, if
cmp(x, y)= 0,if
1,if
• The three-valued compare function can also be
implemented by the sign function. But how?
• If comparison predicates are available, we can
use:

70
Exchanging Registers w/o Temporary Storage

• To exchange and , we can use , , then , where is


a temporary variable.
• To avoid using temporary variables, we use , , .
• [Discussion] Why does this work?
• [Exercise] Does this work? , , . No, due to
potential overflow.

71
Adjust to Known Power of 2
• Suppose we want to adjust the value of
unsigned int down to a known
power of 2, say, ().
• Use or for this.
• Why? (there’re zeros at the LSBs).
• How about adjust up the value?
• Use or
for this. 72
Adjust to Known Power of 2 (Cont’d)
• How to adjust to nearest power of 2
toward 0?
• Combine the above two formulas:
;
return ;

If , . So .
Otherwise, .

73
flp2 and clp2
• Define flp2() and clp2() as the function that
adjust down/up to the nearest power of 2:

74
flp2 Implementations
• Branched implementations of flp2:
 Method 1  Method 2
y = 0x80000000; do{
while(y>x){ y = x;
y>>=1; x = x&(x-1);
}
}
while(x!=0);
return y; return y;
讓由最大可能值逐漸無 把的最低位依次關掉直
號右移 (/2) ,直到低於。 到只剩下一個 bit-1 ( 也
就是最高位的 bit-1) 。
75
flp2 Implementations (Cont’d)
• A pseudo-branchless implementation of flp2:
 Method 3

76
flp2 Implementations (Cont’d)
• Why does method 3 work?

Propagate the highest


bit-1 in downward
for one bit.
Considering the
highest bit only, we
have 00…01??...?
After Line 3 we have
00…011?...?
77
flp2 Implementations (Cont’d)

Before Line 4 we have


00…011???...?
After Line 4 we have
00…01111?...?
So, after Lines 4–8…

Aha!

https://i.ytimg.com/vi/hdwa-WKDUnw/hqdefault.jpg

78
flp2 Implementations (Cont’d)
• We may also compose a proof like this:
• Because there are only | but not &, a bit-1 in may never be
changed to bit-0. So we consider bit-0s in .
• In Line 3, a bit-0 can remain if its adjacent higher bit is also bit-
0. In other words, there are two consecutive bit-0s “??...?
00??...?” in .
• Likewise, in Lines 4, except for that there are 3 consecutive bit-
0s immediately before a bit , becomes 1, and vice versa.
• So, in Line 7, all 32 bits must be 0 to prevent the LSB from
becoming 1. This means that, if in the beginning, all bits lower
than the highest bit-1 in are set after Line 7.
• Line 8 then clears all but the highest bit-1s to make the result a
power of 2. 79
clp2 Implementations
• A pseudo-branchless implementation of clp2:
unsigned clp2(unsigned x) {
x = x − 1;
x = x | (x >> 1);
x = x | (x >> 2);
x = x | (x >> 4);
x = x | (x >> 8);
x = x | (x >> 16);
return x + 1;
}
• [Exercise] Why is it correct?
80
flp2 ↔ clp2
• We may compute flp2 or clp2 from the
other, which is faster:
• clp2() = flp2(), if
= flp2(), if .
• flp2() = clp2(), if
= clp2(), if .
• [Remark] We may compute flp2 and clp2
from nlz (number of leading zeros)
operations; how? 81
Boundary Check
• In array implementations, we often need to check
if , where are the legal boundary of the array.
• Consider the case int myArr[10]. Assume that
the legal range of subscripts is [1, 10].
• We can check whether to check if is a legal
subscript for myArr. (‘’ treats the operands as
unsigned ints and performs the comparison.)
• In general, if , the two checks and are
equivalent. Why?
82
Count the Number of Bit-1s
• We want to calculate the number of bit-1s in an 32-
bit integer . This is useful for bitmaps, a common
data structure in memory or storage management.
• We can use divide-and-conquer to partition the
problem into smaller scales:
x = (x & 0x55555555) + (x1 & 0x55555555);
x = (x & 0x33333333) + (x2 & 0x33333333);
x = (x & 0x0F0F0F0F) + (x4 & 0x0F0F0F0F);
x = (x & 0x00FF00FF) + (x8 & 0x00FF00FF);
x = (x & 0x0000FFFF) + (x16 & 0x0000FFFF);

• This is called pop operation (population count, 種


群計數 ). 83
Implementations of pop
• The code: x = (x&0x55555555) + ((x>>1)&0x55555555);
x = (x&0x33333333) + ((x>>2)&0x33333333);
x = (x&0x0F0F0F0F) + ((x>>4)&0x0F0F0F0F);
x = (x&0x00FF00FF) + ((x>>8)&0x00FF00FF);
x = (x&0x0000FFFF) +
((x>>16)&0x0000FFFF);
can be int pop(unsigned x) {
optimized as: x = x - ((x>>1)&0x55555555);
x = (x&0x33333333)+((x>>2)&0x33333333);
x = (x+(x>>4)) & 0x0F0F0F0F;
x = x + (x>>8);
x = x + (x>>16);
return x & 0x0000003F;
}

• [Discussion] Why?
84
Implementations of pop (Cont’d)
• Based on our old trick , we have the 2nd version
of pop, as follows:
int pop(unsigned int x){
int n=0;
while(x!=0){
n++;
x=x&(x-1);
}
return n;
}
85
Table Lookup Version of pop
int pop(unsigned x) {
static char table[256] = {
0, 1, 1, 2, 1, 2, 2, 3,
1, 2, 2, 3, 2, 3, 3, 4,
...
4, 5, 5, 6, 5, 6, 6, 7,
5, 6, 6, 7, 6, 7, 7, 8};
return table[x & 0xFF] +
table[(x >> 8) & 0xFF] +
table[(x >> 16) & 0xFF] +
table[(x >> 24)];
}
86
Application of pop
• pop can be used to compute the Hamming
distance of two integers :
pop().

87
Compare pop of Two Integers
• We can compare the population counts of two
integers and without really computing pop()
and pop(), based on :
int popCmpr(unsigned xp, unsigned yp) {
unsigned x, y;
x = xp & ~yp; // Clear bits where
y = yp & ~xp; // both are 1.
while (1) {
if (x == 0) return y | -y;
if (y == 0) return 1;
x = x & (x - 1); // Clear a bit
y = y & (y - 1); // from each.
}}
88
Parity of an Integer
• The parity of an integer is defined as whether
the number of bit-1s in is even or odd; if there
are even bit-1s, the parity is 0, else it is 1.
• Of course we can compute pop() and get the
LSB of the result. However, this is slow. The
following would be faster:
y = x^(x>>1);
y = y^(x>>2);
y = y^(x>>4);
y = y^(x>>8);
y = y^(x>>16);
89
Parity of an Integer (Cont’d)
• Line 1 counts the parity of bit-1s in every two adjacent
bits in the 2nd bit.
• Likely, Line 2 counts the parity of bit-1s in every four
adjacent bits in the 4th bit, and so on.
• After Line 5, the LSB (i.e., 32nd bit) of the 32-bit
integer contains the parity of the whole number.
• This can be used for parity check (XOR).
y = x^(x>>1);
y = y^(x>>2);
y = y^(x>>4);
y = y^(x>>8);
y = y^(x>>16);
90
Count Leading Bit-0s
• Brute-force, pseudo-branchless method:
if(!x) return 32;
n=0;
if(x<=0x0000FFFF) {n+=16;
x<<=16;}
if(x<=0x00FFFFFF) {n+=8; x<<=8;}
if(x<=0x0FFFFFFF) {n+=4; x<<=4;}
if(x<=0x3FFFFFFF) {n+=2; x<<=2;}
if(x<=0x7FFFFFFF) {n+=1; x<<=1;}
Unnecessary!
return n; 91
Count Leading Bit-0s (Cont’d)
• We can perform pseudo-branchless computation
for this w/o using large constants:
int nlz(unsigned int x){
int n;
if(x==0) return 32;
n=0;
if((x>>16)==0) {n+=16; x<<=16;}
if((x>>24)==0) {n+=8; x<<=8;}
if((x>>28)==0) {n+=4; x<<=4;}
if((x>>30)==0) {n+=2; x<<=2;}
if((x>>31)==0) {n+=1; x<<=1;}
return n;
} Unnecessary!
92
Count Leading Bit-0s (Cont’d)
• We can perform pseudo-branchless computation
for this:
int nlz(unsigned int x){
int n;
if(x==0) return 32;
n=1;
if((x>>16)==0) {n+=16; x<<=16;}
if((x>>24)==0) {n+=8; x<<=8;}
if((x>>28)==0) {n+=4; x<<=4;}
if((x>>30)==0) {n+=2; x<<=2;}
n=n-(x>>31);
return n;
}
93
Count Leading Bit-0s (Cont’d)
• Reversed version of the previous algorithm:
int nlz(unsigned int x){
unsigned int y;
int n;
n=32;
y=x>>16; if(y!=0){x-=16; x=y;}
y=x>>8; if(y!=0){x-=8; x=y;}
y=x>>4; if(y!=0){x-=4; x=y;}
y=x>>2; if(y!=0){x-=2; x=y;}
y=x>>1; if(y!=0){x-=1; x=y;}
return n-x;
} No longer to compute
then here;
94
Count Leading Bit-0s (Cont’d)
• Some “optimization”:
int nlz(unsigned int x){
unsigned int y;
int n;
n=32;
y=x>>16; if(y!=0){x-=16; x=y;}
y=x>>8; if(y!=0){x-=8; x=y;}
y=x>>4; if(y!=0){x-=4; x=y;}
y=x>>2; if(y!=0){x-=2; x=y;}
y=x>>1; if(y!=0) return n-2;
return n-x;
}
• [Exercise] Can you rewrite the function with iterations like for,
while, or do…while? 95
Count Leading Bit-0s (Cont’d)
• Clearer but slower version with loop:
int nlz(unsigned x) {
unsigned y;
int n, c;
n = 32;
c = 16;
do {
y = x >> c; if (y != 0) {
n = n - c; x = y;
}
c = c >> 1;
} while (c != 0);
return n - x;
}
96
Count Leading Bit-0s (Cont’d)
• Tabular method: static char t[256]={
0,
1,
2,2,
3,3,3,3,
4,4,4,4,4,4,4,4,
8,8,…,8};

A tradeoff int nlz(unsigned int x){


between time and int n;
if(!x) return 32;
space! n=1;
if((x>>16)==0) {n+=16; x<<=16;}
if((x>>24)==0) {n+=8; x<<=8;}
return n-t[x];
}
97
Count Leading Bit-0s (Cont’d)
• Use pop(x) to implement nlz(x):

int nlz(unsigned int x){


x = x | (x>>1);
x = x | (x>>2);
x = x | (x>>4);
x = x | (x>>8);
x = x | (x>>16); is the bitwise not (1’s
return pop(~x); complement) of .
}

98
Compare Leading Bit-0s
• We can compare the number of leading
zeros of two integers without actually
computing them:
• nlz()=nlz(), iff
• nlz()nlz(), iff
• nlz()nlz(), iff

99
Relation with lg

• nlz() can be used to compute


“integer” log:
• nlz
• nlz

• [Discussion] Why?
100
Count Trailing Bit-0s
• Method 1: Use nlz: 32-nlz(!x & (x-1))
• !x & (x-1): 所有最右邊的整坨 0 變成 1 ,其他的全部清成 0
• Method 2: Use pop: pop(!x & (x-1))
• Method 3: 之前方法的反序版:
int ntz( unsigned int x ){
int n=1;
if(x==0) return 32;
if((x&0x0000FFFF) == 0){ n+=16; x>>=16; }
if((x&0x000000FF) == 0){ n+=8; x>>=8; }
if((x&0x0000000F) == 0){ n+=4; x>>=4; }
if((x&0x00000003) == 0){ n+=2; x>>=2; }
return n-(x&1);
}
101
Count Trailing Bit-0s (Cont’d)
• Method 4:
int ntz( char x ){
if(x&15){
if(x&3){
這個方法採用暴力法,
if(x&1) return 0; 以 “和小常數作
else return 1; AND” 的方式窮舉所
} 有可能的 ”尾巴
else if(x&4) return 2;
else return 3;
trailing zeros 的個數
} ”,並且分 case 暴
else if(x&0x30){ 力討論處理。
if(x&0x10) return 4;
else return 5;
}
else if(x&0x40) return 6;
else if(x) return 7;
else return 8;
}
102
Gaudent’s Algorithm

int ntz(unsigned x) {
unsigned y, bz, b4, b3, b2, b1, b0;
y = x & -x; // Isolate rightmost 1-bit.
bz = y ? 0 : 1; // 1 if y = 0.
b4 = (y & 0x0000FFFF) ? 0 : 16;
b3 = (y & 0x00FF00FF) ? 0 : 8;
Can run in
b2 = (y & 0x0F0F0F0F) ? 0 : 4;
parallel!
b1 = (y & 0x33333333) ? 0 : 2;
b0 = (y & 0x55555555) ? 0 : 1;
return bz + b4 + b3 + b2 + bl + b0;
}

103
Seal’s Algorithm

int ntz(unsigned x) {
static char table[64] = {
32, 0, 1,12, 2, 6, u,13, 3, u, 7, u, u, u, u,14,
10, 4, u, u, 8, u, u,25, u, u, u, u, u,21,27,15,
31,11, 5, u, u, u, u, u, 9, u, u,24, u, u,20,26,
30, u, u, u, u,23, u,19, 29, u,22,18,28,17,16,u };
x = (x & -x)*0x0450FBAF;
return table[x >> 26]; A ‘u’ means an
} unused entry.

104
Seal’s Algorithm (Cont’d)
• 0x0450FBAF is a magic number because
0x0450FBAF 17 65 65535.
• can be obtained by .
• can be obtained by .
• So x = (x & -x)*0x0450FBAF; can be computed
in 9 instructions, including the assignment.
• With this table size (64 entries), 0x0450FBAF is
the best choice (because multiplying it requires
the least cycles).
105
Dubé’s Algorithm
int ntz(unsigned x) {
static char table[32] =
{ 0, 1, 2, 24, 3, 19, 6, 25,
22, 4, 20, 10, 16, 7, 12, 26,
31, 23, 18, 5, 21, 9, 15, 11,
30, 17, 8, 14, 29, 13, 28, 27};
if (x == 0) return 32;
x = (x & -x)*0x04D7651F;
return table[x >> 27];
}
106
Dubé’s Algorithm (Cont’d)
• Why choose 0x04D7651F?
• 0x04D7651F = (204752561) 31
• Dube’s algorithm is slower but takes smaller
tables.

107
Find Leading All-zero Bytes
int zbytel( unsigned int x ){
if((x>>24)==0) return 0;
if((x & 0x00FF0000)==0) return
1;
if((x & 0x0000FF00)==0) return
2;
if((x & 0x000000FF)==0) return
3;
else return 4;
• [Discussion] Why does this work?
}
108
Find the First Substring with
+
Consecutive Bit-1s
int ffstrl( unsigned x, int n ){
int k, p=0 /*position*/;
while(x!=0){
k=nlz(x);
x<<=k;
p+=k;
k=nlz(!x);
if(k>=n) return p;
x<<=k;
p+=k;
}
}
• [Discussion] Why does this work? 109
Find the First Substring with
+
Consecutive Bit-1s (Cont’d)
int ffstrl( unsigned int x, int n ){
int s;
while(n>1){
s=n>>1;
x=x&(x<<s);
n=n-s;
}
return nlz(x);
}
• [Discussion] Why does this work? 110
Find the First Substring with
Longest Consecutive Bit-1s
int maxstrl( unsigned int x ){
int k;
for(k=0; x!=0; k++) x=x&(x<<1);
return k;
}

• [Discussion] Why does this work?

111
Switching Between Two Values
• Suppose that the variable could be only one of the two
values and . It is common to write
if(x==a){x=b;} else{x=a;}
or
x=(x==a)?b:a;
• This would be better by reducing the number of branch
instructions:
x=a+b-x; or
x=abx;
• [Discussion] Why do these work?
112
Functional Complete
• A set of logical connectives or Boolean
operators is called functional complete, iff all
possible truth tables can be obtained by
combining members of the set.
• For example, {AND, NOT}, {OR, NOT},
{AND, OR, NOT} , {NAND}, {NOR} are all
functional complete sets.
• and are called singleton sets because they
contain only one member.
113
Functional Complete (Cont’d)
• We can use the members of a set to
“implement” a functional complete set to
prove that is also functional complete.
• For example,
• {AND &, NOT ’}. . QED.
• {NAND }. . . QED.
• [Exercise] Prove that NOR is functional
complete.
114
Thank You!
Any Questions?
https://i2.kknews.cc/large/142200021390de6462ae
233717 微算機原理及應用 Lecture 5:
Multiply and Divide

Po-Chun Huang ( 黃柏鈞 )


Department of Electronic Engineering,
National Taipei University of Technology
https://i3.read01.com/uploads/0B3NYR08.jpg
Multiply by Constants
• Multiplication by constants is easy.
• For example, multiplying integer by 7 can be
done like this: .

117
Unsigned Multiplication
void mulmns(unsigned short w[], unsigned short u[],
unsigned short v[], int m, int n) {
unsigned int k, t, b;
int i, j;
for (i = 0; i < m; i++) { w[i] = 0; }
for (j = 0; j < n; j++) {
k = 0;
for (i = 0; i < m; i++) {
t = u[i] * v[j] + w[i+j] + k;
w[i + j] = t; // (I.e., t & 0xFFFF).
k = t >> 16;
}
w[j + m] = k;
}
}
118
Signed Multiplication
• Signed multiplication can be done
(inefficiently) by performing negation and
multiplication. After the multiplication we
just restore the sign.

119
3 types of Signed Division

向 0 取整 餘數非負式 向下取整
Reminder ( 模除法 )

Classical definition
of integer division.

120
Theorems on Floor &
Ceiling Functions

121
Conversion between
Ceiling and Floor

122
Cancelling Nested
Ceiling and Floor

123
“Error” Theorems
• We’ll use this for quick division later.

and

124
Long and Short Divisions
• Short division:
• Long division:

125
Basic Concepts for
Multiply and Division

• Multiplying or dividing a number by a


2’s power is generally simple; just
shift left and right by bits
respectively.
• How do we perform a divide operation
on a number not being 2’s power?

126
Division by 3
• Division by 3 can be done like this:
li M,0x55555556 Load magic number, (2**32+2)/3.
mulhs q,M,n q = floor(M*n/2**32).
shri t,n,31 Add 1 to q if
add q,q,t n is negative.
muli t,q,3 Compute remainder from
sub r,n,t r = n - q*3.

• Why?

• By we know that does


not affect the result of division.
127
Division by 5
• Division by 3 can be done like this:
li M,0x66666667 Load magic number, (2**33+3)/5.
mulhs q,M,n q = floor(M*n/2**32).
shrsi q,q,1
shri t,n,31 Add 1 to q if
add q,q,t n is negative.
muli t,q,5 Compute remainder from
sub r,n,t r = n - q*5.

128
Thank You!
Any Questions?
https://i2.kknews.cc/large/142200021390de6462ae

You might also like