Logic Hacks For The Optimization of Term
Logic Hacks For The Optimization of Term
Optimization of
Terminal Devices
Po-Chun Huang ( 黃柏鈞 )
Department of Electronic Engineering,
National Taipei University of Technology
Disclaimer
• The copyright and intellectual properties of all materials, except
for those produced by the lecturer (Po-Chun Huang), belong to
the original authors of the sources.
• A majority of the materials in this slides are revised from Small
Memory Software: Patterns for systems with limited memory by
Charles Weir and James Noble, Addison-Wesley Professional,
2000.
2
Agenda
• Basic operations
• Intermediate arithmetic
• Multiplication & division
• Next steps
3
Reference Book
• Henry S. Warren Jr., Hacker's Delight, 2nd
Edition, Addison-Wesley, 2002.
https://www.specsavers.co.uk/sites/default/files/styles/ssw_left_right_image_565x278/public/material-men-3-v2_0.jpg?itok=3-Gvjw_M
4
What’s Horrible?
• How to do the following with the fastest steps in
C or assembly languages?
• Double a number or half a number.
• Round up an integer to the next power of 2.
• Round down an integer to the previous power of 2.
• Compute the mean of two integers, rounded down.
• Compute the mean of two integers, rounded up.
• We will cover these tomorrow.
5
We’ve seen how numbers are
represented in a modern fashion.
How about texts?
6
Alphanumeric Codes
• Extended Binary Coded Decimal Interchange Code (EBCDIC)
• Used by early IBM mainframe computers.
• Drawback: English letters in EBCDIC are not consecutive in
numerical values.
• American Standard Code for Information Interchange (ASCII)
• Current standard for encoding characters.
• Limitation: ASCII only supports English letters and basic punctuation
marks. No supports for Chinese letters.
• Unicode
• Latest standard that supports almost all languages.
• Issue: Many existing programming libraries still do not provide the
native support for Unicode… 7
ASCII Code
https://cdn.shopify.com/s/files/1/1014/5789/files/Standard-ASCII-Table_large.jpg?10669400161723642407 8
Some Details
• ASCII code covers printable characters and
non-printable characters.
• Printable characters are visible on screen, such as
‘a’ to ‘z’, ‘A’ to ‘Z’, and ‘0’ to ‘9’.
• Non-printable characters, a.k.a., control characters,
have special purposes, e.g.,
• Carriage return (CR) and line feed (LF) for line endings
• Bell (BEL) for triggering the beeper
• Acknowledge (ACK) and synchronous idle (SYN) for
network communication
• Null (NUL) for string endings
• Horizontal tab (HT) and vertical tab (VT) for positioning
9
Design Ideas of ASCII
• Digits ‘0’–‘9’ are arranged from to .
A character can be converted to its numerical
value by subtracting the char value by 48.
• ‘A’ to ‘Z’ are also continuous from , while ‘a’ to
‘z’ are from .
The capital of a lower-case letter can be
obtained by subtracting 32, and vice versa.
• [Discussion] Can we do even better? Yes! We can do
“ANI 10111111,” and “ORI 01000000,”
respectively.
• These are even faster than subtraction or addition.
10
Design Ideas of ASCII (Cont’d)
• Why doesn’t ASCII use escaped sequences to reduce the code
length of each character?
• Escape sequences are not reliable. Why?
• Consider a string with one character smudged on paper or cut from
paper. We may still recognize most of the contents.
Some redundancy must exist, but where?
• JPEG and some sound formats are resistant to minor corruptions. What
are the “corruptions” here?
• [Research problem] How about our data structures on main
memory?
• [Research problem] Consider more sophisticated file formats, such
as .odt or .mp3. How to make them “green” (consuming less energy
as being used)? 11
Design Tradeoff for Environments
• Why is EBCDIC still used even after ASCII is developed?
• Punch cards become physically vulnerable when there are too many holes on the
same row or column!
• [Discussion] Can you suggest a solution, or at least some characteristics of the viable
solution?
• Thus, instead of just reducing the code length, there are something more to think
in real-world applications!
• See http://opass.logdown.com/posts/1300438-the-story-of-auto-beverage-
machine-23 for an inspiring example!
A punch card
http://faculty.washington.edu/rjl/uwamath583s11/sphinx/notes/html/_images/punch-card.png 12
How Large Are the Books in ASCII?
• Marcel Proust, À la recherche du temps perdu
(Rememberance of Things Past, 追憶逝水年華 ) is
about 7.7 MB.
• F. Scott Fitzgerald, The Great Gatsby ( 大亨小傳 )
is about 300 KB.
• Margaret Mitchell, Gone with the Wind ( 飄 , 或譯
亂世佳人 ) is about 2.5 MB.
• We are able to create more information than ever!
What a pity, most of the information we created are
garbage…
13
Characteristics of Unicode
• Universality
• Efficiency
• Characters, not glyphs
• Semantics
• Plain text
14
Characteristics of Unicode (Cont’d)
• Logical order
• Unification
• Dynamic composition
• Stability
• Convertibility
15
Thank You!
Any Questions?
https://i2.kknews.cc/large/142200021390de6462ae
231958 數位邏輯設計 Lecture 3:
Boolean Algebra &
Switching Algebra
18
Huntington’s Axioms (Cont’d)
19
Observations
• Each of the above axioms is independent
and consistent with each other.
• So, we cannot prove or disprove them.
• [Discussion] Can we remove any axiom
from the systems? Do we need to add
anything such as the associative laws ?
20
Operator Precedence
• Operators have order in computation:
• ’
Complement
21
Duality Principle
• Suppose that holds (is true), then the dual
is defined as with all 0s and 1s exchanged,
and with all and exchanged: .
• Duality principle: holds iff holds.
(iff: if and only if)
22
Idempotent Laws
• For any , we have
• Proof:
• , quod erat demonstrandum (QED).
• can be proven by duality principle.
• [Exercise] Prove from scratch!
23
Boundedness Theorems
• For each ,
0 and 1 are the lower and
upper “bounds” in this
system.
• Proof:
• . QED.
• From duality principle we also have
• [Discussion] Prove from scratch!
24
Uniqueness of Complements
• In Boolean algebra, the complement of a number
is unique. That is, suppose that both and are
complements of , then .
• Proof.
• Suppose that both and are complements of , then
we have , , , and .
•.
•.
• So . The complement of any is unique. QED.
25
Complementation
• In Boolean algebra, and .
• Proof.
• , and vice versa. QED.
26
Involution
• For all , .
• Proof.
• From above, we know that , , , and .
• So, both and are complements of .
• Because the complement of a number is
unique, we have . QED.
27
Associativity Laws
29
Associativity Laws (Cont’d)
•Proof. .
• We have several possible proofs.
• By duality principle.
• By analogous argument as in the
previous page.
• By normal, “textbook” proofs on our
textbook.
• Try yourself!
30
DeMorgan’s Theorems
•For any , we have
31
DeMorgan’s Theorems (Cont’d)
• Proof.
• We start by confirming that is a
complement of :
• So is a complement of .
• Since complement is unique, we have .
32
DeMorgan’s Theorems (Cont’d)
• Proof.
• Try yourself!
• DeMorgan’s Theorems can be used on over
2 operands, such as:
33
Absorption Rules
•For any , we have
• Why? .
36
Some Exercises (Cont’d)
• Why? .
37
Switching Algebra
• Switching algebra is a Boolean algebra
system defined over the minimal set .
• The operator is named as “OR” (iff or )
• The operator is named as “AND” (iff and )
• The operator is named as “NOT” (iff )
38
Switching Algebra (Cont’d)
• Due to the limited value range of , a way for
proving switching algebra statements is to use
truth table to exhaustively list all possible
combinations of values.
• Example: Prove .
0 0 0 0 0
0 1 0 0 0
1 0 0 1 1
1 1 1 1 1
39
Proving by Perfect Induction
(i.e., the Truth Table)
• Prove .
• Method 1: algebraic proof. .
• Method 2: perfect induction.
0 0 1 0 0 0
0 1 0 0 1 1
1 0 1 1 1 1
1 1 0 0 1 1
40
Cancelling the Cancellation Law
• In classical arithmetic, we have the cancellation
law: For any , we have . Unfortunately, this does
not hold on switching algebra: Letting and , we
have . But obviously, cannot be cancelled out.
• For switching algebra we have the revised
cancellation law: If and , then can be cancelled
out, and we obtain .
41
Proving the Cancellation Law
• Prove that if and , then .
• . QED.
• Prove that if and , then .
• [Exercise] Your turn!
• Duality principle also applies.
42
Switching Expressions
• A switching expression is:
1. A switching variable or constants 0 or 1, or
2. Letting and be any switching expressions, then , ,
or are also switch expressions.
3. All other combinations are not switching
expressions.
• The value of a switching expression can be
(very intuitively) obtained by replacing the
values of all involving variables , which have
possible combinations in total.
43
Switching Functions
• A switching function is a relationship from
the value combinations of its variables to ,
that is, .
• [Exercise] How many possible -variable
switching functions are there? .
• NOT, AND, and OR functions can also be
applied on switching functions, like Boolean
variables.
44
Uniqueness of Truth Table
• Although there might be multiple algebraic
representations of a switching function, a
specific truth table represents only one,
unique switching function.
• Thus, if multiple switching functions (with
different algebraic representations) have
the same truth table, they are essentially
the same switching function.
45
Other Operators in
Switching Algebra
• AND, OR, and NOT can represent any
switching functions, by definition.
• There are also other (somewhat redundant)
operators:
• NAND
• NOR
• XOR ( 是否不同 ?)
• XNOR ( 是否相同 ?)
46
Peculiarities of NAND and NOR
• NAND and NOR operators can apply
commutative property, but not associative
property.
• [Exercise] Verify using DeMorgan’s Theorems.
• However, we can regard NAND as AND-before-
NOT, and NOR as OR-before-NOT, so as to
extend NAND and NOR to 3+ operands:
47
XOR and XNOR
• XOR and XNOR have commutative
property and associative property, just like
AND, OR, and NOT.
• Prove by truth table yourself!
• In the case with over 2 operands,
• Even number of operands: XOR and XNOR
are complements.
• Odd number of operands: NOR and XNOR
are the same.
• [Discussion] Why?
48
Some Definitions
• Bitwise AND: &
• Bitwise OR: |
• Bitwise NOT:
• Bitwise XOR: ( in C++)
• Equivalence:
• Unsigned shift: (with zero-padding)
• Signed shift: (with sign-bit padding)
49
Zero vs. Sign Padding
• Left shifts
• Shifting an integer (w/ NBC) left, the
rightmost bits obviously need to fill with
bit-0s. Why?
• Left shift should have the semantic of
“timed by 2.”
• Example: , .
50
Zero vs. Sign Padding (Cont’d)
• Right shifts
• For unsigned integers (w/ NBC), right shifts should have the
semantic “(integral) divided by 2.”
• , Should it be , or ?
• , while . The latter is obviously wrong. So, right shifts should have
the MSBs padded with bit-0s for all the time.
• For signed integers (w/ NBC and 2’s complement), likewise,
right shifts should have the semantic “(integral) divided by 2.”
• , Should it be , or ?
• , which is wrong. , which is correct. So, right shifts should have the
MSBs padded with the sign bits.
• [Exercise] Verify the positive signed integer case yourself!
51
Review: 1’s & 2’s Complements
https://www3.ntu.edu.sg/home/ehchua/programming/java/images/DataRep_TwoComplement.png 52
Basic Operations I
• Turn the rightmost bit-1 off:
• Suppose (1 個 bit-1 ,後接 0 個以上的結尾 bit-0s),
then , so .
• In the other case, , then (0 和任何人 AND 都是 0).
• Application: Use to determine whether an integer
is the power of 2:
bool isPowerOf2( unsigned int x ) {
return !(x&(x-1)); // 3 cycles
}
53
Basic Operations I (Cont’d)
• Turn the rightmost bit-0 on:
• Turn a continuous chunk of bit-1s at the end of a
number off:
• For example, if , then , and .
• If there is no 1s at the end, the result is 0.
• Turn a continuous chunk of bit-0s at the end of a
number on:
54
Basic Operations II
• : 把最右邊往左看遇到的第一個 bit-0 所
在位置設成 1 ,其餘所有 bits 則清 0
• Example:
• Discussion: Why?
• The parenthesis is not needed:
• : ________ (exercise)
55
Basic Operations II (Cont’d)
• or or
• 所有最右邊的整坨 0 變成 1 ,其他的則為
0 ;如果輸入全部都是 0 ,則輸出也是全 0 。
• Why? 使得從 ??...?100…0 變成 ??...?011…
1 ,而 NOT 則是 ?’?’…?’011…1 。
• ??...? 部分翻轉後 AND 會變成 0 。
• 從右往左看最右邊的 bit-1 經翻轉後會變成 bit-
0 ,,所以此位也會輸出 0 。
• 只有最低位的 bit-0s 會在翻轉或在減 1 後變成
1 ,因而 AND 出來輸出是 1 。
56
Basic Operations II (Cont’d)
• : ________ (exercise)
• : isolate the rightmost bit-1, leaving all other
bits being 0.
•,
• Why? 尾巴的在 NOT 後會變成,後會變成。所
以只有最右的 1 會保存下來,成為整個數字中唯
一的 1 。
• [Exercise] What is the output if ?
• : ________ (exercise)
• : ________ (exercise)
57
Lemmas from DeMorgan’s Theorems
• ! can be applied to each bit of a number, thus:
• (Why?)
•
(property of 2’s complement)
• and
(XOR’s definition)
• and (Why?)
58
Why Does ?
• By 2’s complement’s definition, is the
representation of .
• By 2’s complement’s definition, .
• So we know that .
• Moving the “” from LHS to RHS yields .
Result …
61
More on Addition/Subtraction
• (from 2’s complement’s definition)
http://img.ubiaoqing.com/u/
589122bcce9c1d775343720c01ed7f79.jpg
62
Average of Integers
• The average of integers and (rounded down) is
63
Average of Integers (Cont’d)
• Why does make sense?
• 考量 , 中對應的 bits .
• 如果在作,結果的對應 bit .
• 如果,那麼本 bit 的平均結果不影響的對應 bit ,
只會影響低一位的對應 bit ,所以要右移一位。
• Why does the other formula work?
64
Rounding Manners
• Rounding for integer computation can be
done in four manners:
• Rounding down removes anything less than
1. This is the default behavior.
• Rounding up treats anything less than 1 as
1.
• Rounding toward 0. This is called
“truncated average.”
• Rounding toward (away from 0)
65
Truncated Average
(Average Rounded toward 0)
66
Truncated Average (Cont’d)
• The basic idea of the previous algorithm is to compute
the round-down average and fix the LSB bit for the
negative case, in which round-down round-toward-0.
只有和的最低位會有效…
70
Exchanging Registers w/o Temporary Storage
71
Adjust to Known Power of 2
• Suppose we want to adjust the value of
unsigned int down to a known
power of 2, say, ().
• Use or for this.
• Why? (there’re zeros at the LSBs).
• How about adjust up the value?
• Use or
for this. 72
Adjust to Known Power of 2 (Cont’d)
• How to adjust to nearest power of 2
toward 0?
• Combine the above two formulas:
;
return ;
If , . So .
Otherwise, .
73
flp2 and clp2
• Define flp2() and clp2() as the function that
adjust down/up to the nearest power of 2:
74
flp2 Implementations
• Branched implementations of flp2:
Method 1 Method 2
y = 0x80000000; do{
while(y>x){ y = x;
y>>=1; x = x&(x-1);
}
}
while(x!=0);
return y; return y;
讓由最大可能值逐漸無 把的最低位依次關掉直
號右移 (/2) ,直到低於。 到只剩下一個 bit-1 ( 也
就是最高位的 bit-1) 。
75
flp2 Implementations (Cont’d)
• A pseudo-branchless implementation of flp2:
Method 3
76
flp2 Implementations (Cont’d)
• Why does method 3 work?
Aha!
https://i.ytimg.com/vi/hdwa-WKDUnw/hqdefault.jpg
78
flp2 Implementations (Cont’d)
• We may also compose a proof like this:
• Because there are only | but not &, a bit-1 in may never be
changed to bit-0. So we consider bit-0s in .
• In Line 3, a bit-0 can remain if its adjacent higher bit is also bit-
0. In other words, there are two consecutive bit-0s “??...?
00??...?” in .
• Likewise, in Lines 4, except for that there are 3 consecutive bit-
0s immediately before a bit , becomes 1, and vice versa.
• So, in Line 7, all 32 bits must be 0 to prevent the LSB from
becoming 1. This means that, if in the beginning, all bits lower
than the highest bit-1 in are set after Line 7.
• Line 8 then clears all but the highest bit-1s to make the result a
power of 2. 79
clp2 Implementations
• A pseudo-branchless implementation of clp2:
unsigned clp2(unsigned x) {
x = x − 1;
x = x | (x >> 1);
x = x | (x >> 2);
x = x | (x >> 4);
x = x | (x >> 8);
x = x | (x >> 16);
return x + 1;
}
• [Exercise] Why is it correct?
80
flp2 ↔ clp2
• We may compute flp2 or clp2 from the
other, which is faster:
• clp2() = flp2(), if
= flp2(), if .
• flp2() = clp2(), if
= clp2(), if .
• [Remark] We may compute flp2 and clp2
from nlz (number of leading zeros)
operations; how? 81
Boundary Check
• In array implementations, we often need to check
if , where are the legal boundary of the array.
• Consider the case int myArr[10]. Assume that
the legal range of subscripts is [1, 10].
• We can check whether to check if is a legal
subscript for myArr. (‘’ treats the operands as
unsigned ints and performs the comparison.)
• In general, if , the two checks and are
equivalent. Why?
82
Count the Number of Bit-1s
• We want to calculate the number of bit-1s in an 32-
bit integer . This is useful for bitmaps, a common
data structure in memory or storage management.
• We can use divide-and-conquer to partition the
problem into smaller scales:
x = (x & 0x55555555) + (x1 & 0x55555555);
x = (x & 0x33333333) + (x2 & 0x33333333);
x = (x & 0x0F0F0F0F) + (x4 & 0x0F0F0F0F);
x = (x & 0x00FF00FF) + (x8 & 0x00FF00FF);
x = (x & 0x0000FFFF) + (x16 & 0x0000FFFF);
• [Discussion] Why?
84
Implementations of pop (Cont’d)
• Based on our old trick , we have the 2nd version
of pop, as follows:
int pop(unsigned int x){
int n=0;
while(x!=0){
n++;
x=x&(x-1);
}
return n;
}
85
Table Lookup Version of pop
int pop(unsigned x) {
static char table[256] = {
0, 1, 1, 2, 1, 2, 2, 3,
1, 2, 2, 3, 2, 3, 3, 4,
...
4, 5, 5, 6, 5, 6, 6, 7,
5, 6, 6, 7, 6, 7, 7, 8};
return table[x & 0xFF] +
table[(x >> 8) & 0xFF] +
table[(x >> 16) & 0xFF] +
table[(x >> 24)];
}
86
Application of pop
• pop can be used to compute the Hamming
distance of two integers :
pop().
87
Compare pop of Two Integers
• We can compare the population counts of two
integers and without really computing pop()
and pop(), based on :
int popCmpr(unsigned xp, unsigned yp) {
unsigned x, y;
x = xp & ~yp; // Clear bits where
y = yp & ~xp; // both are 1.
while (1) {
if (x == 0) return y | -y;
if (y == 0) return 1;
x = x & (x - 1); // Clear a bit
y = y & (y - 1); // from each.
}}
88
Parity of an Integer
• The parity of an integer is defined as whether
the number of bit-1s in is even or odd; if there
are even bit-1s, the parity is 0, else it is 1.
• Of course we can compute pop() and get the
LSB of the result. However, this is slow. The
following would be faster:
y = x^(x>>1);
y = y^(x>>2);
y = y^(x>>4);
y = y^(x>>8);
y = y^(x>>16);
89
Parity of an Integer (Cont’d)
• Line 1 counts the parity of bit-1s in every two adjacent
bits in the 2nd bit.
• Likely, Line 2 counts the parity of bit-1s in every four
adjacent bits in the 4th bit, and so on.
• After Line 5, the LSB (i.e., 32nd bit) of the 32-bit
integer contains the parity of the whole number.
• This can be used for parity check (XOR).
y = x^(x>>1);
y = y^(x>>2);
y = y^(x>>4);
y = y^(x>>8);
y = y^(x>>16);
90
Count Leading Bit-0s
• Brute-force, pseudo-branchless method:
if(!x) return 32;
n=0;
if(x<=0x0000FFFF) {n+=16;
x<<=16;}
if(x<=0x00FFFFFF) {n+=8; x<<=8;}
if(x<=0x0FFFFFFF) {n+=4; x<<=4;}
if(x<=0x3FFFFFFF) {n+=2; x<<=2;}
if(x<=0x7FFFFFFF) {n+=1; x<<=1;}
Unnecessary!
return n; 91
Count Leading Bit-0s (Cont’d)
• We can perform pseudo-branchless computation
for this w/o using large constants:
int nlz(unsigned int x){
int n;
if(x==0) return 32;
n=0;
if((x>>16)==0) {n+=16; x<<=16;}
if((x>>24)==0) {n+=8; x<<=8;}
if((x>>28)==0) {n+=4; x<<=4;}
if((x>>30)==0) {n+=2; x<<=2;}
if((x>>31)==0) {n+=1; x<<=1;}
return n;
} Unnecessary!
92
Count Leading Bit-0s (Cont’d)
• We can perform pseudo-branchless computation
for this:
int nlz(unsigned int x){
int n;
if(x==0) return 32;
n=1;
if((x>>16)==0) {n+=16; x<<=16;}
if((x>>24)==0) {n+=8; x<<=8;}
if((x>>28)==0) {n+=4; x<<=4;}
if((x>>30)==0) {n+=2; x<<=2;}
n=n-(x>>31);
return n;
}
93
Count Leading Bit-0s (Cont’d)
• Reversed version of the previous algorithm:
int nlz(unsigned int x){
unsigned int y;
int n;
n=32;
y=x>>16; if(y!=0){x-=16; x=y;}
y=x>>8; if(y!=0){x-=8; x=y;}
y=x>>4; if(y!=0){x-=4; x=y;}
y=x>>2; if(y!=0){x-=2; x=y;}
y=x>>1; if(y!=0){x-=1; x=y;}
return n-x;
} No longer to compute
then here;
94
Count Leading Bit-0s (Cont’d)
• Some “optimization”:
int nlz(unsigned int x){
unsigned int y;
int n;
n=32;
y=x>>16; if(y!=0){x-=16; x=y;}
y=x>>8; if(y!=0){x-=8; x=y;}
y=x>>4; if(y!=0){x-=4; x=y;}
y=x>>2; if(y!=0){x-=2; x=y;}
y=x>>1; if(y!=0) return n-2;
return n-x;
}
• [Exercise] Can you rewrite the function with iterations like for,
while, or do…while? 95
Count Leading Bit-0s (Cont’d)
• Clearer but slower version with loop:
int nlz(unsigned x) {
unsigned y;
int n, c;
n = 32;
c = 16;
do {
y = x >> c; if (y != 0) {
n = n - c; x = y;
}
c = c >> 1;
} while (c != 0);
return n - x;
}
96
Count Leading Bit-0s (Cont’d)
• Tabular method: static char t[256]={
0,
1,
2,2,
3,3,3,3,
4,4,4,4,4,4,4,4,
8,8,…,8};
98
Compare Leading Bit-0s
• We can compare the number of leading
zeros of two integers without actually
computing them:
• nlz()=nlz(), iff
• nlz()nlz(), iff
• nlz()nlz(), iff
99
Relation with lg
• [Discussion] Why?
100
Count Trailing Bit-0s
• Method 1: Use nlz: 32-nlz(!x & (x-1))
• !x & (x-1): 所有最右邊的整坨 0 變成 1 ,其他的全部清成 0
• Method 2: Use pop: pop(!x & (x-1))
• Method 3: 之前方法的反序版:
int ntz( unsigned int x ){
int n=1;
if(x==0) return 32;
if((x&0x0000FFFF) == 0){ n+=16; x>>=16; }
if((x&0x000000FF) == 0){ n+=8; x>>=8; }
if((x&0x0000000F) == 0){ n+=4; x>>=4; }
if((x&0x00000003) == 0){ n+=2; x>>=2; }
return n-(x&1);
}
101
Count Trailing Bit-0s (Cont’d)
• Method 4:
int ntz( char x ){
if(x&15){
if(x&3){
這個方法採用暴力法,
if(x&1) return 0; 以 “和小常數作
else return 1; AND” 的方式窮舉所
} 有可能的 ”尾巴
else if(x&4) return 2;
else return 3;
trailing zeros 的個數
} ”,並且分 case 暴
else if(x&0x30){ 力討論處理。
if(x&0x10) return 4;
else return 5;
}
else if(x&0x40) return 6;
else if(x) return 7;
else return 8;
}
102
Gaudent’s Algorithm
int ntz(unsigned x) {
unsigned y, bz, b4, b3, b2, b1, b0;
y = x & -x; // Isolate rightmost 1-bit.
bz = y ? 0 : 1; // 1 if y = 0.
b4 = (y & 0x0000FFFF) ? 0 : 16;
b3 = (y & 0x00FF00FF) ? 0 : 8;
Can run in
b2 = (y & 0x0F0F0F0F) ? 0 : 4;
parallel!
b1 = (y & 0x33333333) ? 0 : 2;
b0 = (y & 0x55555555) ? 0 : 1;
return bz + b4 + b3 + b2 + bl + b0;
}
103
Seal’s Algorithm
int ntz(unsigned x) {
static char table[64] = {
32, 0, 1,12, 2, 6, u,13, 3, u, 7, u, u, u, u,14,
10, 4, u, u, 8, u, u,25, u, u, u, u, u,21,27,15,
31,11, 5, u, u, u, u, u, 9, u, u,24, u, u,20,26,
30, u, u, u, u,23, u,19, 29, u,22,18,28,17,16,u };
x = (x & -x)*0x0450FBAF;
return table[x >> 26]; A ‘u’ means an
} unused entry.
104
Seal’s Algorithm (Cont’d)
• 0x0450FBAF is a magic number because
0x0450FBAF 17 65 65535.
• can be obtained by .
• can be obtained by .
• So x = (x & -x)*0x0450FBAF; can be computed
in 9 instructions, including the assignment.
• With this table size (64 entries), 0x0450FBAF is
the best choice (because multiplying it requires
the least cycles).
105
Dubé’s Algorithm
int ntz(unsigned x) {
static char table[32] =
{ 0, 1, 2, 24, 3, 19, 6, 25,
22, 4, 20, 10, 16, 7, 12, 26,
31, 23, 18, 5, 21, 9, 15, 11,
30, 17, 8, 14, 29, 13, 28, 27};
if (x == 0) return 32;
x = (x & -x)*0x04D7651F;
return table[x >> 27];
}
106
Dubé’s Algorithm (Cont’d)
• Why choose 0x04D7651F?
• 0x04D7651F = (204752561) 31
• Dube’s algorithm is slower but takes smaller
tables.
107
Find Leading All-zero Bytes
int zbytel( unsigned int x ){
if((x>>24)==0) return 0;
if((x & 0x00FF0000)==0) return
1;
if((x & 0x0000FF00)==0) return
2;
if((x & 0x000000FF)==0) return
3;
else return 4;
• [Discussion] Why does this work?
}
108
Find the First Substring with
+
Consecutive Bit-1s
int ffstrl( unsigned x, int n ){
int k, p=0 /*position*/;
while(x!=0){
k=nlz(x);
x<<=k;
p+=k;
k=nlz(!x);
if(k>=n) return p;
x<<=k;
p+=k;
}
}
• [Discussion] Why does this work? 109
Find the First Substring with
+
Consecutive Bit-1s (Cont’d)
int ffstrl( unsigned int x, int n ){
int s;
while(n>1){
s=n>>1;
x=x&(x<<s);
n=n-s;
}
return nlz(x);
}
• [Discussion] Why does this work? 110
Find the First Substring with
Longest Consecutive Bit-1s
int maxstrl( unsigned int x ){
int k;
for(k=0; x!=0; k++) x=x&(x<<1);
return k;
}
111
Switching Between Two Values
• Suppose that the variable could be only one of the two
values and . It is common to write
if(x==a){x=b;} else{x=a;}
or
x=(x==a)?b:a;
• This would be better by reducing the number of branch
instructions:
x=a+b-x; or
x=abx;
• [Discussion] Why do these work?
112
Functional Complete
• A set of logical connectives or Boolean
operators is called functional complete, iff all
possible truth tables can be obtained by
combining members of the set.
• For example, {AND, NOT}, {OR, NOT},
{AND, OR, NOT} , {NAND}, {NOR} are all
functional complete sets.
• and are called singleton sets because they
contain only one member.
113
Functional Complete (Cont’d)
• We can use the members of a set to
“implement” a functional complete set to
prove that is also functional complete.
• For example,
• {AND &, NOT ’}. . QED.
• {NAND }. . . QED.
• [Exercise] Prove that NOR is functional
complete.
114
Thank You!
Any Questions?
https://i2.kknews.cc/large/142200021390de6462ae
233717 微算機原理及應用 Lecture 5:
Multiply and Divide
117
Unsigned Multiplication
void mulmns(unsigned short w[], unsigned short u[],
unsigned short v[], int m, int n) {
unsigned int k, t, b;
int i, j;
for (i = 0; i < m; i++) { w[i] = 0; }
for (j = 0; j < n; j++) {
k = 0;
for (i = 0; i < m; i++) {
t = u[i] * v[j] + w[i+j] + k;
w[i + j] = t; // (I.e., t & 0xFFFF).
k = t >> 16;
}
w[j + m] = k;
}
}
118
Signed Multiplication
• Signed multiplication can be done
(inefficiently) by performing negation and
multiplication. After the multiplication we
just restore the sign.
119
3 types of Signed Division
向 0 取整 餘數非負式 向下取整
Reminder ( 模除法 )
Classical definition
of integer division.
120
Theorems on Floor &
Ceiling Functions
121
Conversion between
Ceiling and Floor
122
Cancelling Nested
Ceiling and Floor
123
“Error” Theorems
• We’ll use this for quick division later.
and
124
Long and Short Divisions
• Short division:
• Long division:
125
Basic Concepts for
Multiply and Division
126
Division by 3
• Division by 3 can be done like this:
li M,0x55555556 Load magic number, (2**32+2)/3.
mulhs q,M,n q = floor(M*n/2**32).
shri t,n,31 Add 1 to q if
add q,q,t n is negative.
muli t,q,3 Compute remainder from
sub r,n,t r = n - q*3.
• Why?
128
Thank You!
Any Questions?
https://i2.kknews.cc/large/142200021390de6462ae