Floating Point To Fixed Point Conversion
Floating Point To Fixed Point Conversion
FloatingpointtoFixedpoint
conversion
FixedPointDesign
2
FixedPointDataTypes
In a digital hardware, numbers are stored in binary words. A binary word is a fixedlength
sequence of bits (1's and 0's). How hardware components or sofware functons interpret this
sequenceof1'sand0's isdenedbythedata type.Binarynumbersarerepresentedaseither
fixedpoint or floatingpoint data types. In order to implement an algorithm such as
communication algorithms, the algorithm should be converted to the fixedpoint domain and
thenitshouldbedescribedwithHardwareDescriptionLanguage(HDL).InHDLcodingprocess,
it is necessary to indicate the size of the variables and registers. The registers should be large
enoughtorepresentthevalueofparameterswiththedesiredprecision.
Fixedpointdatatypehelpsustoknowwhathappensinthehardware.Intheotherwords
when an algorithm is represented in floatingpoint domain, all of the variables have 64 bits(in
MATLABprogramming).Soalloftheoperationsaredonewithlargenumberofbits.Weknow
that it is impossible to implement an algorithm with large number of flip flops. Because large
number of flip flops need a larger area, and more power consumption. In order to solve this
problem the algorithm should be converted to the fixedpoint domain. In the fixedpoint
domainapair(W,F)isconsideredforeachoftheparametersinthealgorithm,whereWisthe
word length of the parameters and F is the fractional length of the parameters. It is obvious
that larger W and F results in a better performance and lower bit error rate (BER) but the
designneedsalargesiliconarea.OntheotherhandsmallerWandFresultinalargerBERbut
lessarea.Soweshouldchoosesuitablevaluesof(W,F)foreachparameterinthealgorithm.For
this reason a simulation should be ran for the algorithm to get the dynamic range of the
parameters. Simulation results indicate the dynamic rangeof the variables and the number of
bitsforWandF,whichareusedtorepresentthevariableswiththedesiredprecision.
According to the previous section, a fixedpoint data type is characterized by the word
length in bits, the position of the binary point, and whether it is signed or unsigned. The
positionofthebinarypointisthemeansbywhichfixedpointvaluesarescaledandinterpreted.
Forexample,abinaryrepresentationofageneralizedfixedpointnumber(eithersignedor
unsigned)isshownbelow:
0
b
1
b
2
b
3
b
1 wl
b
2 wl
b
FixedPointDesign
3
Where:
b
istheithbinarydigit
wlisthewordlengthinbits
b
wI-1
isthelocationofthemostsignificant,orhighest,bit(MSB)
b
0
isthelocationoftheleastsignificant,orlowest,bit(LSB).
The binary point is shown three places to the left of the LSB. In this example, therefore, the
numberissaidtohavethreefractionalbits,orafractionlengthofthree.
Fixedpointdatatypescanbeeithersignedorunsigned.Signedbinaryfixedpointnumbers
aretypicallyrepresentedinoneoftheseways:
Sign/magnitude
One'scomplement
Two'scomplement
Two's complement is the most common representation of signed fixedpoint numbers and is
theonlyrepresentationusedbyFixedPointToolboxinMATLAB.
Fixedpointnumberscanbeencodedaccordingtothefollowingscheme:
Rea| -ua|ue = 2
-ract|una|-|ength
xtured |nteger(1)
wherestorcJ intcgcris the raw binary number, in which the binary point assumed to be at
thefarrightoftheword.
Conversion of an algorithm from floatingpoint domain to fixedpoint domain can be done
throughtheMATLABfixedpointtoolbox.
FixedPoint Toolbox provides fixedpoint data types in MATLAB and enables algorithm
developmentbyprovidingfixedpointarithmetic.FixedPointToolboxenablesyoutocreatethe
followingtypesofobjects:
fi Defines a fixedpoint numeric object in the MATLAB workspace. Each fi object is
composedofvaluedata,afimathobject,andanumerictypeobject.
fimathGovernshowoverloadedarithmeticoperatorsworkwithfiobjects
fiprefDefinesthedisplay,logging,anddatatypeoverridepreferencesoffiobjects
numerictypeDefinesthedatatypeandscalingattributesoffiobjects
quantizerQuantizesdatasets
FixedPointDesign
4
Normallycomplicatedalgorithmshavemanyvariablessothenumberoffixedpointobjects
growssignificantly.Moreover,insomecasesalongtimesimulationisneededtoobtaintheBER
curves of the algorithm. In the above cases fixedpoint simulation with MATLAB fixedpoint
toolboxneedsalargeamountofmemory,time,andCPUusageandinmostofthecasesitwill
crash.
In order to solve the above problem a simple method for floatingpoint to fixedpoint
conversion is proposed in this tutorial. Simulation results with this method and simulation
results with the MATLAB fixedpoint toolbox are the same, but the simulation with the
proposed method is significantly faster than the other. For example one iteration of KBest
algorithm simulation with MATLAB fixedpoint toolbox, takes 237 seconds but simulation with
the proposed method, needs only 36 seconds. So in a longtime simulation for example 5000
iterationMATLABfixedpointtoolboxdoesntworkwell.
FloatingpointtoFixedpointconversion:
Inthispartasimplemethodforfloatingpointtofixedpointconversionwilldescribe.Then
we consider the various arithmetic operations and mention a lot of examples for them and
finallycomparetheirresultswiththeresultsofMATLABfixedpointtoolbox.
In order to convert a floatingpoint value to the corresponding fixedpoint vlaue use the
followingsteps.
Considerafloatingpointvariable,o :
Step 1: Calculate b = o 2
P
, where F is the fractional length of the variable. Note thatbis
representedindecimal.
Step 2:Roundthevalueofbtothenearestintegervalue.Forexample:
rounJ(S.S6) = 4
rounJ(-1.9) = -2
rounJ(-1.S) = -2
Step 3:Convertbfromdecimaltobinaryrepresentationandnamethenewvariablec.
Step 4: Now, we assume that c, needsnbits to represent the value ofbin binary. On the
otherhandweobtainthevaluesofWandF,fromthesimulation.SothevalueofWshouldbe
FixedPointDesign
5
equalorlargerthann.IfSmallvalueischosenforW,weshouldtruncatec.IfWislargerthan
n,(W n)zerobitsaddtotheleftmostofc.
Now consider the simulation is ran carefully and suitable values of (W,F) are obtained. It
means that W is equal or larger than n. So (W n) zero are added to leftmost of c. Then we
select F bits ofcfrom positon 0 to F1 as the fractional part of the fixedpoint variable.
Thereforetheconversionfromfloatingpointtofixedpointisfinishedbyfindingthepositionof
binary point inc.In order to verify the result, we can do the same conversion with MATLAB
fixedpoint toolbox. The results of both methods are the same, but the proposed method is
faster.BecauseinMATLABmethodweshouldcallalargenumberoffixedpointfunctionsand
fixedpointobjects,whicharetimeconsumingandtheyneedalargememory.
Inthefollowingsectionvariousexamplesarementionedfordifferentarithmeticoperation
such as addition, subtraction, multiplication, and norm. In each case the operation is done
throughthebothmethodsandshownthattheresultsarethesame.
Note:
In the following examples Method 1 shows the MATLAB fixedpoint toolbox and
Method 2showstheabovemethod.
The dot in the binary representation is used to separate the fractional part and the
integerpartofthevariable.Butitisntapartofthevariable.
Example 1)
This example shows that the value of (W,F) should choose carefully from the simulation
(accordingtothedynamicrangeofvariables).
Example 2)
This example shows the conversion of a floatingpoint value to fixedpoint value and then
find the corresponding binary value and finally shows the conversion of a binary value to
correspondingrealvalueby(1).
Method 1:
i (S.61S,1,1S,12) = S.61Sconverttobinarywithbin() u11.1uu111uu1111(w, F) = (1S,12)
(u111uu111uu1111)
b
= (14799)
d
converttodecimalby(1)14799 2
-12
= S.61S
Example 3)
Thisexampleshowsconversionofafloatingpointvaluetocorrespondingfixedpointvalue
intwomethods.Bothpositiveandnegativevaluesarecoveredinthisexample.
o = S.u1S,(w, F) = (8,S)
Method 1:
i (S.u1S,1,8,S) = S.uuconverttobinarywithbin()uuu11.uuu
Method 2:
Step1:b = o 2
P
= S.u1S 2
+3
= 24.1u4u
Step2: rounJ(24.1u4u) = 24
Step3:c = Jcc2bin(b) = 11uuu
Step4:c = uuu11.uuu
Inbothmethods:rcol :oluc = intcgcr :oluc 2
-P
FixedPointDesign
7
Example 4)
o = 9.S14S2,(w, F) = (12,7)
Method 1:
i (9.S14S2,1,12,7) = 9.S1S6converttobinarywithbin()u1uu1.1uuuu1u
Method 2:
Step1:b = o 2
P
= 9.S14S2 2
+7
= 1217.8S29
Step2: rounJ(1217.8S29) = 1218
Step3:c = Jcc2bin(b) = u1uu11uuuu1u
Step4:c = u1uu1.1uuuu1u
Example 5)
o = -9.uS14,(w, F) = (14,9)
Method 1:
i (-9.uS14 ,1,14,9) = -9.uSu8converttobinarywithbin()1u11u.1111uu11u
Method 2:
Step1:b = o 2
P
= -9.uS14 2
+9
= -46S4.S
Step2: rounJ(-46S4.S) = -46S4
Step3:c = Jcc2bin(b) = 1u11u1111uu11u
Step4:c = 1u11u.1111uu11u
FixedPointDesign
8
Example 6) Multiplication 1
This example shows the conversion of a floatingpoint multiplication to fixedpoint
multiplication.Inordertoperformthisconversion:
1
st
:Eachofoperandsareconvertedtofixedpointonlybystep1andstep2.
2
nd
:Performthemultiplicationwithnewvalues.
3
rd
:Applythestep3andstep4onthemultplicatonresult.
Step1:c = b 2
P
= 2 2
+2
= 8
Step2: rounJ(8) = 8
c = o b
FixedPointDesign
9
Step1:c = b 2
P
= S.24S6 2
+9
= 1661.7472
Step2: rounJ(1662) = 1662
c = o b
mult = rounJ(J) rounJ(c) = 68 1662 = 11Su16
Step3:c = Jcc2bin(mult) = u11u111uu1u1111uuu
Step4:c = uuu11u.111uu1u1111uuu
FixedPointDesign
10
Example 8) Additon.1
This example shows the conversion of a floatingpoint addition to fixedpoint addition. In
ordertoperformthisconversion:
1
st
: Align the binary point of operands by adding zero in the right side of the operand, which
hassmallerfractionallength.
2
nd
:Eachofoperandsareconvertedtofixedpointonlybystep1andstep2.
3
rd
:Performtheadditionwithnewvalues.
4
th
:Applythestep3andstep4ontheadditionresult.
Step1:c = b 2
P
= 2.S 2
+3
= 18.4
Step2: rounJ(18.4) = 18
c = o +b
FixedPointDesign
11
Method 1:
J = i(-9.61S,1,1u,S) = -9.62S,c = i(-S.421,1,8,S) = -S.4u6S
oJJ = J +c = -1S.uS1S converttobinarywithbin()c = 11uu1u.11111
(W,F)=(11,5)
Method 2:
Step1:J = o 2
P
= -9.61S 2
+5
= -Su7.616
Step2: rounJ(-Su7.616) = -Su8
Step1:c = b 2
P
= -S.421 2
+5
= -1u9.472
Step2: rounJ(-1u9.472) = -1u9
c = o +b
oJJ = rounJ(J) +rounJ(c) = (-Su8) +(-1u9) = -417
Step3:c = Jcc2bin(oJJ) = 11uu1u11111
Step4:c = 11uu1u.11111
FixedPointDesign
13
Method 1:
J = i(-9.61S,1,1u,S) = -9.62S,c = i(+S.421,1,8,S) = S.4u6S
oJJ = J +c = -6.2188 convert to binary with bin() c = 111uu1.11uu1
(W,F)=(11,5)
Method 2:
Step1:J = o 2
P
= -9.61S 2
+5
= -Su7.616
Step2: rounJ(-Su7.616) = -Su8
Step1:c = b 2
P
= S.421 2
+5
= 1u9.472
Step2: rounJ(1u9.472) = 1u9
c = o +b
oJJ = rounJ(J) +rounJ(c) = (-Su8) +(1u9) = -199
Step3:c = Jcc2bin(oJJ) = 111uu111uu1
Step4:c = 111uu1.11uu1
FixedPointDesign
14
Method 1:
J = i(+9.61S,1,1u,S) = +9.62S,c = i(-S.421,1,8,S) = -S.4u6S
oJJ = J +c = +6.2188 converttobinarywithbin()c = uuu11u.uu111
(W,F)=(11,5)
Method 2:
Step1:J = o 2
P
= +9.61S 2
+5
= +Su7.616
Step2: rounJ(+Su7.616) = Su8
Step1:c = b 2
P
= -S.421 2
+5
= -1u9.472
Step2: rounJ(-1u9.472) = -1u9
c = o +b
oJJ = rounJ(J) +rounJ(c) = (+Su8) +(-1u9) = +199
Step3:c = Jcc2bin(oJJ) = uuu11uuu111
Step4:c = uuu11u.uu111
FixedPointDesign
15
Method 1:
b = i(S.2S +4.26i , 1, 8, 4) = S.2Suu + 4.2Suui
c = obs(b) = S.S7Suconverttobinarywithbin()bin(c) = u1u1.u11u
Method 2:
Step1:J = Rc{b] 2
P
= S.2S 2
+4
= S2
c = Im{b] 2
P
= 4.26 2
+4
= 68.16
Step2: rounJ(S2) = S2
rounJ(68.16) = 68
Floatingpointtofixedpointconversionofanalgorithm
In this section conversion of an algorithm from the floatingpoint to the fixedpoint is
shown.Soasimplecodeisconvertedfromthefloatingpointdomaintothefixedpointdomain.
Thecorrespondingequation,whichisdescribedinthefollowingMATLABcodesis:
Method2:
f unct i on PED = Fi xedPED3( R, S, C, Z) ;
R_Fr ac=8; %The Fr act i onal Lengt h and
R_Wor dLengt h=12; %The Wor d Lengt h of t he par amet er s ( W, F)
S_Fr ac=0;
S_Wor dLengt h=4;
C_Fr ac=14;
C_Wor dLengt h=15;
Z_Fr ac=12;
Z_Wor dLengt h=16;
RCS_Fr ac=R_Fr ac+S_Fr ac+C_Fr ac;
%PED_i nt er 1_Fr ac=max( Z_Fr ac, RCS_Fr ac) ;
R_f i 0=R*2^R_Fr ac; %St ep1 i n t he Met hod2
S_f i 0=S*2^S_Fr ac;
C_f i 0=C*2^C_Fr ac;
Z_f i 0=Z*2^Z_Fr ac;
R_f i =r ound( R_f i 0) ; %St ep2 i n t he Met hod2
S_f i =r ound( S_f i 0) ;
C_f i =r ound( C_f i 0) ;
Z_f i =r ound( Z_f i 0) ;
RCS_f i = R_f i *S_f i *C_f i ; %Per f or mi ng t he mul t i pl i cat i on
RCS_f i 1=RCS_f i *2^( - RCS_Fr ac) ; %Cal cul at i on of t he r eal - val ue of t he
RCS_f i 1 by ( 1)
RCS = R*S*C; %The cor r espondi ng f l oat i ng- poi nt
oper at i on
RCS_Fr ac = Z_Fr ac; %Equal i ze t he Fr act i onal Lengt h of t he
t wo oper ands
RCS_f i 2 = RCS_f i 1 *2^( RCS_Fr ac) ; %St ep1 i n t he Met hod2
RCS_f i 3 = r ound( RCS_f i 2) ; %St ep2 i n t he Met hod2
i f ( RCS_Fr ac<Z_Fr ac) %The t wo oper ands of t he addi t i on, shoul d
have t he same Fr act i onal l engt h.
RCS_f i 4=RCS_f i 3*2^( Z_Fr ac- RCS_Fr ac) ;
Z_f i 1=Z_f i ; %I n gener al Thi s condi t i on i s
%used t o equal i ze t he f r act i onal
el se %( RCS_Fr ac>=Z_Fr ac) %l engt h of t he t wo oper ands.
Z_f i 1=Z_f i *2^( RCS_Fr ac- Z_Fr ac) ; %But i n t hi s code, i n t he
%pr evi ous l i nes t hi s act i on i s
RCS_f i 4=RCS_f i 3; %done wi t h " RCS_Fr ac = Z_Fr ac; "
end
FixedPointDesign
18