Yushiang 1
Yushiang 1
SENSOR SYSTEMS
by
Yu-Shiang Lin
Doctoral Committee:
Associate Professor Dennis M. Sylvester, Chair
Professor David T. Blaauw
Associate Professor Michael P. Flynn
Professor Marios C. Papaefthymiou
Yu-Shiang Lin
c
2008
All rights reserved.
To My Family
ii
ACKNOWLEDGEMENTS
When I looks back the days pursuing for the Ph.D, I see challenges and lots of
precious memories. Coming to a foreign country for living and study for the first time
in my life, a lot of people have helped me throughout the past five years.
I always know that the VLSI program at the University of Michigan was my top
choice, especially Professor Sylvesters group. However, not until I joined the group I
realized how fortunate I was to choose the right program. I was always given all the
resources I need to implement my ideas under Professor Sylvesters guidance.
I want to thank Professor Blaauw for his advices on the researches. Having two
advisors on research has always been a positive experience to me. Also I want to thank
much easier. It has been a pleasure to work with a group of brilliant people in our
lab. I find myself always learning new things from you.
Last but not least, I want to dedicate my Ph.D to my family for their supporting.
My wife Yen-Ting takes care of most daily routines for me and always encourages
me. My son Andruw always gives me his big smile when I come back home. It is my
parents who make who I am.
May the best to all of you.
iii
TABLE OF CONTENTS
DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . iii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
I. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Sensors . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Microcontroller . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Storage elements . . . . . . . . . . . . . . . . . . . . 5
1.1.4 Communication module . . . . . . . . . . . . . . . . 7
1.1.5 Timer . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.6 Power source . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Low power sensor system . . . . . . . . . . . . . . . . . . . . 10
II. ULTRA LOW POWER TIMER DESIGN FOR SENSOR
APPLICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 A Sub-pW gate leakage timer . . . . . . . . . . . . . . . . . . 20
2.2.1 Circuit Design for the timer . . . . . . . . . . . . . 20
2.2.2 Measurement results . . . . . . . . . . . . . . . . . 24
2.3 Self temperature compensation for low power timer . . . . . . 27
2.3.1 Oscillator with self temperature compensated current
source . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.2 Power reduction by charge holding technique . . . . 30
2.3.3 Test chip and measurement results . . . . . . . . . . 33
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
III. AN ULTRA LOW POWER 1V, 220NW TEMPERATURE
SENSOR FOR PASSIVE WIRELESS APPLICATIONS . . . 42
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 low power temperature sensor design . . . . . . . . . . . . . . 43
3.3 Measurement results . . . . . . . . . . . . . . . . . . . . . . . 48
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5 Improving the voltage sensitivity . . . . . . . . . . . . . . . . 52
iv
IV. SINGLE STAGE STATIC LEVEL SHIFTER DESIGN FOR
SUBTHRESHOLD TO I/O VOLTAGE CONVERSION . . . 54
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Conventional approach . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
V. SENSOR DATA RETRIEVAL USING ALIGNMENT INDE-
PENDENT CAPACITIVE SIGNALING . . . . . . . . . . . . 67
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Geometry optimization . . . . . . . . . . . . . . . . . . . . . 69
5.2.1 Sizing of the sensor pad . . . . . . . . . . . . . . . . 70
5.2.2 Single-ended vs. differential signaling . . . . . . . . 73
5.3 System architecture . . . . . . . . . . . . . . . . . . . . . . . 76
5.3.1 Data retrieval circuits design . . . . . . . . . . . . . 77
5.3.2 Sensor chip circuit design . . . . . . . . . . . . . . . 79
5.4 Chip measurement . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4.1 Test chip . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4.2 Alignment detection . . . . . . . . . . . . . . . . . . 84
5.4.3 Measurement results . . . . . . . . . . . . . . . . . 86
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
VI. NEAR FIELD INDUCTIVE COUPLING USING PLL PHASE-
LOCKING AND PULSE SIGNALING . . . . . . . . . . . . . 92
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.2 System architecture . . . . . . . . . . . . . . . . . . . . . . . 94
6.2.1 Integrated inductor . . . . . . . . . . . . . . . . . . 95
6.2.2 Transponder circuits . . . . . . . . . . . . . . . . . 96
6.2.3 Reader circuits . . . . . . . . . . . . . . . . . . . . . 105
6.3 Measurement results . . . . . . . . . . . . . . . . . . . . . . . 108
6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
VII. CONTRIBUTIONS AND FUTURE WORKS . . . . . . . . . 113
7.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.2 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
v
LIST OF FIGURES
vi
3.4 Block diagram and timing diagram of the sensor controller. . . . . . 47
3.5 Die photo of the temperature sensor. . . . . . . . . . . . . . . . . . . 48
3.6 Power consumption of the temperature sensor. . . . . . . . . . . . . 49
3.7 Temperature inaccuracy of the temperature sensor with two-point
calibration at 20 C and 80 C. . . . . . . . . . . . . . . . . . . . . . . 50
3.8 Temperature inaccuracy over samples (top: 10 samples/s; bottom:
100 samples/s; solid line: actual temperature). . . . . . . . . . . . . 51
3.9 Modified temperature insensitive current source. . . . . . . . . . . . 52
3.10 Voltage reference generator. . . . . . . . . . . . . . . . . . . . . . . . 53
vii
5.14 Procedures for alignment detection and pad reconfiguration. . . . . . 86
5.15 Decoded data waveform showing pseudo random bit sequences up to
15 unrepeated cycles. . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.16 Operating frequency versus transmitting amplitude and carrier fre-
quency with estimated working distance showing on the second x-axis. 88
5.17 Energy consumption versus transmitting amplitude and carrier fre-
quency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.18 (a) Tw versus BER, (b) Clock modulation circuit that defines Tw . . 89
5.19 Data rate versus BER with 10 random position testing. . . . . . . . 90
viii
LIST OF TABLES
ix
CHAPTER I
INTRODUCTION
1.1 Overview
Sensor system is ubiquitous in our modern day of livings. Applications such as chem-
ical sensing, biomedical monitoring to even industrial and automotive applications
have all made large strides. Nowadays, those systems are more and more cost ef-
fective and with higher level of integration, thanks to the highly developed silicon
technology [1, 2, 3, 4]. Basically, a sensor system utilizes the transducers to translate
the nonelectric world to something that the electrical engineers are more familiar
with, for instance, in the form of digital or analog signals. By interfacing the nonelec-
trical properties with signals that can be processed by electronic devices, the sensor
system can perform more functions in addition to just sensing. For example, the
sensors can be used to build devices such as beams or diaphragms with their mechan-
ical properties [5]. Such devices when controlled by externally applied voltages, can
store the recorded data in the storage elements. An example is a watchdog system
that monitors the condition of perishables [6]. The deterioration process of the food
can be represented by chemical reactions that are closely related to the function of
activation energy and the integrated temperature over time. The reaction process
1
can be simulated with a simple CMOS circuit with digital outputs. In this way, the
condition of the perishables can be monitored throughout the lifetime. Another ex-
ample is commonly implemented in modern VLSI microprocessors, where a thermal
tributed to collect environmental related data such as the temperature and humidity
inside an ecosystem [8]. The lifetime of the wireless sensor network is limited by the
energy consumption of the individual sensor node. Most components in the sensor
node need to be turned off when not used due to the power sources with limited ca-
pacities. A wakeup receiver that consumes less than 100W was proposed to provide
low standby power and to activate the main receiver upon request [9]. When the RF
input power is reduced or when the wakeup sequence is shortened, the mean time
between false alarm will also be decreased. This results in a more frequent wakeup
cycle than what is needed for the sensor system. It was shown that by increasing the
time of false alarm to more than 1018 seconds, -50dBm of sensitivity can be achieved
with a 7-bit code. Compared to -100dBm sensitivity of the main receiver, the saving
of the power consumption is at the expense of shorter energy range. Further discus-
sion of the wireless sensor network is beyond the scope of this work, we will mostly
focus on a single sensor system throughout the chapter.
The advancement of the CMOS technology is the driving force for high perfor-
mance computing systems. On the other hand, a sensor system can also benefit
from higher density and smaller parasitic capacitances from device scaling while the
throughput requirement is usually low. Fig. 1.1 illustrates a sensor monitoring system
that integrates various functions into the same package. The components include sen-
sor/transducer, power source, controller, storage elements, timer and communication
module. In the following section, we will discuss the role of the components in the
2
+
battery
-
sensor
1 0
Storage
0 0
element
1
1 controller
communication
timer 0110001...
1.1.1 Sensors
that selective etches the silicon wafer or adds additional layers to form the mechani-
cal or electrical devices. Pressure sensors formed by creating a thin diaphragm is an
early and successful example of a MEMS sensor. Nowadays, sensors are not limited
to mechanical devices only. Instead, thermal, optical, magnetic and chemical sensors
have all seen promising developments [10, 11, 12, 13]. Typically, interface circuits
3
example, a microfabricated capacitive servo pressure sensor was demonstrated with
integrated circuits [14]. The cavity between the diaphragm and the electrode is vac-
uumed so that the pressure sensor detects absolute pressures. The idea is to use a
capacitance-to-voltage converter to form a close loop with the servo sensor and to
balance the position of the silicon electrode. The amplified voltage corresponds to
the absolute pressure is then generated automatically by the sensor.
Temperature sensing is another example with broad interests for all sorts of ap-
plications. Unlike other environmental parameters, temperature has an direct and
thermal sensor for a VLSI microprocessor may require less than 1ms of conversion
time for effective thermal throttling.
1.1.2 Microcontroller
After the nonelectrical property is quantized to digital signals that can be under-
stood by the logic circuits, a microcontroller takes over the control of the system.
The microcontroller can be as simple as a finite state machine that performs certain
routines like serializing the data to the communication module or a full-fledged gen-
eral purpose processor that can handle data manipulation and analysis. For sensor
applications, the system throughput is generally lower than 1bps. As a result, it
spends most of the time idling and still consumes leakage power due to the nature
of non-ideal MOS switches. Technology scaling results in smaller parasitics and thus
less switching power. On the other hand, to keep up the performance boost over
technologies, the threshold voltage needs to be reduced as well and leads to more
4
subthreshold leakages [15].
Fig. 1.2 shows the supply voltage of a processor versus the energy consumption per
instruction. First, the graph shows that there is a specific power supply voltage (Vmin )
that produces the minimum energy per instruction for a fixed activity rate [16].
This is because when the voltage is high, the circuit can operate at a higher frequency
and the total power is dominated by the dynamic power. On the other hand, running
at a lower voltage means that the circuits start to spend more time leaking while the
dynamic power remains the same. Usually Vmin is below the threshold voltage so the
operating frequency decreases rapidly as the supply voltage reduced from Vmin . When
increases, the curve shifts upward, and the minimum energy voltage is moved from
Vmin1 to a lower value of Vmin2 . The analysis of Vmin has an underline assumption
that the processor consumes zero power after the computation is completed, which
is not realistic for a sensor system. In practice, the strength of the power gating
impedance path between the supply rails. There is a tradeoff in deciding the size of
the footer transistor. Selecting a weak footer transistor sizes helps reducing leakage
but hurts the performance and robustness of the system [17]. Process variation has
to be considered when choosing the optimal transistor size. To sum up, operation
at subthreshold region and heavily power gating are both required in order to save
Storage elements serve two purpose: recording the measurement data and providing
the instruction routines. SRAM is commonly used as the storage element since it
provides good compromise between high density and low latency. While the same
minimum energy analysis can be applied to the SRAM devices, SRAM cells that are
5
Energy/Inst
Vmin2
Vmin1
Vdd
Figure 1.2: The relationship between supply voltage and energy consumption per instruction.
designed for nominal voltage operation do not reliably work below 700mV without
modifications [18]. The fundamental problem is the loss of Ion /Iof f ratio while op-
erating at subthreshold regions. The other problem is due to the process variation
such that the cells suffer from reduced static noise margin (SNR). An early effort is to
replace the SRAM with multiplexer based memory that is able to successfully work
at 180mV [19]. To optimize the read and write margin at the same time, a virtual
rail can be used to selectively weaken the latch transistors during the write operation
[20]. Another solution is to increase the number of transistors in the SRAM cell and
by doing so, the size of the cell can be optimized individually for the read and the
write operations [21, 22]. Using these techniques, the supply voltage can be lowered
to less than 200mV. Ultra low standby power SRAM cell of 10.9fW was reported
using stack-forcing and gate length biasing techniques by sacrificing the bitcell area
[17].
An alternative for the storage elements is the nonvolatile memories such as ROM,
EPROM and FLASH. Such devices do not rely on power sources to sustain the data.
Therefore, it is very energy efficient for infrequent operations. ROM is suitable for
storing instruction routines because of its high density. However, writable memories
are required for recording data. FLASH is widely used for mass storage in consumer
electronics such as digital camera and mobile phones. Generally, it requires a special
6
floating gate process during fabrication and relies on high programming voltage to
provide the necessary electrical field for accessing the floating node [23]. CMOS com-
patible FLASH was proposed with 5V of programming voltage and 1.2V for reading
operation [24].
There are two types of communication schemes that can be applied to the system:
one that is able to individually send and receive data to other nodes and the other
one that relies on a base station. The former one is the concept of a wireless sensor
network which requires a reliable power source. On the other hand, the latter one can
be remotely powered. Passive wireless nodes have been demonstrated with low data
rates [25, 26, 27]. In these systems, the passive nodes harvest energy from the radio
frequency (RF) input. In general, power and data are sent simultaneously. In radio
frequency identification (RFID) terminology, the base station is called a interrogator
or reader and the device that responds the request is called a transponder. Typical
ranges for passive RFID devices are from less than 1cm to a few meters. Long range
batteryless wireless telemetry has been reported with up to 18 meters of distance
[28, 29]. To harvest enough energy for signal transmission, large capacitors are used
to store the charges.
saving is a strong function of the responsiveness of the sensor node which is highly
application dependent. As an alternative, the aforementioned wakeup receiver that
operates at lower power consumption when it is inactive can be used. With -50dBm
so that the sensors only turn on their low power wakeup detector every t seconds
7
(where t is a design parameter). This scheme further saves power consumption at the
1.1.5 Timer
Time keeping is essential to some sensor systems since the content can be highly time
dependent. For example, the doctor may apply proper treatment by knowing the
temperature variation of the patient in the past 48 hours. In another example, the
For medical applications, the temperature variation is small whereas it can be dra-
matically different when used in automotive systems. Generally, power consumption
is the biggest challenge for the timer since it is the only active device while the other
parts of the system are strongly power gated. The power consumption of the timer
should not dominate the sleep power of the system.
Energy scavenging and battery are two potential power sources for the sensor systems.
Energy scavenging is the process by which energy is captured and stored. There are
a variety of scavenging sources such as solar power, thermal energy, vibration energy
or even human power [32, 33, 34, 35, 36]. Table. 1.1 summarizes the power density
of various power generation sources [37]. For a lifespan of 10 years, the power den-
sity of the energy scavenging sources outperform the Lithium batteries. Among all,
vibration is potentially the favorable mechanism because of its abundance. One way
of exploiting it is to use the piezoelectric materials to produce electric field when
the material is deformed by external forces. Other methods such as magnetic and
8
Table 1.1: Summary of power generation sources.
Power Density (W/cm3 )
15,000 - direct sun
Solar (outdoors)
150 - cloudy day
Solar (indoors) 6 - office desk
Vibrations 200
0.003 @ 75dB
Acoustic Noise
0.96 @ 100dB
Daily Temp. variation 10
Temperature Gradient 15 @ 10 C gradient
electrical transducer can also be used to harvest the vibration energies. Vibration
scavenging through the traffic of a bridge or induced by wind, for example, is a rea-
sonable power source for a structural health sensor nodes [38]. However, considering
the inconsistent nature of the mechanism, such power source is not reliable enough
to guarantee the operation of the system.
Batteries are able to supply constant current until the lifetime is over. The lifetime
of a battery depends mostly on the form factor and the chemistry. Table 1.2 compares
several commercialized miniature batteries that are potential candidates for the sensor
system. 4A and CR-1025 are commonly used batteries for small electrical devices such
as watches and toys. Although the charge density is high, they are not compatible
with microfabrication process. Power paper [39] (ink based technique) and Cymbet
[40] (thin film battery) are advantageous in terms of size because the thickness of the
battery can be less than 1mm. Take Cymbet for example, a 1mm by 1mm by 25m
battery is able to provide the energy of roughly 10Ah. The lifetime of the sensor
system can be calculated from the its power consumption and the capacity of the
power source. A year of lifetime means that the whole system can only consume 1nA
of current when directly supplied by the aforementioned Cymbet battery.
9
Table 1.2: Comparison between small batteries
Product Nominal Voltage Capacity Size Charge density
4A battery 1.5V 625mAh 2298.0mm3 0.27mAh/mm3
CR-1025 3.0V 30mAh 196.25mm3 0.15mAh/mm3
Power paper 1.5V 30mAh 1064.7mm3 0.03mAh/mm3
Cymbet 3.6V N/A N/A 0.40mAh/mm3
In this dissertation, we target for a sensor system that is mostly limited by the form
factor. Applications such as implantable or non-intrusive systems may find 1mm3
form factor attractive. One good example is a intraocular pressure monitoring sys-
tem that is shown in Fig. 1.3. Eye pressure is highly related to some eye disease
such as glaucoma. When the intraocular pressure (IOP) increases, it can cause mal-
function of the eyes drainage structure. It will finally damage the optic nerve and
result into permanently vision loss if left untreated. The raise of pressure inside the
eye is due to the imbalance between drainage and reproduction of fluid [41]. Fluids
continuously enter the eye but they are not able to be drained due to improper func-
flatten a constant region of cornea. It is considered the most accurate way of mea-
suring the eye pressure. The other tonometry such as air puff test or transpalpebral
tonometry do not require direct contact of the cornea and are less accurate compared
to the applanation tonometers. The disadvantage of those measurement is that the
risk factor of open-angle glaucoma. High IOP on awakening is reported from many
10
Sensor
Processor
Integrated battery
(back side of the chip)
Wireless module
RX Readout system
TX
publications [45, 46, 47]. Therefore, the measurement that is not taken during the
morning very likely misses the peak of eye pressure. The average IOP is similar
during different time of the day, but peak IOP is about 5mmHg higher on awakening
compared to other time. Goldmann tonometry was done in the sitting position and it
is reported that IOP is higher in supine position [48]. Thus, it suggests that the using
of portable tonometry or self-tonometry is advantageous over the traditional methods.
uses the information to determine the absolute pressure from the capacitance value
11
[50]. The other way to implement passive sensor is by designing a device so that the
stored into the memory. The pressure sensor can also be implemented with surface
micromachining [51]. In [52], the authors summarize the development of nontelemetry
intraocular sensors to date where the implant sizes are ranging from 1.1mm to 11.5mm
in diameter.
In [53], radio-frequency (RF) transmission is used to send the signal from the
processor in real time. A full system demonstration of intraocular sensor was reported
with an on-chip micromechanical pressure sensor, a microcontroller, the readout cir-
cuits and a RF transponder in [54]. Another readout method for intraocular applica-
tion is to use a coil in parallel to the capacitive sensor [55]. This LC resonant circuit
converts the pressure into a shift of the resonance frequency. A VCO is then used to
excite the sensor over a frequency range and to detect the resonant frequency of the
internal sensor.
For a monitoring system on the order of mm3 , the power source, whether by
energy scavenging or microfabricated battery, is the limiting factor. As was discussed
in the previous section, less than 1nA of average current consumption is required
for a year of lifetime based on the capacity of the battery. To operate at such a
tight power budget, the system has to adopt aggressive power gating while it is not
actively monitoring the objects. Fig. 1.4 shows such a monitoring system with a goal
on minimizing the total power. In this system, it relies on a battery to provide the
supply power to all the components except for the wireless module. The wireless
module should harvest AC power directly from the RF input. The voltage regulator
downconverts the voltage level from a typical battery output to the energy minimum
12
Power
Regulator Power gated
source
Not power
gated
Power
Timer
controller
Wireless Temp.
CPU ROM
module Sensor
voltage level. In this way, the dynamic power can be quadratically reduced with
supply voltage. With switched-capacitor DC-DC converter, the current efficiency can
the digitized data from the sensor to the data memory. The storage units are partially
retentive in order not to lose the data while in the sleep mode. After the computation
is completed, the CPU sends a request to the power controller before entering the
sleep mode. Then the power controller takes over and switch off the footer/header
transistors for the none data retentive blocks. At the same time, a START signal is
also sent to the timer to start counting the time spent in the sleep mode. During
the sleep mode, a retention memory is used to keep the recorded data. After a given
number of cycles, the timer sends a expire signal to the power controller and returns
13
the control to the CPU again.
Recently, many research efforts have been focused on ultra low power design for
digital logics and memories. Energy number in a single digit of pJ was reported for the
processor as well as SRAM [57, 20]. On the other hand, the power consumption of the
peripheral circuits such as the timer have gotten little attentions. In this dissertation,
we will discuss the design issues on the peripheral circuits under stringent power
rent controlled one-shot timer was proposed to provide steady output frequency with
circuit that combines Schmitt trigger and a charge pump [59, 60]. For a given period
of time, the charge pump provides a fixed amount of charge to the load capacitor and
the output will eventually be flipped when the voltage level exceeds the transition
point of the Schmitt trigger. In this design, the current source has the most impact
on the output frequency variations. The implementation of steady low current source
is challenging given that 1nA is the total system current budget. Ring oscillator
based circuit can also be used to generate clock signal with low hardware overhead
[61]. For MOS transistors, the drain current decreases at higher temperatures mainly
due to the degradation of electron mobility. On the other hand, the drain current
is increasing with the temperature while operating at subthreshold region where the
14
variations.
In Chap. II, two timer designs that is suitable for sub-nA operations will be
presented. The first one uses the gate leakage of a MOS transistor as the current
source for the timer. Gate leakage is relatively insensitive to temperature compared
to other current source in CMOS technologies. In addition to that, it provides large
time constant which is ideal for reducing the switching activities with negligible area
cost. The second timer design generates a temperature insensitive current source by
forcing identical voltage across a resistor. The same current flows into a reference
transistor such that it also becomes temperature insensitive. To further reduce the
power consumption, a program-and-hold scheme is implemented to store the bias
voltage on a capacitor while the biasing circuit is turned off. The current reduction
is achieved by mirrored to a transistor that is 200X smaller than the reference one.
An ultra low power temperature sensor will be shown in Chap. III. Temperature
sensor accounts for large portion of the total leakage when the system is remotely
powered. The transmitting distance of such device is highly related to the power
proximity to the sensor node where a strong field is not needed for communication.
Capacitive coupling is suitable for such applications where it has the advantage of
low hardware overhead. Capacitive coupling was first proposed to alleviate I/O com-
munications between the chips and was implemented through a substrate trace [62].
Face-to-face communication scheme was proposed to allow more channels and thus
15
higher bandwidths [63, 64]. One obvious advantage is the power reduction and per-
formance boosting due to the absence of the electrostatic discharge (ESD) protection
device that is commonly used in wired communications. It is also suitable for passive
communication since the transmitting frequency is not limited by the resonant fre-
quency of the passive device. However, the major concern of the capacitive coupling
scheme is the misalignment of the chips. Since the coupling capacitance is inversely
proportional to the distance between the pads, the chip alignment has a strong impact
on the signal strength of the receiver. An alignment independent method is proposed
in this work. The transmitting pads are divided into smaller microplates and each
microplate can be reconfigured to either transmit power or receive data depending on
their location. The solution is demonstrated with less than 15% of achievable data
rate by randomly dropped the sensor chip on the so-called data retrieval chip.
For sensor-type applications, passive RFID technique is a potential solution when
the transmitting distance is less than 10m [65]. However, such system usually requires
a bulky external coil so the enough energy can be harvested by the transponder. On
the other hand, there are applications like intraocular pressure sensing that requires
only a few mms of distance between the reader and the transponder. Inductive
coupling is well suited for this type of application. In Chap. VI, a pulse signaling
based scheme is proposed to provide more robust transmission compared to traditional
backscattering scheme. Pulse signaling is widely used in ultra wide band (UWB)
RF frequency which is excited at the resonant frequency of the reader. A short gap
between continuous waves is created so that the transponder can utilize it to send
pulses with the frequency that was previously acquired by the PLL back to the reader.
As a result, the signal-to-noise ratio (SNR) can be greatly improved and filtering of
the strong interferences from the readers local resonant clock is not required. The
16
Table 1.3: Summary of the contributions of the works.
Chap. Title Contributions Ref.
II Gate leakage based timer Sub-pW power consumption at 300mV. [68]
II Program-and-hold timer Lower temperature dependency and sup-
ply sensitivity at comparable power con-
sumption compared to [68].
III Low power temperature Lowest published power consumption [69]
sensor
IV Static subthreshold to I/O Single stage design with consistent 5FO4 [70]
voltage level shifter delay for 300mV to 2.5V level conversion.
V Alignment independent ca- First alignment independent capacitive [71]
pacitive signaling coupling test chip with simultaneous
power and data transmission.
VI Inductive coupling using Propose pulse signaling and PLL phase-
pulse signaling locking scheme that automatically ac-
quires resonant frequency for near field in-
ductive coupling data transmission.
Chap. VII concludes the contributions of this dissertation, which are also sum-
marized in Table 1.3. Future directions based on the works will also be discussed.
17
CHAPTER II
2.1 Introduction
In this chapter, the designs of ultra-low power timers will be presented. To reduce
the power consumption and extend the lifetime for a sensor system, both the active
power and the idle power are crucial. For example, Fig. 2.1 illustrates the energy
consumption of a sensor system during its lifetime. The system is actively performing
tasks during only a short period of time. Previous work shows that the energy Emin of
a few pJ per instruction can be achieved by operating at a subthreshold voltage Vmin
[57]. Although the voltage Vmin optimize the energy consumption during active mode,
the system spends most of the time idling. The idling energy can be greatly reduced
by power gating technique. As shown in [17], however, strong power gating requires
weak footer or header transistors that also reduces the performance and robustness
of the circuits. Moreover, not every component can be turned off during the idle
time of the sensor system. A timer that keeps track of the time in the sleep mode is
one example. The power consumption of the timer should not dominate the power
consumption of the other circuits, otherwise the power reduction from power gating
can be largely degraded.
The crystal oscillator is widely used as the frequency reference due to its accurate
value and insensitivity to temperature and supply variations. Typically incurring
a bulky external component, it can also be implemented with a Colpitts oscillating
18
Task
Emin
Idle
Esleep
circuit on-chip [58]. However, the frequency coming off of the crystal oscillator is
orders of magnitude higher than what we need and the resulting power consumption
is unacceptable. An alternative option is a current controlled one-shot timer has
been proposed to provide steady output frequency with a circuit that combines a
comparator and a charge pump [59, 60]. For a given time the charge pump provides
a fixed amount of charge to the load capacitance and the output will eventually flip
when the voltage level exceeds the transition point of the Schmitt trigger. The design
of the current source has a direct impact on the frequency sensitivity of the circuit.
While the supply voltage for the sensor system is preferred to be in the subthreshold
region to minimize the active energy, designing a current source that is insensitive to
temperature, voltage and also small in magnitude is not trivial. In a standard CMOS
process, several leakage sources are available options to provide reasonably small time
constant for the timer. The subthreshold leakage is well-studied but unfortunately has
exponential dependency on the temperature. The gate leakage is relatively insensitive
19
The remainder of the chapter will be organized as follows. We first show the
design of a timer using gate leakage as the current source in Sec. 2.2. In Sec. 2.3, a
timer with self temperature compensation current source will be presented. It relies
on a charge holding technique to save the power during the active mode. We will
compare the proposed timers and summarize our work in Sec. 2.4.
Fig. 2.2(a) shows the typical implementation of the previously mentioned one-shot
oscillator design [59, 60]. The output of the oscillator is decided by the voltage Vin .
The waveform shown in Fig. 2.2(b) illustrates the operation of the circuit. When
Vout is 0, Vin is charged toward the supply voltage by current source I1 . When Vin
surpasses Vb1 , the comparator will flip the output of the oscillator. Therefore, Vin
starts to be discharged by I2 instead. Assuming that both I1 and I1 are equal to Ion ,
the frequency of the timer can be written by
2Ion
(2.1)
Ct (Vb1 Vb2 )
compensated current sources, however, none are targeted for ultra low power applica-
tions [72, 73]. To reduce overall power consumption of the timer, the circuit needs to
be biased in the subthreshold region, further reducing the headroom for the current
source. Implementing the voltage references is another challenge since the voltages
Vb1 and Vb2 should also be independent to operating conditions such as temperature
and voltage. Although a bandgap reference provides a reference voltage with great
accuracy, it does not comply with the stringent power consumption that will be en-
20
I1 Vb1
Vb1
M1
Vin
Vin Vout
Comparator
Vb2
Ct M2
Vout
I2 Vb1
time
(a) (b)
Figure 2.2: The concept of a one-shot oscillator. (a) The circuit diagram. (b) The operation
waveform.
forced on the timer. The goal of the timer is to awaken the processor just in time.
Therefore, operating at a frequency faster than a very low value, for example, at
sub-Hz to 10Hz range, should be avoided as it is only a waste of energy. This means
that either the load capacitor Ct has to be very large or the current source should
only generate very little current in order to achieve a large RC time constant.
sources I1 and I2 shown in Fig. 2.2(a). As a CMOS technology scales, gate oxide
thickness will continue to shrink to maintain good channel control and drive current
tunneling currents such as the electron tunneling from the conduction band (ECB),
the electron tunneling from the valance band (EVB), and the hole tunneling from the
21
valance band (HVB). In general, gate current density has the following form [74]
Vg Vaux
Jg = A Toxratio
t2ox (2.2)
exp [B( |Vox |)(1 + |Vox |)tox ]
3/2
where A = q 2 /8hb , B = 8 2qmox b /3h, mox is the effective carrier mass in the
oxide, b the tunneling barrier height, tox the oxide thickness, and Vaux is a fitting
function of the tunneling carrier density and available states. Vaux is a weak function
of temperature and has the following form
Vgs ef f Vth0
Vaux = N IGC vt log 1 + exp (2.3)
N IGC vt
Typical temperature sensitivity for gate leakage is about 10% per 10 C, which is much
lower compared to the subthreshold leakage or junction leakage. For the timer appli-
cation, using the gate leakage is also advantageous in its small magnitude compared
to a transistors saturation current (e.g., in 0.13m CMOS a typical gate leakage is on
the order of 10s of pA/um [75]). The benefit of having small magnitude is two-fold.
First, the static current that used to charge and discharge the load capacitance is
small. Plus, large time constant helps reducing the switching of the clock network on
Schmitt trigger. The hysteresis nature of a Schmitt trigger is often used to suppress
signal noises [76]. The low-to-high transition voltage VM + and high-to-low transition
voltage VM are defined as the 2 crossover points in the voltage transfer characteristic;
i.e., when the input voltage Vin equals the output voltage Vout . In this work, VM + and
VM are the equivalent of Vb1 and Vb2 of Fig. 2.2, respectively. Since VM + and VM
Schmitt trigger inverter contains transistors MS1 through MS6. When operating at
22
Vinv
MI4
MS4 MC1 MC2
MS6
MS3 Vin MI3
Vs Vout
Vin Vs
MS2 ML1 MI2
INV1 INV2
vx
MS5
TINV MI1
MS1 Vclk
superthreshold voltages, VM + can be determined when MS1, MS2 and MS5 are all in
saturation. Considering IM S1 = IM S2 + IM S5 , and assuming that the channel length
where Vx,tran is the voltage Vx in Fig. 2.3 when VM + occurs. From simulation results,
we know that Vx,tran is nearly constant when temperature varies. Therefore, the
temperature impacts VM + in the same way it affects Vth . The transition voltage for
the Schmitt trigger decreases as the temperature rises. VM can be computed in the
same way. Our simulation results show that at 300mV, VM = VM + VM reduces
0.1%/ C due to the lower on-off current ratio in subthreshold region. This results
INV1 and INV2 are the inverters to provide sharper transition for the timer. To
reduce the leakage power, they are stack-forced and sized with long channel lengths.
The clock output is buffered again from the loading by a tri-state inverter TINV to
isolate any possible noise that is coming from the other part of the system. MC1 and
MC2 are thin oxide MOS transistors used to serve as the charging and discharging
devices. Both PMOS and NMOS transistors are used to provide comparable charg-
23
0C
20 C
40 C
2
10
1
10
Figure 2.4: Power consumption vs. supply voltage at different temperature points
gate oxide transistor, which are commonly available in modern CMOS processes, to
avoid unwanted gate leakage to ground. The corresponding waveform of Vin and Vout
is similar to the one illustrated with Fig. 2.2(b). When Vout is pulled up, Vinv is pulled
down to discharge Vin through both MC1 and MC2 until Vin is lower than VM of
the Schmitt trigger, and vice versa. Vclk goes to a digital counter that is configurable
by the system. Based on the application, the number of timer ticks can be used to
decide the time between active modes.
The test chip was implemented in a commercial 0.13m digital CMOS process. The
total circuit area is approximately 480 m2 where half of the area is allocated to the
load capacitor ML1. Fig. 2.4 plots the power consumption of timer as a function of
both the supply voltage and temperature. At 300mV, the power consumption of the
24
17 300mV
400mV
16 500mV
600mV
15
13
12
11
10
0 10 20 30 40 50 60 70 80
Temperature ( C)
timer is less than 1pW at 20 C and it consumes roughly 2nW at 600mV measured
by a Keithley 6217 electrometer.
Fig. 2.5 shows the timer output period measured at different supply voltages and
temperatures. The timer is more temperature insensitive at higher supply voltages,
largely due to the fact that the impact of VM , is minimized at the superthreshold
region. Similarly, the variation due to supply voltage is also reduced at higher it Vdd s
for the same reason. The measured temperature sensitivity is 0.16%/ C at 600mV
and is 0.6%/ C at 300mV; supply sensitivity is 0.15%/mV from 300mV to 500mV and
0.04%/mV at 600mV, the lower figure at 600mV is specifically due to operating in the
25
70
Vdd=300mV
Vdd=600mV
60
50
30
20
10
0
0 5 10 15 20 25
Die number
Figure 2.6: Output period scatter plot highlighting die-to-die and within-die variations.
the output period in Fig. 2.6. To characterize die-to-die variation, we first compute
the mean for each die and obtain / across all 25 dies. Die-to-die variation is 28%
and 27% at supply voltage of 300mV and 600mV respectively. Within-die variation is
obtained by taking the average of / within individual die and is measured at 12.4%
and 9.2% for 300mV and 600mV. Key sources of variation includes oxide thickness
variation and the voltage shift of the Schmitt trigger trip points VM + and VM due
to transistor mismatch. In general, the variation can be calibrated by adjusting the
aforementioned counter. The processor can easily configure the number of counts
between the readings by preloading digital values.
We also tested the proposed timer by running it continuously for 20 hours to
measure the timing stability over an extended period. Measurement results over time
26
Output period (s)
21.4
21.3
21.2
500 1000 1500 2000 2500 3000
Number of ticks
400
Count
300
200
100
0
21.1 21.15 21.2 21.25 21.3 21.35 21.4 21.45 21.5
second
along with the resulting histogram are shown in Fig. 2.7. It takes approximately
ten minutes for the timer to reach steady state, after which the output frequency is
always within 1% throughout the remaining 20 hours of testing. The rms jitter for
low power timer is to find a reliable current source which defines the output period
accurately with low cost. The previous section discussed the design where gate leak-
age of a thin oxide device is used to provide such current. In general, gate leakage
modeling is a complicated problem and accurately simulate the behavior is not feasi-
ble. Also, porting the design to another technology is not trivial since the change in
27
gate leakage is typical huge. Therefore, we would like to explore other options that
Subthreshold leakage is still the most dominant contributor to the total leakage power
dissipation for advanced CMOS technologies [77]. Therefore, it is also the most
studied leakage source and many measurement data can be used to refine the circuit
simulation model [78, 79]. The result of many recent literatures on subthreshold
circuit designs provide further evidents on this [57, 80, 19]. An advantage of using
the subthreshold leakage is that the value can be easily duplicated by the current
mirror, which is critical for lowering the power consumption as we will present in Sec.
2.3.2. On the other hand, subthreshold current does suffer from temperature and
process variation as shown by the equation
2
W kT
Isub = ef f Cox (m 1) eq(Vg Vt )/mkT (1 eqVds /kT ) (2.5)
L q
As temperature affects both the thermal voltage and the threshold voltage in a non-
linear way, finding a inverse function that is able to compensate for the temperature
effect is not practical. In addition to that, process variation from doping and geometry
can cause difference up to several times compared to the nominal value [81, 82].
sistors Md1 -Md6 which are equally size evenly divide the supply voltage and the inter-
mediate voltages are insensitive to temperature. Transistor M0 is biased with a gate
voltage lower than the subthreshold voltage and forms a negative feedback loop with
28
nd1
Md1 R1 M2
nd2 n1
Md2 bp
A0
+ bn
nd3 M0
-
Md3
M1
nd4
Md4 Mg Ms
nd5
Md5
Md6 reset
Figure 2.8: The bias stage showing the voltage divider and a resistor based self-biasing loop.
resistor R1 and amplifier A0. The voltage on node nd3 is replicated to n1 through the
feedback loop. As a result, the current that flows through R1 and the drain current
of M0 are identical and can be given by
Vdd Vnd3
IR1 = IM0 = (2.6)
R1
the gate overdrive voltage bn is thus self-biased at different temperatures. When the
temperature is high, voltage bn reduces to compensate for the higher leakage, and
vice versa. Transistor that biased in subthreshold region provides very high output
resistance since the drain current barely changes with Vds when it is larger than 3
kT /q. The magnitude of current-mirrored output can be easily adjusted for process
variation by dividing M0 into smaller parallel transistors with series switches that can
be selectively turned on to change the ratio. A reset signal is used to turn off the
biasing circuit in order to save power during the active mode. We will have more
discussion on this in the later section.
29
The oscillator that generates the output of the timer is shown in Fig. 2.9. It is a
one-shot oscillator that determines the oscillation period by the load capacitor CL and
current sources that is biased by bn and bp. Both bn and bp are originally generated
from the bias stage and replicated by the hold stage that will be presented later. When
out is logic low, the charge stored on load will be sinked to ground. On the other
hand, load is pulled up toward Vdd when out is logic high. The switching transistors
for pull-up and pull-down are long channel devices and stack-forced with N=4. This
is because biasing transistors Mn and Mp are biased in the deep subthreshold region.
unwanted leakage does not contend with the charging transistors during oscillation.
By comparing load voltage to reference voltages refh and refl, the output will flip
from the previous state once load surpasses refh or becomes lower than refl. In this
work, refh and refl is also generated from the bias stage (voltages nd2 and nd4). The
gain of the comparator is an important design parameter since it determines the delay
of signal ss and rs once load triggers the comparator. It is noted that to maintain
the comparator gain at different temperatures, the comparators are also biased with
From Eq. 2.6, the current consumption of the circuit is mainly determined by the
resistance value. When Vdd = 600mV, Vnd3 = 300mV and R1 equals 15M, the
power consumption for the bias stage is 10nW assuming that the power consumption
of the amplifier at room temperature can be neglected. Further increasing R1 will
reduce power consumption at the expense of increased silicon area. Reducing the
voltage difference between Vdd and Vnd3 is another option. However, it magnifies the
voltage offset between Vnd3 and Vn1 and increases the temperature sensitivity. In
30
Mp bp
refh + ss
N=4
- out
load
CL + rs
N=4
refl -
Mn bn
proposed to bring down the power by two orders of magnitude with little hardware
overhead.
The circuit diagram for the proposed method is shown in Fig. 2.10. The bias stage
and the oscillator stage have already been presented in the previous section, with an
addition of the hold stage for power saving. The idea is to store the voltage on a
capacitor after the bias stage is turned off through power gating. The time before
the bias stage is turned off is called the programming mode. And it enters the active
mode when the bias voltage is only sustained by the hold stage. By applying the
bias voltage to a much smaller transistor than bias transistor, the bias current for
the oscillator and thus the total active power can be proportionally reduced. For
example, if the ratio between the transistor width of M0 and M1 is 200:1, the power
consumption can be reduced from 10nW during the programming mode to merely
50pW in the active mode.
Two types of charge holding circuit are shown in Fig. 2.11. For type I circuit,
31
Bias Stage Hold Stage Oscillate Stage
Iref M2
Program Hold
200X 1X out
M0 M1
CL
the bias voltage bn is written into bn1 and bn2 when c[1] is high. After c[1] goes
eliminates the subthreshold leakage through Ms2 . Ideally, the charge stored on CL
can be maintained for a long period of time before it needs to be replenished again.
However, the junction leakage of Ms2 becomes a dominant source that discharges CL
when the temperature increases. To address this issue, type II circuit is considered.
Instead of charging CL with pass transistors, gate leakage of a thin oxide transistor is
used. When the bias stage is ON, amplifier A1 controls the voltage of bn1 until bn2
reaches the same value as bn. After the bias stage turns off, node bn2 acts like floating
node. In this scheme, node bn2 does not suffer from other leakage source other than
To further understand the operation of the hold circuit, the node voltages are
plotted in Fig. 2.12. Assuming at the beginning of power on, every node has initial
32
c[2] bn1
(to osc.)
c[1]
+ A2
bn1 bn2 -
bn c[3] bn2
M s1 Ms2 Ms
(from bias)
- Mc
+ CL bn(from bias) CL
+ A1
-
A0
c[1]
(a) (b)
Figure 2.11: Circuits for charge holding. (a) type I, (b) type II.
condition of voltage 0. During phase P1, node bn will reach steady state first as
discussed in Sec. 2.3.1. bn2 will slowly converge to bn with time constant depending
on the ratio of gate leakage and the load capacitance. It is noted there is finite
voltage offset between bn1 and bn due to input offset voltage and finite gain for A1.
In order to eliminate the gap between bn1 and bn, transistor Ms is turned on during
P2. At this point, ideally bn, bn1 and bn2 will all have the same voltage. In phase
P3 and P4, amplifier A2 is turned on to minimize the voltage difference between
bn1 and bn2 while bias stage can be turned off by c[0] to save power. While P2,
P3 can take less than milliseconds, P1 will be the dominating period of the total
programming time. Since bn1 and bn2 follow each other closely during P4, bn1 is
chosen to drive the oscillator stage. In this way, the unnecessary coupling noise from
the switching of oscillator can be prevented from entering pseudo-floating node bn2.
The programming time of the timer is defined as the total period combining P1, P2
and P3.
The proposed timer design was fabricated in 0.13m technology. The die photo of
the timer circuit is shown in Fig. 2.13. Total area of the timer is 0.019mm2 , where
the resistor occupies about half of the that. Ambient temperature is controlled by
33
P1 P2 P4
P3
VDD
bn1
bn
bn2
0
c[0]
c[1]
c[2]
c[3]
ramped up. The output frequency of the timer is measured by a Tektronics oscillo-
scope TDS5104A. In this setup, all the control signals are supplied externally through
plotted in Fig. 2.14. Since the frequency of the timer drifts over time by the gate
leakage through the programming transistor, the output frequency in this figure is
referred to as the frequency right after the programming mode is completed. The
because of the reduced gain from the comparators of the oscillate stage. The variation
of the output frequency across temperature when the timer is operating at 600mV is
6% over the range from 0 C to 90 C. At different supply voltages, the curves closely
track each other with respect to temperature, suggesting that the characteristics
34
Bias stage
Hold stage
Osc. stage
remain unchanged with the bias voltage. At room temperature, varying the supply
voltage by 50mV results in +4/-2% of frequency variation. In this work, the timer
is the only active component in the sleep mode. Thus, the switching supply noise
can be neglected. It is reasonable to assume that the cycle time error due to supply
variation should be lower than 1%.
The expected timer behavior is, for example, to wake up the system every 10
minutes. The ideal situation is that the timer only requires to be reprogrammed
35
1.06
1.04
Normalized frequency
1.02
1.00
0.98
0.96
0.94
VDD = 600mV
0.92 VDD = 550mV
VDD = 650mV
0.900 10 20 30 40 50 60 70 80 90
Temperature ( C)
Figure 2.14: Normalized frequency vs. temperature and supply voltage.
the timer varies depending on how often the timer needs to be refreshed. Fig. 2.15
shows the average timer frequency versus the time between refreshing. The results are
measured at 600mV with a programming time of 30 seconds as well. Each curve in
the figure represents a different temperature point at 0, 20, 50 and 90 C, respectively.
Higher temperature also means higher leakage and thus larger slope is shown in the
figure. At 0 C, the frequency drift is about 0.8% per minute, while it is 1.7% per
minute at 90 C. The measurement results backs up the statement that choosing type
II circuit for the hold stage is advantageous for its low temperature coefficient. By
refreshing the timer every 4 minutes, 7% of frequency deviation is observed across the
temperatures. Whereas by reducing the refreshing time to 2 minutes, the frequency
deviation can be reduced to 5% as well.
So far, the programming time is set to 30 seconds to guarantee that the timer
is properly biased. However, the power saving by the program-and-hold method is
36
105
@ 50 C
100
85 @ 90 C
80
750 100 200 300 400 500 600 700
Refresh time (s)
Figure 2.15: Average of normalized frequency drift over time.
In Fig. 2.16, the timer is refreshed every four minutes with a programming time of 1.1
seconds. It shows that 3 to 4 programming cycles are required to bias the timer at the
target frequency. After that, the timer will operate at a steady frequency. In order to
achieve the steady state frequency in a single programming cycle, the programming
temperature sensitivity needs to be considered. Fig. 2.17 shows the output frequency
normalized to the programming time of 10 seconds with three different temperature
settings. When the programming time is further decreased below 1 second, the timer
can no long be properly programmed and therefore no oscillation can be observed.
This is consistently true across the temperatures of interest. At the room temperature,
the output frequency drops 8% by reducing the programming time to less than 2
seconds. Since the programming time will be fixed across temperature, what matters
is the frequency deviation at the same programming time. According to the figure,
37
12
10
reducing the programming time will not introduce more than 2% error on top of the
frequency deviation shown in Fig. 2.14. Therefore, it is reasonable to always use the
minimum programming time that is available for this work.
Power consumption is measured at programming mode and active mode, respec-
tively. Fig. 2.18 shows the power consumption at supply voltage equals to 550mV. It
is also shown that the programming power has a clear floor at around 11nW. This is
mainly due to the fixed voltage drop across the polysilicon resistor in the bias stage.
At higher temperatures, the programming power grows beyond linearly due to the
exponentially increased power from the amplifiers that is biased in subthreshold re-
gion. The active power at room temperature is 55pW, which is directly proportional
to the 200:1 current mirror ratio. In active mode, the power consumption of the
38
1.00
0.96
0.94
0.92 Temperature = 0 C
Temperature = 27 C
Temperature = 80 C
0.90 0
10 101
Programming time (s)
Figure 2.17: Normalized frequency vs. programming time.
13.0 250
Programming power (nW)
12.5 200
Active power (pW)
12.0 150
11.5 100
11.0 50
10.50 10 20 30 40 50 60 70 800
Temperature ( C)
Figure 2.18: Power consumption at the programming mode and the active mode with respect to
different temperatures.
39
105 12
102 7
6
101 0 100 200 300 400 500 600
Refresh period (s)
Figure 2.19: Power consumption and frequency deviation with different refreshing time.
Fig. 2.19. The programming time is given by 1 second, which is also the smallest
period that can still bias the timer properly. As the refreshing time gets larger, the
average power consumption includes both the programming power and the active
power will be reduced. The frequency deviation is 5% without considering the shift
of frequency over time. When the refreshing time increases over two minutes, the
frequency deviation begins to rise as a result of leakage current difference between
low and high temperatures. To sum up, 150pW and 100pW of power consumption
can be achieved if the tolerable errors are 5% and 7%, respectively.
2.4 Summary
In this chapter, two types of timers are proposed for ultra lower power sensor plat-
forms. Table. 2.1 summarizes the characteristics of the timers. The gate leakage
timer has the advantage on smaller footprint and also consumes less power compared
to the program-and-hold timers when operates at 300mV. Considering applications,
40
the program-and-old timer works only under the assumption that temperature only
vary slowly compared to the refreshing period. The gate leakage timer, however, suf-
fers from larger variation when operates at different temperatures, especially when
the supply voltage becomes lower. In exchange for better temperature insensitiv-
ity for gate leakage based timer, the supply voltage has to be increased and results
into higher power consumption. In terms of low voltage operation, the gate leakage
timer is preferred since the operation does not rely on the analog components as
in the program-and-hold timer. Control for the gate leakage timer is trivial as the
*
Over a temperature range from 0 C to 80 C.
**
Over a temperature range from 0 C to 90 C.
41
CHAPTER III
3.1 Introduction
Since the last decade, smart temperature sensors have growing demands on VLSI,
automotive, and wireless sensing applications due to their low cost. Monitoring VLSI
chip temperature plays a key role on long-term system level reliability and perfor-
mance. Rapidly increasing transistor numbers require embedded sensors with small
area and low power that can be spread over the chip for temperature management
[84]. Sensors that produce low power consumption not only helps with power grid
integrity but also alleviates self-heating issues. Recently, growing interests in building
monitoring systems with wireless telemetry or RFID cards demand even more strin-
gent power consumption [85, 86]. The energy range which is defined as the distance
from the transponder and reader that is just enough to operate the transponder can
be extended by cutting down the power dissipation [87]. In the work reported in [65],
the temperature sensor consumes 10W compared to 2W by the reader for their pas-
sive RFID transponder. This means that the power consumption of the temperature
sensor is highly related to the working distance of such wireless systems.
Smart temperature sensor ICs were first developed using bandgap reference and
analog-to-digital converters (ADCs) [88, 89]. Such sensors typically are able to achieve
better than 1 C accuracy with calibration. Combining with offset cancellation, dy-
42
namic element matching and room-temperature calibration, accuracy of 0.1 C with
247.5W power consumption was reported [90]. Time-to-digital converter (TDC) was
also proposed to measure the temperature by tracking a pulsed signal along a delay
line [91]. In this work, our goal is to implement a temperature sensor with sub-W
power dissipation with acceptable accuracy for ultra low power passive wireless sen-
sor applications. In the Sec. 3.2, the architecture of our proposed circuits will be
discussed and the power consumption will be analyzed. Measurement results will
be shown in Sec. 3.3 and is followed by the conclusion in Sec. 3.4. A discussion on
Fig. 3.1 shows the block diagram of the temperature sensor. Temperature insen-
sitive current source Iref and proportional to absolute temperature (PTAT) current
source IPTAT are generated separately. Each current source is mirrored and fed into
the current-starved ring oscillator to translate the temperature information into fre-
quency. Afterwards, the clock signals are fed into an UP-counter that is triggered
by a start signal in order to produce a digitized output. The sensor controller de-
cides when the conversion should start and responds by a data valid signal when the
data is available. The key blocks of this work is to generate current sources Iref and
IPTAT with low power dissipation and is still able to maintain reasonable temperature
characteristics.
Generating IPTAT is a commonly used technique in bandgap reference design for
compensating the complementary to absolute temperature (CTAT) current sources.
Fig. 3.2 shows the schematic for such purpose that was originally implemented with
bipolar circuits [92]. CMOS transistors can be used in place of bipolar transistors
when operating in the subthreshold region. In this way, we can reduce the power
consumption of this block significantly, which accounts for roughly 30% of the total
43
Iref
start
Sensor controller
start
Counter data
data_valid
IPTAT
start
power dissipation. When Vgs is less than Vth and Vds is larger than three VT , the drain
current of transistor M4 and M5 can be approximately written by:
W 2 Vgs Vth
Isub = COX VT exp (3.1)
L nVT
where VT and Vth are the thermal voltage represented by kT /q and Vth is the threshold
voltage of the transistor, respectively. Through current mirror transistors M2 and M3,
the current through resistor RP T AT can be expressed as
nVT W 5 W 3 L4 L2
IRP T AT = ln (3.2)
RP T AT W 4 W 2 L5 L3
Assuming that Vth mismatch is ignored. By properly biasing the circuit, the output
current is proportional to VT . The sensitivity to the geometric variations can be
minimized by designing a large value in the log function. Large transistor sizes also
help to reduce the impact on threshold voltage due to random doping fluctuations.
44
M0
resetn
M1 M7
bp
M2 M3
RPTAT
bn
M4 M5 M6
transistors used to provide bias voltages that are proportional to the supply voltage.
The voltage of nb is replicated to node na through negative feedback loop consisting
of transistor M6, resistor R1 and the amplifier. Therefore, the drain current of M6
can be defined by (Vdd Vna Vos )/Rref , where Vos is the input offset voltage of the
amplifier. The fractional temperature coefficient (T CF ) of Id6 is
1 dId6
T CF (Id6 ) = (3.3)
Id6 dT
1 dVna 1 dRref
= (3.4)
Vdd Vna Vos dT Rref dT
To reduce the non-ideal temperature effect on the sensor we do the following: 1) the
Eq. 3.4.
It is noted that in this work, the voltage reference circuitry in Fig. 3.3 was imple-
45
M5
Rref M8
nb na
bp
M4
+
-
Amp
M3
M6 M7
M2
bn
Inv
M9
M1
reset
M10
Voltage
reference
circuitry
mented as a voltage divider. Thus, Iref is inversely proportional to the supply voltages
and lead to changing output value with power supply noises. To fix this issue in the
future, the voltage reference should be re-designed to have constant output regardless
starving voltage for the ring oscillator. Temperature information in Iref and IPTAT
are translated into frequency for the signals clk i and clk l. In Fig. 3.4, the sensor
controller is shown as well as the timing diagram. clk i and clk l are used to clock
the q-counter and the d-counter, respectively. When start is 0, both counter outputs
are cleared. Triggered by input signal start, the controller asserts output data valid
after the q-counter gets overflowed 210 cycles later. data valid immediately stops
both counters from changing their content until start goes to 0 again to reset the
states. The temperature sensor including the Iref and IPTAT blocks are implemented so
46
clk_i data_valid
q[0] q[10]
start
d[0] d[10]
clk_l
data
start
clk_i
clk_l
q 012 1024
d 01 1135
data_valid
Figure 3.4: Block diagram and timing diagram of the sensor controller.
that they can be deactivated during sleep state by asserting reset signal. When high
conversion rate in not required, the temperature sensor can be periodically deactivated
to save power.
The total power of our proposed temperature sensor can be written as follow:
where n, m are the multiplication constants of current mirrors. Pctrl is the power
consumption of the sensor controller. For simplification, static power consumption is
2
neglected in this first order analysis. Therefore, Pctrl can be expressed as Cc Vdd fclk
given the total capacitance Cc , effective activity factor and clock frequency fclk .
Considering fclk as a function of Iref , IPTAT and Vdd , Eq. 3.5 can be re-written as
2
Vdd Vdd
Ptot = k1 VT + k2 (3.6)
RP T AT Rref
where k1 and k2 are geometry and process related constants. It is shown that 1) the
47
power consumption of the sensor is a linear function of temperature; and 2) power
Iref and IPTAT at room temperature. In this work, 6.2M and 3.2M P+ poly resistors
are chosen for Rref and RP T AT , respectively.
The chip was implemented in a 0.18m 1P6M digital CMOS process. The total area
of the temperature sensor module is 0.05mm2 . The die photo is shown in Fig. 3.5.
In this test chip, 85% of the area is dominated by the resistor for biasing the current
sources.
De-
Sensor controller
cap
0.3mm
IPTAT Iref
module module
0.165mm
48
320
300
280
Power (nW)
260
240
220
200
180
0 20 40 60 80 100
Temp (C)
Figure 3.6: Power consumption of the temperature sensor.
voltage for this technology is 1.8V. The power consumption increases from 200nW
to 310nW from 0 C to 100 C, which matches the expected trend from Eq. 3.6. The
consumption and area. While most area are dominated by the resistors, reducing the
resistance by half also reduce total area by 43%. In the same time, the conversion rate
is also doubled because of the boost in ring oscillators starving current. In this test
chip, clk i is running at 100kHz for an equivalent of 100 samples/s. This is sufficiently
fast for most applications, and in fact we can lower the conversion rate to lower the
in Fig. 3.7. The temperature error is ranging from -1.6 C to +3 C over the sweep-
49
Table 3.1: Comparison of temperature sensors.
Sensor Inaccuracy Power Technology Area Temperature Conversion rate
( C) Consumption (mm2 ) range ( C) (samples/s)
[88] 1 7W 2m 1.5 -40120 50
[89] 1 1mW 0.6m 3.32 -55125 40k
[90] 0.1 247.5W 0.7m 4.5 -55125 110
[91] -0.7/+0.9 10W 0.35m 0.175 0100 10k
[65] -1.8/+2.2 10W N/A N/A 0100 2
[86] 1 0.9W 0.18m 0.2 2747 N/A
This work -1.6/+3 0.22W 0.18m 0.05 0100 100
2
Error (C)
-1
-2
0 20 40 60 80 100
Temp (C)
Figure 3.7: Temperature inaccuracy of the temperature sensor with two-point calibration at 20 C
and 80 C.
ing range from 0 C to 100 C. With 11 bits output from the sensor controller, the
key circuits parameters to this work. It can be seen that our proposed temperature
sensor adopts an approach that is favorable for low power operation at the expense in
terms of temperature inaccuracy. The total area of our test chip is comparable or even
smaller than other works after considering the translation of different technologies.
Fig. 3.8 shows the long term characteristics of the sensor by setting up the chip in
50
3
2
Temperature offset(C)
1
0
-1
-2
-3
3
2
1
0
-1
-2
-3
0 200 400 600 800 1000
Samples (#)
Figure 3.8: Temperature inaccuracy over samples (top: 10 samples/s; bottom: 100 samples/s; solid
line: actual temperature).
the temperature chamber (top: 10 samples/s; bottom: 100 samples/s). After taking
1000 samples successively, the 3 inaccuracy value over the samples is 2.5 C. By
3.8.
3.4 Conclusion
In this work, we implemented an ultra low power temperature sensor for passive
wireless applications. At room temperature, it consumes merely 220nW while contin-
uously running. By utilizing a temperature independent current source Iref and PTAT
current source IPTAT , the temperature information can be synthesized and translated
into digital output in a conversion rate of 100 samples/s. Measured data shows that
51
M0 M1
Amp
bp
+
-
Vref nb bn
Rref M2
In order to minimize the supply voltage sensitivity, the circuit shown in Fig. 3.3
needs to be revisited. As supply voltage deviates from its nominal value by V,
the current varies by I=V/Rref . I will result in a shift of output value given
the same measured temperature. More importantly, it will also distort the current
to frequency conversion by the current starved ring oscillator since the relationship
between the input current and the output frequency is a non-linear function. One
solution is to investigate another way of translating the current to frequency that is
less impacted by the supply voltage. For example, a transmission gate based current
starved oscillator can be used.
Another solution is to consider the circuit that shown in Fig. 3.9. Instead of using
voltage divider as the voltage reference, Vref which provides absolute voltage level
needs to be generated. The other change is to place the resistor between ground and
na. Therefore, a supply voltage insensitive current source can be generated. Vref can
be provided by the circuit shown in Fig. 3.10. The details of the voltage reference
will be discussed in Sec. 6.2.2. According to the Monte Carlo simulations, more than
52
M2 M3
M1
Vref
M4 M5
40dBs of power supply rejection ratio (PSRR) can be achieved using the circuit. The
temperature coefficient of Vref can be compensated by sizing transistor M1. At the
temperature of interest, the temperature coefficient of the thermal voltage and the
threshold voltage of M1 cancels each other. Again from 100 Monte Carlo Simulations,
the worse case temperature coefficient is 167ppm/ C.
53
CHAPTER IV
4.1 Introduction
Operating in the subthreshold region helps to greatly reduce power dissipation for
applications that do not require high performance [80, 19]. Level conversion has al-
ways been an issue for systems that need to deal with two or more power domains.
This problem is more severe in subthreshold circuits. Since the drive strength of the
input devices are mostly limited to subthreshold operation and have a corresponding
extra wiring requirements are undesirable side effects of multi-stage level conversion.
In this paper we discuss the design issues associated with bridging the subthreshold
core logic and I/O voltage in a single stage design, and propose a robust circuit that
for voltage level conversion [93]. It has the advantage of low static power consumption
and small propagation delay due to the cross-coupled latch structure. The drawback
is that converting from ultra-low input voltages requires large transistors, which will
be discussed in Sec. 4.2. Modified DCVS was proposed to alleviate the contention
problem [94], however it still suffers the same sizing issue as the DCVS level shifter
54
when the input is at subthreshold voltage. A dynamic level converter is a reliable
way to achieve level conversion at low voltages [95, 96]. The disadvantage is that it is
more power hungry compared to its static counterpart and requires extra clock routing
used for dual supply systems [97]. While not specifically designed for subthreshold
level conversion, these designs generally require the cascading of multiple stages.
In this chapter, we will first examine the sizing of conventional DCVS level shifters
at very low voltages and demonstrate its susceptibility to process/voltage/temperature
Fig. 4.1 shows the circuit diagram of a DCVS-type level shifter. The circuit operates
on the basis of contention between pull-up and pull-down devices. In order for the
output to switch, the NMOS drive strength has to be sufficiently greater than the
PMOS drive strength. When VDDL is at a subthreshold level and VDDH is the I/O
voltage, the difference in drive strength for the pull-up and pull-down transistors can
easily be greater than three orders of magnitude.
Fig. 4.2 shows the simulated gate delay of the DCVS level shifter converting a
periodic input to the I/O voltage in 0.13m CMOS. Transistors Mp1 and Mp2 are
1
both sized at W/L of 0.36m/5m to provide decent fall delay with respect to the
1
0.36m is the minimum width of I/O device in this technology
55
Thick oxide VDDH
I/O
Thin oxide
Normal Vt Mp1 Mp2
VDDH
n2
VDDL out
n3
buf_in (inv)
in Mn1 Mn2
(buffer)
VDDL
n1
(inv)
Figure 4.1: Conventional DCVS-type level shifter with cross-coupled pull-up transistors.
rise delay while restricting the size of the pull-down transistors Mn1 and Mn2. When
VDDL is 0.35V, the operating frequency is primarily limited by the fall delay if the
width of Mn1 and Mn2 (represented by Wn) is greater than 15m. This also indicates
that the results can be further optimized by up-sizing Mp1 and Mp2. However, at
lower VDDL, the rise delay is not able to catch up with the fall delay until Mn1 and
Mn2 is disproportionately large. Other than the area and corresponding leakage power
arising from such large pull-down transistor sizes, operating at low temperatures leads
to another problem for this type of circuit. The drive strength of Mp1/Mp2 are (at
most) quadratically sensitive to Vt shift while the drain current of Mn1/Mn2 has an
exponential dependency on Vt. As a result it is much more difficult to balance drive
56
5
Delay (# of FO4)
3
1
VDDL=0.30V
VDDL=0.25V
VDDL=0.35V
0
0 10 20 30 40 50 60 70 80 90 100
Wn (m)
Figure 4.2: Simulation results showing the operating frequency with respect to pull-down transistor
width Wn.
To overcome the dramatic difference in overdrive voltage for pull-up and pull-down
transistors, diode-connected PMOS transistors are used to replace the pull-up tran-
sistors. The proposed circuit is shown in Fig. 4.3. The pull-down of internal nodes
intn and intp is directly through normal Vt transistor Mn1 and zero Vt thick oxide
transistor Mn2. The use of Mn2 was previously proposed to reduce the potential
difference across the drain and source of Mn1 [98]. Therefore, the drive strength of
Mn1 can be increased by avoiding the use of thick oxide devices. We apply this stack
transistor technique to the conventional level shifter for fair comparisons in Sec. 4.4.
The purpose of Mp1 is to help eliminate part of the cross-bar current at the beginning
of the transition, although it also introduces roughly 10% delay penalty due to the
extra loading.
For pulling up the internal nodes, Md1 through MdN transistor stacks provide a
variable resistance path to the supply. At the beginning of input switching, most of
57
VDDH
Thick oxide
Md1
I/O
Mpo
Thick oxide
intp
zero Vt
MdN
Thin oxide
Normal Vt (pull up)
Mp1
out
VDDL Mn2
buf_in intn
in Mn1 Mno
(buffer)
(pull down)
Figure 4.3: Proposed approach that uses input voltage independent diode-connected transistor stacks
for pull-up devices.
the VDDH voltage drop is across Md1-MdN. Assuming that each transistor in the
stack is still biased above threshold voltage and neglecting second order effects, the
VDDH Lp
Ref f = (4.1)
Cox Wp ( VDDH
N Vt )
2
where N is the number of devices in the stack. A small N helps to achieve faster
falling delay by initially providing a smaller Ref f and reducing the time it stays in
subthreshold region before the state is switched. However, a smaller N also leads to
larger leakage when the input is at state 1. Nodes intn and intp are used to drive
output transistors Mno and Mpo, respectively. In this way, the good 0 property of
intn and good 1 property of intp can both be used to reduce static power with little
58
VDDH
Mp2
n1 Mp3
Reduced n2 Md1
Mpo
swing inverter
Mn4
intp
MdN
Mn3 out
n0
Mp1
VDDL
VDDL Mn2
buf_in intn
in Mn1 Mno
(buffer)
Figure 4.4: Proposed level shifter with feedback path for leakage reduction.
the previous paragraph. The idea is to add a PMOS header Mp2 on top of the
pull-up transistor stack. When both the input and the output is high, n0 is low
and n1 is designed at 500mV below VDDH due to the reduced swing inverter. As a
consequence, n2 will be pulled high, strongly turning off Mp2 to save leakage. When
the input switches low, Mn3 needs to be strong enough with a gate voltage of VDDL
to pull down n2. Otherwise, the pull-up transistor stack will not be able to charge
intn and intp, causing functional failure. Assuming that node n2 can be pulled down
very quickly after in goes low, the rise delay is dictated by the size of transistors Md1
through MdN. In this way, we can choose circuit parameters (including the number
of transistors in the stack N, and transistor widths) to match the fall delay with the
59
Mp3
Mn1
Mp2
out
in Mp1
Mn0
all devices are thick oxide I/O devices in this circuit. When in is low, Mp2 and Mp3
can easily pull out to VDDH. When in goes high, it behaves like a reduced swing
driver, which is used to save switching energy for interconnect [99]. Instead of pulling
all the way to 0, out will remain slightly higher than (VDDH-Vtn) in this situation.
The use of the reduced swing inverter helps to match the gate overdrive voltage to the
subthreshold voltage input that is being converted. The inverted output is designed
at 2V for 0.3V to 2.5V voltage conversion. It provides a fast response time for leakage
reduction and still makes Mp3 weak enough compared to Mn3 when there is logic
contention at n2.
The simulated waveforms of our proposed circuit are shown in Fig. 4.6 where N is
5. As in transitions from high to low, n1 is pulled up to VDDH to ensure the diode
transistors stack is able to sink current from VDDH. Therefore, both intp and intn
rise to within 10% of their corresponding steady state voltage in a few hundred ns
(FO4 delay at 0.3V in this technology is 18ns). Node intp has to be able to quickly
turn on the output pull-up transistor Mpo. In this design, the internal node between
Md2 and Md3 is chosen as intp. The tradeoff in the selection of this node is the
60
3.0
out
2.0 intn n1
(V)
intp
1.0
in
0.0
t(s)
leakage power since intp will never reach VDDH. When in goes high again, n1 drops
from VDDH to turn off Mp2 in order to save leakage power in this state. Nodes intp
as follows:
Use minimal transistor dimensions for Mp1 to minimize the intrinsic loading on
node intn.
Determine the size of Mp2 based on leakage current constraints when the input
is at state 1.
Size Mn1 to meet the target leakage current at state 0 and the rise delay re-
quirements.
Choose N (the number of stacked PMOS devices) by calculating the fall delay
Verify that the pull-down strength of Mn3 is always stronger than the pull-up
strength of Mp3 at process corners.
61
2.4 40
Delay
2.3 Power
35
2.1 30
2
1.9 25
1.8
20
1.7
1.6 15
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Wp (m)
Figure 4.7: Sizing of diode-connected stacked PMOS (Wp) versus gate delay and power dissipation.
Among these steps, the sizing of transistors Md1 through MdN is the most difficult
to determine analytically since the operating range is mostly between the subthreshold
operating frequency of the level shifter. N is chosen to be 5 and the width of transistor
Mn1 is 5m. When Wp is small, gate delay is dominated by the fall delay. Increasing
Wp both decreases the fall delay by reducing the effective pull-up resistance and
increases the rise delay as the parasitic capacitance also grows. To illustrate a typical
case for subthreshold operation at 300mV, the circuit is running at 100 FO4 delay with
an activity factor of 0.1. The cross-bar current dominates other leakage sources in
this scenario. Therefore, as Wp increases the power consumption decreases due to the
faster rising transition of internal nodes indp and indn, despite the fact that parasitic
capacitances also rise. As expected, power dissipation saturates to a certain value as
Wp becomes large and finally will start rising as parasitic capacitances dominate.
62
20
10
0
-20 0 20 40 60 80 100
Temp (C)
(a)
100
Power dissipation (nW)
Conventional
80 Proposed
60
40
20
0
-20 0 20 40 60 80 100
Temp (C)
(b)
Figure 4.8: Comparison of level shifters. (a) Gate delay, (b) Power consumption.
In this section, we compare the performance of the proposed level shifter to the con-
ventional design in terms of delay, power, robustness, and area. The target output
voltage VDDH is 2.5V, which is being converted to from a subthreshold voltage of
300mV. The simulations are conducted using commercial 0.13m CMOS logic tech-
nology. Power values do not include the switching power that drives a physical package
pin. As was explained in Sec. 4.2, the W/L of pull-up transistors are 0.36m/5m
for the conventional level shifter.
We first examine the gate delay and power dissipation of both circuits at different
temperatures. Worst-case corner (fast PMOS and slow NMOS) is applied for all
transistors. In Fig. 4.8a, gate delay is plotted with respect to temperature. Both
circuits can operate sufficiently fast (roughly 5 FO4 delays) above room temperature.
However, the conventional circuit runs much slower at low temperatures and even
63
fails to function completely below -10 C. This is due to the pull-down NMOS devices
other hand, the gate delay of our proposed circuit remains almost constant in terms
of FO4 delays across temperature. The power dissipation of the level converters are
calculated with a period of 5000 FO4 inverter delays. Fig. 4.8b clearly shows that the
proposed circuit has lower power than the conventional design. The first reason is due
to the fact that the conventional circuit is slower, making it more susceptible to cross-
bar current. In addition, large NMOS sizes are required in the DCVS case due to
the difficulty in low temperature conversion and result in large parasitic capacitances.
variation in gate delay when converting from 0.3V to 2.5V (Fig. 4.9). Process related
parameters such as Vt and geometry are the sweeping factors in this setup. Both level
shifters are configured the same way as the experiment in the previous paragraph. The
and of conventional level shifter are 3.1 and 2.77 in terms of FO4 inverter delay.
On the other hand, the and of our proposed level shifter are 2.63 and 1.13 FO4
delays, respectively. Although the proposed level shifter is smaller in total area and
has a more complicated circuit structure, it is still less affected by process variations.
The reason is that our proposed circuits with the diode-connected pull-up stack has
less contention at the beginning of state switching compared to the conventional one.
In other words, speed penalty caused by contention has less impact on our circuits
when process parameters vary.
Table 4.1 summarizes the circuit parameters of the conventional and proposed
level shifters at worst-case process corner and room temperature. These values do
not include the input buffer to the level shifters to simplify the comparison. Therefore,
64
500
Proposed
Conventional
400
Number of counts
300
200
100
0
0 1 2 3 4 5 6 8
Gate delay (# of FO4)
Figure 4.9: Monte Carlo simulation results showing gate delay variation across process spread.
the power consumption of the conventional one is underestimated due to its larger
input capacitances. The conventional level shifter suffers from area (calculated as the
sum of transistor areas) and power penalties in order to increase the driving strength
of pull-down transistors especially at low temperatures. The routing cost and irregular
structure of the proposed circuit will reduce the area difference in physical design,
4.5 Conclusion
In this work, we proposed a subthreshold to I/O voltage level shifter that relies
on a pull-up transistor stack independent of the input voltage. Through a feedback
65
mechanism that reduces leakage when the input is high, it also improves the transition
speed of the circuit. The proposed level shifter was compared to the conventional
DCVS-type level shifter, and shows advantages in power dissipation, gate delay and
total area. The proposed level shifter is also capable of converting a 0.3V incoming
signal to 2.5V output robustly across process variation according to Monte Carlo
SPICE simulations.
66
CHAPTER V
5.1 Introduction
Miniature self-sustaining sensor nodes have become a viable option with silicon tech-
nology scaling. Such a system can be easily attached to, or implanted into, various
objects for applications such as periodic sensing and recording of temperature or bio-
chemical data. With energy minimization techniques [80, 100, 101] and aggressive
power gating, these systems can potentially operate using a micro-fabricated battery
with comparable form factor over an extended period of time [56]. To maintain the
form factor for such systems, data read-out can be challenging from two perspec-
tives. First, hardware overhead has to be kept low such that the size of the system is
not dominated by the communication components. Secondly, power consumption and
instantaneous power spikes during read-out will determine the size of battery and pas-
for the sensor, but this generally requires an external coil on a centimeter scale,
significantly limiting the application space [102]. Near-field pulse signaling through
inductive coupling has been reported to achieve high bandwidth using integrated in-
ductors while also being energy efficient [103, 104]. However, the power required for
sending data back from the sensor chip still needs to be supplied externally.
67
Capacitive coupling is another favored candidate for near field communication
due to its high bandwidth and low energy consumption capable of achieving less
than 0.1pJ/b [105, 106]. Simultaneous data and power transmission has also been
coupling is inversely proportional to the distance between the pads, which makes the
robustness of such scheme very susceptible to misalignment. Pad alignment of about
3m, achieved by markings on the edge of a scriber line, was reported [63]. Vernier
bar patterns were also proposed to electrically detect the alignment between chips so
that alignment error down to 1.4m can be detected [64]. The accuracy of alignment
can be further improved by dividing each transmit plate into smaller microplates. By
driving the appropriate microplates with embedded switching circuits, the mechanical
ported in [109], the analog output by the alignment circuits is able to differentiate
alignment error down to 0.1m.
In this work, we propose a capacitive coupling based method where the communi-
cation module is fully integrated with the sensor node on a sub-mm scale. The goal
is to provide a convenient read-out mechanism without the aid of a optical micro-
scope and positioning by micromanipulator. We use the terminology sensor chip (SC)
and data retrieval (DR) chip to indicate corresponding concepts referred to as the
transponder and interrogator in RFID systems. By dividing the data retrieval pads
into microplates, individual microplates can be grouped together to establish power
and signal channels after alignment is known. A digital alignment circuit is used to
translate misalignment into digital output so that the configuration can be computed
externally. In Sec. 5.2, the geometric design issues associated with capacitive coupling
68
will be discussed. The proposed system architecture will be shown in Sec. 5.3 along
with circuits blocks. Several design aspects will be highlighted in Sec. 5.4 with silicon
measurement results. Sec. 5.5 concludes the work.
Since our goal is to achieve chip to chip communication without fine tuning the
alignment, the pad pattern is designed considering the electrical field in the worst
case due to misalignment. Thus, we first seek the worst case scenario when stacking
two chips face-to-face. There are two assumptions in the following analysis:
1. The data retrieval chip is composed of a large array of square pads so that the
sensor chip is completely covered by the pad array.
2. The distance between the coupling pads is not a function of location, i.e. the
thickness of passivation layer is fixed.
The first assumption requires that the data retrieval array is large enough such that
the sensor chip can be easily dropped on top of it while the entire sensor chip is still
within the boundary of the receiver array. By designing the receiver array two times
larger than the sensor chip, this assumption can be satisfied without fine positioning
by micromanipulator. The second assumption relies on the uniformity of the final
passivation, which can be affected by many issues such as dust on the surface of
the chip. In general, this is not a deterministic process from the circuit designers
perspective of view. Therefore, it is reasonable to make this assumption at design
time.
Both the power and signal channels are required to be established during com-
munication. For the sensor chip, the power pads will be allocated with as much area
as possible to maximize the charge that can be harvested. On the other hand, sig-
nal pad sizing presents a tradeoff between capacitive loading and coupling factor. A
69
WRX
WTX
B A
r2 r1
Wsep C r3
r4
Figure 5.1: The relative position of the receiver array and sensor signal pad when WRX <= 2WT X .
larger signal pad means greater energy consumption at each transition, while reduc-
ing the size decreases the sensible voltage seen by the data retrieval pad given fixed
parasitic capacitances. For the data retrieval array, the pads are placed as close as
possible so that the uncovered area can be minimized. The spacing between pads
is typically constrained by two DRC (Design Rule Check) rules in advanced VLSI
technologies: the metal density rule that is allowed in the process and the minimum
spacing between top metal layers. In the following analysis, the separation between
data retrieval pads is fixed at 5m according to the CMOS process we use. Ideally,
dividing the pads into a smaller dimension is helpful for a finer configuration. In
reality, however, the minimum size of the pads will be decided by the area of the
functional blocks associated with each pad.
Fig. 5.1 illustrates the worst case condition given that all the signal pads are square
in this work. is defined as the offset angle between the two chips. Here WRX , WT X ,
70
Wsep are the width of data retrieval pads, sensor signal pad and the separation between
the data retrieval pads, respectively. Since the pads are all squares, from symmetry
is only considered from [0, 4 ]. In this case WRX <= 2WT X , and polygon ABCD
represents the area of interest, which is used to calculate the coupling capacitance of
the pads. Line segments r1 through r4 are used to represent the length of the sides.
Coupling capacitance is the sum of the parallel plate capacitance and the fringing
capacitance
where Cpp is the parallel plate capacitance per unit area, Cf r,RX is the fringing capac-
itance per unit length of the data retrieval pads and Cf r,T X is the fringing capacitance
per unit length of the sensor pads. Using trigonometric function, r1 through r4 can
be written as:
WT X Wsep
r1 = sec (1 tan ) (5.2)
2 2
WT X Wsep
r2 = (1 tan ) sec (5.3)
2 2
WT X Wsep
r3 = sec (1 + tan ) (5.4)
2 2
WT X Wsep
r4 = (1 + tan ) sec (5.5)
2 2
It is noted that the above expressions are physically meaningful only when vertex C is
still inside the data retrieval pad. In other words, they are valid when <= t where
t is the angle when vertex B and vertex C overlap. Combining Eq. 5.2 through Eq.
71
5.5, the area of polygon ABCD can be obtained and simplified as
1
Area of polygon ABCD = (r1 r3 + r2 r4)
2
1 WT X Wsep
= sec (1 tan )
2 2 2
WT X Wsep
sec (1 + tan )
2 2
1 WT X Wsep
+ (1 tan ) sec )
2 2 2
WT X Wsep
(1 + tan ) sec )
2 2
1 1 2 1
= WT2X + Wsep WT X Wsep sec (5.6)
4 4 2
Cf r,T X because the electric field lines from sidewall DAB are mostly terminated at
the neighboring receiver pads instead of the sensor pad. Cc can then be rewritten as
1 1 2 1
Cc = Cpp WT2X + Wsep WT X Wsep sec
4 4 2
+ Cf r,T X (WT X Wsep sec ) (5.9)
Similar derivations can also be applied to other cases such as > t or when WRX >
With the aid of 3D field solver tools [110], the relationship between WT X and
coupling capacitance can be found as in Fig. 5.2. In the technology used in this work,
72
7
6
3
2
the minimum size of the data retrieval pad is 50m due to the active circuits area.
With WRX and Wsep being 50m and 5m, the simulation results with respect to
least one data retrieval pad no matter where it is located. Further simulation result
shows that the difference between coupling capacitance at different orientations is
within 1%, suggesting that a consistently good coupling ratio can be achieved at
WT X = 150m. To sum up, sensor pads are chosen to be about three times larger
than the receiver pad to maximize coupling in the worst case condition.
In the previous section, only a single pad was considered to transmit a signal from
the sensor. On the other hand, the signal strength can be doubled by implementing
differential signaling. Consider the diagram shown in Fig. 5.3, assuming that the
dimension of the pads are the same as given in Sec. 5.2.1. In this scheme, both Pads
73
Sensor pads
DR array
Figure 5.3: Differential signaling scheme. Pad A (square with slant lines) together with all the other
pads in light gray are used to recover the signal from the sensor.
A and B are required to amplify the differential signal from the sensor pads. Since
the sensor chip can land in any orientation, 15 DR pads along with Pad B have to
be routed into Pad A to make sure that signals from both sensor pads are able to be
Vcouple Ccouple
Cc,single = = (5.10)
Vtran Cgnd + Ccouple
Vcouple and Vtran are the coupled voltage and transmitted amplitude, respectively. For
where Csw is the device loading of the switches that control the destination of coupled
74
signal, N is the number of other pads the pad has to connect to, and Cwire denote
the extra wire loading due to the differential signaling scheme. Assuming Ccouple,1
Ccouple,2 = Ccouple and Cgnd,1 Cgnd,2 = Cgnd , the difference between Cc,single and
Cc,dif f is
Cc,single Cc,dif f
Ccouple 2Ccouple
=
Cgnd + Ccouple Cgnd + ccouple + N Cckt + Cwire
Ccouple N Csw + Cwire Cgnd Ccouple
= (5.12)
Cgnd + Ccouple Cgnd + Ccouple + N Csw + Cwire
In other words, the differential scheme is better than the single-ended scheme only
when the sum of N Csw and Cwire is smaller than the sum of Cgnd and Ccouple .
Cgnd and Ccouple can be estimated from the process and geometry, or more precisely,
through RC extraction tools. For a DR pad that is 50m by each side, Cgnd is
4050fF if the signal and power routing underneath it are restricted to metal 3 or
below. Csw can generally been ignored if, for example, a transmission gate that is four
times as large as the minimum sized transistor is used. Cwire can estimated by the
wire length. Considering 15 extra connections require 150m long metal wiring each
with minimum width, the total wire loading is 150fF assuming isolated wires. Unless
Ccouple is more than two times larger than Cgnd , differential signaling scheme will not
offer any advantage over the single-ended counterpart. In addition to that, complex
wiring in the differential signaling scheme will force wires to be routed at higher
levels of metal and will increase Cgnd as a result. Therefore, single-ended signaling is
implemented in this work. The dimensions of the pads used in data retrieval chip and
sensor chip are summarized in Table. 5.1. Due to fabrication constraints, the actual
footprint of the pads are slightly different from the designed values. For example, the
DR pad size is reduced from 50m to 48m on a side to comply with metal density
rules.
75
Table 5.1: Summary of pad dimensions.
Pad size Pad Spacing Number of
(m) (m) pads
Sensor chip Power: 225 by 225 20 Power: 2
Signal: 150 by 150 Signal: 1
Data retrieval chip 48 by 48 5 400
Fig. 5.4 shows the proposed system diagram for sensor data retrieval. The data
retrieval chip is responsible for sending power and recovering data from the sensor
chip at the same time. Since there is no common reference for both chips, two power
channels are required to send AC power differentially. An AC to DC converter at the
sensor chip side is used to harvest the supply voltage for the sensor. The clock signal
is modulated with the power signals and can be demodulated by the sensor chip,
so no additional channel is needed for synchronization. This also helps to precisely
control the sensing window of the receiver circuit for better noise rejection. A single
signal channel is used to transmit data back to the data retrieval chip as suggested
clkmod
Level converter
Sensor chip
ext_clk Modulator
Voltage
Voltage Demod-
Alignment douber
limiter ulator
detector stages VDDn
(n=1:10)
align_out
Level converter
Figure 5.4: System architecture for the proposed data retrieval mechanism.
76
DR cell
Level shifter
Clk
Pad Modulator
generator
Alignment Serial
detector Interface
N=20
Differential DR
ext_clk
amplifier controller
N=20
While the sensor chip has three pads dedicated to individual channels, the data
retrieval chip contains an array of 20 by 20 cells that each can be assigned as the
signal channel or can be clustered as a power channel as needed (Fig. 5.5). Each cell
is tied to a corresponding DR pad, which serves as communication channels that are
2. Power transmission. The pad is driven by level converters with elevated ampli-
tude to strengthen the signal that is able to reach the sensor pads.
3. Signal recovery. The capacitively coupled signal is first amplified and then
decoded by the DR controller.
77
PAD SYS_CLK
RING_CLK
ENABLE
synchronous
counter
RING_CLK
ENABLE
asynchronous
SYS_CLK
OUT[N:0]
counter OUT[N:0] 1 2 3 4 5 6 6
(a) (b)
Figure 5.6: Alignment detector. (a) block diagram, (b) operation waveform.
After sensor chip is dropped on top of the data retrieval chip, the alignment detec-
tor shown in Fig. 5.6(a) is used to determine the best configuration. The alignment
detector is essentially an ring oscillator based capacitance-to-digital converter that
translates capacitive loading for each DR pad. The ring oscillator converts the ca-
pacitance into frequency information represented by RING CLK . Then RING CLK
is used to increment the synchronous counter during a given period of time when
ENABLE is high (defined by SYS CLK ). The operation waveform is shown in Fig.
5.6(b). To adapt for different speed of ring oscillators across the DR array, a one time
zero-calibration method needs to be implemented (Sec. 5.4.2). Although the output
has to be limited to 9 bits to physically fit underneath each cell, the circuits can be
operated in cyclic mode. This means that the alignment information is maintained
even though the counter overflows and the carry-out information is discarded. We will
revisit the alignment detection issue in Sec. 5.4.2 to explain how useful information
can be extracted efficiently for the whole data retrieval array.
For the power transmission drivers, traditional DCVS (differential cascode voltage
switch) type level converters are used. Such level converters can easily operate at
an output amplitude that is three times higher than the nominal supply voltage
within our interested carrier frequency of tens of MHzs. The clock signal is globally
distributed to every cell and is locally inverted if an out-of-phase signal is required.
In an effort to reduce parasitic capacitance for the DR pads, we restrict the routing
78
layers to metal 3 and below only. Uniform clock wire routing is achieved throughout
the DR array by implementing the clock driver all from one side of each row. This
provides a feasible routing scheme compared to an H-tree type clock network, at the
expense of larger clock skew. The problem of clock skew will be discussed in Sec. 5.4
as it limits the carrier frequency for power harvesting.
Figure. 5.7 shows the data retrieval mechanism. Two differential amplifiers are
used to detect both the rising and falling transitions. The input node (Vin ) is
precharged high before the clock goes low to sensitize the amplifiers. Immediately
after the clock fires, either Vlh or Vhl will be pulled down depending on the direction
of the coupled signal. The high-to-low transition triggers the 400-to-1 AND tree gate
that simultaneously monitors all DR pads and results in an UP/DN signal for the
one-bit saturation counter that determines the data output. The difference between
Vdc and Vdc1 /Vdc2 is designed to be 50mV to mitigate input offset voltage and the
impact of noise. The timing diagram in Fig. 5.8 shows that the operation is syn-
chronized to ext clk. The signal transition only happens after the negative edge of
ext clk and is latched at the positive edge. In this scheme, signal preset is used to
both precharge Vin and enable the decoder to detect switching events. In other words,
the impact of noise on the floating node Vin can be minimized by properly control of
the pulse width of preset. The pulse width of the signal preset and the delay from
ext clk can both be programmable through delay lines.
The main building block of the sensor chip is the AC to DC conversion circuit shown
in Fig. 5.9. The AC coupled inputs Vin and Vinn are rectified into DC supply voltages
and Vn2 to Vdc1 . After each input transition at Vin and Vinn , Vdc2 is charged with
79
SC pad Vlh Vhl
Vdc1 Vin Vdc2
disable
DR pad
Vbias Vbias
preset
Vdc
data_out
Cell 1
Cell 1 UP 1-bit
Cell 1 UP/DN
Cell 1 counter
Cell 1
Cell 1
Vlh
Cell 1
Cell 1 DN
Vhl Controlled
delay unit
400-to-1
AND gate DR Controller
preset
Figure 5.7: Data retrieval with capacitive coupled input and periodic precharge to sensitize the
amplifier.
ext_clk
preset
Vin
data_out
Figure 5.8: Timing diagram showing the operation of data retrieval circuits when switching happens.
80
VDD4 (to LFSR)
Vin VDD10
(to level
converter)
DR pad SC pad Voltage VDD1 Voltage VDD2 Voltage
doubler doubler doubler
Vinn
Vn2
Vin
Vout
1.6V
md2 md4
Vinn
Vn1
a potential equals to Vdc1 +Vin by the cross-coupled PMOS md3 and md4, where
Vin is the coupled amplitude for the sensor chip. Although replacing md1 and md2
with cross-coupled NMOS transistors are advantageous in reducing turn on voltage at
the first few stages, it is not feasible for stages with higher voltage inputs. The reason
is that without a triple well or deep NWELL process, body effect can eventually result
in large NMOS threshold voltage. At the output of the 10th stage, a voltage limiter
is similar to the mode selector in [85]. In this work, a shunt transistor m10 is used
to discharge current from Vin (VDD10 in Fig. 5.9) to ground when Vin is above a
certain voltage level. To help explain how the voltage is set in hardware, the open-
loop voltage transfer curve in Fig. 5.10(b) is used. Node n2 will remain close to VSS
before Vin exceeds 2V (where V is the turn-on voltage of the diode-connected
81
Vin
R2
m6 m7 (1M)
m4 m3
2.0
n3
n3 m9
(V)
m5 m10 1.0 n2
n1 n1
n2 m1 m2
R1
0.0
(1M) m8
0.0 1.0 2.0 3.0
(v)
(a) (b)
Figure 5.10: Voltage limiter. (a) circuits diagram, (b) open loop voltage transfer curve.
transistors m5 and m6). When Vin increases beyond 2V , the excessive voltage drop
will occur mainly across R1, and thus the voltage on n2 begins to track the supply
voltage. On the other hand, voltage n1 will be limited at 2V once the supply voltage
is higher than this value. By comparing n1 and n2, the amplifier output n3 will begin
to turn on m10 strongly when the supply voltage is greater than 1.6V. Since each
voltage doubler stage is identical, intermediate voltage levels VDD1 through VDD10
are inherently generated. In this work, we use VDD4 (0.65V) to supply the voltage for
a 4-bit LFSR circuit to generate a data stream with low power consumption and then
outputs are zero. This is relatively easy for the LFSR circuit used in this work to
represent logic, since the situation can be avoided by using a NAND4 gate to force
advancing the state of LFSR if it starts at the deadlock state.
For clock synchronization, the system clock is amplitude modulated with carrier
frequency fc using the same power channels. An envelope detector is used to demod-
ulate the clock signal as shown in Fig. 5.11. The differential AC input signal is first
rectified and then filtered by a RC low pass filter. Since the input amplitude varies
82
VDD4
VDD2 M0
Vin
M1 M5 M6 VDD1
M3
M2 M4 clk_out
C1 R1 M7 M8
Vin
due to several factors such as the transmitting amplitude and the distance between
pads, a level converter is required so that the demodulated clock is able to drive
the logic blocks at 0.65V. For robust level conversion for subthreshold input voltage,
a single stage comparator is implemented. In this circuit, VDD1 and VDD2 from
the voltage doubler stages are used as the reference voltage and bias voltage for the
comparator, respectively. In this way, as long as the rectified voltage is higher than
VDD1 the demodulator is able to work properly.
A test chip was fabricated in 0.13m CMOS technology. The die photo is shown in
Fig. 5.12. The active die area consumed by the sensor chip is 0.014mm2 . The size of
the data retrieval array is 1.1mm 1.1mm while the total size of the DR controller
and clock generator is 0.08mm2 . During measurement, the data retrieval chip is
packaged and mounted on a PCB. The sensor chip is diced to 0.5mm by 0.5mm, and
is manually dropped on top of the data retrieval array without precise positioning.
Once the two chips are stacked, we first perform alignment detection and scan out
83
0.5mm 1.1mm
Sensor chip
0.5mm
(later diced)
20 by 20
DR Data retrieval
array
controller
1.1mm
the information to be externally processed by a PC. The PC will match the data to
a known pattern and determine the channel that a particular pad should be assigned
to for the DR array. Alternatively, the computations can also be processed on chip if
an ALU (Arithmetic Logic Unit) is available. Data clock fdata is generated externally
by a function generator and sent along with the decoded data to a PC-based logic
analyzer to compute BER (Bit Error Rate).
We have seen that alignment information can be obtained using the ring oscillators to
extract different coupling capacitances seen by each DR pad. To reduce the conver-
sion time, we would like to run as many alignment detectors in parallel as possible.
However, activating all alignment detectors at the same time will yield results that do
not contain any alignment information. This can be explained by Fig. 5.13 showing
the parasitic components of the system when two chips are put in a stack. For DR
pads P1 through P5, the parasitic capacitors include coupling capacitors Cc1 through
84
Sensor Chip
Substrate
Rs
Cs
P1 P2 P3 P4 P5
Cp1 Cp2 Cp3 Cp4
Cg1 Cg2 Cg3 Cg4 Cg5
DR Array
Figure 5.13: Parasitic components for the system of two chips in a stack.
Cc5 , ground capacitors Cg1 through Cg5 and capacitors Cp1 through Cp4 that exist
between pads. By simultaneously oscillating all the pads at the same time, the cou-
pling capacitances will be blocked from the AC ground and therefore the location of
the sensor chip will not have any impact on the alignment detectors. In addition,
since the impedance of Cp1 through Cp4 is low at high frequency the whole system
will oscillate at the same frequency. To solve this problem, at least one neighboring
pad should be grounded for any given oscillating pad. For example, P2 and P4 are
grounded when P1, P3 and P5 are running to provide a close return AC path to
ground.
From this analysis, we can develop the alignment detection algorithm in a sys-
tematic way (Fig. 5.14). The DR array is first divided into four quadrants and only
one quadrant is activated at a time. By repeating the capacitance-to-digital conver-
sion four times the results can be merged into a two dimensional table. The table
represents a set of zero calibration values for the specific data retrieval chip. The
same procedure needs to be repeated again every time the sensor is dropped on top
85
Increase i Power
pads
2 1 2 1 2 2
3 4 3 4 Scan in 3 4 3 4
i>4
2 1 2 1 setups 2 2
Signal pads
3 4 3 4 3 4 3 4
Divide DR array Run alignment detector
into 4 zones Zi, while rounding Reconfigure pads
i=1..4 neighboring cells
1 1 3 2
0 0 1 -1
-2 -1 1 0
3 0 -1 1
Repeat the
procedure after 8 15 17 18
the sensor chip is 3 0 1 14
dropped on top of -
5 -1 2 0
the DR chip Map the digital outputs
14 2 -1 1
to 2D contour plot
of the DR array to generate another 2D table that represents the actual alignment.
A 2D contour plot shown on the bottom right of Fig. 5.14 can be obtained by simply
subtracting values from the 2D tables. Each pixel of the plot indicates the value of
excessive coupling capacitance due to the existence of the sensor chip. From the plot,
both the outline of the sensor chip and the position of the power pads and signal pad
can be clearly seen. With the digitized alignment information, the clusters for power
pads and signal pad can be computed by comparing the results with a known pattern
coming from the chip geometry. As a result, the channels for power transmission and
signal reception can be identified and reconfigured properly every time regardless of
the position and orientation of the sensor chip.
Measured waveforms of the test chip are shown in Fig. 5.15. At a clock frequency of
1.1MHz, the decoded output shows the data sequence that repeats every 15 cycles. We
define the achievable operating frequency (or data rate, since there is only one serial
data bit) of this system to be when no errors occur in 109 cycles. Achievable data rate
86
2.5
Voltage (V)
1.5
sequence repeated every 15 cycles
1
0.5
-0.5
0 5 10 15 20 25
Time (s)
3
2.5
2
Voltage (V)
1.5
1
0.5
0
-0.5
-1
-1.5
-2
0 5 10 15 20 25
Time (s)
Figure 5.15: Decoded data waveform showing pseudo random bit sequences up to 15 unrepeated
cycles.
is measured with different transmitting amplitude (Ain ) and carrier frequency fc . The
results are shown in Fig. 5.16. I/O devices are used for power transmission so Ain can
be as high as 3.3V in this 0.13m technology. The system starts successfully receiving
sequence of data with BER less than 109 when Ain exceeds 1.8V. Estimated working
distance is also shown on the second x-axis of the fdata plot. Based on measurement
data, I/O devices would not be needed if the passivation thickness were reduced by 1/3
from its 5.6m original value (e.g., by further polishing). Increasing Ain monotonically
increases the data rate as expected. At 3V, a data rate as high as 2.5MHz can be
achieved with fc of 216MHz. However, it is observed that raising fc above 150MHz
in fact reduces fdata . The reason is that at higher frequencies the clock skew between
different cells can cause phase offset for signals in the same power cluster, eventually
resulting in a reduction of electric field. Since targeted data working sets for sensor
nodes are on the order of kb [111, 112], the achievable data rate is sufficient for
87
Estimated working distance (m)
12 14 16 18
2.5
1.5
1
fc=54MHz
0.5 fc=108MHz
fc=216MHz
fc=432MHz
0
1.8 2 2.2 2.4 2.6 2.8 3
Transmitting Amplitude (V)
Figure 5.16: Operating frequency versus transmitting amplitude and carrier frequency with esti-
mated working distance showing on the second x-axis.
slightly forward-biased after each transition for the rectifier circuit shown in Fig. 5.9.
Therefore, the charge that can be harvested begins to saturate and results in lower
rectifier efficiency. 2nJ/bit is the lowest energy achieved by the proposed system.
Fig. 5.18(a) shows BER with respect to the window size (Tw ), which is related to
the modulated clock for power transmission. Tw is defined as the period when the
output clkmod (Fig. 5.18(b)) remains at 0. It is required for clock synchronization
purpose as the sensor chip needs to demodulate the clock signal and send back the
data within the time when Tw is low. This sets the lower bound for Tw because of the
demodulators response time. From Fig. 5.18(a), the bathtub shape of BER suggests
that there is also an upper bound for Tw . The reason is that the charge that can be
harvested by the sensor chip reduces as Tw increases for a given period of time. In
general, we need to fine tune Tw within a range of tens of ns for higher data rate. On
88
40
fc=54MHz
35 fc=108MHz
fc=216MHz
30 fc=432MHz
Energy (nJ/bit)
25
20
15
10
0
2 2.2 2.4 2.6 2.8 3
Transmitting amplitude (V)
Figure 5.17: Energy consumption versus transmitting amplitude and carrier frequency.
100
900kHz
10-1 920kHz
940kHz
10-2 960kHz
Bit error rate
980kHz
10-3
10-4
10-5
fc (clk) clkmod
10-6
Tw
10-7
340 360 380 400 420 440 460 480 500
Tw (ns)
fdata (ext_clk)
(a) (b)
Figure 5.18: (a) Tw versus BER, (b) Clock modulation circuit that defines Tw .
the other hand, since data rates close to MHz may be excessive for the application,
the design requirement for Tw can be relaxed by simply reducing the transmitting
data rate.
Fig. 5.19 shows the data rate vs. BER for 10 random locations at which the
sensor was dropped. The alignment 2D contour plots (8 out of 10 locations) are
89
18 18 18 18
16 16 16 16
14 14 14 14
12 12 12 12
10 10 10 10
8 8 8 8
6 6 6 6
4 4 4 4
2 2 2 2
2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18
18
16
100 18
16
14
10-1 14
12 12
10 10-2 10
2 4 6 8 10 12 14 16 18
10-5 2 4 6 8 10 12 14 16 18
18 18
16 10-6 16
14 14
12 10 -7
12
10 900 950 1000 1050 1100 10
8
data rate (kbps) 8
6 6
4 4
2 2
2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18
Figure 5.19: Data rate versus BER with 10 random position testing.
also shown for the corresponding BER curves. Some regions yield a lower data rate
mainly because the electric field between the pads is not as strong as the others. The
results are distributed into two distinct regions of the plot, however, there is no clear
correlation between the position and the achievable data rate. Non-uniform surface
of the passivation layer may be one cause for the discrepancy. These results verify
that the proposed system adapts to different locations and orientations without the
5.5 Conclusions
In this work, we presented a near field data retrieval system using capacitive coupling.
To alleviate the problem of chip misalignment, an alignment detection and pad recon-
figuration method was proposed. The data retrieval pad is divided into an array of
micropads, and each micropads can be assigned for sending power or receiving data
depending on the alignment information. From the chip measurement results, it was
90
shown that data rate higher than 900kbps can be achieved across 10 random posi-
tioning tests. For small form factor sensor systems, this work provides the advantage
of little hardware overhead and a flexible operating frequency that is not limited by
91
CHAPTER VI
6.1 Introduction
Radio frequency identification (RFID) is widely used among various areas includ-
ing personal identification, public transportations, and many more. For near field
RFID transponder, the range of operation can vary from a few meters to less than
10cm depending on the operating frequency [113, 114]. Because of the cheap cost,
near field applications usually adopt passive RFID tag that does not rely on any
internal power supply. While harvesting power from the reader (interrogator), the
transponder transmits data back through backscattering [102, 115, 116]. The con-
cept of backscattering is shown on the left of Fig. 6.1. The data is modulated by
changing the load impedance that is seen by the incoming AC signal of the transpon-
der. Changing Zm leads to phase modulation (PM) or amplitude modulation (AM)
away from the carrier signal. One of the subcarrier frequencies can be downconverted
at the reader to decode the data. In this scheme, there are two main limitations for
the reader. First, the inductors should be designed with high quality factor (Q) to
maximize the energy range. However, higher Q damps the subcarrier frequency and
weakens the sensible amplitude. Secondly, the transmitter is continuously switching
92
Back
Scattering Pulse signaling
Power
Zm
Data
Figure 6.1: Comparison of transponder data encoding with back scattering and pulse signaling.
in order to power the transponder which causes significant amount of noise by the
oscillator. Therefore, the subcarrier signal should be no more than 100dB lower than
vice, for example, an intraocular pressure sensor that helps glaucoma detection and
diagnosis. The required range of operation for such an application can be as close
as a few mms. At the same time, the form factor should also be small enough con-
sidering the intrusiveness to the body. In this work, we propose a time-multiplexing
inductive coupling scheme in an effort to alleviate the design limitations on tradi-
tional backscattering method. The concept is shown on the right of Fig. 6.1. Instead
of sending power continuously to the transponder, a small gap is created so that the
transponder can send the uplink signals through pulse signaling using the same in-
ductor. During the same period of time, the transponder is also synchronized to the
reader by an envelope detector. The oscillator of the reader can be turned off so that
the noise floor of the receiver can be greatly reduced.Pulse signaling is widely used in
ultra wide band (UWB) communications [66] and recently used by proximity induc-
tive coupling to achieve high data rate and low energy operation [67, 103, 117]. By
sending the pulses at the resonant frequency, the amplitude that reaches the receiver
93
input is maximized with a given power constraint. A key to the scheme is to design
the transponder and the reader with identical resonant frequency with the maximal
available Q. In Sec. 6.2, the system architecture along with design of the circuits will
be shown. Test chip and silicon measurement results will be presented in Sec. 6.3.
Conclusion will be drawn in Sec. 6.4.
The proposed system is shown in Fig. 6.2. The reader sends a continuous power signal
that is modulated by the clock. At the transponder side, a power harvesting module
the frequency fVCO from the input frequency fin . The clock for synchronization is
demodulated from the notch of the continuous wave. A timing controller keeps a
small state machine to control the behavior of the transponder and ensures that the
timing is precisely followed.
Reader Transponder
fin
clk PFD CP LPF
ext_clk Modulator
gen.
fVCO
PLL
Data Power Timing
data_out Demod.
decoder module Controller
Figure 6.2: System architecture for the proposed pulse signaling method.
94
Table 6.1: Summary of the integrated inductor.
Target distance 2mm
Metal Width 12m
Number of turns 13
Metal spacing 5m
Shielding M1 patterned ground
Hollowness 0.79
DC inductance 163nH
Natural frequency 294MHz
Q @ 200MHz 8.62
substrate. In this work, we restrict the size of the inductor to 1mm by 1mm for both
the reader and the transponder. It is reasonable to assume that in our pulse signaling
scheme, the limiting factor for the range of operation is the power that can be har-
vested by the transponder. To optimize the transponder supply power, the geometry
of the inductor should be optimized according to the target distance of 2mm. The
power available for the transponder is a function of self inductance, coupling coeffi-
cient, the resistive loading and the operating frequency at a desirable Q. However,
the resistive loading is not a linear function of input amplitude due to the nonlinear
transistors. Instead of trying to solve it analytically, a inductor simulation tool called
ASITIC is used to calculate the S-parameters and the coupling coefficient k of the in-
ductor. The geometries of the inductor are constrained by the process and metal fills
are neglected during the simulation. The resulting S-parameters are transformed into
discrete R, L and C values that can be used in the circuit simulators like HSPICE.
The optimized parameters for the integrated inductor is shown in Table. 6.1. The
95
6.2.2 Transponder circuits
In this section, the building blocks of the transponder will be presented. The goal of
the power harvesting module is to perform the following three tasks:
AC to DC conversion.
Signals the controller when the rectified voltage is below certain level.
Voltage regulation.
The block diagram of the power harvesting module is shown in Fig. 6.3. An AC to DC
conversion is accomplished with the same method that used in Sec. 5.3. Instead of
using 10 stages of voltage doublers, 5 stages are used in this work since the minimum
input amplitude that is required is higher so the power harvesting module will not
benefit from having more stages. The voltage limiter clamps the supply voltage at
2V which will be used to supply the voltage regulators and the output drivers. As
explained in Sec. 5.3, voltage Vn3 will suppress voltage Vn6 when the voltage limiter
approaches the designed voltage of 2V. When the supply voltage VDD5 reduces from
2V, Vn3 reduces while Vn6 should remain unchanged until VDD5 is lower than one
diode drop. Implementing a Schmitt trigger with Vn6 as the supply voltage and Vn3 as
the input voltage, the state of pump enable will be flipped when Vn3 becomes lower
than 1.2V. The active high pump enable signal provides an important information
since it happens when the power harvesting module is unable to sustain the power
of the PLL. The controller can use the information to determine the time when PLL
should be disabled from further draining more power. Two voltage regulators are
used in this work. One is to supply the voltage controlled oscillator (VCO) of the
PLL and the other one is to supply the rest of the chip. With a dedicate power supply
for the VCO, the supply noise from the digital controller can be largely reduced.
96
VDD5
VDDR2
VDD1 Voltage VDDR1
Voltage
Voltage
doubler
Voltage
doubler Voltage
Voltage
doubler Voltage Voltage
doubler regulator 2
doubler limiter regulator 1
to controller
to VCO
M1 M7 R2
M2 M8 M9 M4
Ms n6 VDDR2
M3 M5
M10 M11 LC pump_enable
n3 n6
R1 M6 VDD1
Figure 6.3: Power harvesting module with the schematic of the voltage limiter.
quadratic saving of the dynamic power. On the other hand, it still needs to meet
the requirement of the timing critical elements, in this case, the input buffer and the
phase frequency detector of the PLL. Based on our simulation results, the minimum
operating voltage for the PLL with 200MHz input is 670mV. Considering the margins
for process variations, the output voltage is designed at 770mV. The voltage regulator
circuit is shown in Fig. 6.4. It includes a voltage reference stage, a start-up stage and
where VT is the thermal voltage equals to kT /q. If biased at least three thermal volt-
ages above ground potential, the Vds dependency of transistor M3 can be neglected.
97
Voltage reference Startup circuit Output buffer
M5 M4 M7
M15 M16 M17
vn1 M8
vref M6
M9 M12 M13 vout
R1
vn0 M18
M10
M2 M3 M14 M19
M11
Vn0 becomes
" !#
W 4 L5 IR1
Vn0 = Vth + VT ln (6.2)
L4 W 5 ef f Cox W 3
V2
L3 T
where IR1 can be decided by the first term of the RHS of Eq. 6.1 and the resistance
M13) and (M15, M16) are made unbalanced so that the output voltage level can
be shifted to a higher voltage compared to Vref . In addition to that, unbalanced
sizing also provides temperature compensation [92]. The bias current of the output
stage is also controlled by Vref . Transistors M18 and M19 are used to provide decent
current loading in order to stabilize the output. The start-up circuit is to assist the
transient response of the voltage reference stage before it achieves the steady state.
The current of transistors M4 and M5 need to be large enough so that the self-biasing
mechanism starts to take over. The voltage reference circuit has two operating point:
the desired state with Iref flowing and the undesired state where the current is near
98
0. As the supply voltage ramp up, voltage Vn1 will be pulled up to near Vdd while
Vn0 will remain close to 0. Vn0 will eventually reach the desired state because of
leakage. However, it can be a slow process which largely depends on the supply
voltage. The startup stage speeds up the process. When the current of transistor
M7 is low, transistor M6 is turned on to shunt voltages Vn1 and Vref . The charge is
redistributed between Vn1 and Vref so that Vref rises faster toward the desired voltage.
Simulation results show 1s for Vref to attain the steady state which is at least 20
times better than the circuit without the startup stage.
The block diagram of the PLL is shown in Fig. 6.5. A type II PLL is used for
locking into a wider range of incoming frequency [118]. For power saving purpose, each
block of the PLL can be individually turned off. The PLL operates at a locking mode
and a signal pulsing mode depending on whether the voltage controlled oscillator
(VCO) is the only active circuits. During the locking mode, the PLL operates in a
negative feedback loop and every block is activated. A pulsing mode is used when
the blocks associated with VDDR1 are turned off so that the VCO is running directly
off of the voltage stored in the low-passed filter (LPF). A second order loop filter
is implemented to reduce noise injection at every clock cycle and to still maintain a
stabilized loop [119]. As a rule of thumb, the loop bandwidth has to be at least 5
times smaller than the reference frequency to avoid instability [120]. With a reference
frequency of 200MHz in this work, 20MHz of loop bandwidth is designed to provide
reasonable tradeoff between stability and transient response. With the polysilicon
resistor and the MOS capacitors, the total area of the integrated loop filter is 180m2
major concern for the phase frequency detector. We use the true single phase clock
(TSPC) D flip-flops to replace the static flip-flops in the phase frequency detector
[121]. The circuit diagram of the TSPC D flip-flop is shown in Fig. 6.6(a). When
99
pll_en Powered by VDDR1
Powered by VDDR2
up
in Amplifier Phase
Charge Loop filter
frequency dn
pump
detector
Ring
oscillator
VCO
vco_en_bar
reset is high, the output will be precharged high as well. After reset goes low,
the output node becomes floating and is only sensitive to the positive edge of input
signal. From the energy point of view, the TSPC D flip-flop is more efficient compared
to static logics due to less parasitic capacitances. The phase frequency detector is
shown in Fig. 6.6(b). Two TSPC D flip-flops are used to detect the rising edge from
fref and fVCO , respectively. The reset signal will be asserted whenever both of the
flip-flop outputs are high. The output signals upn and dn are balanced in terms of
propagation delay to reduce the noise that will be injected into the charge pump. The
locking mode during transmission. Thus the noise sources such as charge injection
and charge sharing are irrelevant to the pulse signaling events. In addition, the timing
jitter from VCO has little impact on the resonant signal since the quality factor of the
integrated inductors is not high. We can therefore trade off the noise for the circuit
complexity and the power consumption for the sensitive components. The circuit
diagram of the charge pump is shown in Fig. 6.7(a). The bias current is generated by
transistors M1 and M3. The charge pump pulls current from the supply to the output
100
Q upn
in M1 M6
fin Q
out RST
M2 M5 reset
pll_en_bar
reset
M3 M4 RST
Q dn
fVCO
Q
(a) (b)
Figure 6.6: TSPC phase frequency detector. (a) TSPC D filp-flop, (b) circuit diagram of the phase
frequency detector.
when the upn signal is low, and vice versa. In pulse signaling mode, the charge pump
can be turned off by opening the transistor M2. The output will become a floating
node but can still drift over time due to the subthreshold leakage from transistors
M4 to M7. To minimize the impact of leakage, careful sizing is needed for balancing
the pull up and pull down networks when they are off. Fig. 6.7(b) shows the current-
starved VCO. The VCO is composed of 5 inverter stages that are current-starved with
control voltage coming from the loop filter output. Since it is only current-starved
through NMOS transistors, the duty cycle of the oscillator is not 50%. In order to
compensate for uneven rising and falling transitions from the high state and the low
state, another current-starved inverter is used to generate the output that is close to
50% duty cycle. A separate control signal vco en bar is used to stop the VCO from
detector will start reducing its amplitude once the incoming signal stop switching
as shown in the waveform of Fig. 6.8. The figure shows the related control signals
when the transponder is in the pulse signaling mode. By detecting the transitions
101
M3 M4
out
M7 upn
in out
pll_en M2 out
M6 dn
M1 M5
ctrl
vco_en_bar
ctrl
(a) (b)
Figure 6.7: Schematics of (a) the charge pump, (b) the VCO.
from demod out, the active low signal clk bar is generated to indicate the period
when the transponder starts to take over the control of the communication channel.
vco en bar becomes 0 right after clk bar goes low to enable the VCO in a free running
mode. The VCO is in fact activated a couple hundreds of nanoseconds before the
pulse signaling happens. This is because that the voltage regulator requires some
response time switching from very light load to the load of an free running VCO. The
delay between the time when clk bar goes low and the actual pulse signals events
is controlled by the number of cycles of the VCO output. A series of pulses will
be fired by the drivers at the output frequency of the VCO. At the same time, the
signal demod out may rise since the same communication channel is being excited
again. To prevent the controller from misinterpreting that the pulse signaling mode is
finished, the controller needs to mask out demod out and prevents the clk bar signal
from rising. The VCO can be put into sleep right after the pulse signaling mode is
completed to save power.
The waveform when the system is in the PLL locking mode is shown in Fig. 6.9. At
the beginning of this mode, clk bar goes high and activates both the VCO and the rest
parts of the PLL. After a certain cycle, the VCO output will reach the same frequency
as the incoming signal frequency. It is noted that the power consumption is at its peak
102
Sending Signal 1
Vin
demod_out
vpulse
pump_enable
pll_en
Vin
demod_out
clk_bar
vco_en_bar
PLL locking is stopped in either condition:
1. 128 cycles has been passed
vpulse 2. VDD5 dropped below 1.2V
pump_enable
pll_en
103
in this mode where the PLL is the dominant source. The energy range for the reader
depends on the power consumption of the transponder. When the distance between
the two chips are far enough apart, the harvested power will not enough to supply the
PLL. To extend the energy range, the harvested power should be allowed to be lower
than what the PLL is consuming. As mentioned before, the rectified voltage can be
prevented from being lower than a certain voltage by the pump enable signal. The
PLL operation will be stopped upon the request of pump enable. In close distances,
however, what may happen is that the PLL can keep running until it enters the
pulse signaling mode again. When that happens the reference clock of the PLL will
suddenly disappear while the PLL is still trying to track to a frequency that does not
exist. To avoid the false locking attempt, a counter in the timing controller records
the number of cycles the PLL has entered the locking mode and will stop it after 128
cycles if pump enable has not been asserted. In general, the PLL locking mode only
accounts for a fraction of time when the transponder is remotely powered. The rest
of the cycles will be used to replenish the other components that need to be charged
for pulse signaling. Part of the harvested energy will go to the supply capacitors that
will be discussed in the next paragraph.
During the signal pulsing mode, the open loop VCO output pll clk is used to
produce pulses at the resonant frequency. The scheme is shown in Fig. 6.10. To
provide enough current for exciting the reader inductor, the charges are stored on
the supply capacitors during power harvesting. Supply voltages vddp1 to vddp4 are
replenished by the rectifier output VDD5 when pulse en is low. In case a signal 1
is sent, a series of pulses will be sent. The output drivers will inject the charges that
were previously stored on vddp1 to vddp4 successively into the inductor and sink the
current out of it at the other end. lcn is the input of the pull-down transistor Mn,
while lp1 to lp4 are the inputs for the pull-up transistors. Each capacitor for supply
voltages vddp1 to vddp4 is 10pF, which corresponds to 1890m2 of silicon area.
104
pulse_en LC
pulse gen LC
pll_clk LC
LC vddp1 vddp2 vddp3 vddp4 VDD5
logic
lcn
vddp1 vddp2 vddp3 vddp4
lc1
Mp1 Mp2 Mp3 Mp4
lc2
Mn lc3
lc4
Compared to the transponder, the design for the reader is much simpler since no
other signals are interfering with the readout data. The output driver is similar to
the transponder driver except that it can be driven strongly by the power supply.
The carrier clock is modulated with ext clk which defines the data rate. The circuit
that generates the control signals for the output driver is shown in Fig. 6.11. Two
pulse generators are composed of delay chains which define two critical period for the
system.
T1 is the time when the reader stops sending power and instead waiting for the
readout signals.
T1-T2 is the time when the reader is allowed to amplify the incoming signals.
Both T1 and T2 are referenced to the negative edge of ext clk. In this work, T1
105
damping resonant clock of the reader itself, T2 is given by 200ns. Signal clk is
the carrier frequency of the system at 200MHz. The signals produced by the pulse
generators are pulse mod and pulse pre, respectively. The drivers are controlled by
pulse mod and clk where the charges are replenished into the inductor from opposite
direction every half cycle. In order to conveniently adjust the magnetic field, level
converters are implemented to drive the driver with larger swing. Assuming that
the series resistance of the inductor dominates the current, the AC current will be
proportional to the raised switching amplitude. Therefore, the driver transistors
should be implemented with thick oxide devices in order to sustain the higher than
nominal voltages.
The carrier clock signal is implemented on die as well. The clock generator sup-
ports two modes of operation. The first mode generates the clock output at the
resonant frequency which can be directly sent to the driver. The second mode pro-
duces a clock frequency twice higher than the resonant frequency and follows by a
divide-by-two circuit before feeding the output drivers. The reason for the second
mode is to produce a near 50% duty cycle signal from the clock generator. Although
the second mode consumes more power, the sinusoidal signal always gets replenished
at the right time and efficiency of the driver can be improved compared to the first
mode.
Since data receiving is time-multiplexed with the power transmitting mode, the
receiver does not have to deal with strong interferences. Fig. 6.12 shows the circuit
diagram of how the data is decoded and the timing diagram when a single bit of 1
is sent. First, a single stage amplifier with decent gain is sufficient to amplify the
resonant signals to full rail. The input signal in is first AC coupled and properly
DC biased before amplified to rec in. And then the data can be easily decoded by
digital logic gates. Signal pulse pre is used here to reset the state of rec data and
ensure that the transition of rec data is uni-direction. Assuming that the amplitude
106
T1
pulse_mod
ext_clk
clk clkn
T2
pulse_pre
ext_clk
clk
pulse_mod T1
pulse_pre T2
in
rec_in
in rec_data
Amp
pulse_pre
ext_clk
rec_data
D Q 1'b0 D Q D Q out_data
SET SET SET
rec_in
pulse_pre reset
ext_clk
107
of rec in is large enough to trigger the D flip-flop and produce a rising signal rec data,
the output data out will be latched at the positive edge of ext clk. In case when a
0 is sent, data out will remain the same while rec data is precharged to 0.
The test chip for inductive coupling was fabricated with a 0.13m CMOS technology.
Fig. 6.13 shows the die photo of both the reader and the transponder on the same
die. The integrated inductor for both circuits are designed with the same dimensions
as shown in Table. 6.1. The active area of the transponder measures 0.084m2 , while
Transponder Reader
Figure 6.13: Die photo for the reader and transponder of the system.
108
Micromani-
pulator
Transponder
(a) (b)
Figure 6.14: (a) Test setup with the micromanipulator and the PC board, (b) close-up photo.
mounted on a PC board with interfaces to the oscilloscope and a laptop for control
y-axis, the resolution is 0.1mm. In order to maximize the energy range, the reso-
nant frequency should be measured first. As the switching frequency getting closer
to the resonant frequency, the loss of charges due to the resistive components de-
crease. Therefore, the resonant frequency can be found at the local minimum of
power consumption by sweeping the switching frequency.
The received waveform in shown in Fig. 6.15. The data stream is encoded with a
4 bit LFSR that generates pseudo-random numbers and repeats every 15 cycles. The
worst case happens when a 1 is sent following another 1 from the previous cycle
since it gives the transponder the shortest time to replenish the supply capacitors. It
109
3.0
2.5
2.0
data_out (V) 1.5
1.0
0.5
0.0
2.0
1.5
ext_clk (V)
1.0
0.5
0.0
0 50 100 150 200
Time ( s)
Figure 6.15: Measured waveform from the oscilloscope showing the output data and the clock signal
AC current. At a given data rate of fdata , dmax monotonically increases with Vsw .
1.1mm is the achievable distance with Vsw = 3V and fdata = 50kHz while the power
consumption is 16mW. For reduced distance at 0.9mm apart, fdata = 400kHz can
be achieved. At higher data rates, the dmax starts decreasing because the harvested
energy is also reduced. On the other hand, reducing the data rate is not always
advantageous. It is because that the frequency of pulse signaling relies on the ability
of the filter in the PLL to hold the bias voltage. However, it suffers from 20pA of
leakage even after the charge pump is turned off. The frequency will further deviate
from the resonant frequency as the time between the refreshes increases and result in
less sensible signals. As a result, the absolute minimum data rate is 6kHz regardless
of the switching amplitude.
Fig. 6.17 shows the plot where the data can be successfully communicated with a
110
1.04
1.00
Switching Amplitude (V)
2.5 1.00
Distance (mm)
0.96 0.96
2.0 0.92
0.92 0.88
0.84
1.5 0.88
0.80
0.84
10 20 30 40 50 60 70 80 90 100
Data rate (kHz)
Figure 6.16: Measured communication distance with respect to the data rate and the switching
amplitude.
combination of the horizontal misalignment and the vertical distance. At Vsw = 3V,
every 0.1mm of misalignment translates to a loss of roughly 0.1mm of communication
distance. While at Vsw = 2.3V, the impact from misalignment is only half of that.
The power consumption when Vsw = 2.3V is about 8mW.
6.4 Conclusion
In this work, we present a pulse signaling based method for data readout from induc-
tive coupled coils in short range. The use of time-multiplexing pulse signaling allows
the optimization of quality factor for both the reader and the transponder, as long
as the resonant frequency is the same. It also relaxes the constraint on the receivers
sensitivity by eliminating the dominant noise source during data receiving. A PLL is
111
0.5 Vsw = 3.0V
Alignment offset (mm) Vsw = 2.3V
0.4
0.3
0.2
0.1
0.0
0.7 0.8 0.9 1.0 1.1
Communication distance (mm)
Figure 6.17: Measured achievable communication distance with misalignment in the x-axis or the
y-axis.
the power. The replicate frequency in stored on the loop filter in the form of volt-
age, which can be later used to drive the VCO and generate pulses that effectively
excites the readers inductor. The test chip was fabricated in 0.13m technology
with 1mm1mm of integrated inductors on both the reader and the transponder.
The measurement results demonstrate successful reading at 1.1mm of distance with
112
CHAPTER VII
7.1 Contributions
In this dissertation, several building blocks for a miniature sensor system were dis-
cussed. In order to achieve a small form factor, low energy operation becomes the
key to such a system. Unlike the microcontrollers or the storage units like SRAM,
the peripheral circuits did not get much emphasis in terms of low power operations.
The timer, for example, is the only active component while the system is in the sleep
mode and often dominates the total power if it is not properly designed. Passive
communication is another important feature for the miniature system. At mm3
data remotely without actively powering the transponder. In order to work at close
distance and limited form factor, new techniques are proposed in this work.
Two ultra low power timers that oscillate in sub-Hz to 10Hz are proposed. To
effectively reduce the power consumption, the transistors are aggressively biased
113
The chip is measured with less than 0.1Hz of nominal frequency and sub-pW
the bias voltage so that the oscillation period remains temperature insensitive
after the bias stage is turned off. Although the footprint of the program-and-
hold timer is 40X larger than the gate leakage based design, it still only accounts
for less than 2% of a 1mm2 chip. The average power consumption is 150pW,
with 5% cycle time error from 0 C to 90 C when the timer is refreshed every 2
minutes.
A low power temperature sensor is proposed for remotely powered systems. The
a digitized output. In this work, the size of the temperature sensor is inversely
proportional to the power consumption. With a footprint of 0.05mm2 , the total
For communication between different power domains and testing for the sub-
threshold circuits, level shifters are widely used. A single stage static subthresh-
114
old to I/O voltage level shifter is implemented with the advantage of robustness
unchanged at temperatures lower than 0c ircC while the DCVS counterpart de-
grades exponentially below the room temperature.
test chip demonstrates that the achievable data rate varies less than 15% in 10
experiments when the sensor chip is randomly dropped on the data retrieval
chip. In this work, data, clock and power can all be capacitively transmitted at
the same time through different channels.
data signals, the noise floor of the receiver is greatly reduced. The test chips for
both the transponder and the reader are implemented with integrated inductors
of 1mm1mm. The achievable communication distance is 1.1mm which can be
115
7.2 Future works
In the future, we can work toward two directions for the sensor system.
Wireless sensor network. The low power circuits shown in this dissertation were
motivated toward a single sensor system. It is also possible to apply the circuit
techniques to a wireless sensor network with the addition of RF modules. The
RF module, however, has unacceptable active power consumption for the system
that relies on a battery in mm3 . As a result, strong power gating is required to
is to use the wakeup receiver that was discussed in Chap. I. The idle power
of the wakeup receiver should be further reduced so that it will not dominate
the system power during the sleep mode. Another solution is to develop a low
power timer that can be used on both end of the transceiver and the chips
cent loading. However, the efficiency drops significantly when the loading
reduces in the sleep mode. In order to maintain decent efficiency, the de-
116
vices that is responsible for static power consumption should be minimized.
Lowering the clock frequency intelligently based on the load is the key to
cut down the power consumption. For example, the voltage regulator only
needs to supply 100pW for the timer in the sleep mode and but the power
demand increases to 1W for the whole system during the active mode.
up with a scheme that utilizes the advantages of both power sources is not
trivial. Life time is not limited with energy scavenging, however, switching
power system.
3D stacking is an attractive option for the sensor system. The first reason
is that by stacking the chips, higher densities can be achieved. Another rea-
son is that heterogeneous technologies can be used to fabricate components
such as the FLASH memory. This allows us to explore new architecture for
the system as individual components can be optimized. For example, the
timers that proposed in this work rely on the magnitude of gate leakage
in a certain range so technology scaling can become adverse. On the other
hand, the microcontroller typically favors smaller dimensions so that the
117
BIBLIOGRAPHY
118
[11] A. Malik, M. Aceves, and S. Alcantara, Novel FTO/SRO/silicon optical sen-
sors: characterization and applications, in Proc. Sensors, vol. 1, 2002, pp.
116120.
[21] I. J. Chang, J.-J. Kim, S. Park, and K. Roy, A 32kb 10T Subthreshold SRAM
Array with Bit-Interleaving and Differential Read Scheme in 90nm CMOS, in
IEEE ISSCC Dig. Tech. Papers, Feb. 2008, pp. 388622.
119
[24] S. Shukuri, K. Tanagisawa, and K. Ishibashi, CMOS process compatible ie-
Flash (inverse gate electrode Flash) technology for system-on-a-chip, in Proc.
IEEE Custom Integrated Circuits Conf. (CICC), May 2001, pp. 179182.
[25] Q. Huang and M. Qberle, A 0.5-mW passive telemetry IC for biomedical ap-
plications, IEEE J. Solid-State Circuits, vol. 33, no. 7, pp. 937946, July 1998.
[27] S. Kaiser, Passive Telemetric Readout System, IEEE Sensors J., vol. 6, no. 5,
pp. 13401345, Oct. 2006.
[28] D. Dudenbostel, K.-L. Krieger, C. Candler, and R. Laur, A new passive CMOS
telemetry chip to receive power and transmit data for a wide range of sensor
applications, in Proc. Solid State Sensors and Acuators, vol. 2, 16-19 June
1997, pp. 995998.
[29] F. Kocer and M. P. Flynn, A new transponder architecture with on-chip ADC
for long-range telemetry applications, IEEE J. Solid-State Circuits, vol. 41,
no. 5, pp. 11421148, May 2006.
[30] W. Nosovic and T. Todd, Scheduled rendezvous and RFID wakeup in embed-
ded wireless networks, in Proc. ICC, vol. 5, 2002, pp. 33253329.
[31] S. von der Mark and G. Boeck, Ultra low power wakeup detector for sensor
networks, in Proc. IMOC, Oct. 2007, pp. 865868.
[34] M. Renaud, T. Sterken, P. Fiorini, R. Puers, K. Baert, and C. van Hoof, Scav-
enging energy from human body: design of a piezoelectric transducer, in Proc.
Transducers, vol. 1, 5-9 June 2005, pp. 784787.
[35] E. Reilly, E. Carleton, and P. Wright, Thin Film Piezoelectric Energy Scav-
enging Systems for Long Term Medical Monitoring, in International Workshop
on Wearable and Implantable Body Sensor Networks (BSN 06), Apr. 2006, pp.
3841.
120
[37] S. J. Roundy, Energy Scavenging for Wireless Sensor Nodes with a Focus
on Vibration to Electricity Conversion, Ph.D. dissertation, The University of
California, Berkeley, 2003.
[38] H. Li and P. Pillay, A Linear Generator Powered from Bridge Vibrations for
Wireless Sensors, in Proc. IAS, 2007, pp. 523529.
[46] D. Da Rin and B. Brown, Diurnal variation of intraocular pressure and the
overriding effects of sleep, Am J Optom Physiol Opt, vol. 64, pp. 5461, 1987.
121
[49] A. Banobre, T. Alvarez, R. Fechtner, R. Greene, G. Thomas, O. Levi, and
N. Ciampa, Measurement of intraocular pressure in pigs eyes using a new
tonometer prototype, in Proc. NEBC, 2-3 April 2005, pp. 260261.
[50] C. C. Collins, Miniature passive pressure transensor for implanting in the eye,
IEEE Trans. Biomed. Eng., vol. 14, pp. 7483, 1967.
[51] M. Kandler and W. Mokwa, Capacitive silicon pressure sensor for invasive
measurement of blood pressure, in Proc. Micromech. Euro. Tech. Dig, Nov
1990, pp. 203208.
[56] Y.-S. Lin, S. Hanson, F. Albano, C. Tokunaga, R.-U. Haque, K. Wise, A. Sastry,
D. Blaauw, and D. Sylvester, Low-voltage circuit design for widespread sensing
applications, in IEEE Int. Symp. on Circuits and Systems, May 2008, pp.
25582561.
122
[61] K. Sundaresan, K. Brouse, K. U-Yen, F. Ayazi, and P. Allen, A 7-MHz process,
temperature and supply compensated clock oscillator in 0.25 m CMOS, in
IEEE Int. Symp. on Circuits and Systems, vol. 1, 25-28 May 2003, pp. 693696.
[68] Y.-S. Lin, D. Sylvester, and D. Blaauw, A sub-pW timer using gate leakage for
ultra low-power sub-Hz monitoring systems, in Proc. IEEE Custom Integrated
Circuits Conf. (CICC), Sept. 2007, pp. 397400.
[69] Y.-S. Lin, D. Sylvester, and D. Blaauw, An ultra low power 1V, 220nW temper-
ature sensor for passive wireless applications, in Proc. IEEE Custom Integrated
Circuits Conf. (CICC), Sept. 2008, pp. 507510.
[70] Y.-S. Lin and D. Sylvester, Single stage static level shifter design for subthresh-
old to I/O voltage conversion, in Proc. Int. Symp. Low Power Electronics and
Design, Aug. 2008, pp. 197200.
[71] Y.-S. Lin, D. Sylvester, and D. Blaauw, Sensor data retrieval using alignment
independent capacitive signaling, in Symp. VLSI Circuits Dig. Tech. Papers,
June. 2008, pp. 6667.
123
[73] J. Georgiou and C. Toumazou, A resistorless low current reference circuit for
implantable devices, in IEEE Int. Symp. on Circuits and Systems, vol. 3, May
2002, pp. 193196.
[74] K. M. Cao, W.-C. Lee, W. Liu, X. Jin, P. Su, S. Fung, J. An, B. Yu, and C. Hu,
BSIM4 gate leakage model including source-drain partition, in Proc. IEDM,
10-13 Dec. 2000, pp. 815818.
[75] C.-H. Choi, K.-Y. Nam, Z. Yu, and R. Dutton, Impact of gate direct tunneling
current on circuit performance: a simulation study, IEEE Trans. Electron
Devices, vol. 48, no. 12, pp. 28232829, Dec. 2001.
[76] M. J. S. Smith and J. D. Meindl, Exact analysis of the Schmitt trigger oscil-
lator, IEEE J. Solid-State Circuits, vol. 19, no. 6, pp. 10431046, Dec 1984.
[77] S. Borkar, Design challenges of technology scaling, IEEE Micro, vol. 19, no. 4,
pp. 2329, Jul-Aug 1999.
[78] Y. Liu, R. Dick, L. Shang, and H. Yang, Accurate Temperature-Dependent
Integrated Circuit Leakage Power Estimation is Easy, in Proc. Design Au-
tomation Test Eur., Apr. 2007, pp. 16.
[79] H. Su, F. Liu, A. Devgan, E. Acar, and S. Nassif, Full chip leakage-estimation
considering power supply and temperature variations, in Proc. Int. Symp. Low
Power Electronics and Design, Aug. 2003, pp. 7883.
[80] S. Hanson, B. Zhai, M. Seok, B. Cline, K. Zhou, M. Singhal, M. Minuth, J. Ol-
son, L. Nazhandali, T. Austin, D. Sylvester, and D. Blaauw, Performance and
Variability Optimization Strategies in a Sub-200mV, 3.5pJ/inst, 11nW Sub-
threshold Processor, in Symp. VLSI Circuits Dig. Tech. Papers, June 2007,
pp. 152153.
[81] S. Narendra, V. De, S. Borkar, D. Antoniadis, and A. Chandrakasan, Full-
Chip Subthreshold Leakage Power Prediction and Reduction Techniques for
Sub-0.18-m CMOS, IEEE J. Solid-State Circuits, vol. 39, no. 3, pp. 501510,
Mar. 2004.
[82] R. Rao, A. Srivastava, D. Blaauw, and D. Sylvester., Statistical Analysis of
Subthreshold Leakage Current for VLSI Circuits, IEEE Trans. VLSI Syst.,
vol. 12, no. 2, pp. 131139, Feb. 2004.
[83] H.-M. Chuang, K.-B. Thei, S.-F. Tsai, and W.-C. Liu, Temperature-dependent
characteristics of polysilicon and diffused resistors, IEEE Trans. Electron De-
vices, vol. 50, no. 5, pp. 14131415, May 2003.
[84] D. Duarte, G. Geannopoulos, U. Mughal, K. Wong, and G. Taylor, Temper-
ature Sensor Design in a High Volume Manufacturing 65nm CMOS Digital
Process, in Proc. IEEE Custom Integrated Circuits Conf. (CICC), Sept. 2007,
pp. 221224.
124
[85] F. Kocer and M. Flynn, An RF-powered, wireless CMOS temperature sensor,
IEEE Sensors J., vol. 6, no. 3, pp. 557564, 2006.
[86] S. Zhou and N. Wu, A novel ultra low power temperature sensor for UHF
RFID tag chip, in Proc. ASSCC, Nov 2007, pp. 464467.
[88] A. Bakker and J. Huijsing, Micropower CMOS temperature sensor with digital
output, IEEE J. Solid-State Circuits, vol. 31, no. 7, pp. 933937, July 1996.
[91] P. Chen, C.-C. Chen, C.-C. Tsai, and W.-F. Lu, A time-to-digital-converter-
based CMOS smart temperature sensor, IEEE J. Solid-State Circuits, vol. 40,
no. 8, pp. 16421648, Aug 2005.
[92] K. Kimura, Low voltage techniques for bias circuits, IEEE Trans. Circuits
Syst. I, vol. 44, no. 5, pp. 459465, May 1997.
[94] C.-C. Yu, W.-P. Wang, and B.-D. Liu, A new level converter for low-power
applications, in IEEE Int. Symp. on Circuits and Systems, vol. 1, May 2001,
pp. 113116.
[95] F. Ishihara, F. Sheikh, and B. Nikolic, Level Conversion for Dual-Supply Sys-
tems, IEEE Trans. VLSI Syst., vol. 12, no. 2, pp. 185195, 2004.
[96] I. J. Chang, J.-J. K., and K. Roy, Robust Level Converter Design for Sub-
threshold Logic, in Proc. Int. Symp. Low Power Electronics and Design, 2006,
pp. 1419.
[98] W.-T. Wang, M.-D. Ker, M.-C. Chiang, and C.-H. Chen, Level shifters for
high-speed 1 V to 3.3 V interfaces in a 0.13 m Cu-interconnection/low-k CMOS
technology, in Proc. VLSI TSA, 2001, pp. 307310.
125
[99] H. Zhang, V. George, and J. Rabaey, Low-swing on-chip signaling techniques:
effectiveness and robustness, IEEE Trans. VLSI Syst., vol. 8, no. 3, pp. 264
272, June 2000.
[100] Y. Ramadass and A. Chandrakasan, Minimum Energy Tracking Loop with
Embedded DC-DC Converter Delivering Voltages down to 250mV in 65nm
CMOS, in IEEE ISSCC Dig. Tech. Papers, Feb. 2007, pp. 64 587.
[101] M.-E. Hwang, A. Raychowdhury, K. Kim, and K. Roy, A 85mV 40nW Process-
Tolerant Subthreshold 8x8 FIR Filter in 130nm Technology, in Symp. VLSI
Circuits Dig. Tech. Papers, June 2007, pp. 154155.
[102] U. Karthaus and M. Fischer, Fully integrated passive UHF RFID transponder
IC with 16.7-W minimum RF input power, IEEE J. Solid-State Circuits,
vol. 38, no. 10, pp. 16021608, Oct. 2003.
[103] N. Miura, D. Mizoguchi, M. Inoue, T. Sakurai, and T. Kuroda, A 195-gbs
1.2-W inductive inter-chip wireless superconnect with transmit power control
scheme for 3-D-stacked system in a package, IEEE J. Solid-State Circuits,
vol. 41, no. 1, pp. 2334, Jan. 2006.
[104] T. Kuroda, Wireless Proximity Communications for 3D System Integration,
in IEEE Workshop on RFIT, Dec. 2007, pp. 2125.
[105] R. Drost, R. Hopkins, and I. Sutherland, Proximity communication, in Proc.
IEEE Custom Integrated Circuits Conf. (CICC), Sept. 2003, pp. 469472.
[106] A. Fazzi, R. Canegallo, L. Ciccarelli, L. Magagni, F. Natali, E. Jung,
P. Rolandi, and R. Guerrieri, 3D Capacitive Interconnections with Mono- and
Bi-Directional Capabilities, in IEEE ISSCC Dig. Tech. Papers, Feb. 2007, pp.
356608.
[107] E. Culurciello and A. G. Andreou, Capacitive Inter-Chip Data and Power
Transfer for 3-D VLSI, IEEE Trans. Circuits Syst. II, vol. 53, no. 12, pp.
13481352, 2006.
[108] R. Drost, R. Ho, D. Hopkins, and I. Sutherland, Electronic alignment for
proximity communication, in IEEE ISSCC Dig. Tech. Papers, 15-19 Feb. 2004,
pp. 144518.
[109] R. Canegallo, M. Mirandola, A. Fazzi, L. Magagni, R. Guerrieri, and
K. Kaschlun, Electrical measurement of alignment for 3D stacked chips, in
Proc. ESSCIRC, 12-16 Sept. 2005, pp. 347350.
[110] Raphael, Synopsys Inc., Mountain View, California, 2005.
[111] L. Nazhandali, B. Zhai, A. Olson, A. Reeves, M. Minuth, R. Helfand, S. Pant,
T. Austin, and D. Blaauw, Energy optimization of subthreshold-voltage sensor
network processors, in Proc. of the International Symposium on Computer
Architecture (ISCA), June 2005, pp. 197207.
126
[112] M. Seok, S. Hanson, Y.-S. Lin, Z. Foo, D. Kim, Y. Lee, N. Liu, D. Sylvester, and
D. Blaauw, The Phoenix Processor: A 30pW platform for sensor applications,
in Symp. VLSI Circuits Dig. Tech. Papers, June 2008, pp. 188189.
[114] K. W. Min, S. B. Chai, and S. Kim, An Analog Front-End Circuit for ISO/IEC
14443-compatible RFID Interrogators, Jour. ETRI, vol. 26, no. 6, pp. 560564,
2004.
[115] J.-P. Curty, N. Joehl, C. Dehollain, and M. Declercq, Remotely powered ad-
dressable UHF RFID integrated system, IEEE J. Solid-State Circuits, vol. 40,
no. 11, pp. 21932202, Nov. 2005.
[121] W.-H. Lee, J.-D. Cho, and S.-D. Lee, A high speed and low power phase-
frequency detector and charge-pump, in Proc. Asia and South Pacific Design
Automation Conf., vol. 1, 1999, pp. 269272.
127