# **CMOS Digital Integrated Circuits**



# Lec 10 Semiconductor Memories



CMOS Digital Integrated Circuits

#### **Semiconductor Memory Types**





# **Semiconductor Memory Types (Cont.)**

#### Design Issues

- Area Efficiency of Memory Array: # of stored data bits per unit area
- **Memory Access Time:** the time required to store and/or retrieve a particular data bit.
- Static and Dynamic Power Consumption
- RAM: the stored data is volatile
  - DRAM
    - » A capacitor to store data, and a transistor to access the capacitor
    - » Need refresh operation
    - » **Low cost**, and **high density**  $\Rightarrow$  it is used for main memory
  - SRAM
    - » Consists of a latch
    - » Don't need the refresh operation
    - » High speed and low power consumption ⇒it is mainly used for cache memory and memory in hand-held devices



#### **Semiconductor Memory Types (Cont.)**

#### **ROM: 1, nonvolatile memories**

2, only can access data, cannot to modify data

**3, lower cost:** used for permanent memory in printers, fax, and game machines, and ID cards

- *Mask ROM*: data are written **during** chip fabrication by a **photo mask**
- **PROM:** data are written electrically **after** the chip is fabricated.
  - » *Fuse ROM*: data cannot be erased and modified.
  - » EPROM and EEPROM: data can be rewritten, but the number of subsequent re-write operations is limited to 104-105.
    - **EPROM uses ultraviolet rays** which can penetrate through the crystal glass on package to erase whole data simultaneously.
    - **EEPROM uses high electrical voltage** to erase data in 8 bit units.
- *Flash Memory*: similar to EEPROM
- *FRAM*: utilizes the **hysteresis** characteristics of a capacitor to overcome the slow written operation of EEPROMs.



#### **Random-Access Memory Array Organization**



### Nonvolatile Memory 4Bit × 4Bit NOR-based ROM Array



| <b>R</b> <sub>1</sub> | <b>R</b> <sub>2</sub> | <b>R</b> <sub>3</sub> | <b>R</b> <sub>4</sub> | <b>C</b> <sub>1</sub> | <b>C</b> <sub>2</sub> | <b>C</b> <sub>3</sub> | <b>C</b> <sub>4</sub> |
|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|
| 1                     | 0                     | 0                     | 0                     | 0                     | 1                     | 0                     | 1                     |
| 0                     | 1                     | 0                     | 0                     | 0                     | 0                     | 1                     | 1                     |
| 0                     | 0                     | 1                     | 0                     | 1                     | 0                     | 0                     | 1                     |
| 0                     | 0                     | 0                     | 1                     | 0                     | 1                     | 1                     | 0                     |

- One word line "*R<sub>i</sub>*" is activated by raising its voltage to *V<sub>DD</sub>*
- Logic "1" is stored: Absent transistor Logic "0" is stored: Present transistor
- To reduce static power consumption, the pMOS can be driven by a periodic pre-charge signal.

### **Layout of Contact-Mask Programmable NOR ROM**



- "0" bit: drain is connected to metal line via a metal-to-diffusion contact
  "1" bit: omission the connect between drain and metal line.
- **To save silicon area:** the transistors on every two adjacent rows share a common ground line, also are routed in n-type diffusion





• In reality, the metal lines are **laid out directly on top** of diffusion columns to reduce the horizontal dimension.



# Implant-Mask Programmable NOR ROM Array



- *V*<sub>T0</sub> is implanted to activate 1 bit:
   Let *V*<sub>T0</sub> > *V*<sub>DD</sub> ⇒ permanently turn off transistor
  - $\Rightarrow$  disconnect the contact





Each diffusion-to-metal contact is shared by two adjacent transistors
 ⇒ need smaller area than contact-mask ROM layout



#### **4Bit × 4Bit NAND-based ROM Array**



| <b>R</b> <sub>1</sub> | <b>R</b> <sub>2</sub> | <b>R</b> <sub>3</sub> | <b>R</b> <sub>4</sub> | <b>C</b> <sub>1</sub> | <b>C</b> <sub>2</sub> | <b>C</b> <sub>3</sub> | <b>C</b> <sub>4</sub> |
|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|
| 0                     | 1                     | 1                     | 1                     | 0                     | 1                     | 0                     | 1                     |
| 1                     | 0                     | 1                     | 1                     | 0                     | 0                     | 1                     | 1                     |
| 1                     | 1                     | 0                     | 1                     | 1                     | 0                     | 0                     | 1                     |
| 1                     | 1                     | 1                     | 0                     | 0                     | 1                     | 1                     | 0                     |

- All word lines are kept at logic "1" level, except the selected line pulled down by "0" level.
- Logic "0" is stored: Absent transistor
   Logic "1" is stored: Present transistor



# Layout of Implant-Mask Programmable 4Bit × 4Bit NAND ROM



- No contact in the array **⇒ More compact than NOR ROM array**
- Series-connected nMOS transistors exist in each column
  - $\Rightarrow$  The access time is slower than NOR ROM



#### **Design of Row and Column Decoders**

• Row and Column Decoders: To select **a particular memory location** in the array.





**CMOS** Digital Integrated Circuits



**CMOS** Digital Integrated Circuits

#### **Implementation of Row Decoder and ROM**

• Can be implemented as *two adjacent* NOR planes







### **Implementation of Row Decoder and ROM (Cont.)**

• Can also be implemented as *two adjacent* NAND planes



| <b>A</b> <sub>1</sub> | $A_2$ | <b>R</b> <sub>1</sub> | <b>R</b> <sub>2</sub> | <b>R</b> 3 | <b>R</b> 4 |
|-----------------------|-------|-----------------------|-----------------------|------------|------------|
| 0                     | 0     | 0                     | 1                     | 1          | 1          |
| 0                     | 1     | 1                     | 0                     | 1          | 1          |
| 1                     | 0     | 1                     | 1                     | 0          | 1          |
| 1                     | 1     | 1                     | 1                     | 1          | 0          |

4×4 NAND ROM Array



### Column Decoder (1) NOR Address Decoder and Pass Transistors

- **Column Decoder:** To select one out of **2**<sup>*M*</sup> bits lines of the ROM array, and to route the data of the selected bit line to the data output
- NOR-based column address decoder and pass transistors:
  - » Only one nMOS pass transistor is turned on at a time
  - » # of transistors required:  $2^{M}(M+1)$  ( $2^{M}$  pass transistors,  $M2^{M}$  decoder)



#### **Column Decoder (2) Binary Tree Decoder**

# Binary Tree Decoder Binary Tree Decoder: A binary selection tree with consecutive stages

- » The pass transistor network is used to select one out of every two bit lines at each stages. The NOR address decoder is not needed.
- » Advantage: Reduce the transistor count (2<sup>M+1</sup>-2+2M)
- » Disadvantage: Large number of series connected nMOS pass transistors ⇒ long data access time





#### **An Example of NOR ROM Array**

- Consider the design of a 32-kbit NOR ROM array and the design issues related to *access time analysis*
  - » # of total bits: 15 (2<sup>15</sup>=32,768)
  - » 7 row address bits ( $2^7 = 128$  rows)
  - » 8 column address bits (2<sup>8</sup> = 256 columns)
  - » Layout: implant-mask
  - »  $W = 2 \mu m$ ,  $L = 1.5 \mu m$
  - $\approx \mu_n C_{ox} = 20 \,\mu \text{A/V}^2$
  - »  $C_{ox} = 3.47 \ \mu F/cm^2$
  - »  $\mathbf{R}_{sheet-poly} = 20 \ \Omega/square$



- *R*<sub>*row*</sub>, and *C*<sub>*row*</sub> / unit memory cell
  - »  $C_{row} = C_{ox} \cdot W \cdot L = 10.4 \text{ fF/bit}$
  - »  $R_{row} = (\# \text{ of squares}) \times R_{sheet-poly} = 3 \times 20 = 60 \Omega$



## An Example of NOR ROM Array (Cont.)

• The poly word line can be modeled as a RC transmission line with up to 256 transistors



• The row access time *t*<sub>row</sub>: delay associated with selecting and activating 1 of 128 word lines in ROM array. It can be approximated as





## An Example of NOR ROM Array (Cont.)

- A more accurate RC delay value: *Elmore time constant* for RC ladder circuits  $t_{row} = \sum_{k=1}^{256} R_{jk} C_k = 20.52 \text{ ns}$
- The column access time  $t_{column}$ : worst case delay  $\tau_{PHL}$  associated with discharging the precharged bit line when a row is activated.





# An Example of NOR ROM Array (Cont.)

- $C_{column} = 128 \times (C_{gd,driver} + C_{db,driver}) \approx 1.5 \text{pF}$ where  $C_{gd,driver} + C_{db,driver} = 0.0118 \text{ pF/word line}$
- Since only one word line is activated at a time, the above circuit can be reduced to an inverter circuit

V<sub>DD</sub> *Remark:*  $\tau_{PLH}$  is not considered because the bit line is precharged high before each row access operation  $R_1 \rightarrow [(2/1.5)] \rightarrow C_{column}$  $t_{column} = \tau_{PHL} = \frac{C_{load}}{k_n (V_{OH} - V_{T0,n})} \left| \frac{2V_{T0,n}}{V_{OH} - V_{T0,n}} + \ln \left( \frac{4(V_{OH} - V_{T0,n})}{V_{OH} + V_{OL}} - 1 \right) \right| = 18ns$  $t_{access} = t_{row} + t_{column} = 20.52 + 18 = 38.52 \text{ ns}$ 



# **Static Random Access Memory (SRAM)**

• **SRAM:** The stored data can be retained indefinitely, without any need for a periodic refresh operation.



• **Complementary Column** arrangement is to achieve a more reliable SRAM operation

#### **Resistive-Load SRAM Cell**





#### **Full CMOS and Depletion-Load SRAM Cell**





**CMOS** Digital Integrated Circuits

### **SRAM Operation Principles**



- **RS=0:** The word line is not selected.  $M_3$  and  $M_4$  are OFF
- One data-bit is held: The latch preserves one of its two stable states.
- ► If RS=0 for all rows:  $C_C$  and  $C_C$  are charged up to near  $V_{DD}$  by pulling up of  $M_{P1}$  and  $M_{P2}$  (both in saturation)

$$V_{\bar{c}} = V_{c} = V_{DD} - (V_{T0} + \gamma \sqrt{|2\phi_{F}| + V_{c}} - \sqrt{|2\phi_{F}|})$$

Ex:  $V_C = V_{\overline{C}} = 3.5V$  for  $V_{DD} = 5V$ ,  $V_{T0} = 1V$ ,  $|2\phi_F| = 0.6V$ ,  $= 0.4V^{1/2}$ 



- *RS*=1: The word line is now selected. *M*<sub>3</sub> and *M*<sub>4</sub> are ON
   Four Operations
- **1. Write "1" Operation**  $(V_1 = V_{OL}, V_2 = V_{OH} \text{ at } t = 0^-)$ :

 $V_{\overline{C}} \Rightarrow V_{OL}$  by the *data-write circuitry*. Therefore,  $V_2 \Rightarrow V_{OL}$ , then  $M_1$  turns *off*  $V_1 \Rightarrow V_{OH}$  and  $M_2$  turns on pulling down  $V_2 \Rightarrow V_{OL}$ .





#### **2. Read "1" Operation** (*V*<sub>1</sub>=*V*<sub>OH</sub>, *V*<sub>2</sub>=*V*<sub>OL</sub> at *t*=0<sup>-</sup>):

*V<sub>c</sub>* retains pre-charge level, while  $V_{\overline{C}} \Rightarrow V_{OL}$  by  $M_2$  **ON**. *Data-read circuitry* detects small voltage difference  $V_C - V_{\overline{C}} > 0$ , and amplifies it as a "**1**" data output.





**3. Write "0" Operation**  $(V_1 = V_{OH}, V_2 = V_{OL} \text{ at } t=0^-)$ :  $V_C \Rightarrow V_{OL}$  by the *data-write circuitry*. Since  $V_1 \Rightarrow V_{OL}$ ,  $M_2$  turns off, therefore  $V_2 \Rightarrow V_{OH}$ .





**4. Read "0" Operation** (*V*<sub>1</sub>=*V*<sub>0L</sub>, *V*<sub>2</sub>=*V*<sub>0H</sub> at *t*=**0**<sup>-</sup>):

 $V_{CT}$  etains pre-charge level, while  $V_C \Rightarrow V_{OL}$  by  $M_1 ON$ .

**Data-read circuitry** detects small voltage difference  $V_C - V_C < \theta$ , and amplifies it as a " $\theta$ " data output.



Pull-up transistor (one per column)



**CMOS** Digital Integrated Circuits

#### **Static or "Standby" Power Consumption**



• Assume: 1 bit is stored in the cell  $\Rightarrow M_1 \text{ OFF}, M_2 \text{ ON} \Rightarrow V_1 = V_{OH}, V_2 = V_{OL}. I.E. One load resistor is always conducting non-zero current.$  $<math>P_{\text{standby}} = (V_{DD} - V_{OL})^2 / R$ with  $R = 100 \text{M}\Omega$  (undoped poly),  $P_{\text{standby}} \approx 0.25 \,\mu\text{W}$  per cell for  $V_{DD}$ = 5 V





- Advantages
  - Very **low standby power** consumption
  - Large noise margins than *R*-load SRAMS
  - **Operate at lower supply voltages** than *R***-load <b>SRAMS**
- **Disadvantages** 
  - Larger die area: To accommodate the n-well for pMOS transistors and polysilicon contacts. The area has been reduced by using multi-layer polysilicon and multi-layer metal processes
  - CMOS more complex process

#### **6T-SRAM** — Layout



Source: Digital Integrated Circuits 2<sup>nd</sup>



CMOS Digital Integrated Circuits

#### **CMOS SRAM Cell Design strategy** Two basic requirements which dictate *W/L* ratios

- 1. Data-read operation should **not destroy data** in the cell
- **2.** Allow modification of stored data during data-write operation Pull-up transistor (one per column)



- Read "0" operation
  - » at **t=0**<sup>-</sup>: **V**<sub>1</sub>=**0V**, **V**<sub>2</sub>=**V**<sub>DD</sub>; **M**<sub>3</sub>, **M**<sub>4</sub> OFF; **M**<sub>2</sub>, **M**<sub>5</sub> OFF; **M**<sub>1</sub>, **M**<sub>6</sub> Linear
  - » at t=0:  $RS = V_{DD}$ ,  $M_3$  Saturation,  $M_4$  Linear;  $M_2$ ,  $M_5$  OFF;  $M_1$ ,  $M_6$  Linear
    - Slow discharge of large  $C_C$ : Require  $V_1 < V_{T,2} \Rightarrow$ Limits  $M_3$  W/L wrt  $M_1$  W/L





• **Design Constraint:**  $V_{1,max} < V_{T,2} = V_{T,n}$  to keep  $M_2$  **OFF** 

( 147 )

»  $M_3$  saturation,  $M_1$  linear  $\Rightarrow$ 

 $k_{n,3}(V_{DD}-V_1-V_{T,n})^2/2 = k_{n,1}(2(V_{DD}-V_{T,n})V_1-V_1^2)/2$ 

» Therefore,

$$\frac{k_{n,3}}{k_{n,1}} = \frac{\left(\frac{W}{L}\right)_{3}}{\left(\frac{W}{L}\right)_{1}} < \frac{2(V_{DD} - 1.5V_{T,n})V_{T,n}}{(V_{DD} - 2V_{T,n})^{2}}$$

#### Symmetry:

# **CMOS SRAM Cell Design Strategy (Cont.)**

• Write "0" operation with "1" stored in cell:



*V<sub>C</sub>* is set "0" *by data-write circuit* ("1" stored)
at *t=0*<sup>-</sup>: *V<sub>1</sub>=V<sub>DD</sub>*, *V<sub>2</sub>=0V*; *M<sub>3</sub>*, *M<sub>4</sub>* OFF; *M<sub>2</sub>*, *M<sub>5</sub>* Linear; *M<sub>1</sub>*, *M<sub>6</sub>* OFF
at *t=0*: *V<sub>C</sub>=0V*, *V<sub>C</sub>=V<sub>DD</sub>*; *M<sub>3</sub>*, *M<sub>4</sub>* saturation; *M<sub>2</sub>*, *M<sub>5</sub>* Linear; *M<sub>1</sub>*, *M<sub>6</sub>* OFF
write "0"⇒*V<sub>1</sub>*: *V<sub>DD</sub>→0(<V<sub>2T,n</sub>*) and *V<sub>2</sub>:0→V<sub>DD</sub>(M<sub>2</sub>→OFF)*



### **CMOS SRAM Cell Design Strategy (Cont.)**

• **Design constraint:**  $V_{1,max} < V_{T,2} = V_{T,n}$  to keep  $M_2$  **OFF** 

- » When  $V_1 = V_{T,n}$ :  $M_3$  Linear and  $M_5$  saturation  $\Rightarrow$ 
  - $k_{p,5}(0-V_{DD}-V_{T,p})^2/2 = k_{n,3}(2(V_{DD}-V_{T,n})V_{T,n}-V_{T,n}^2)/2$
- »  $V_1 \leq V_{T,n}$ , i.e.  $M_2(M_1)$  forced OFF





| W | DATA | WB | WB | <b>Operation (M3 on)</b>                                                  |
|---|------|----|----|---------------------------------------------------------------------------|
| 0 | 1    | 0  | 1  | $M_1$ off, $M_2$ on $\Rightarrow V_{\overline{C}} \rightarrow \text{low}$ |
| 0 | 0    | 1  | 0  | $M_1$ on, $M_2$ off $\Rightarrow$ $V_C \rightarrow$ low                   |
| 1 | X    | 0  | 0  | $M_1$ off, $M_2$ off $\Rightarrow$ $V_C$ , $V_{\overline{C}}$ -no change  |

### **SRAM Read Circuit**

VDD Source coupled differential amplifier V<sub>01</sub>  $V_{o2}$  $V_C$  –  $V_X$  $I_{D1} = \frac{k_n}{2} (V_C - V_X - V_{T1,n})^2$  $I_{D2} = \frac{k_n}{2} (V_{\overline{c}} - V_X - V_{T2,n})^2$   $A_{sense} = \frac{\partial (V_{o1} - V_{o2})}{\partial (V_c - V_{\overline{c}})} = -g_m R \quad \text{Increase } R \rightarrow$   $g_m = \frac{\partial I_D}{\partial V} = \sqrt{2k_n I_D} \quad \text{Use cascade}$ Use active load  $\partial V_{GS}$ 



## **Sense Amp Operation**





Source: Digital Integrated Circuits 2<sup>nd</sup>

## **Fast Sense Amplifier**



- $V_C < V_{\overline{C}}: M_1 \Rightarrow OFF, V_o$  decreases,  $V_{ON} \Rightarrow High$
- $V_C > V_{\overline{C}}$ :  $M_2 \Rightarrow OFF$ ,  $V_o$  remains high,  $V_{ON} = Low$

 $A_{sense} = -g_{m2}(r_{o2}||r_{o5})$ 



## **Two-Stage differential Current-Mirror Amplifier Sense Circuit**





CMOS Digital Integrated Circuits

## **Typical Dynamic Response for One and Two Stage Sense Amplifier Circuits**





**CMOS** Digital Integrated Circuits

## **Cross-Coupled nMOS Sense Amplifier**



- Assume: *M*<sub>3</sub> OFF, *V*<sub>C</sub> and *V*<sub>C</sub> are initially precharged to *V*<sub>DD</sub>
- Access: *V<sub>C</sub>* drops slightly less than *V<sub>c</sub>*
- $M_3 \Rightarrow$  ON and  $V_C < V_C$ :  $M_1$  ON first, pulling  $V_C$  lower  $M_2$  turns OFF,  $C_C$  discharge via  $M_1$

and  $M_3$ 

### Enhances differential voltage V<sub>C</sub> - V<sub>C</sub> Does not generate output logic level



## **Dynamic Read-Write Memory (DRAM) Circuits**

- **SRAM:** 4~6 transistors per bit
  - 4~5 lines connecting as charge on capacitor
- **DRAM:** Data bit is stored as charge on capacitor

Reduced die area

Require periodic refresh



#### **Four-Transistor DRAM Cell**



## **DRAM Circuits (Cont.)**



#### **Three-Transistor DRAM Cell**

No constraints on device ratios Reads are non-destructive Value stored at node X when writing a "1" =  $V_{WWL}$ - $V_{Tn}$ 



## **3T-DRAM** — Layout



<u>(222)</u>4

Source: Digital Integrated Circuits 2<sup>nd</sup>

## **One-Transistor DRAM Cell**



**One-Transistor DRAM Cell** 

- **Industry standard** for high density dram arrays
- **Smallest** component count and silicon area per bit
- Separate or "**explicit**" capacitor (dual poly) per cell





- The binary information is stored as the charge in  $C_1$
- **Storage transistor**  $M_2$  is on or off depending on the charge in  $C_1$
- **Pass transistors** *M*<sub>1</sub> **and** *M*<sub>3</sub>**:** access switches
- Two separate bit lines for "data read" and "data write"



- The operation is based on a **two-phase non-overlapping clock scheme** 
  - » The precharge events are driven by  $\phi_1$ , and the "read" and "write" operations are driven by  $\phi_2$ .
  - » Every "read" and "write" operation is preceded by a precharge cycle, which is initiated with *PC* going high.







- **Read "1" OP**: *DATA* = 0, *WS* = 0; *RS* = 1
  - »  $M_2$ ,  $M_3$   $ON \Rightarrow C_3$ ,  $C_1$  discharges through  $M_2$  and  $M_3$ , and the falling column voltage is interpreted bt the "data read" circuitry as a stored logic "1".





• Write "0" OP: *DATA* = 1, *WS* = 1; *RS* = 0

»  $M_2$ ,  $M_3$   $ON \Rightarrow C_2$  and  $C_1$  discharge to 0 through  $M_1$  and *data\_in nMOS*.



**CMOS** Digital Integrated Circuits



- **Read "0" OP**: *DATA* = 1, *WS* = 0; *RS* = 1
  - » C<sub>3</sub> does not discharge due to M<sub>2</sub> OFF, and the logic-high level on the Data\_out column is interpreted by the data read circuitry as a stored "0" bit.



## **Operation of One-Transistor DRAM Cell**



- Write "1" OP: BL = 1, WL = 1 ( $M_1$  ON) $\Rightarrow$  $C_1$  charges to "1"
- Write "0" OP: BL = 0, WL = 1 ( $M_1$  ON) $\Rightarrow$  $C_1$  discharges to "0"
- **Read OP:** destroys stored charge on  $C_1 \Rightarrow$  destructive refresh is needed after every data read operation



## Appendix

Derivation of 
$$\frac{k_{n,3}}{k_{n,1}} = \frac{\left(\frac{W}{L}\right)_{3}}{\left(\frac{W}{L}\right)_{1}} < \frac{2(V_{DD} - 1.5V_{T,n})V_{T,n}}{\left(V_{DD} - 2V_{T,n}\right)^{2}}$$

 $k_{n,3}(V_{DD}-V_{1}-V_{T,n})^{2}/2 = k_{n,1}(2(V_{DD}-V_{T,n})V_{1}-V_{1}^{2})/2$ 

Therefore,

$$\frac{k_{n,3}}{k_{n,1}} = \frac{\left(\frac{W}{L}\right)_{3}}{\left(\frac{W}{L}\right)_{1}} = -1 + \frac{\left(V_{DD} - V_{T,n}\right)^{2}}{\left(V_{DD} - V_{1} - V_{T,n}\right)^{2}} < -1 + \frac{\left(V_{DD} - V_{T,n}\right)^{2}}{\left(V_{DD} - 2V_{T,n}\right)^{2}} = \frac{2\left(V_{DD} - 1.5V_{T,n}\right)}{\left(V_{DD} - 2V_{T,n}\right)^{2}}$$

