TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

更新时间:2023-09-04 07:49:01 阅读量: 教育文库 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

TMS320C64x DSPTwo-Level Internal MemoryReference Guide

Literature Number: SPRU610B

August 2004

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

IMPORTANT NOTICE

Texas Instruments Incorporated and its subsidiaries (TI) reserve the right to make corrections, modifications,enhancements, improvements, and other changes to its products and services at any time and to discontinueany product or service without notice. Customers should obtain the latest relevant information before placingorders and should verify that such information is current and complete. All products are sold subject to TI’s termsand conditions of sale supplied at the time of order acknowledgment.

TI warrants performance of its hardware products to the specifications applicable at the time of sale inaccordance with TI’s standard warranty. Testing and other quality control techniques are used to the extent TIdeems necessary to support this warranty. Except where mandated by government requirements, testing of allparameters of each product is not necessarily performed.

TI assumes no liability for applications assistance or customer product design. Customers are responsible fortheir products and applications using TI components. To minimize the risks associated with customer productsand applications, customers should provide adequate design and operating safeguards.

TI does not warrant or represent that any license, either express or implied, is granted under any TI patent right,copyright, mask work right, or other TI intellectual property right relating to any combination, machine, or processin which TI products or services are used. Information published by TI regarding third-party products or servicesdoes not constitute a license from TI to use such products or services or a warranty or endorsement http://www.77cn.com.cne of such information may require a license from a third party under the patents or other intellectual propertyof the third party, or a license from TI under the patents or other intellectual property of TI.

Reproduction of information in TI data books or data sheets is permissible only if reproduction is withoutalteration and is accompanied by all associated warranties, conditions, limitations, and notices. Reproductionof this information with alteration is an unfair and deceptive business practice. TI is not responsible or liable forsuch altered documentation.

Resale of TI products or services with statements different from or beyond the parameters stated by TI for thatproduct or service voids all express and any implied warranties for the associated TI product or service andis an unfair and deceptive business practice. TI is not responsible or liable for any such statements.

Following are URLs where you can obtain information on other Texas Instruments products and applicationsolutions:

Products

Amplifiers

Data Converters

DSP

Interface

Logic

Power Mgmt

http://www.77cn.com.cnApplicationsAudioAutomotiveBroadbandDigital ControlMilitaryOptical NetworkingSecurity

Telephony

Video & Imaging

Wireless

Mailing Address:Texas Instruments

Post Office Box 655303 Dallas, Texas 75265

Copyright 2004, Texas Instruments http://www.77cn.com.cn/http://www.77cn.com.cn/http://www.77cn.com.cn/http://www.77cn.com.cn/http://www.77cn.com.cn/http://www.77cn.com.cn/http://www.77cn.com.cn/http://www.77cn.com.cn/http://www.77cn.com.cn/http://www.77cn.com.cn/wireless

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

PrefaceRead This First

About This Manual

The TMS320C64x digital signal processors (DSPs) of the

TMS320C6000 DSP family have a two-level memory architecture for

program and data. The first-level program cache is designated L1P, and the

first-level data cache is designated L1D. Both the program and data memory

share the second-level memory, designated L2. L2 is configurable, allowing

for various amounts of cache and SRAM. This document discusses the C64x

two-level internal memory.

Notational Conventions

This document uses the following conventions.

-Hexadecimal numbers are shown with the suffix h. For example, thefollowing number is 40 hexadecimal (decimal 64): 40h.

-Registers in this document are shown in figures and described in tables.

JEach register figure shows a rectangle divided into fields that representthe fields of the register. Each field is labeled with its bit name, its

beginning and ending bit numbers above, and its read/write properties

below. A legend explains the notation used for the properties.

Reserved bits in a register figure designate a bit that is used for futuredevice expansion.J

Related Documentation From Texas Instruments

The following documents describe the C6000 devices and related support

tools. Copies of these documents are available on the Internet at http://www.77cn.com.cn.

Tip: Enter the literature number in the search box provided at http://www.77cn.com.cn.

TMS320C6000 CPU and Instruction Set Reference Guide (literature

number SPRU189) describes the TMS320C6000 CPU architecture,

instruction set, pipeline, and interrupts for these digital signal processors.

TMS320C6000 DSP Peripherals Overview Reference Guide (literature

number SPRU190) describes the peripherals available on the

TMS320C6000 DSPs.

SPRU610BTMS320C64x Two-Level Internal Memory3

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

TrademarksRelated Documentation From Texas Instruments/Trademarks

TMS320C64x Technical Overview (SPRU395) The TMS320C64x technical

overview gives an introduction to the TMS320C64x digital signal

processor, and discusses the application areas that are enhanced by the

TMS320C64x VelociTI .

TMS320C6000 DSP Cache User’s Guide (literature number SPRU656)

explains the fundamentals of memory caches and describes how to

efficiently utilize the two-level cache-based memory architecture in the

digital signal processors (DSPs) of the TMS320C6000 DSP family. It

shows how to maintain coherence with external memory, how to use

DMA to reduce memory latencies, and how to optimize your code to

improve cache efficiency.

TMS320C6000 Programmer’s Guide (literature number SPRU198)

describes ways to optimize C and assembly code for the

TMS320C6000 DSPs and includes application program examples.

TMS320C6000 Code Composer Studio Tutorial (literature number

SPRU301) introduces the Code Composer Studio integrated develop-

ment environment and software tools.

Code Composer Studio Application Programming Interface Reference

Guide (literature number SPRU321) describes the Code Composer

Studio application programming interface (API), which allows you to

program custom plug-ins for Code Composer.

TMS320C6x Peripheral Support Library Programmer’s Reference

(literature number SPRU273) describes the contents of the

TMS320C6000 peripheral support library of functions and macros. It

lists functions and macros both by header file and alphabetically,

provides a complete description of each, and gives code examples to

show how they are used.

TMS320C6000 Chip Support Library API Reference Guide (literature

number SPRU401) describes a set of application programming interfaces

(APIs) used to configure and control the on-chip peripherals.

Trademarks

Code Composer Studio, C6000, C62x, C64x, C67x, TMS320C6000,

TMS320C62x, TMS320C64x, TMS320C67x, and VelociTI are trademarks of

Texas Instruments.

4T MS320C64x Two-Level Internal MemorySPRU610B

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

Contents

Contents

1

2

3Memory Hierarchy Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Cache Terms and Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Level 1 Data Cache (L1D). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1L1D Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2L1D Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.1L1D Memory Banking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.2L1D Miss Penalty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.3L1D Write Buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.4L1D Miss Pipelining. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Level 1 Program Cache (L1P). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.1L1P Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2L1P Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2.1L1P Miss Penalty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2.2L1P Miss Pipelining. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Level 2 Unified Memory (L2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.1L2 Cache and L2 SRAM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.2L2 Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.3L2 Bank Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.4L2 Interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.4.1L1D/L1P-to-L2 Request Servicing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.4.2EDMA-to-L2 Request Servicing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.4.3L2 Request Servicing Using EDMA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.4.4EDMA Access to Cache Controls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.4.5HPI and PCI Access to Memory Subsystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.1Cache Configuration Register (CCFG). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.2L2 EDMA Access Control Register (EDMAWEIGHT). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.3L2 Allocation Registers (L2ALLOC0 L2ALLOC03). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6.4L2 Writeback Base Address Register (L2WBAR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.5L2 Writeback Word Count Register (L2WWC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.6L2 Writeback Invalidate Base Address Register (L2WIBAR). . . . . . . . . . . . . . . . . . . . . . 44

6.7L2 Writeback Invalidate Word Count Register (L2WIWC). . . . . . . . . . . . . . . . . . . . . . . . . 44

TMS320C64x Two-Level Internal Memory5456SPRU610B

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

Contents

6.8L2 Invalidate Base Address Register (L2IBAR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6.9L2 Invalidate Word Count Register (L2IWC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6.10L1P Invalidate Base Address Register (L1PIBAR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.11L1P Invalidate Word Count Register (L1PIWC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.12L1D Writeback Invalidate Base Address Register (L1DWIBAR). . . . . . . . . . . . . . . . . . . 47

6.13L1D Writeback Invalidate Word Count Register (L1DWIWC). . . . . . . . . . . . . . . . . . . . . . 47

6.14L1D Invalidate Base Address Register (L1DIBAR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.15L1D Invalidate Word Count Register (L1DIWC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.16L2 Writeback All Register (L2WB). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.17L2 Writeback Invalidate All Register (L2WBINV). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.18L2 Memory Attribute Registers (MAR0 MAR255). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517Memory System Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.1Cache Mode Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.1.1L1D Mode Selection Using DCC Field in CSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.1.2L1P Mode Selection Using PCC Field in CSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7.1.3L2 Mode Selection Using L2MODE Field in CCFG. . . . . . . . . . . . . . . . . . . . . . . . 53

7.2Cacheability Controls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

7.3Program-Initiated Cache Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

7.3.1Global Cache Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

7.3.2Block Cache Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

7.3.3Effect of L2 Commands on L1 Caches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.4L2-to-EDMA Request Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7.5EDMA Access Into L2 Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688Memory System Policies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

8.1Memory System Coherence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

8.2EDMA Coherence in L2 SRAM Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

8.3Memory Access Ordering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

8.3.1Program Order of Memory Accesses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

8.3.2Strong and Relaxed Memory Ordering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Revision History. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816T MS320C64x Two-Level Internal MemorySPRU610B

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

Figures

Figures

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35TMS320C64x DSP Block Diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9TMS320C64x Two-Level Internal Memory Block Diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . 12L1D Address Allocation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Address to Bank Number Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Potentially Conflicting Memory Accesses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21L1P Address Allocation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27L2 Address Allocation, 256K Cache (L2MODE = 111b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31L2 Address Allocation, 128K Cache (L2MODE = 011b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31L2 Address Allocation, 64K Cache (L2MODE = 010b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31L2 Address Allocation, 32K Cache (L2MODE = 001b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Cache Configuration Register (CCFG). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39L2 EDMA Access Control Register (EDMAWEIGHT). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41L2 Allocation Registers (L2ALLOC0 L2ALLOC3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42L2 Writeback Base Address Register (L2WBAR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43L2 Writeback Word Count Register (L2WWC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43L2 Writeback Invalidate Base Address Register (L2WIBAR). . . . . . . . . . . . . . . . . . . . . . . . . 44L2 Writeback Invalidate Word Count Register (L2WIWC). . . . . . . . . . . . . . . . . . . . . . . . . . . 44L2 Invalidate Base Address Register (L2IBAR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45L2 Invalidate Word Count Register (L2IWC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45L1P Invalidate Base Address Register (L1PIBAR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46L1P Invalidate Word Count Register (L1PIWC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46L1D Writeback Invalidate Base Address Register (L1DWIBAR). . . . . . . . . . . . . . . . . . . . . . 47L1D Writeback Invalidate Word Count Register (L1DWIWC). . . . . . . . . . . . . . . . . . . . . . . . 47L1D Invalidate Base Address Register (L1DIBAR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48L1D Invalidate Word Count Register (L1DIWC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48L2 Writeback All Register (L2WB). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49L2 Writeback-Invalidate All Register (L2WBINV). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50L2 Memory Attribute Register (MAR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51CPU Control and Status Register (CSR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Block Cache Operation Base Address Register (BAR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Block Cache Operation Word Count Register (WC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Streaming Data Pseudo-Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Double Buffering Pseudo-Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Double-Buffering Time Sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Double Buffering as a Pipelined Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

TMS320C64x Two-Level Internal Memory7SPRU610B

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

Tables

Tables

1TMS320C621x/C671x/C64x Internal Memory Comparison. . . . . . . . . . . . . . . . . . . . . . . . . . 102Terms and Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133Cycles Per Miss for Different Numbers of L1D Misses That Hit L2 Cache. . . . . . . . . . . . . . 264Cycles Per Miss for Different Numbers of L1D Misses that Hit L2 SRAM. . . . . . . . . . . . . . . 265Average Miss Penalties for Large Numbers of Sequential Execute Packets. . . . . . . . . . . . 296Internal Memory Control Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387Cache Configuration Register (CCFG) Field Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . 398L2 EDMA Access Control Register (EDMAWEIGHT) Field Descriptions. . . . . . . . . . . . . . . 419L2 Allocation Registers (L2ALLOC0 L2ALLOC3) Field Descriptions. . . . . . . . . . . . . . . . . . 4210L2 Writeback Base Address Register (L2WBAR) Field Descriptions. . . . . . . . . . . . . . . . . . . 4311L2 Writeback Word Count Register (L2WWC) Field Descriptions. . . . . . . . . . . . . . . . . . . . . 4312L2 Writeback Invalidate Base Address Register (L2WIBAR) Field Descriptions. . . . . . . . 4413L2 Writeback Invalidate Word Count Register (L2WIWC) Field Descriptions. . . . . . . . . . . 4414L2 Invalidate Base Address Register (L2IBAR) Field Descriptions. . . . . . . . . . . . . . . . . . . . 4515L2 Invalidate Word Count Register (L2IWC) Field Descriptions. . . . . . . . . . . . . . . . . . . . . . . 4516L1P Invalidate Base Address Register (L1PIBAR) Field Descriptions. . . . . . . . . . . . . . . . . 4617L1P Invalidate Word Count Register (L1PIWC) Field Descriptions. . . . . . . . . . . . . . . . . . . . 4618L1D Writeback Invalidate Base Address Register (L1DWIBAR) Field Descriptions. . . . . 4719L1D Writeback Invalidate Word Count Register (L1DWIWC) Field Descriptions. . . . . . . . 4720L1D Invalidate Base Address Register (L1DIBAR) Field Descriptions. . . . . . . . . . . . . . . . . 4821L1D Invalidate Word Count Register (L1DIWC) Field Descriptions. . . . . . . . . . . . . . . . . . . . 4822L2 Writeback All Register (L2WB) Field Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4923L2 Writeback Invalidate All Register (L2WBINV) Field Descriptions. . . . . . . . . . . . . . . . . . . 5024Memory Attribute Register (MAR) Field Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5125L1D Mode Setting Using DCC Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5226L1P Mode Setting Using PCC Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5327L2 Mode Switch Procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5528Memory Attribute Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5729Summary of Program-Initiated Cache Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6030L2ALLOC Default Queue Allocations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6831Coherence Assurances in the Two-Level Memory System. . . . . . . . . . . . . . . . . . . . . . . . . . . 7032Program Order for Memory Operations Issued From a Single Execute Packet. . . . . . . . . 7633Document Revision History. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818T MS320C64x Two-Level Internal MemorySPRU610B

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

TMS320C64x Two-Level Internal Memory

The TMS320C621x, TMS320C671x, and TMS320C64x digital signal

processors (DSPs) of the TMS320C6000 DSP family have a two-level

memory architecture for program and data. The first-level program cache is

designated L1P, and the first-level data cache is designated L1D. Both the

program and data memory share the second-level memory, designated L2. L2

is configurable, allowing for various amounts of cache and SRAM. This

document discusses the C64x two-level internal memory. For a discussion

of the C621x/C671x two-level internal memory, see TMS320C621x/C671x DSP

Two-Level Internal Memory Reference Guide (SPRU609).

1Memory Hierarchy Overview

Figure 1 shows the block diagram of the C64x DSP. Table 1 summarizes the

differences between the C621x/C671x and C64x internal memory. Figure 2

illustrates the bus connections between the CPU, internal memories, and the

enhanced DMA (EDMA) of the C6000 DSP.

Figure 1.

TMS320C64x DSP Block Diagram

Note:EMIFB is available only on certain C64x devices. Refer to the device-specific data sheet for the available peripheral set.SPRU610BTMS320C64x Two-Level Internal Memory9

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

Memory Hierarchy Overview

Table 1.TMS320C621x/C671x/C64x Internal Memory Comparison

TMS320C621x/C671x DSPTMS320C64x DSP

Internal memory structureL1P size

L1P organization

L1P CPU access time

L1P line size

L1P read miss action

L1P read hit action

L1P write miss action

L1P write hit action

L1P → L2 request size

L1P protocol

L1P memory

L1P → L2 single request stall

L1P → L2 minimum cycles between

pipelined misses

L1D size

L1D organization

L1D CPU access time

L1D line size

L1D replacement strategy

L1D banking

L1D read miss action

L1D read hit action

L1D write miss action

L1D write hit action

L1D protocol

L1D → L2 request size Two Level4 KbytesDirect mapped1 cycle64 bytes1 line allocated in L1PData read from L1PL1P writes not supportedL1P writes not supported2 fetches/L1P lineRead Allocate1 fetch/L1P lineRead Allocate; Pipelined MissesSingle-cycle RAM5 cycles for L2 hitPipelined misses not supported4 Kbytes2-way set associative1 cycle32 bytes64 bytes8 cycles for L2 hit1 cycle16 Kbytes32 bytes16 Kbytes2-way Least Recently Used64-bit-wide dual-ported RAM8 × 32 bit banks1 line allocated in L1DData read from L1DNo allocation in L1D, data sent to L2Data updated in L1D; line marked dirtyRead AllocateRead allocate; Pipelined Misses2 fetches/L1D lineSome C64x devices may not support the 256K cache mode. Refer to the device-specific datasheet.

10TMS320C64x Two-Level Internal MemorySPRU610B

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

Memory Hierarchy Overview

Table 1.TMS320C621x/C671x/C64x Internal Memory Comparison (Continued)

TMS320C621x/C671x DSPTMS320C64x DSP6 cycles/L2 SRAM hit8 cycles/L2 cache hit4 cycles for L2 hitL1D → L2 single request stall

L1D → L2 minimum cycles between

pipelined misses

L2 total size

L2 SRAM size

L2 cache size

L2 organization

L2 line size

L2 replacement strategy

L2 banking

L2-L1P protocol

L2-L1D protocol

L2 protocol

L2 read miss action

L2 read hit action

L2 write miss action

L2 write hit action

L2 → L1P read path width

L2 → L1D read path width

L1D → L2 write path width

L1D → L2 victim path width

L2 → EDMA read path width

L2 → EDMA write path width Pipelined misses not supported2 cyclesVaries by part number. Refer to the datasheet for the specific device.Varies by part number. Refer to the datasheet for the specific device.0/16/32/48/64 Kbytes1/2/3/4-way set associative0/32/64/128/256 Kbytes4-way set associative cache128 bytes1/2/3/4-way Least Recently Used4 × 64 bit banks4-way Least Recently Used8 × 64 bit banksCoherency invalidatesCoherency snoop-invalidatesCoherency snoops andsnoop-invalidatesRead and Write AllocateData is read via EDMA into newly allocated line in L2; requesteddata is passed to the requesting L1Data read from L2Data is read via EDMA into newly allocated line in L2; write data isthen written to the newly allocated line.Data is written into hit L2 location256 bit128 bit32 bit128 bit64 bit64 bit256 bit64 bit256 bitSome C64x devices may not support the 256K cache mode. Refer to the device-specific datasheet.

SPRU610BTMS320C64x Two-Level Internal Memory11

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

Memory Hierarchy Overview

Figure 2.TMS320C64x Two-Level Internal Memory Block Diagram

12TMS320C64x Two-Level Internal MemorySPRU610B

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

Cache Terms and Definitions

2Cache Terms and Definitions

Table 2 lists the terms used throughout this document that relate to the

operation of the C64x two-level memory hierarchy.

Table 2.

Term

AllocationTerms and Definitions DefinitionThe process of finding a location in the cache to store newly cached data. This

process can include evicting data that is presently in the cache to make room for the

new data.

The number of line frames in each set. This is specified as the number of ways in the

cache.

A cache miss that occurs because the cache does not have sufficient room to hold the

entire working set for a program. Compare with compulsory miss and conflict miss.

A cache line that is valid and that has not been written to by upper levels of memory

or the CPU. The opposite state for a valid cache line is dirty.

Informally, a memory system is coherent if any read of a data item returns the most

recently written value of that data item. This includes accesses by the CPU and theEDMA. Cache coherence is covered in more detail in section 8.1.

Sometimes referred to as a first-reference miss. A compulsory miss is a cache miss

that must occur because the data has had no prior opportunity to be allocated in the

cache. Typically, compulsory misses for particular pieces of data occur on the first

access of that data. However, some cases can be considered compulsory even if

they are not the first reference to the data. Such cases include repeated write misses

on the same location in a cache that does not write allocate, and cache misses to

noncacheable locations. Compare with capacity miss and conflict miss.

A cache miss that occurs due to the limited associativity of a cache, rather than due

to capacity constraints. A fully-associative cache is able to allocate a newly cached

line of data anywhere in the cache. Most caches have much more limited

associativity (see set-associative cache), and so are restricted in where they may

place data. This results in additional cache misses that a more flexible cache would

not experience.

A direct-mapped cache maps each address in the lower-level memory to a single

location in the cache. Multiple locations may map to the same location in the cache.

This is in contrast to a multi-way set-associative cache, which selects a place for the

data from a set of locations in the cache. A direct-mapped cache can be considered

a single-way set-associative cache.

In a writeback cache, writes that reach a given level in the memory hierarchy may

update that level, but not the levels below it. Thus, when a cache line is valid and

contains updates that have not been sent to the next lower level, that line is said to

be dirty. The opposite state for a valid cache line is clean.AssociativityCapacity missCleanCoherenceCompulsory missConflict missDirect-mapped cacheDirty

SPRU610BTMS320C64x Two-Level Internal Memory13

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

Cache Terms and Definitions

Table 2.

Term

DMATerms and Definitions (Continued)DefinitionDirect Memory Access. Typically, a DMA operation copies a block of memory from

one range of addresses to another, or transfers data between a peripheral and

memory. On the C64x DSP, DMA transfers are performed by the enhanced DMA

(EDMA) engine. These DMA transfers occur in parallel to program execution. From a

cache coherence standpoint, EDMA accesses can be considered accesses by a

parallel processor.

The process of removing a line from the cache to make room for newly cached data.

Eviction can also occur under user control by requesting a writeback-invalidate for an

address or range of addresses from the cache. The evicted line is referred to as the

victim. When a victim line is dirty (that is, it contains updated data), the data must be

written out to the next level memory to maintain coherency.

A block of instructions that begin execution in parallel in a single cycle. An execute

packet may contain between 1 and 8 instructions.

A block of 8 instructions that are fetched in a single cycle. One fetch packet may

contain multiple execute packets, and thus may be consumed over multiple cycles.

A cache miss that occurs on the first reference to a piece of data. First-reference

misses are a form of compulsory miss.

A cache that allows any memory address to be stored at any location within the

cache. Such caches are very flexible, but usually not practical to build in hardware.

They contrast sharply with direct-mapped caches and set-associative caches, both of

which have much more restrictive allocation policies. Conceptually, fully-associative

caches are useful for distinguishing between conflict misses and capacity misses

when analyzing the performance of a direct-mapped or set-associative cache. In

terms of set-associative caches, a fully-associative cache is equivalent to a

set-associative cache that has as many ways as it does line frames, and that has

only one set.

In a hierarchical memory system, higher-level memories are memories that are

closer to the CPU. The highest level in the memory hierarchy is usually the Level 1

caches. The memories at this level exist directly next to the CPU. Higher-level

memories typically act as caches for data from lower-level memory.

A cache hit occurs when the data for a requested memory location is present in the

cache. The opposite of a hit is a miss. A cache hit minimizes stalling, since the data

can be fetched from the cache much faster than from the source memory. Thedetermination of hit versus miss is made on each level of the memory hierarchy

separately—a miss in one level may hit in a lower level.EvictionExecute packetFetch packetFirst-reference missFully-associativecacheHigher-level memoryHit

14TMS320C64x Two-Level Internal MemorySPRU610B

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

Cache Terms and Definitions

Table 2.

Term

InvalidateTerms and Definitions (Continued)DefinitionThe process of marking valid cache lines as invalid in a particular cache. Alone, this

action discards the contents of the affected cache lines, and does not write back any

updated data. When combined with a writeback, this effectively updates the next

lower level of memory that holds the data, while completely removing the cached

data from the given level of memory. Invalidates combined with writebacks are

referred to as writeback-invalidates, and are commonly used for retaining coherence

between caches.

For set-associative and fully-associative caches, least-recently used allocation refers

to the method used to choose among line frames in a set when allocating space in

the cache. When all of the line frames in the set that the address maps to contain

valid data, the line frame in the set that was read or written the least recently (furthest

back in time) is selected to hold the newly cached data. The selected line frame is

then evicted to make room for the new data.

A cache line is the smallest block of data that the cache operates on. The cache line

is typically much larger than the size of data accesses from the CPU or the next

higher level of memory. For instance, although the CPU may request single bytes

from memory, on a read miss the cache reads an entire line’s worth of data to satisfy

the request.

A location in a cache that holds cached data (one line), an associated tag address,

and status information for the line. The status information can include whether the

line is valid, dirty, and the current state of that line’s LRU.

The size of a single cache line, in bytes.

When a CPU request misses both the first-level and second-level caches, the data is

fetched from the external memory and stored to both the first-level and second-level

cache simultaneously. A cache that stores data and sends that data to the

upper-level cache at the same time is a load-through cache. Using a load-through

cache reduces the stall time compared to a cache that first stores the data in a lower

level and then sends it to the higher-level cache as a second step.Least Recently Used(LRU) allocationLineLine frameLine sizeLoad through

Long-distance accessAccesses made by the CPU to a noncacheable memory. Long-distance accesses

are used when accessing external memory that is not marked as cacheable.

Lower-level memoryIn a hierarchical memory system, lower-level memories are memories that are further

from the CPU. In a C64x system, the lowest level in the hierarchy includes the

system memory below L2 and any memory-mapped peripherals.

Least Recently Used. See least recently used allocation for a description of the LRU

replacement policy. When used alone, LRU usually refers to the status information

that the cache maintains for identifying the least-recently used line in a set. For

example, consider the phrase “accessing a cache line updates the LRU for that line.”LRU

SPRU610BTMS320C64x Two-Level Internal Memory15

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

Cache Terms and Definitions

Table 2.

TermTerms and Definitions (Continued)Definition

Defines what order the effects of memory operations are made visible in memory.

(This is sometimes referred to as consistency.) Strong memory ordering at a given

level in the memory hierarchy indicates it is not possible to observe the effects of

memory accesses in that level of memory in an order different than program order.

Relaxed memory ordering allows the memory hierarchy to make the effects of

memory operations visible in a different order. Note that strong ordering does not

require that the memory system execute memory operations in program order, only

that it makes their effects visible to other requestors in an order consistent withprogram order. Section 8.3 covers the memory ordering assurances that the C64x

memory hierarchy provides.

A cache miss occurs when the data for a requested memory location is not in the

cache. A miss may stall the requestor while the line frame is allocated and data is

fetched from the next lower level of memory. In some cases, such as a CPU write

miss from L1D, it is not strictly necessary to stall the CPU. Cache misses are often

divided into three categories: compulsory misses, conflict misses, and capacity

misses.

The process of servicing a single cache miss is pipelined over several cycles. By

pipelining the miss, it is possible to overlap the processing of several misses, should

many occur back-to-back. The net result is that much of the overhead for the

subsequent misses is hidden, and the incremental stall penalty for the additional

misses is much smaller than that for a single miss taken in isolation.

A read-allocate cache only allocates space in the cache on a read miss. A write miss

does not cause an allocation to occur unless the cache is also a write-allocate cache.

For caches that do not write allocate, the write data would be passed on to the next

lower-level cache.

A collection of line frames in a cache that a single address can potentially reside. A

direct-mapped cache contains one line frame per set, and an N-way set-associative

cache contains N line frames per set. A fully-associative cache has only one set that

contains all of the line frames in the cache.

A set-associative cache contains multiple line frames that each lower-level memory

location can be held in. When allocating room for a new line of data, the selection is

made based on the allocation policy for the cache. The C64x devices employ a least

recently used allocation policy for its set-associative caches.

A method by which a lower-level memory queries a higher-level memory to

determine if the higher-level memory contains data for a given address. The primary

purpose of snoops is to retain coherency, by allowing a lower-level memory to

request updates from a higher-level memory. A snoop operation may trigger a

writeback, or more commonly, a writeback-invalidate. Snoops that trigger

writeback-invalidates are sometimes called snoop-invalidates.Memory orderingMissMiss pipeliningRead allocateSetSet-associativecacheSnoop

16TMS320C64x Two-Level Internal MemorySPRU610B

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

Cache Terms and Definitions

Table 2.

Term

TagTerms and Definitions (Continued)DefinitionA storage element containing the most-significant bits of the address stored in a

particular line. Tag addresses are stored in special tag memories that are not directly

visible to the CPU. The cache queries the tag memories on each access to

determine if the access is a hit or a miss.

An algorithm is said to thrash the cache when its access pattern causes the

performance of the cache to suffer dramatically. Thrashing can occur for multiple

reasons. One possible situation is that the algorithm is accessing too much data or

program code in a short time frame with little or no reuse. That is, its working set is

too large, and thus the algorithm is causing a significant number of capacity misses.

Another situation is that the algorithm is repeatedly accessing a small group of

different addresses that all map to the same set in the cache, thus causing an

artificially high number of conflict misses.

A memory operation on a given address is said to touch that address. Touch can also

refer to reading array elements or other ranges of memory addresses for the sole

purpose of allocating them in a particular level of the cache. A CPU-centric loop used

for touching a range of memory in order to allocate it into the cache is often referred

to as a touch loop. Touching an array is a form of software-controlled prefetch for data.

When a cache line holds data that has been fetched from the next level memory, that

line frame is valid. The invalid state occurs when the line frame holds no data, either

because nothing has been cached yet, or because previously cached data has been

invalidated for whatever reason (coherence protocol, program request, etc.). The

valid state makes no implications as to whether the data has been modified since it

was fetched from the lower-level memory; rather, this is indicated by the dirty or

clean state of the line.

When space is allocated in a set for a new line, and all of the line frames in the set

that the address maps to contain valid data, the cache controller must select one of

the valid lines to evict in order to make room for the new data. Typically, the

least-recently used (LRU) line is selected. The line that is evicted is known as the

victim line. If the victim line is dirty, its contents are written to the next lower level of

memory using a victim writeback.

A special buffer that holds victims until they are written back. Victim lines are moved

to the victim buffer to make room in the cache for incoming data.

When a dirty line is evicted (that is, a line with updated data is evicted), the updated

data is written to the lower levels of memory. This process is referred to as a victim

writeback.

In a set-associative cache, each set in the cache contains multiple line frames. The

number of line frames in each set is referred to as the number of ways in the cache.

The collection of corresponding line frames across all sets in the cache is called a

way in the cache. For instance, a 4-way set-associative cache has 4 ways, and each

set in the cache has 4 line frames associated with it, one associated with each of the4 ways. As a result, any given cacheable address in the memory map has 4 possible

locations it can map to in a 4-way set-associative cache.ThrashTouchValidVictimVictim BufferVictim WritebackWay

SPRU610BTMS320C64x Two-Level Internal Memory17

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

Cache Terms and Definitions

Table 2.

Term

Working setTerms and Definitions (Continued)DefinitionThe working set for a program or algorithm is the total set of data and program code

that is referenced within a particular period of time. It is often useful to consider theworking set on an algorithm-by-algorithm basis when analyzing upper levels of

memory, and on a whole-program basis when analyzing lower levels of memory.

A write-allocate cache allocates space in the cache when a write miss occurs. Space

is allocated according to the cache’s allocation policy (LRU, for example), and the

data for the line is read into the cache from the next lower level of memory. Once the

data is present in the cache, the write is processed. For a writeback cache, only the

current level of memory is updated—the write data is not immediately passed to the

next level of memory.

The process of writing updated data from a valid but dirty cache line to a lower-level

memory. After the writeback occurs, the cache line is considered clean. Unless

paired with an invalidate (as in writeback-invalidate), the line remains valid after a

writeback.

A writeback cache will only modify its own data on a write hit. It will not immediately

send the update to the next lower-level of memory. The data will be written back at

some future point, such as when the cache line is evicted, or when the lower-level

memory snoops the address from the higher-level memory. It is also possible to

directly initiate a writeback for a range of addresses using cache control registers. A

write hit to a writeback cache causes the corresponding line to be marked as

dirty—that is, the line contains updates that have yet to be sent to the lower levels of

memory.

A writeback operation followed by an invalidation. See writeback and invalidate. On

the C64x devices, a writeback-invalidate on a group of cache lines only writes out

data for dirty cache lines, but invalidates the contents of all of the affected cache lines.

Write merging combines multiple independent writes into a single, larger write. This

improves the performance of the memory system by reducing the number of

individual memory accesses it needs to process. For instance, on the C64x device,

the L1D write buffer can merge multiple writes under some circumstances if they are

to the same double-word address. In this example, the result is a larger effective

write-buffer capacity and a lower bandwidth impact on L2.

A write-through cache passes all writes to the lower-level memory. It never contains

updated data that it has not passed on to the lower-level memory. As a result, cache

lines can never be dirty in a write-through cache. The C64x devices do not utilize

write-through caches.Write allocateWritebackWriteback cacheWriteback-invalidateWrite mergingWrite-through cache

18TMS320C64x Two-Level Internal MemorySPRU610B

TMS320C64x DSP Two Level Internal Memory Reference Guide (Rev. B)

Level 1 Data Cache (L1D)

3Level 1 Data Cache (L1D)

The level 1 data cache (L1D) services data accesses from the CPU. The

following sections describe the parameters and operation of the L1D. Theoperation of L1D is controlled by various registers, as described in section 7,Memory System Controls.

3.1L1D Parameters

The L1D is a 16K-byte cache. It is a two-way set associative cache with a

64-byte line size and 128 sets. It also features a 64-bit by 4-entry write buffer

between L1D and the L2 memory.

Physical addresses map onto the cache in a straightforward manner. The

physical address divides into three fields as shown in Figure 3. Bits 5 0 of the

address specify an offset within the line. Bits 12 6 of the address select one

of the 128 sets within the cache. Bits 31 13 of the address serve as the tag

for the line.

Figure 3.

31L1D Address Allocation1312

TagSet Index65Offset0

Because L1D is a two-way cache, each set contains two cache lines, one for

each way. On each access, the L1D compares the tag portion of the address

for the access to the tag information for both lines in the appropriate set. If the

tag matches one of the lines and that line is marked valid, the access is a hit.

If these conditions are not met, the access is a miss. Miss penalties are

discussed in detail under section 3.2.

The L1D is a read-allocate-only cache. This means that new lines are allocated

in L1D for read misses, but not for write misses. For this reason, a 4-entry write

buffer exists between the L1D and L2 caches that captures data from write

misses. The write buffer is enhanced in comparison to the write buffer on the

C621x/C671x devices. The write buffer is described in section 3.2.3.

The L1D implements a least-recently used (LRU) line allocation policy. This

means that on an L1D read miss, the L1D evicts the least-recently read or

written line within a set in order to make room for the incoming data. Note that

invalid lines are always considered least-recently used.

If the selected line is dirty, that is, its contents are updated, then the victim line’s

data is prepared for writeback to L2 as a victim writeback. The actual victim

writeback occurs after the new data is fetched, and then only if the newly

fetched data is considered cacheable. If the newly fetched data is

noncacheable, the victim writeback is cancelled and the victim line remains in

the L1D cache.

SPRU610BTMS320C64x Two-Level Internal Memory19

本文来源:https://www.bwwdw.com/article/sbgi.html

Top