Index:

PART I THEORY OF PARALLELISM

Chapter 1 Parallel Computer Models
    1.1 The State of Computing
        1.1.1 Computer Development Milestones
        1.1.2 Elements of Modern Computers
        1.1.3 Evolution of Computer Architecture
        1.1.4 System Attributes to Performance
    1.2 Multiprocessors and Multicomputers
        1.2.1 Shared-Memory Multiprocessors
        1.2.2 Distributed-Memory Multicomputers
        1.2.3 A Taxonomy of MIMD Computers
    1.3 Multivector and SIMD Computers
        1.3.1 Vector Supercomputers
        1.3.2 SIMD Supercomputers
    1.4 PRAM and VLSI Models
        1.4.1 Parallel Random-Access Machines
        1.4.2 VLSI Complexity Model
    1.5 Architectural Development Tracks
        1.5.1 Multiple-Processor Tracks
        1.5.2 Multivector and SIMD Tracks
        1.5.3 Multithreaded and Dataflow Tracks
    1.6 Bibliographic Notes and Exercises
 

Chapter 2 Program and Network Properties
    2.1 Conditions of Parallelism
        2.1.1 Data and Resource Dependences 
        2.1.2 Hardware and Software Parallelism
        2.1.3 The Role of Compilers
    2.2 Program Partitioning and Scheduling
        2.2.1 Grain Sizes and Latency
        2.2.2 Grain Packing and Scheduling
        2.2.3 Static Multiprocessor Scheduling
    2.3 Program Flow Mechanisms
        2.3.1 Control Flow Versus Data Flow
        2.3.2 Demand-Driven Mechanisms
        2.3.3 Comparison of Flow Mechanisms
    2.4 System Interconnect Architectures
        2.4.1 Network Properties and Routing
        2.4.2 Static Connection Networks
        2.4.3 Dynamic Connection Networks
    2.5 Bibliographic Notes and Exercises

Chapter 3 Principles of Scalable Performance
    3.1 Performance Metrics and Measures
        3.1.1 Parallelism Profile in Programs
        3.1.2 Harmonic Mean Performance
        3.1.3 Efficiency, Utilization, and Quality
        3.1.4 Standard Performance Measures
     3.2 Parallel Processing Applications
        3.2.1 Massive Parallelism for Grand Challenges
        3.2.2 Application Models of Parallel Computers
        3.2.3 Scalability of Parallel Algorithms
     3.3 Speed up Performance Laws
        3.3.1 Amdahl's Law for a Fixed Workload
        3.3.2 Gustafson's Law for Scaled Problems
        3.3.3 Memory-Bounded Speedup Model
     3.4 Scalability Analysis and Approaches
        3.4.1 Scalability Metrics and Goals
        3.4.2 Evolution of Scalable Computers
        3.4.3 Research Issues and Solutions
     3.5 Bibliographic Notes and Exercises

PART II HARDWARE TECHNOLOGIES

Chapter 4 Processors and Memory Hierarchy 
    4.1 Advanced Processor Technology
        4.1.1 Design Space of Processors 
        4.1.2 Instruction-Set Architectures
        4.1.3 CISC Scalar Processors
        4.1.4 RISC Scalar Processors
    4.2 Superscalar and Vector Processors
        4.2.1 Superscalar Processors
        4.2.2 The VLIWJ Architecture
        4.2.3 Vector and Symbolic Processors
    4.3 Memory Hierarchy Technology
        4.3.1 Hierarchical Memory Technology
        4.3.2 Inclusion, Coherence, and Locality 
        4.3.3 Memory Capacity Planning 
    4.4 Virtual Memory Technology
        4.4.1 Virtual Memory Models
        4.4.2 TLB, Paging, and Segmentation
        4.4.3 Memory Replacement Policies
    4.5 Bibliographic Notes and Exercises
 

Chapter 5 Bus, Cache, and Shared Memory 
    5.1 Backplane Bus Systems 
        5.1.1 Backplane Bus Specification
        5.1.2 Addressing and Timing Protocols
        5.1.3 Arbitration, Transaction, and Interrupt
        5.1.4 The IEEE Futurebus Standards
    5.2 Cache Memory Organizations
        5.2.1 Cache Addressing Models
        5.2.2 Direct Mapping and Associative Caches
        5.2.3 Set Associative and Sector Caches
        5.2.4 Cache Performance Issues
    5.3 Shared-Memory Organizations
        5.3.1 Interleaved Memory Organization
        5.3.2 Bandwidth and Fault Tolerance 
        5.3.3 Memory Allocation Schemes
    5.4 Sequential and Weak Consistency Models
        5.4.1 Atomicity and Event Ordering
        5.4.2 Sequential Consistency Model
        5.4.3 Weak Consistency Models
    5.5 Bibliographic Notes and Exercises

Chapter 6 Pipelining and Superscalar Techniques
    6.1 Linear Pipeline Processors
        6.1.1 Asynchronous and Synchronous Models
        6.1.2 Clocking and Timing Control
        6.1.3 Speedup, Efficiency, and Throughput
    6.2 Non linear Pipeline Processors
        6.2.1 Reservation and Latency Analysis
        6.2.2 Collision-Free Scheduling
        6.2.3 Pipeline Schedule Optimization
    6.3 Instruction Pipeline Design 
        6.3.1 Instruction Execution Phases
        6.3.2 Mechanisms for Instruction Pipelining 
        6.3.3 Dynamic Instruction Scheduling
        6.3.4 Branch Handling Techniques
    6.4 Arithmetic Pipeline Design
        6.4.1 Computer Arithmetic Principles
        6.4.2 Static Arithmetic Pipelines
        6.4.3 Multifunctional Arithmetic Pipelines
    6.5 Superscalar and Superpipeline Design
        6.5.1 Superscalar Pipeline Design
        6.5.2 Superpipelined Design
        6.5.3 Supersymmetry and Design Tradeoffs
    6.6 Bibliographic Notes and Exercises

PART III PARALLEL AND SCALABLE ARCHITECTURES

Chapter 7 Multiprocessors and Multicomputers
    7.1 Multiprocessor System Interconnects
        7.1.1 Hierarchical Bus Systems
        7.1.2 Crossbar Switch and Multiport Memory
        7.1.3 Multistage and Combining Networks
    7.2 Cache Coherence and Synchronization Mechanisms
        7.2.1 The Cache Coherence Problem
        7.2.2 Snoopy Bus Protocols
        7.2.3 Directory-Based Protocols
        7.2.4 Hardware Synchronization Mechanisms
    7.3 Three Generations of Multicomputers
        7.3.1 Design Choices in the Past
        7.3.2 Present and Future Development
        7.3.3 The Intel Paragon System
    7.4 Message-Passing Mechanisms
        7.4.1 Message-Routing Schemes
        7.4.2 Deadlock and Virtual Channels
        7.4.3 Flow Control Strategies
        7.4.4 Multicast Routing Algorithms
    7.5 Bibliographic Notes and Exercises
 
Chapter 8 Multivector and SIMD Computers
    8.1 Vector Processing Principles
        8.1.1 Vector Instruction Types
        8.1.2 Vector-Access Memory Schedes
        8.1.3 Past and Present Supercomputers
    8.2 Multivector Multiprocessors
        8.2.1 Performance-Directed Design Rules
        8.2.2 Cray Y-MP, C-90, and MPP 
        8.2.3 Fujitsu VP2000 and VPP500
        8.2.4 Mainframes and Minisupercomputers
    8.3 Compound Vector Processing
        8.3.1 Compound Vector Operations
        8.3.2 Vector Loops and Chaining
        8.3.3 Multipipeline Networking
    8.4 SIMD Computer Organizations
        8.4.1 Implementation Models
        8.4.2 The CM-5 Architecture
        8.4.3 The MasPar MP-1 Architecture
    8.5 The Connection Machine CM-5
        8.5.1 A Synchronized MIMD Machine
        8.5.2 The CM-5 Network Architecture
        8.5.3 Control Processors and Processing Nodes
        8.5.4 Interprocessor Comuunications
    8.6 Bibliographic Notes and Exercises

Chapter 9 Scalable, Multithreaded, and Dataflow Architectures
    9.1 Latency-Hiding Techniques
        9.1.1 Shared Virtual Memory
        9.1.2 Prefetching Techniques
        9.1.3 Distributed Coherent Caches
        9.1.4 Scalable Coherence Interface
        9.1.5 Relaxed MeD1ory Consistency
    9.2 Principles of Multithreading
        9.2.1 Multithreading Issues and Solutions
        9.2.2 Multiple-Context Processors
        9.2.3 Multidimensional Architectures
    9.3 Fine-Grain Multicomputers
        9.3.1 Fine-Grain Parallelism
        9.3.2 The MITJ-Machine
        9.3.3 The Caltech Mosaic C
    9.4 Scalable and Multithreaded Architectures
        9.4.1 The Stanford Dash Multiprocessor
        9.4.2 The Kendall Square Research KSR-1 
        9.4.3 The Tera Multiprocessor System
    9.5 Dataflow and Hybrid Architectures
        9.5.1 The Evolution of Dataflow Computers
        9.5.2 The ETL/EM-4 in Japan
        9.5.3 The MIT/Motorola *T Prototype
    9.6 Bibliographic Notes and Exercises

PART IV SOFTWARE FOR PARALLEL PROGRAMMING

Chapter 10 Parallel Models, Languages, and Compilers
    10.1 Parallel Programming Models
        10.1.1 Shared-Variable Model 
        10.1.2 Message-Passing Model
        10.1.3 Data-Parallel Model
        10.1.4 Object-Oriented Model
        10.1.5 Functional and Logic Models
    10.2 Parallel Languages and Compilers 
        10.2.1 Language Features for Parallelism
        10.2.2 Parallel Language Constructs
        10.2.3 Optimizing Compilers for Parallelism
    10.3 Dependence Analysis of Data Arrays
        10.3.1 Iteration Space and Dependence Analysis 
        10.3.2 Subscript Separability and Partitioning 
        10.3.3 Categorized Dependence Tests
    10.4 Code Optimization and Scheduling
        10.4.1 Scalar Optimization with Basic Blocks 
        10.4.2 Local and Global Optimizations
        10.4.3 Vectorization and Parallelization Methods
        10.4.4 Code Generation and Scheduling
        10.4.5 Trace Scheduling Compilation
    10.5 Loop Parallelization and Pipelining
        10.5.1 Loop Transformation Theory
        10.5.2 Parallelization and Wavefronting 
        10.5.3 Tiling and Localization
        10.5.4 Software Pipelining 
    10.6 Bibliographic Notes and Exercises

Chapter 11 Parallel Program Development and Environments
    11.1 Parallel Programming Environments
        11.1.1 Software Tools and Environments
        11.1.2 Y-MP , Paragon, and CM-5 Environments
        11.1.3 Visualization and Performance Tuning
    11.2 Synchronization and Multiprocessing Modes
        11.2.1 PrinciPles of Synchronization
        11.2.2 Multiprocessor Execution Modes 
        11.2.3 Multitasking on Cray Multiprocessors
    11.3 Shared-Variable Program Structures 
        11.3.1 Locks for protected Access
        11.3.2 Semaphores and Application
        11.3.3 Monitors and Applications
    1l.4 Message-Passing Program Development
        11.4.1 Distributing the Computation
        1l.4.2 Synchronous Message Passing
        11.4.3 A Synchronous Message Passing
    1l.5 Mapping Programs onto Multicomputers
        1l.5.1 Domain Decomposition Techniques
        1l.5.2 Control Decomposition Techniques
        1l.5.3 Heterogeneous Processing
    11.6 Bibliographic Notes and Exercises

Chapter 12 UNIX, Mach, and OSF /1 for Parallel Computers
    12.1 Multiprocessor UNIX Design Goals
        12.1.1 Conventional UNIX Limitations
        12.1.2 Compatibility and Portability
        12.1.3 Address Space and Load Balancing
        12.1.4 Parallel 1/0 and Network Services
    12.2 Master-Slave and Multithreaded UNIX 
        12.2.1 Master-Slave Kernels
        12.2.2 Floating-Executive Kernels
        12.2.3 Multithreaded UNIX Kernel
    12.3 Multicomputer UNIX Extensions 
        12.3.1 Message-Passing OS MOdels
        12.3.2 Cosmic Environment and Reactive Kernel
        12.3.3 Intel NX/2 Kernel and Extensions
    12.4 Mach/OS Kernel Architecture
        12.4.1 Mach/OS Kernel Functions
        12.4.2 Multithreaded Multitasking
        12.4.2 Multithreaded Multitasking
        12.4.3 Message-Based Communications
        12.4.4 Virtual Memory Management
    12.5 OSF/1 Architecture and Applications
        12.5.1 The OSF/1 Architecture
        12.5.2 The OSF/1 Programming Environment 
        12.5.3 Improving Performance with Threads
    12.6 Bibliographic Notes and Exercises

Bibliography
Index
Answers to Selected Problems