Introduction to MIMD Architectures:
Multiple instructions stream, multiple data stream (MIMD) machines have a number of processors that function asynchronously and individually. Anytime, different processors may be executing different instructions on different bits of data. MIMD architectures may be used in several software areas such as computer-aided design/computer-aided making, simulation, modeling, and as communication switches. MIMD machines can be of either shared memory or distributed storage area categories. These classifications are founded on how MIMD processors access memory. Shared memory machines may be of the bus-based, prolonged, or hierarchical type. Distributed ram machines may have hypercube or mesh interconnection strategies.
MIMD
A type of multiprocessor architecture in which several instructions cycles may be productive at any given time, each independently fetching instructions and operands into multiple processing units and working to them in a concurrent fashion. Acronym for multiple-instruction-stream.
Bottom of Form
(Multiple Education stream Multiple Data stream) Some type of computer that can process two or more independent packages of instructions together on two or more pieces of data. Pcs with multiple CPUs or solo CPUs with dual cores are types of MIMD architecture. Hyperthreading also results a certain degree of MIMD performance as well. Compare with SIMD.
In computing, MIMD (Multiple Teaching stream, Multiple Data stream) is a technique employed to attain parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. Anytime, different processors may be performing different instructions on different bits of data. MIMD architectures can be utilized in a number of software areas such as computer-aided design/computer-aided manufacturing, simulation, modeling, and as communication switches. MIMD machines can be of either shared memory or sent out storage area categories. These classifications are based mostly on how MIMD processors gain access to memory. Shared ram machines may be of the bus-based, extended, or hierarchical type. Distributed memory machines may have hypercube or mesh interconnection plans.
Multiple Education - Multiple Data
MIMD architectures have multiple processors that all execute an independent stream (series) of machine instructions. The processors perform these instructions by using any accessible data somewhat than being forced to operate upon a single, shared data stream. Hence, at any given time, an MIMD system can be using as much different instruction channels and data streams as there are processors.
Although software procedures performing on MIMD architectures can be synchronized by passing data among processors through an interconnection network, or with processors analyze data in a shared ram, the processors' autonomous execution makes MIMD architectures asynchronous machines.
Shared Storage: Bus-based
MIMD machines with shared ram have processors which show a common, central storage area. In the easiest form, all processors are attached to a bus which links them to memory. This setup is called bus-based shared storage area. Bus-based machines may have another bus that enables them to connect directly with each other. This additional bus is used for synchronization on the list of processors. When using bus-based shared ram MIMD machines, only a small number of processors can be backed. There is certainly contention among the list of processors for usage of shared ram, so these machines are limited because of this. These machines may be incrementally widened until where there is too much contention on the bus.
Shared Memory: Extended
MIMD machines with prolonged shared memory attempt to avoid or reduce the contention among processors for shared storage by subdividing the ram into a number of independent storage units. These storage units are connected to the processsors by an interconnection network. The recollection units are treated as a unified central memory space. One kind of interconnection network because of this type of structures is a crossbar turning network. In this design, N processors are associated with M memory items which requires N times M switches. This isn't an economically possible setup allowing you to connect a large variety of processors.
Shared Memory space: Hierarchical
MIMD machines with hierarchical shared storage area use a hierarchy of buses to give processors usage of each other's memory. Processors on different boards may speak through inter nodal buses. Buses support communication between boards. We utilize this type of architecture, the machine may support over a thousand processors.
In computing, distributed memory is ram that may be simultaneously utilized by multiple programs with an purpose to provide communication included in this or avoid redundant copies. Depending on framework, programs may operate on a single processor or on multiple separate processors. Using storage for communication inside a one program, for example among its multiple threads, is normally not referred to as shared memory
IN HARDWARE
In computers, shared memory identifies a (typically) large stop of random access memory that may be accessed by a number of different central processing items (CPUs) in a multiple-processor computer system.
A shared storage system is relatively easy to program since all processors show a single view of data and the communication between processors is often as fast as ram accesses to a same location.
The issue with shared recollection systems is that lots of CPUs need fast access to memory and will likely cache storage area, which has two complications:
- CPU-to-memory interconnection becomes a bottleneck. Shared memory computers cannot scale very well. Most of them have ten or fewer processors.
- Cache coherence: Whenever one cache is modified with information that may be utilized by other processors, the change must be mirrored to the other processors, often the several processors will be working with incoherent data (see cache coherence and recollection coherence). Such coherence protocols can, when they work very well, provide extremely high-performance access to shared information between multiple processors. Alternatively they will often become overloaded and be a bottleneck to performance.
The alternatives to shared ram are distributed storage area and distributed shared storage area, each having an identical set of issues. See also Non-Uniform Storage Access.
IN SOFTWARE:
In software applications, shared memory is either
- A method of inter-process communication (IPC), i. e. a way of exchanging data between programs operating at the same time. One process will create an area in Memory which other techniques can access, or
- A approach to conserving storage by directing accesses to what would normally be copies of a bit of data to a single illustration instead, by using online storage area mappings or with explicit support of the program in question. That is frequently used for shared libraries and for Execute in Place.
Shared Storage MIMD Architectures:
The distinguishing feature of shared memory systems is the fact no matter just how many memory blocks are used in them and exactly how these ram blocks are connected to the processors and address spaces of these ram blocks are unified into a global address space which is totally visible to all or any processors of the shared storage system. Issuing a certain recollection address by any cpu will gain access to the same recollection block location. However, in line with the physical company of the logically distributed storage, two main types of distributed ram system could be distinguished:
- Physically shared memory space systems
- Virtual (or sent out) shared memory space systems
In physically shared memory space systems all storage blocks can be reached uniformly by all processors. In sent out shared storage area systems the recollection blocks are literally distributed on the list of processors as local memory space units.
The three main design issues in increasing the scalability of shared memory systems are:
- Organization of memory
- Design of interconnection networks
- Design of cache coherent protocols
Cache Coherence:
Cache memories are presented into computers to be able to bring data nearer to the processor and therefore to reduce memory latency. Caches widely accepted and used in uniprocessor systems. However, in multiprocessor machines where several processors need a copy of the same memory block.
The maintenance of uniformity among these copies raises the so-called cache coherence problem which has three causes:
- Sharing of writable data
- Process migration
- I/O activity
From the idea of view of cache coherence, data structures can be divided into three classes:
- Read-only data buildings which never cause any cache coherence problem. They can be replicated and positioned in any number of cache memory space blocks with no problem.
- Shared writable data structures are the primary way to obtain cache coherence problems.
- Private writable data structures cause cache coherence problems only regarding process migration.
There are several ways to maintain cache coherence for the critical circumstance, that is, distributed writable data structures. The applied methods can be divided into two classes:
- hardware-based protocols
- software-based protocols
Software-based techniques usually add some limitations on the cachability of data in order to avoid cache coherence problems.
Hardware-based Protocols:
Hardware-based protocols provide general solutions to the problems of cache coherence without any restrictions on the cachability of data. The price of this approach is the fact that shared storage area systems must be expanded with sophisticated hardware mechanisms to aid cache coherence. Hardware-based protocols can be grouped according with their memory update plan, cache coherence policy, and interconnection plan. Two types of storage update policy are applied in multiprocessors: write-through and write-back. Cache coherence policy is split into write-update insurance policy and write-invalidate plan.
Hardware-based protocols can be further categorized into three basic classes with regards to the aspect of the interconnection network applied in the shared memory system. In the event the network efficiently facilitates broadcasting, the so-called snoopy cache protocol can be advantageously exploited. This system is typically used in single bus-based shared ram systems where regularity commands (invalidate or update instructions) are broadcast via the bus and each cache 'snoops' on the bus for inbound consistency directions.
Large interconnection sites like multistage systems cannot support broadcasting proficiently and therefore a mechanism is necessary that can directly forward consistency orders to people caches that contain a copy of the updated data structure. For this purpose a index must be retained for each block of the distributed memory to administer the actual location of blocks in the possible caches. This approach is called the directory program.
The third strategy attempts to avoid the use of the costly index scheme but nonetheless provide high scalability. It proposes multiple-bus sites with the use of hierarchical cache coherence protocols that are generalized or prolonged variations of the single bus-based snoopy cache protocol.
In describing a cache coherence protocol the next definitions must get:
- Definition of possible expresses of blocks in caches, recollections and directories.
- Definition of directions to be performed at various read/write hit/miss actions.
- Definition of point out transitions in caches, recollections and directories based on the commands.
- Definition of transmitting routes of orders among processors, caches, memories and directories.
Software-based Protocols:
Although hardware-based protocols provide fastest system for retaining cache uniformity, they introduce a substantial extra hardware complexity, especially in scalable multiprocessors. Software-based methods signify a good and competitive compromise given that they require almost negligible hardware support plus they can result in the same small number of invalidation misses as the hardware-based protocols. All of the software-based protocols count on compiler assistance.
The compiler analyses this program and classifies the variables into four classes:
- Read-only
- Read-only for any number of operations and read-write for just one process
- Read-write for just one process
- Read-write for any number of procedures.
Read-only parameters can be cached without limitations. Type 2 parameters can be cached limited to the processor where in fact the read-write process works. Since only one process uses type 3 variables it is sufficient to cache them limited to that process. Type 4 factors must not be cached in software-based plans. Variables illustrate different behavior in various program sections and hence this program is usually split into sections by the compiler and the factors are categorized separately in each section. A lot more than that, the compiler creates instructions that control the cache or gain access to the cache explicitly predicated on the classification of parameters and code segmentation. Typically, by the end of each program section the caches must be invalidated to ensure that the variables are in a consistent state before starting a new section.
shared storage systems can be divided into four main classes:
Uniform Memory Gain access to (UMA) Machines:
Contemporary uniform ram access machines are small-size solitary bus multiprocessors. Large UMA machines with a huge selection of processors and a switching network were typical in the early design of scalable distributed ram systems. Famous reps of that course of multiprocessors will be the Denelcor HEP and the NYU Ultracomputer. They created many innovative features in their design, a few of which right now represent a significant milestone in parallel computer architectures. However, these early on systems do not contain either cache storage area or local main memory space which ended up being essential to achieve powerful in scalable distributed memory systems
Non-Uniform Memory Gain access to (NUMA) Machines:
Non-uniform memory gain access to (NUMA) machines were designed to avoid the memory access bottleneck of UMA machines. The logically distributed memory is physically distributed among the processing nodes of NUMA machines, resulting in distributed shared ram architectures. Similarly these parallel computer systems became highly scalable, but on the other palm they are incredibly hypersensitive to data allocation in local recollections. Accessing an area memory segment of a node is a lot faster than accessing a remote memory space section. Not by chance, the composition and design of these machines resemble in many ways that of sent out memory multicomputers. The primary difference is at the organization of the address space. In multiprocessors, a global address space is applied that is uniformly visible from each processor; that is, all processors can transparently gain access to all memory locations. In multicomputers, the address space is replicated in the neighborhood memories of the processing elements. This difference in the address space of the storage is also shown at the program level: distributed memory multicomputers are designed based on the message-passing paradigm, while NUMA machines are designed based on the global address space (shared memory) basic principle.
The problem of cache coherency will not appear in sent out memory multicomputers because the message-passing paradigm explicitly deals with different copies of the same data composition in the form of independent messages. Within the shard storage paradigm, multiple accesses to the same global data composition are possible and can be accelerated if local copies of the global data structure are preserved in local caches. However, the hardware-supported cache consistency schemes are not introduced in to the NUMA machines. These systems can cache read-only code and data, as well as local data, however, not distributed modifiable data. This is the distinguishing feature between NUMA and CC-NUMA multiprocessors. Appropriately, NUMA machines are closer to multicomputers than to other shared memory space multiprocessors, while CC-NUMA machines look like real shared storage systems.
In NUMA machines, like in multicomputers, the key design issues will be the organization of processor chip nodes, the interconnection network, and the possible ways to reduce remote recollection accesses. Two examples of NUMA machines will be the Hector and the Cray T3D multiprocessor.
Sources used
- www. wikipedia. com
- http://www. developers. net/tsearch?searchkeys=MIMD+architecture
- http://carbon. cudenver. edu/~galaghba/mimd. html
- http://www. docstoc. com/docs/2685241/Computer-Architecture-Introduction-to-MIMD-architectures
 
         
             
          
          
         
      