|
Message Passing Interface (MPI) is both a computer specification and is an implementation that allows many computers to communicate with one another. It is used in computer clusters. Linux Cluster at Purdue University A computer cluster is a group of locally connected computers that work together as a unit. ...
Overview MPI is a language-independent communications protocol used to program parallel computers. Both point-to-point and collective communication are supported. MPI "is a message-passing application programmer interface, together with protocol and semantic specifications for how its features must behave in any implementation."[1] MPI's goals are high performance, scalability, and portability. MPI remains the dominant model used in high-performance computing today.[2] This article concerns communication between pairs of electronic devices. ...
Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain faster results. ...
In computer science, message passing is a form of communication used in concurrent programming, parallel programming, object-oriented programming, and interprocess communication. ...
It has been suggested that this article or section be merged into Supercomputing. ...
MPI is not sanctioned by any major standards body; nevertheless, it has become the de facto standard for communication among processes that model a parallel program running on a distributed memory system. Actual distributed memory supercomputers such as computer clusters often run these programs. The principal MPI-1 model has no shared memory concept, and MPI-2 has only a limited distributed shared memory concept. Nonetheless, MPI programs are regularly run on shared memory computers. Designing programs around the MPI model (as opposed to explicit shared memory models) has advantages on NUMA architectures as programming for MPI encourages memory locality. De facto is a Latin expression that means in fact or in practice. It is commonly used as opposed to de jure (meaning by law) when referring to matters of law or governance or technique (such as standards), that are found in the common experience as created or developed without...
âStandardâ redirects here. ...
For other uses, see Communication (disambiguation). ...
Parallel programming is a computer programming technique that provides for the execution of operations in parallel, either within a single computer, or across a number of systems. ...
Distributed memory is a concept used in parallel computing. ...
// Diagram of a typical Shared memory system. ...
Non-Uniform Memory Access or Non-Uniform Memory Architecture (NUMA) is a computer memory design used in multiprocessors, where the memory access time depends on the memory location relative to a processor. ...
It has been suggested that this article or section be merged with Memory locality. ...
Although MPI belongs in layers 5 and higher of the OSI Reference Model, implementations may cover most layers of the reference model, with socket and TCP being used in the transport layer. It has been suggested that Socket be merged into this article or section. ...
The Transmission Control Protocol (TCP) is one of the core protocols of the Internet protocol suite. ...
Most MPI implementations consist of a specific set of routines (API) callable from Fortran, C, or C++ and from any language capable of interfacing with such routine libraries. The advantages of MPI over older message passing libraries are portability (because MPI has been implemented for almost every distributed memory architecture) and speed (because each implementation is in principle optimized for the hardware on which it runs). Fortran (previously FORTRAN[1]) is a general-purpose[2], procedural,[3] imperative programming language that is especially suited to numeric computation and scientific computing. ...
C is a general-purpose, block structured, procedural, imperative computer programming language developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system. ...
C++ (pronounced ) is a general-purpose programming language. ...
Look up Implementation in Wiktionary, the free dictionary. ...
For other uses, see Hardware (disambiguation). ...
MPI is a specification, not an implementation. MPI has Language Independent Specifications (LIS) for the function calls and language bindings. The first MPI standard specified ANSI C and Fortran-77 language bindings together with the LIS. The draft of this standard was presented at Supercomputing 1994 (November 1994) and finalized soon thereafter. About 128 functions comprise the MPI-1.2 standard as it is now defined. The C Programming Language, 2nd edition, is a widely used reference on ANSI C. ANSI C is the standard published by the American National Standards Institute (ANSI) for the C programming language. ...
There are two versions of the standard that are currently popular: version 1.2 (shortly called MPI-1), which emphasizes message passing and has a static runtime environment, and MPI-2.1 (MPI-2), which includes new features such as parallel I/O, dynamic process management and remote memory operations.[3] MPI-2's LIS specifies over 500 functions and provides language bindings for ANSI C, ANSI Fortran (Fortran90), and ANSI C++. Interoperability of objects defined in MPI was also added to allow for easier mixed-language message passing programming. A side effect of MPI-2 standardization (completed in 1996) was clarification of the MPI-1 standard, creating the MPI-1.2 level. It is important to note that MPI-2 is mostly a superset of MPI-1, although some functions have been deprecated. Thus MPI-1.2 programs still work under MPI implementations compliant with the MPI-2 standard. MPI is often compared with PVM, which is a popular distributed environment and message passing system developed in 1989, and which was one of the systems that motivated the need for standard parallel message passing systems. Threaded shared memory programming models (such as Pthreads and OpenMP) and message passing programming (MPI/PVM) can be considered as complementary programming approaches. The Parallel Virtual Machine (PVM) is a software tool for parallel networking of computers. ...
pthreads is an abbreviation for POSIX threads and a library that provides POSIX-compliant functions for creating and manipulating threads. ...
OpenMP logo The OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C/C++ and Fortran on many architectures, including Unix and Microsoft Windows platforms. ...
The Parallel Virtual Machine (PVM) is a software tool for parallel networking of computers. ...
Functionality The MPI interface is meant to provide essential virtual topology, synchronization and communication functionality between a set of processes (that have been mapped to nodes/servers/computer instances) in a language independent way, with language specific syntax (bindings), plus a few features that are language specific. MPI programs always work with processes, although commonly people talk about processors. When one tries to get maximum performance, one process per CPU is selected, as part of the mapping activity; this mapping activity happens at runtime, through the agent that starts the MPI program, normally called mpirun or mpiexec. Synchronization (or Sync) is a problem in timekeeping which requires the coordination of events to operate a system in unison. ...
CPU can stand for: in computing: Central processing unit in journalism: Commonwealth Press Union in law enforcement: Crime prevention unit in software: Critical patch update, a type of software patch distributed by Oracle Corporation in Macleans College is often known as Ash Lim. ...
Such functions include, but are not limited to, point-to-point rendezvous-type send/receive operations, choosing between a Cartesian or graph-like logical process topology, exchanging data between process pairs (send/receive operations), combining partial results of computations (gathering and reduction operations), synchronizing nodes (barrier operation) as well as obtaining network-related information such as the number of processes in the computing session, current processor identity that a process is mapped to, neighboring processes accessible in a logical topology, and so on. Point-to-point operations come in synchronous, asynchronous, buffered, and ready forms, to allow both relatively stronger and weaker semantics for the synchronization aspects of a rendezvous-send. Many outstanding operations are possible in asynchronous mode, in most implementations. Cartesian means of or relating to the French philosopher and mathematician René Descartes. ...
MPI-1 and MPI-2 both enable implementations that do good work in overlapping communication and computation, but practice and theory differ. MPI also specifies thread safe interfaces, which have cohesion and coupling strategies that help avoid the manipulation of unsafe hidden state within the interface. It is relatively easy to write multithreaded point-to-point MPI code, and some implementations support such code. Multithreaded collective communication is best accomplished by using multiple copies of Communicators, as described below. In computer programming, cohesion is a measure of how strongly-related and focused the various responsibilities of a software module are. ...
In computer science, coupling or dependency is the degree to which each program module relies on each one of the other modules. ...
Concepts Although MPI has many functions, there are a few concepts that are very important, and these concepts when taken a few at a time, help people learn MPI quickly, and decide what functionality to use in their application programs. There are nine basic concepts of MPI, five of which are only applicable to MPI-2.
Communicator Communicators are objects connecting groups of processes in the MPI session. Within each communicator each contained process has an independent identifier and the contained processes are arranged in an ordered topology. MPI also has explicit groups, but these are mainly good for organizing and reorganizing subsets of processes, before another communicator is made. MPI understands single group intracommunicator operations, and bi-partite (two-group) intercommunicator communication. In MPI-1, single group operations are most prevalent, with bi-partite operations finding their biggest role in MPI-2 where their usability is expanded to include collective communication and in dynamic process management. Communicators can be partitioned using several commands in MPI, these commands include a graph-coloring-type algorithm called MPI_COMM_SPLIT, which is commonly used to derive topological and other logical subgroupings in an efficient way.
Point-to-point basics A number of important functions in the MPI API involve communication between two specific processes. A much used example is the MPI_Send interface, which allows one specified process to send a message to a second specified process. Point-to-point operations, as these are called, are particularly useful in master-slave program architectures, where a master node might be responsible for managing the data-flow of a collection of slave nodes. Typically, the master node will send specific batches of instructions or data to each slave node, and possibly merge results upon completion.
Collective basics Collective functions in the MPI API involve communication between all processes in a process group (which can mean the entire process pool or a program-defined subset). A typical function is the MPI_Bcast call (short for "broadcast"). This function takes data from one specially identified node and sends that message to all processes in the process group. A reverse operation is the MPI_Reduce call, which is a function designed to take data from all processes in a group, performs a user-chosen operation (like summing), and store the results on one individual node. This type of call is also useful in master-slave architectures, where the master node may want to sum results from all slaves to arrive at a final result, for instance.
One-sided communication (MPI-2) This section needs to be developed.
Collective extensions (MPI-2) This section needs to be developed.
Dynamic process management (MPI-2) The key aspect of this MPI-2 feature is "the ability of an MPI process to participate in the creation of new MPI processes or to establish communication with MPI processes that have been started separately.".[4]
MPI I/O (MPI-2) The Parallel I/O feature introduced with MPI-2, is sometimes shortly called MPI-IO.[5]
Miscellaneous improvements of MPI-2 This section needs to be developed.
Guidelines for writing multithreaded MPI-1 and MPI-2 programs This section needs to be developed.
Implementations 'Classical' cluster and supercomputer implementations The implementation language for MPI is different in general from the language or languages it seeks to support at runtime. Most MPI implementations are done in a combination of C, C++ and assembly language, and target C, C++, and Fortran programmers. However, the implementation language and the end-user language are in principle always decoupled. The initial implementation of the MPI 1.x standard was MPICH, from Argonne National Laboratory (correctly pronounced MPI-C-H, not pronounced as a single syllable) and Mississippi State University. IBM also was an early implementor of the MPI standard, and most supercomputer companies of the early 1990s either commercialized MPICH, or built their own implementation of the MPI 1.x standard. LAM/MPI from Ohio Supercomputing Center was another early open implementation. Argonne National Laboratory has continued developing MPICH for over a decade, and now offers MPICH 2, which is an implementation of the MPI-2.1 standard. LAM/MPI, and a number of other MPI efforts recently merged to form a combined project, Open MPI. There are many other efforts that are derivatives of MPICH, LAM, and other works, too numerous to name here. Recently, Microsoft added an MPI effort to their Cluster Computing Kit (2005), based on MPICH 2. MPICH is a freely available, portable implementation of MPI, a standard for message-passing for distributed-memory applications used in Parallel computing. ...
LAM/MPI is one of the predecessors of the Open MPI project. ...
Open MPI is a project combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI) in order to build the best Message Passing Interface (MPI) library available. ...
Besides the mainstream of MPI programming for high performance, MPI has been used widely with Python, Perl, and Java. These communities are growing. MATLAB-based MPI appear in many forms, but no consensus on a single way of using MPI with MATLAB yet exists. The next sections detail some of these efforts.
Python There are at least five implementations of MPI for Python: mpi4py, PyPar, PyMPI, MYMPI, and the MPI submodule in ScientificPython. PyMPI is notable because it is a variant python interpreter making the multi-node application the interpreter itself, rather than the code the interpreter runs. PyMPI implements most of the MPI spec and automatically works with compiled code that needs to make MPI calls. PyPar, MYMPI, and ScientificPython's module all are designed to work like a typical module used with nothing but an import statement. They make it the coder's job to decide when and where the call to MPI_Init belongs. Not to be confused with SciPy. ...
OCaml The OCamlMPI Module implements a large subset of MPI functions and is in active use in scientific computing. To get a sense of its maturity: it was reported on caml-list that an eleven thousand line OCaml program was "MPI-ified", using the module, with an additional 500 lines of code and slight restructuring and has run with excellent results on up to 170 nodes in a supercomputer.
Java Although Java does not have an official MPI binding, there have been several attempts to bridge Java and MPI, with different degrees of success and compatibility. One of the first attempts was Bryan Carpenter's mpiJava, essentially a collection of JNI wrappers to a local C MPI library, resulting in a hybrid implementation with limited portability, which also has to be recompiled against the specific MPI library being used. The Java Native Interface (JNI) is a programming framework that allows Java code running in the Java virtual machine (VM) to call and be called by native applications (programs specific to a hardware and operating system platform) and libraries written in other languages, such as C, C++ and assembly. ...
However, this original project also defined the mpiJava API (a de-facto MPI API for Java following the equivalent C++ bindings closely) which other subsequent Java MPI projects followed. An alternative although less used API is the MPJ API, designed to be more object-oriented and closer to Sun Microsystems' coding conventions. Other than the API used, Java MPI libraries can be either dependent on a local MPI library, or implement the message passing functions in Java, while some like P2P-MPI also provide Peer to peer functionality and allow mixed platform operation. Look up De facto in Wiktionary, the free dictionary. ...
API may refer to: In computing, application programming interface In petroleum industry, American Petroleum Institute In education, Academic Performance Index This page concerning a three-letter acronym or abbreviation is a disambiguation page â a navigational aid which lists other pages that might otherwise share the same title. ...
Sun Microsystems, Inc. ...
For other uses of the term see Peer-to-peer (disambiguation) For peer-to-peer networks used for file sharing see File sharing A peer-to-peer based network. ...
Some of the most challenging parts of any MPI implementation for Java arise from the language's own limitations and peculiarities, such as the lack of explicit pointers and linear memory address space for its objects , which make transferring multi-dimensional arrays and complex objects inefficient[citation needed]. The workarounds usually used involve transferring one line at a time and/or performing explicit de-serialization and casting both at the sending and receiving end, simulating C or FORTRAN-like arrays by the use of a one-dimensional array, and pointers to primitive types by the use of single-element arrays, thus resulting in programming styles quite extraneous from Java's conventions. In computer science, a pointer is a programming language datatype whose value refers directly to (points to) another value stored elsewhere in the computer memory using its address. ...
This article is about data structure encoding. ...
This article is about the manufacturing process. ...
One major improvement is MPJ Express by Aamir Shafi. This project was supervised by Bryan Carpenter and Mark Baker. On commodity platform like Fast Ethernet, advances in JVM technology now enable networking programs written in Java to rival their C counterparts. On the other hand, improvements in specialized networking hardware have continued, cutting down the communication costs to a couple of microseconds. Keeping both in mind, the key issue at present is not to debate the JNI approach versus the pure Java approach, but to provide a flexible mechanism for programs to swap communication protocols. The aim of this project is to provide a reference Java messaging system based on the MPI standard. The implementation follows a layered architecture based on an idea of device drivers. The idea is analogous to UNIX device drivers. For more info visit [1]
Microsoft Windows Windows Compute Cluster Server uses the Microsoft Messaging Passing Interface v2 (MS-MPI) to communicate between the processing nodes on the cluster network. The application programming interface consists of over 160 functions. MS MPI was designed, with some exceptions because of security considerations, to cover the complete set of MPI2 functionality as implemented in MPICH2. Dynamic process spawn and publishing are planned for the future. Windows Server 2003 (also referred to as Win2K3) is a server operating system produced by Microsoft. ...
API redirects here. ...
There is also a completely managed .NET implementation of MPI - Pure Mpi.NET. The object-oriented API is powerful, yet easy to use for parallel programming. It has been developed based on the latest .NET technologies, including Windows Communication Foundation (WCF). This allows you to declaratively specify the binding and endpoint configuration for your environment and performance needs. When using the SDK, a programmer will definitely see the MPI'ness of the interfaces come through, while it takes full advantage of .NET features - including generics, delegates, asynchronous results, exception handling, and extensibility points. Microsoft . ...
This subsystem is a part of . ...
Hardware Implementations There has been research over time into implementing MPI directly into the hardware of the system, for example by means of Processor-in-memory, where the MPI operations are actually built into the microcircuitry of the RAM chips in each node. By implication, this type of implementation would be independent of the language, OS or CPU on the system, but cannot be readily updated or unloaded. A Processor-in-memory (PIM) refers to a computer processor tightly coupled to memory, generally on the same silicon chip. ...
Look up RAM, Ram, ram in Wiktionary, the free dictionary. ...
Another approach has been to add hardware acceleration to one or more parts of the operation. This may include hardware processing of the MPI queues or the use of RDMA to directly transfer data between memory and the network interface without needing CPU or kernel intervention. Remote Direct Memory Access (RDMA) is a concept whereby two or more computers communicate via Direct Memory Access directly from the main memory of one system to the main memory of another. ...
Example program Here is a "Hello World" program in MPI written in C. In this example, we send a "hello" message to each processor, manipulate it trivially, send the results back to the main process, and print the messages out. /* "Hello World" Type MPI Test Program */ #include <mpi.h> #include <stdio.h> #include <string.h> #define BUFSIZE 128 #define TAG 0 int main(int argc, char *argv[]) { char idstr[32]; char buff[BUFSIZE]; int numprocs; int myid; int i; MPI_Status stat; MPI_Init(&argc,&argv); /* all MPI programs start with MPI_Init; all 'N' processes exist thereafter */ MPI_Comm_size(MPI_COMM_WORLD,&numprocs); /* find out how big the SPMD world is */ MPI_Comm_rank(MPI_COMM_WORLD,&myid); /* and this processes' rank is */ /* At this point, all the programs are running equivalently, the rank is used to distinguish the roles of the programs in the SPMD model, with rank 0 often used specially... */ if(myid == 0) { printf("%d: We have %d processorsn", myid, numprocs); for(i=1;i<numprocs;i++) { sprintf(buff, "Hello %d! ", i); MPI_Send(buff, BUFSIZE, MPI_CHAR, i, TAG, MPI_COMM_WORLD); } for(i=1;i<numprocs;i++) { MPI_Recv(buff, BUFSIZE, MPI_CHAR, i, TAG, MPI_COMM_WORLD, &stat); printf("%d: %sn", myid, buff); } } else { /* receive from rank 0: */ MPI_Recv(buff, BUFSIZE, MPI_CHAR, 0, TAG, MPI_COMM_WORLD, &stat); sprintf(idstr, "Processor %d ", myid); strcat(buff, idstr); strcat(buff, "reporting for dutyn"); /* send to rank 0: */ MPI_Send(buff, BUFSIZE, MPI_CHAR, 0, TAG, MPI_COMM_WORLD); } MPI_Finalize(); /* MPI Programs end with MPI Finalize; this is a weak synchronization point */ return 0; } It is important to note that the runtime environment for the MPI implementation used (often called mpirun or mpiexec) spawns multiple copies of the program, with the total number of copies determining the number of process ranks in MPI_COMM_WORLD, which is an opaque descriptor for communication between the set of processes. A Single-Program-Multiple-Data (SPMD) programming model is thereby facilitated, but not required; many MPI implementations allow multiple, different, executables to be started in the same MPI job. Each process has its own rank, the total number of processes in the world, and the ability to communicate between them either with point-to-point (send/receive) communication, or by collective communication among the group. It is enough for MPI to provide an SPMD-style program with MPI_COMM_WORLD, its own rank, and the size of the world to allow for algorithms to decide what they do based on their rank. In more robust examples, I/O should be more carefully managed than in this example. MPI does not guarantee how POSIX I/O would actually work on a given system, but it commonly does work, at least from rank 0. A type of parallel computing. ...
The notion of process and not processor is used in MPI. The copies of this program are mapped to processors by the runtime environment of MPI. In that sense, the parallel machine can map to 1 physical processor, or N where N is the total number of processors available, or something in between. For maximal potential for parallel speedup more physical processors are used. It should also be noted that this example adjusts its behavior to the size of the world N, so it also seeks to be scalable to the size given at runtime. There is no separate compilation for each size of the concurrency, although different decisions might be taken internally depending on that absolute amount of concurrency provided to the program.
Adoption of MPI-2 | | This article or section may contain original research or unverified claims. Please improve the article by adding references. See the talk page for details. | While the adoption of MPI-1.2 has been universal, including on almost all cluster computing, the acceptance of MPI-2.1 has been more limited. Here are some of the reasons. - While MPI-1.2 emphasizes message passing and a minimal, static runtime environment, full MPI-2 implementations include I/O and dynamic process management, and the size of the middleware implementation is substantially larger. Furthermore, most sites that use batch scheduling systems cannot support dynamic process management. Parallel I/O is well accepted as a key value of MPI-2.
- Many legacy MPI-1.2 programs were already developed by the time MPI-2 came out, and work fine. The threat of potentially lost portability by using MPI-2 functions kept people from using the enhanced standard for many years, though this is lessening in the mid 2000's, with wider support for MPI-2.
- Many MPI-1.2 applications use only a subset of that standard (16-25 functions). This minimalism of use contrasts with the huge availability of functionality now afforded in MPI-2.
Other inhibiting factors can be cited too, although these may amount more to perceptions and belief than fact. MPI-2 has been well supported in free and commercial implementations since at least the early 2000s, with some implementations coming earlier than that.
The future of MPI Some aspects of MPI's future appear solid; others less so. The MPI Forum reconvened in 2007, to clarify some MPI-2 issues and explore developments for a possible MPI-3. Irrespective of what the MPI Forum decides for MPI-3, MPI as a legacy interface will exist at the MPI-1.2 and MPI-2.1 levels for many years to come. Like Fortran, it is ubiquitous in technical computing, and it is taught and used widely. The body of free and commercial products that require MPI, combined with new ports of the existing free and commercial implementations to new target platforms, help ensure that MPI will go on indefinitely. Architectures are changing, with greater internal concurrency (multi-core), better fine-grain concurrency control (threading, affinity), and more levels of memory hierarchy. Multithreaded programs can take advantage of these developments more easily than single threaded applications. This has already yielded separate, complementary standards for symmetric multiprocessing, namely OpenMP. The MPI-2 standard does define how standard-conforming implementations should deal with multithreaded issues, the standard does not require that implementations be multithreaded, or even thread safe. While multithreaded capable MPI implementations do exist, the number of multithreaded, message passing applications are few. The drive to achieve multi-level concurrency all within MPI is both a challenge and an opportunity for the standard in future. Many programming languages, operating systems, and other software development environments support what are called threads of execution. ...
Symmetric multiprocessing, or SMP, is a multiprocessor computer architecture where two or more identical processors are connected to a single shared main memory. ...
OpenMP logo The OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C/C++ and Fortran on many architectures, including Unix and Microsoft Windows platforms. ...
Much of the discussion within the MPI Forum centres around fault tolerance. Improved fault tolerance within MPI would have clear benefits in the context of Grid computing, a growing trend in large scale computing. In computer science, Fault-tolerance is the property of a computer system to continue operation at an acceptable quality, despite the unexpected occurrence of hardware or software failures. ...
Grid computing is a phrase in distributed computing which can have several meanings: Multiple independent computing clusters which act like a grid because they are composed of resource nodes not located within a single administrative domain. ...
See also MPICH is a freely available, portable implementation of MPI, a standard for message-passing for distributed-memory applications used in Parallel computing. ...
LAM/MPI is one of the predecessors of the Open MPI project. ...
Open MPI is a project combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI) in order to build the best Message Passing Interface (MPI) library available. ...
OpenMP logo The OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C/C++ and Fortran on many architectures, including Unix and Microsoft Windows platforms. ...
Unified Parallel C (UPC) a parallel extension to the C programming language for high-performance computing. ...
Occam is a parallel programming language that builds on Communicating Sequential Processes (CSP) and shares many of their features. ...
In computer science, Linda is a parallel programming language which is implemented as an extension of other (sequential) languages such as Prolog, C, or Java. ...
The Parallel Virtual Machine (PVM) is a software tool for parallel networking of computers. ...
The Calculus of Communicating Systems (or CCS) (one of the first process calculi) was developed by Robin Milner. ...
Calculus of Broadcasting Systems (CBS) is a CCS-like calculus where processes speak one at a time and are heard instantaneously by all others. ...
In computer science, the Actor model is a mathematical model of concurrent computation that treats actors as the universal primitives of concurrent digital computation: in response to a message that it receives, an actor can make local decisions, create more actors, send more messages, and determine how to respond to...
The Bulk Synchronous Parallel computer is a model for designing parallel algorithms. ...
Notes - ^ Gropp et al 96, p.3
- ^ http://portal.acm.org/citation.cfm?id=1188565
- ^ Gropp et al 1999-advanced, pp.4-5
- ^ Gropp et al 1999-advanced, p.7
- ^ Gropp et al 1999-advanced, pp.5-6
References - This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.
- Aoyama, Yukiya; Nakano, Jun (1999) RS/6000 SP: Practical MPI Programming, ITSO
- Foster, Ian (1995) Designing and Building Parallel Programs (Online) Addison-Wesley ISBN 0201575949, chapter 8 Message Passing Interface
- Using MPI series:
- Gropp, William; Lusk, Ewing; Skjellum, Anthony. (1994) Using MPI: portable parallel programming with the message-passing interface. MIT Press In Scientific And Engineering Computation Series, Cambridge, MA, USA. 307 pp. ISBN 0-262-57104-8
- Gropp, William; Lusk, Ewing; Skjellum, Anthony. (1999) Using MPI, 2nd Edition: portable Parallel Programming with the Message Passing Interface. MIT Press In Scientific And Engineering Computation Series, Cambridge, MA, USA. 395 pp. ISBN 978-0-262-57132-6
- Gropp, William; R Thakur, E Lusk (1999) Using MPI-2: Advanced Features of the Message Passing Interface - MIT Press Cambridge, MA, USA ISBN 0-262-57133-1
- MPI—The Complete Reference series:
- Snir, Marc; Otto, Steve; Huss-Lederman, Steven; Walker, David; Dongarra, Jack (1995) MPI: The Complete Reference. MIT Press Cambridge, MA, USA. ISBN 0-262-69215-5
- M Snir, SW Otto, S Huss-Lederman, DW Walker, J (1998) MPI—The Complete Reference: Volume 1, The MPI Core. MIT Press, Cambridge, MA. ISBN 0-262-69215-5
- Gropp, William; Steven Huss-Lederman, Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg, William Saphir, and Marc Snir (1998) MPI—The Complete Reference: Volume 2, The MPI-2 Extensions. MIT Press, Cambridge, MA ISBN 9780262571234
- Vanneschi, Marco (1999) Parallel paradigms for scientific computing In Proc. of the European School on Computational Chemistry (1999, Perugia, Italy), number 75 in Lecture Notes in Chemistry, pages 170–183. Springer, 2000.
This article does not cite any references or sources. ...
âGFDLâ redirects here. ...
MIT Press Books The MIT Press is a university publisher affiliated with the Massachusetts Institute of Technology (MIT) in Cambridge, Massachusetts. ...
External links Wikibooks has a book on the topic of Programming:Message-Passing Interface Wikibooks has a book on the topic of MPI function reference | Parallel computing topics | | | General | High-performance computing Image File history File links Wikibooks-logo-en. ...
Wikibooks logo Wikibooks, previously called Wikimedia Free Textbook Project and Wikimedia-Textbooks, is a wiki for the creation of books. ...
Image File history File links Wikibooks-logo-en. ...
Wikibooks logo Wikibooks, previously called Wikimedia Free Textbook Project and Wikimedia-Textbooks, is a wiki for the creation of books. ...
Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain results faster. ...
Image File history File links Cray2. ...
It has been suggested that this article or section be merged into Supercomputing. ...
| | | Parallelism | Bit-level parallelism · Instruction level parallelism · Data parallelism · Task parallelism Bit-level parallelism is a form of parallel computing based on increasing processor word size. ...
Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can be dealt with at once. ...
Data Parallelism is a form of parallelization of computer code. ...
Task Parallelism is a form of parallelization of computer code. ...
| | | Theory | Speedup · Amdahl's law · Flynn's taxonomy (SISD • SIMD • MISD • MIMD) · Cost efficiency · Gustafson's law · Karp-Flatt metric · Parallel slowdown In parallel computing, speedup refers to how much a parallel algorithm is faster than a corresponding sequential algorithm. ...
The speedup of a program using multiple processors in parallel computing is limited by the sequential fraction of the program. ...
Flynns taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1972. ...
SISD is an acronym for Single Instruction stream over a Single Data stream. ...
-1...
Multiple Instruction Single Data (MISD) is a type of parallel computing architecture where many functional units perform different operations on the same data. ...
Multiple Instruction Multiple Data (MIMD) is a type of parallel computing architecture where many functional units perform different operations on different data. ...
There are very few or no other articles that link to this one. ...
Gustafsons Law (also known as Gustafson-Barsis law) is a law in computer engineering which states that any sufficiently large problem can be efficiently parallelized. ...
The Karp-Flatt Metric is a measure of parallelization of code in parallel processor systems. ...
| | | Elements | Process · Thread · Fiber · Parallel Random Access Machine In computing, a process is an instance of a computer program that is being executed. ...
For the form of code consisting entirely of subroutine calls, see Threaded code. ...
A fiber in computer science is a term for a particularly lightweight thread of execution. ...
PRAM stands for Parallel Random Access Machine, which is an abstract machine for designing the algorithms applicable to parallel computers. ...
| | | Coordination | Multiprocessing · Multithreading · Multitasking · Memory coherency · Cache coherency · Barrier · Synchronization · Distributed computing · Grid computing Multiprocessing is traditionally known as the use of multiple concurrent processes in a system as opposed to a single process at any one instant. ...
Multithreading computers have hardware support to efficiently execute multiple threads. ...
In computing, multitasking is a method by which multiple tasks, also known as processes, share common processing resources such as a CPU. In the case of a computer with a single CPU, only one task is said to be running at any point in time, meaning that the CPU is...
Memory coherence (also cache coherence or cache consistency) is the property of the shared memory systems (multiprocessors and distributed shared memory systems) in which any shared piece of memory (cache line or memory page) gives consistent values with accordance to earlier agreed consistency model despite accesses (maybe parallel) from different...
Cache coherence refers to the integrity of data stored in local caches of a shared resource. ...
In parallel computing, a barrier is a type of synchronization method. ...
In computer science, especially parallel computing, synchronization means the coordination of simultaneous threads or processes to complete a task in order to get correct runtime order and avoid unexpected race conditions. ...
Distributed computing is a method of computer processing in which different parts of a program are run simultaneously on two or more computers that are communicating with each other over a network. ...
Grid computing is a phrase in distributed computing which can have several meanings: Multiple independent computing clusters which act like a grid because they are composed of resource nodes not located within a single administrative domain. ...
| | | Programming | Programming model · Implicit parallelism · Explicit parallelism Programming redirects here. ...
A parallel programming model is a set of software technologies to express parallel algorithms and match applications with the underlying parallel systems. ...
In computer science, implicit parallelism is a characteristic of a programming language that allows a compiler to automatically exploit the parallelism inherent to the computations expressed by some of the languages constructs. ...
In computer programming, explicit parallelism is the representation of concurrent computations by means of primitives in the form of special-purpose directives or function calls. ...
| | | Hardware | Computer cluster · Beowulf · Symmetric multiprocessing · Non-Uniform Memory Access · Cache only memory architecture · Asymmetric multiprocessing · Simultaneous multithreading · Shared memory · Distributed memory · Massively parallel processing · Superscalar processing · Vector processing · Supercomputer · Stream processing · GPGPU Computer hardware is the physical part of a computer, including the digital circuitry, as distinguished from the computer software that executes within the hardware. ...
An example of a Computer cluster A computer cluster is a group of tightly coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. ...
The Borg, a 52-node Beowulf cluster used by the McGill University pulsar group to search for pulsations from binary pulsars. ...
Symmetric multiprocessing, or SMP, is a multiprocessor computer architecture where two or more identical processors are connected to a single shared main memory. ...
Non-Uniform Memory Access or Non-Uniform Memory Architecture (NUMA) is a computer memory design used in multiprocessors, where the memory access time depends on the memory location relative to a processor. ...
Cache only memory architecture (COMA) is a computer memory organization for use in multiprocessors in which the local memories (typically DRAM) at each node are used as cache. ...
Asymmetric Multiprocessing or ASMP is a style of multiprocessing supported in DECs VMS V.3 as well as a number of older systems including TOPS-10 and OS-360. ...
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs. ...
// Diagram of a typical Shared memory system. ...
Distributed memory is a concept used in parallel computing. ...
Massive parallelism is a term used in computer architecture and application-specific integrated circuit (ASIC) design. ...
Simple superscalar pipeline. ...
Processor board of a CRAY YMP vector computer A vector processor, or array processor, is a CPU design that is able to run mathematical operations on multiple data elements simultaneously. ...
For other uses, see Supercomputer (disambiguation). ...
For other uses, see Event Stream Processing. ...
General-purpose computing on graphics processing units (GPGPU, also referred to as GPGP and to a lesser extent GP²) is a recent trend focused on using GPUs to perform computations rather than the CPU. The addition of programmable stages and higher precision arithmetic to the rendering pipelines allowed software developers...
| | | Software | Distributed shared memory · Application checkpointing · Warewulf Computer software (or simply software) refers to one or more computer programs and data held in the storage of a computer for some purpose. ...
Distributed Shared Memory (DSM), in computer science, refers to a wide class of software and hardware implementations, in which each node of a cluster has access to a large shared memory in addition to each nodes limited non-shared private memory. ...
To quote Matt Dillon (of DragonFly BSD), Checkpointing allows you to freeze a copy of an application so that, in theory, you can restore the program to that running state at a later point in time. ...
This article needs to be cleaned up to conform to a higher standard of quality. ...
| | | APIs | POSIX Threads · OpenMP · Message Passing Interface (MPI) · Intel Threading Building Blocks API redirects here. ...
POSIX Threads is a POSIX standard for threads. ...
OpenMP logo The OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C/C++ and Fortran on many architectures, including Unix and Microsoft Windows platforms. ...
Intel Threading Building Blocks (also known as TBB) is the name of a C++ template library developed by Intel for writing software programs that take advantage of multi-core processors. ...
| | | Problems | Embarrassingly parallel · Grand Challenge · Software lockout In the jargon of parallel computing, an embarrassingly parallel workload (or embarrassingly parallel problem) is one for which no particular effort is needed to segment the problem into a very large number of parallel tasks, and there is no essential dependency (or communication) between those parallel tasks. ...
A Grand Challenge Problem is a general category of unsolved problems. ...
In multiprocessor computer systems, software lockout is the issue of performance degradation due to the idle wait times spent by the CPUs in kernel-level critical sections. ...
| | |