词条 | Automatic parallelization |
释义 |
}}Automatic parallelization, also auto parallelization, autoparallelization, or parallelization, the last one of which implies automation when used in context, refers to converting sequential code into multi-threaded or vectorized (or even both) code in order to utilize multiple processors simultaneously in a shared-memory multiprocessor (SMP) machine. The goal of automatic parallelization is to relieve programmers from the hectic and error-prone manual parallelization process.[1] Though the quality of automatic parallelization has improved in the past several decades, fully automatic parallelization of sequential programs by compilers remains a grand challenge due to its need for complex program analysis and the unknown factors (such as input data range) during compilation.[2] The programming control structures on which autoparallelization places the most focus are loops, because, in general, most of the execution time of a program takes place inside some form of loop. There are two main approaches to parallelization of loops: pipelined multi-threading and cyclic multi-threading.[3] For example, consider a loop that on each iteration applies a hundred operations, and runs for a thousand iterations. This can be thought of as a grid of 100 columns by 1000 rows, a total of 100,000 operations. Cyclic multi-threading assigns each row to a different thread. Pipelined multi-threading assigns each column to a different thread. Automatic parallelization techniqueParseThis is the first stage where the scanner will read the input source files to identify all static and extern usages. Each line in the file will be checked against pre-defined patterns to segregate into tokens. These tokens will be stored in a file which will be used later by the grammar engine. The grammar engine will check patterns of tokens that match with pre-defined rules to identify variables, loops, controls statements, functions etc. in the code. AnalyzeThe analyzer is used to identify sections of code that can be executed concurrently. The analyzer uses the static data information provided by the scanner-parser. The analyzer will first find out all the functions that are totally independent of each other and mark them as individual tasks. Then analyzer finds which tasks are having dependencies. ScheduleThe scheduler will lists all the tasks and their dependencies on each other in terms of execution and start times. The scheduler will produce optimal schedule in terms of number of processors to be used or the total time of execution for the application. Code GenerationThe scheduler will generate list of all the tasks and the details of the cores on which they will execute along with the time that they will execute for. The code Generator will insert special constructs in the code that will be read during execution by the scheduler. These constructs will instruct the scheduler on which core a particular task will execute along with the start and end times. Cyclic multi-threadingA cyclic multi-threading parallelizing compiler tries to split up a loop so that each iteration can be executed on a separate processor concurrently. Compiler parallelization analysisThe compiler usually conducts two passes of analysis before actual parallelization in order to determine the following:
The first pass of the compiler performs a data dependence analysis of the loop to determine whether each iteration of the loop can be executed independently of the others. Data dependence can sometimes be dealt with, but it may incur additional overhead in the form of message passing, synchronization of shared memory, or some other method of processor communication. The second pass attempts to justify the parallelization effort by comparing the theoretical execution time of the code after parallelization to the code's sequential execution time. Somewhat counterintuitively, code does not always benefit from parallel execution. The extra overhead that can be associated with using multiple processors can eat into the potential speedup of parallelized code. ExampleA loop is called DOALL if all of its iterations, in any given invocation, can be executed concurrently. The Fortran code below is DOALL, and can be auto-parallelized by a compiler because each iteration is independent of the others, and the final result of array There are many pleasingly parallel problems that have such DOALL loops. For example, when rendering a ray-traced movie, each frame of the movie can be independently rendered, and each pixel of a single frame may be independently rendered. On the other hand, the following code cannot be auto-parallelized, because the value of This does not mean that the code cannot be parallelized. Indeed, it is equivalent to However, current parallelizing compilers are not usually capable of bringing out these parallelisms automatically, and it is questionable whether this code would benefit from parallelization in the first place. Pipelined multi-threading{{main | software pipelining }}A pipelined multi-threading parallelizing compiler tries to break up the sequence of operations inside a loop into a series of code blocks, such that each code block can be executed on separate processors concurrently. There are many pleasingly parallel problems that have such relatively independent code blocks, in particular systems using pipes and filters. For example, when producing live broadcast television, the following tasks must be performed many times a second:
A pipelined multi-threading parallelizing compiler could assign each of these 6 operations to a different processor, perhaps arranged in a systolic array, inserting the appropriate code to forward the output of one processor to the next processor. Recent research focuses on using the power of GPU's[4] and multicore systems[5] to compute such independent code blocks( or simply independent iterations of a loop) at runtime. The memory accessed (whether direct or indirect) can be simply marked for different iterations of a loop and can be compared for dependency detection. Using this information, the iterations are grouped into levels such that iterations belonging to the same level are independent of each other, and can be executed in parallel. DifficultiesAutomatic parallelization by compilers or tools is very difficult due to the following reasons:[6]
WorkaroundDue to the inherent difficulties in full automatic parallelization, several easier approaches exist to get a parallel program in higher quality. They are:
Parallelizing compilers and tools{{See also|Automatic parallelization tool}}Most research compilers for automatic parallelization consider Fortran programs,{{Citation needed|date=July 2007}} because Fortran makes stronger guarantees about aliasing than languages such as C. Typical examples are:
References1. ^"{{cite journal|last=Yehezkael|first=Rafael|title=Experiments in Separating Computational Algorithm from Program Distribution and Communication|journal=Lecture Notes in Computer Science of Springer Verlag |year=2000|volume=1947|pages=268–278}}" 2. ^{{cite book | last = Fox | first = Geoffrey |author2=Roy Williams |author3=Paul Messina | title = Parallel Computing Works! | year = 1994 | publisher = Morgan Kaufmann | pages = 575, 593 | isbn = 978-1-55860-253-3 | page = }} 3. ^Simone Campanoni, Timothy Jones, Glenn Holloway, Gu-Yeon Wei, David Brooks."The HELIX Project: Overview and Directions".2012. 4. ^J Anantpur, R Govindarajan, Runtime dependence computation and execution of loops on heterogeneous systems {{cite web|url=http://hpc.serc.iisc.ernet.in/~jayvant/papers/CGO-2013.pdf |title=Archived copy |accessdate=2015-10-05 |deadurl=yes |archiveurl=https://web.archive.org/web/20151006123251/http://hpc.serc.iisc.ernet.in/~jayvant/papers/CGO-2013.pdf |archivedate=2015-10-06 |df= }} 5. ^X. Zhuang, A. E. Eichenberger, Y. Luo, Kevin O’Brien, Kathryn, Exploiting Parallelism with Dependence-Aware Scheduling 6. ^{{cite web|url=http://blitzprog.org/posts/automatic-parallelism-and-data-dependency |title=Automatic parallelism and data dependency |deadurl=yes |archiveurl=https://web.archive.org/web/20140714111836/http://blitzprog.org/posts/automatic-parallelism-and-data-dependency |archivedate=2014-07-14 |df= }} 7. ^{{cite journal|last=Rünger|first=Gudula|title=Parallel Programming Models for Irregular Algorithms|journal=Parallel Algorithms and Cluster Computing|year=2006|volume=52|pages=3–23|doi=10.1007/3-540-33541-2_1}} See also
2 : Articles with example Fortran code|Compiler optimizations |
随便看 |
|
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。