词条 | Streaming SIMD Extensions |
释义 |
In computing, Streaming SIMD Extensions (SSE) is a single instruction, multiple data (SIMD) instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series of Central processing units (CPUs) shortly after the appearance of Advanced Micro Devices (AMD's) 3DNow!. SSE contains 70 new instructions, most of which work on single precision floating point data. SIMD instructions can greatly increase performance when the exact same operations are to be performed on multiple data objects. Typical applications are digital signal processing and graphics processing. Intel's first IA-32 SIMD effort was the MMX instruction set. MMX had two main problems: it re-used existing x87 floating point registers making the CPUs unable to work on both floating point and SIMD data at the same time, and it only worked on integers. SSE floating point instructions operate on a new independent register set, the XMM registers, and adds a few integer instructions that work on MMX registers. SSE was subsequently expanded by Intel to SSE2, SSE3, SSSE3, and SSE4. Because it supports floating point math, it had wider applications than MMX and became more popular. The addition of integer support in SSE2 made MMX a largely redundant code, though further performance increases can be attained in some situations{{when|date=November 2017}} by using MMX in parallel with SSE operations. SSE was originally called Katmai New Instructions (KNI), Katmai being the code name for the first Pentium III core revision. During the Katmai project Intel sought to distinguish it from their earlier product line, particularly their flagship Pentium II. It was later renamed Internet Streaming SIMD Extensions (ISSE[1]), then SSE. AMD eventually added support for SSE instructions, starting with its Athlon XP and Duron (Morgan core) processors. RegistersSSE originally added eight new 128-bit registers known as SSE used only a single data type for XMM registers:
SSE2 would later expand the usage of the XMM registers to include:
Because these 128-bit registers are additional machine states that the operating system must preserve across task switches, they are disabled by default until the operating system explicitly enables them. This means that the OS must know how to use the The first CPU to support SSE, the Pentium III, shared execution resources between SSE and the floating point unit (FPU).[1] While a compiled application can interleave FPU and SSE instructions side-by-side, the Pentium III will not issue an FPU and an SSE instruction in the same clock cycle. This limitation reduces the effectiveness of pipelining, but the separate XMM registers do allow SIMD and scalar floating point operations to be mixed without the performance hit from explicit MMX/floating point mode switching. SSE instructionsSSE introduced both scalar and packed floating point instructions. Floating point instructions
Integer instructions
Other instructions
ExampleThe following simple example demonstrates the advantage of using SSE. Consider an operation like vector addition, which is used very often in computer graphics applications. To add two single precision, four-component vectors together using x86 requires four floating-point addition instructions. vec_res.x = v1.x + v2.x; vec_res.y = v1.y + v2.y; vec_res.z = v1.z + v2.z; vec_res.w = v1.w + v2.w; This corresponds to four x86 FADD instructions in the object code. On the other hand, as the following pseudo-code shows, a single 128-bit 'packed-add' instruction can replace the four scalar addition instructions. movaps xmm0, [v1] ;xmm0 = v1.w | v1.z | v1.y | v1.x addps xmm0, [v2] ;xmm0 = v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x movaps [vec_res] ;xmm0 Later versions
Software and hardware issuesWith all x86 instruction set extensions, it is up to the BIOS, operating system and application programmer to test and detect their existence and proper operation.
User application uptake of the x86 extensions has been slow with even bare minimum baseline MMX and SSE support (in some cases) being non-existent by applications some 10 years after these extensions became commonly available. Distributed computing has accelerated the use of these extensions in the scientific community—and many scientific applications refuse to run unless the CPU supports SSE2 or SSE3. The use of multiple revisions of an application to cope with the many different sets of extensions available is the simplest way around the x86 extension optimization problem. Software libraries and some applications have begun to support multiple extension types hinting that full use of available x86 instructions may finally become common some 5 to 15 years after the instructions were initially introduced. IdentifyingProcessor ID applications
References1. ^1 {{cite web|url=http://docencia.ac.upc.edu/ETSETB/SEGPAR/microprocessors/pentium3%20(mpr).pdf|author=Diefendorff, Keith|date=March 8, 1999|title=Pentium III = Pentium II + SSE: Internet SSE Architecture Boosts Multimedia Performance|journal=Microprocessor Report. Volume 13, Number 3.|accessdate=September 1, 2017}} 2. ^{{cite news| url=https://www.theregister.co.uk/2007/08/30/amd_sse5/| title=AMD plots single thread boost with x86 extensions| publisher=The Register| first=Ashlee| last=Vance| date=August 3, 2007| accessdate=August 24, 2017}} 3. ^{{cite web| url=http://developer.amd.com/wordpress/media/2012/10/AMD64_128_Bit_SSE5_Instrs.pdf| title=AMD64 Technology: 128-Bit SSE5 Instruction Set| date=August 2007| publisher=AMD| accessdate=August 24, 2017}} 4. ^{{cite web| url=https://support.amd.com/TechDocs/43479.pdf| title=AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 6: 128-Bit and 256-Bit XOP and FMA4 Instructions| date=November 2009| publisher=AMD| accessdate=August 24, 2017}} 5. ^{{cite web| last=Girkar| first=Milind| url=https://software.intel.com/en-us/isa-extensions/intel-avx| title=Intel® Advanced Vector Extensions (Intel® AVX)| publisher=Intel| date=October 1, 2013| accessdate=August 24, 2017}} 6. ^{{cite web| url=https://www.intel.com/content/www/us/en/support/processors/000005651.html| title=Download the Intel® Processor Identification Utility| date=July 24, 2017| publisher=Intel| accessdate=August 24, 2017}} External links
2 : SIMD computing|X86 instructions |
随便看 |
|
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。