Martijn + Bram

advertisement
Sorting with multicore
SIMD processors
Martijn van den Heuvel – 0547028
Bram Kersten – 0537059
December 2007
December 2007
Presentation Contents:
•Introduction
•Sorting basics
•Parallel sorting using SIMD
•AA-Sort algorithm
•Overview
•In-core algorithm
•Out-of-core algorithm
•Experimental results
•Paper discussion
•Questions
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
Introduction:
AA-Sort: A New Parallel Sorting Algorithm for MultiCore SIMD Processors (IEEE, September 2007)
•Hiroshi Inoue
•Takao Moriyama
IBM Tokyo Research Laboratory
•Hideaki Komatsu
•Toshio Nakatani
AA-Sort: Aligned-Access sort
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
Sorting basics:
Sorting is very important.
Algorithms:
•Bubblesort
•CombSort
•MergeSort
•QuickSort
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
Sorting basics:
Bottlenecks in sorting algorithms:
•Branch mispredictions
•Most popular algorithms are not suitable for
exploiting SIMD instructions
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
Presentation Contents:
•Introduction
•Sorting basics
•Parallel sorting using SIMD
•AA-Sort algorithm
•Overview
•In-core algorithm
•Out-of-core algorithm
•Experimental results
•Paper discussion
•Questions
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
Parallel sorting using SIMD:
•Devide the data into smaller blocks that fit into a
processors cache.
•Use aligned memory access
•Use SIMD instructions:
•Vector compare
•Vector select
•Vector permutation
•Merge all blocks.
•No branche misprediction……
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
Presentation Contents:
•Introduction
•Sorting basics
•Parallel sorting using SIMD
•AA-Sort algorithm
•Overview
•In-core algorithm
•Out-of-core algorithm
•Experimental results
•Paper discussion
•Questions
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort overview:
AA-Sort executes 3 phases:
•Divide all of the data to be sorted into blocks
that fit in the cache or the local memory of the
processor
•Sort each block with the in-core sorting
algorithm in parallel by multiple threads, where
each thread processes an independent block.
•Merge the sorted blocks with the out-of-core
sorting algorithm by multiple threads
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort overview:
•We assume processors having 128 bit SIMD
registers
•Sorting 32 bit integers
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort in-core algorithm:
•In-vector sorting using SIMD instructions.
•Vector compare
•Vector select
5
2
7
4
3
1
0
6
9 …
2
4
5
7
0
1
3
6
9 …
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort in-core algorithm:
•Transpose the registers.
3
6
10
14
3
2
0
5
2
4
8
11
6
4
1
9
0
1
6
13
10
8
6
12
5
9
12
15
14
11
13
15
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort in-core algorithm:
•Apply a modified version of combSort to the
transposed registers
3
2
0
5
6
4
1
9
10
8
6
12
14
11
13
15
Sorting with multicore SIMD processors
3
2
0
5
Gap = 3
6
4
1
9
Vector_
cmpswap
10
8
6
12
14
11
13
15
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort in-core algorithm:
•Apply a modified version of combSort to the
transposed registers
3
2
0
5
6
4
1
9
10
8
6
12
14
11
13
15
Sorting with multicore SIMD processors
3
6
4
1
Gap = 3
2
10
8
6
Vector_
cmpswap_
Skew *3
0
14
11
13
5
9
12
15
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort in-core algorithm:
•Apply a modified version of combSort to the
transposed registers
3
6
4
1
2
10
8
6
0
14
11
13
5
9
12
15
Sorting with multicore SIMD processors
0
6
4
1
Gap = 2
2
9
8
6
Vector_
cmpswap
3
14
11
13
5
10
12
15
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort in-core algorithm:
•Apply a modified version of combSort to the
transposed registers
0
6
4
1
2
9
8
6
3
14
11
13
5
10
12
15
Sorting with multicore SIMD processors
0
6
14
11
Gap = 2
2
9
10
12
Vector_
cmpswap_
skew
3
4
1
13
5
8
6
15
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort in-core algorithm:
•Apply a modified version of combSort to the
transposed registers
0
6
14
11
2
9
10
12
3
4
1
13
5
8
6
15
Sorting with multicore SIMD processors
0
6
10
11
Gap = 1
2
4
1
12
Vector_
cmpswap
3
8
6
13
5
9
14
15
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort in-core algorithm:
•Apply a modified version of combSort to the
transposed registers
0
6
10
11
2
4
1
12
3
8
6
13
5
9
14
15
Sorting with multicore SIMD processors
0
6
10
14
Gap = 1
2
4
1
12
Vector_
cmpswap_
skew
3
8
6
13
5
9
11
15
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort in-core algorithm:
•Apply a modified version of combSort to the
transposed registers
0
6
10
14
2
4
1
12
3
8
6
13
5
9
11
15
Sorting with multicore SIMD processors
0
4
1
12
Gap = 1
2
6
6
13
Vector_
cmpswap
3
8
10
14
5
9
11
15
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort in-core algorithm:
•Apply a modified version of combSort to the
transposed registers
0
4
1
12
2
6
6
13
3
8
10
14
5
9
11
15
Sorting with multicore SIMD processors
0
5
9
12
Gap = 1
2
6
6
13
Vector_
cmpswap_
skew
3
8
10
14
4
1
11
15
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort in-core algorithm:
•Apply a modified version of combSort to the
transposed registers
0
5
9
12
2
6
6
13
3
8
10
14
4
1
11
15
Sorting with multicore SIMD processors
0
4
8
12
Gap = 1
1
5
9
13
Vector_
cmpswap*
2
6
10
14
3
6
11
15
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort in-core algorithm:
•Transpose back to origional order
0
4
8
12
1
5
9
13
2
6
10
3
6
11
0
1
2
3
4
5
6
6
14
8
9
10
11
15
12
13
14
15
Sorting with multicore SIMD processors
Transpose
Martijn van den Heuvel & Bram Kersten
December 2007
Presentation Contents:
•Introduction
•Sorting basics
•Parallel sorting using SIMD
•AA-Sort algorithm
•Overview
•In-core algorithm
•Out-of-core algorithm
•Experimental results
•Paper discussion
•Questions
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort out-of-core algorithm:
Odd-even merge is implemented with SIMD instructions
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
Presentation Contents:
•Introduction
•Sorting basics
•Parallel sorting using SIMD
•AA-Sort algorithm
•Overview
•In-core algorithm
•Out-of-core algorithm
•Experimental results
•Paper discussion
•Questions
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort experimental results:
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort experimental results:
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort experimental results:
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
AA-Sort experimental results:
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
Presentation Contents:
•Introduction
•Sorting basics
•Parallel sorting using SIMD
•AA-Sort algorithm
•Overview
•In-core algorithm
•Out-of-core algorithm
•Experimental results
•Paper discussion
•Questions
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
Strong points:
•Good scalability
•Use of data locality
•Data independent
•Tested on up to date hardware
•Convincing and clear paper
•Clear use of Pseudocode
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
Weak points:
•GPUTerasort is optimized for GPU so comparison
on a GPU would be nice.
•We don’t expect GPUTerrasort will outperform AASort
•In-core results are dependent on Heuristics
(Shrink factor)
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
Applicability :
•Searching
•Database management systems
•Scientific Applications
•Depth buffer
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
The future :
•Integration in compilers
•Scalability with even more processor cores
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
Presentation Contents:
•Introduction
•Sorting basics
•Parallel sorting using SIMD
•AA-Sort algorithm
•Overview
•In-core algorithm
•Out-of-core algorithm
•Experimental results
•Paper discussion
•Questions
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
Sources :
•AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD
Processors. Hiroshi Inoue, Takao Moriyama, Hideaki Komatsu and
Toshio Nakatani. IBM Tokyo Research Department (sep 2007)
•Using SIMD Registers and Instructions to Enable Instruction-Level
Parallelism in Sorting Algorithms. Timothy Furtak, José Nelson
Amaral, Robert Niewiadomski Department of Computing Science
University of Arberta (jun 2007)
•GPUTeraSort: High Perdormance Graphics Co-processor Sorting
for Large Database Management. Naga K Govindaraju, Jim Gray,
Ritesg Kumar, Dinesh Manocha. (jun 2006)
•Odd-Even mergesort, www.iti.fhflensburg.de/lang/algorithmen/sortieren/networks/oemen.htm (dec
2007)
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
December 2007
Questions:
???
Sorting with multicore SIMD processors
Martijn van den Heuvel & Bram Kersten
Download