Exploratie van de ontwerpruimte 2. De Hardware/software-grens Prof. dr. ir. Dirk Stroobandt Academiejaar 2004-2005 Inhoud (deel 1) Inleiding over Ingebedde systemen, System-on-Chip en Platformgebaseerd ontwerp Systeemspecificatietechnieken Exploratie van de ontwerpruimte – Prestatiematen – De hardware/software-grens • • • • Raamwerk voor architectuurexploratie Hoogniveautransformaties Hardware/software-partitionering Exploratietools – Prototypes, emulatie en simulatie 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -2- Ontwerptraject Architectuurexploratie Verificatie Systeemspecificatie Platformontwerp Analoog ontwerp HW en Hardware/software-partitionering SW Hardware-ontwerp Communicatie Softwarecompilatie Logisch ontwerp Componentselectie Simulatie Hoogniveausynthese Interfacesynthese Fysisch ontwerp Testing 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -3- Inhoud (deel 1) Inleiding over Ingebedde systemen, System-on-Chip en Platformgebaseerd ontwerp Systeemspecificatietechnieken Exploratie van de ontwerpruimte – Prestatiematen – De hardware/software-grens • • • • Raamwerk voor architectuurexploratie Hoogniveautransformaties Hardware/software-partitionering Exploratietools – Prototypes, emulatie en simulatie 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -4- Raamwerk voor architectuurexploratie • Gegeven: initiële specificatie – Beschrijving van het systeemgedrag op algoritmeniveau – Een set van vereisten (snelheid, oppervlakte, vermogen, …) 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -5- Raamwerk voor architectuurexploratie • Gevraagd: systeemarchitectuurspecificatie – Definitie van architecturale componenten en interfaces – Eerste partitionering van de algoritmische beschrijving in segmenten die elk door een andere architecturale component geïmplementeerd zullen worden – Een grondplan van de systeemarchitectuur – Budgetten voor tijd, oppervlakte, vermogen voor de architecturale componenten 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -6- Raamwerk voor architectuurexploratie 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -7- Iteratieve aanpak • Geleid door prestatieanalyse 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -8- Hoe vind je heuristisch goede oplossingen? • Types van hardware hulpbronnen – Hulpbronnen inherent aan het algoritme • Functionele eenheden (vermenigvuldigers, optellers, …) • Redelijk goed te schatten op systeemniveau – Hulpbronnen voor implementatie-overhead • Al de rest (controlelogica, bussen, MUXen, registers, draden) • Niet (nauwelijks) te schatten op systeemniveau (zeker niet voor dedicated hardware) • Oplossing: taxonomie op basis van potentieel om specifieke eigenschappen te gebruiken die een minimale implementatie-overhead genereren 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -9- Voorbeelden van eigenschappen met weinig implementatie-overhead • Lokaliteit van berekeningen – Hoge lokaliteit = berekeningen zijn geïsoleerd, hebben een sterk geïnterconnecteerde deelgraaf – Veel meer data tussen knopen binnen de cluster dan naar buiten • Regelmatigheid – Herhaalde berekening van bepaalde functiepatronen – Hergebruik van berekeningselementen • Korte tijdsafhankelijkheden – Liefst lokaliteit in de tijd: vermijden van globale bussen 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -10- Inhoud (deel 1) Inleiding over Ingebedde systemen, System-on-Chip en Platformgebaseerd ontwerp Systeemspecificatietechnieken Exploratie van de ontwerpruimte – Prestatiematen – De hardware/software-grens • • • • Raamwerk voor architectuurexploratie Hoogniveautransformaties Hardware/software-partitionering Exploratietools – Prototypes, emulatie en simulatie 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -11- Voorbereidende taken • Task level concurrency management Which tasks in the final system? • High level transformations Transformations that are outside the scope of traditional compilers 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -12- Task-level concurrency management • Granularity: size of tasks (e.g. in instructions) • Readable specifications and efficient implementations can possibly require different task structures. Granularity changes 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -13- Merging of tasks • Reduced overhead of context switches, • More global optimization of machine code, • Reduced overhead for inter-process/task communication. 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -14- Splitting of tasks • No blocking of resources while waiting for input, • More flexibility for scheduling, possibly improved result. 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -15- Merging and splitting of tasks • The most appropriate task graph granularity depends upon the context merging and splitting may be required. • Merging and splitting of tasks should be done automatically, depending upon the context. 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -16- High-level optimizations • To improve (potentially) the efficiency of embedded software and hardware 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -17- Floating-point to fixed point conversion • Pros – – – – – Lower cost Faster Lower power consumption Sufficient SNR, if properly scaled Suitable for portable applications • Cons – Decreased dynamic range – Finite word-length effect, unless properly scaled • Overflow and excessive quantization noise – Extra programming effort © Ki-Il Kum, et al. (Seoul National University): A Floating-point To Fixed-point C Converter For Fixed-point Digital Signal Processors, 2nd SUIF Workshop, 1996 -18- Fixed-Point Data Format • Floating-Point vs. FixedPoint – exponent, mantissa – Floating-Point • automatic computation and update of each exponent at run-time • Integer vs. Fixed-Point S 1 0 0 . . . 0 0 0 0 1 0 (a) Integer IWL=3 FWL – Fixed-Point • implicit exponent • determined off-line S 1 0 0 . . . 0 0 0 0 1 0 hypothetical binary point (b) Fixed-Point © Ki-Il Kum, et al -19- Assignment and Addition/Subtraction • Assume y = x, with - x (IWL=2) and - y (IWL=3): x x>>1 Let result = x + y: equalizing each IWL x s x>>1 s s s + y y s s result s © Ki-Il Kum, et al -20- Development Procedure Floating-Point C Program Range Estimator FloatingPoint to Fixed-Point C Program Converter Fixed-Point C Program Range Estimation C Program Execution Manual specification IWL information -21-et al © Ki-Il Kum, Performance Comparison - Machine Cycles Cycles 4000 Fourth Order IIR Filter 2980 3000 2000 1000 215 0 Fixed-Point (16b) Floating-Point © Ki-Il Kum, et al -22- Performance Comparison - Machine Cycles Cycles ADPCM 140000 125249 120000 100000 80000 61401 60000 40000 26718 20000 0 Fixed-Point (16b) Fixed-Point (32b) Floating-Point © Ki-Il Kum, et al -23- Performance Comparison - SNR Fixed-Point (16b) Fixed-Point (32b) Floating-Point ADPCM SNR (dB) 25 20 15 10 5 0 A B C D © Ki-Il Kum, et al -24- Fridge • RWTH Aachen, commercialized by Synopsys as part of the CoCentric tool suite. • Used type definition features of C++ to define types Fixed and fixed. • Using types in declarations: fixed a, *b, c[8] • Defining types in assignments: a= fixed(5,4,wt,*b) truncation Word-length Wrap-around Fractional wordlength 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -25- Other work on the topic • Fridge (RWTH Aachen), commercialized by Synopsys • Some support in Simulink (MATLAB toolbox) • .. hundreds of papers on the topic. 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -26- Simple loop transformations: Loop permutation • Row major order • Array p[j][k] • Column major order … j=0 j=1 j=2 k=0 j=0 j=1 … k=1 j=0 j=1 … k=2 j=0 j=1 … … 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -27- Simple loop transformations: Loop permutation Two loops, assuming row major order (C): for (k=0; k<=m; k++) for (j=0; j<=n; j++) for (j=0; j<=n; j++) ) for (k=0; k<=m; k++) p[j][k] = ... p[j][k] = ... Poor cache behavior Good cache behavior • For row major order 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -28- Loop fusion, loop fission for(j=0; j<=n; j++) p[j]= ... ; for (j=0; j<=n; j++) , p[j]= p[j] + ... for (j=0; j<=n; j++) {p[j]= ... ; p[j]= p[j] + ...} Loops small enough to allow zero overhead loops Better locality for access to p. Better chances for parallel execution. Which of the two versions is best? 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -29- Loop unrolling for (j=0; j<=n; j++) p[j]= ... ; for (j=0; j<=n; j+=2) {p[j]= ... ; p[j+1]= ...} % factor = 2 Less branches per execution of the loop. More opportunities for optimizations. Tradeoff between code size and improvement. Extreme case: completely unrolled loop (no branch). 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -30- Loop tiling/loop blocking Original version for (i=1; i<=N; i++) for(k=1; k<=N; k++){ r=X[i,k]; /* to be allocated to a register*/ for (j=1; j<=N; j++) Z[i,j] += r* Y[k,j] } % Never reusing information in the cache for Y and Z if N is large or cache is small (2 N³ references for Z and Y + N² references for X). i++ k++ k++ j++ j++ i++ j++ k++ i++ 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -31- Loop tiling/loop blocking tiled version for (kk=1; kk<= N; kk+=B) Reuse factor of B for Z and Y, for (jj=1; jj<= N; jj+=B) O(N³/B) accesses for (i=1; i<= N; i++) to main memory for (k=kk; k<= min(kk+B-1,N); k++){ r=X[i][k]; /* to be allocated to a register*/ for (j=jj; j<= min(jj+B-1, N); j++) Z[i][j] += r* Y[k][j] Same elements for next iteration of i } kk jj k++ j++ k++ jj i++ i++ kk kk i++ j++ jj k++, j++ 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -32- High-level transformations Example: Separation of margin handling many ifstatements for margin-checking 2004-2005 no checking, efficient + only few margin elements to be processed Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -33- Loop nest splitting MPEG-4 full search motion estimation for (z=0; z<20; z++) for (x=0; x<36; x++) {x1=4*x; for (y=0; y<49; y++) {y1=4*y; for (k=0; k<9; k++) {x2=x1+k-4; for (l=0; l<9; ) {y2=y1+l-4; for (i=0; i<4; i++) {x3=x1+i; x4=x2+i; for (j=0; j<4;j++) {y3=y1+j; y4=y2+j; if (x3<0 || 35<x3||y3<0||48<y3) then_block_1; else else_block_1; if (x4<0|| 35<x4||y4<0||48<y4) then_block_2; else else_block_2; }}}}}} if (x>=10||y>=14) for (y=0; y<49; y++) for (k=0; k<9; k++) for (l=0; l<9;l++ ) for (i=0; i<4; i++) for (j=0; j<4;j++) { then_block_1; then_block_2} else {y1=4*y; for (k=0; k<9; k++) {x2=x1+k-4; for (l=0; l<9; ) {y2=y1+l-4; for (i=0; i<4; i++) {x3=x1+i; x4=x2+i; for (j=0; j<4;j++) {y3=y1+j; y4=y2+j; if (0 || 35<x3 ||0 || 48<y3) analysis of polyhedral domains, then-block-1; else else-block-1; selection with genetic algorithm if (x4<0|| 35<x4||y4<0||48<y4) for (z=0; z<20; z++) then_block_2; else else_block_2; for (x=0; x<36; x++) {x1=4*x; }}}}}} for (y=0; y<49; y++) [H. Falk et al., Inf 12, UniDo, 2002] 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -34- Results for loop nest splitting - Execution times 110% Cavity Motion Estimation QSDPCM 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 2004-2005 7 M AR 7 M AR Av er ag e ar m b th m 6x TI C Tr iM ed ia Po w er PC DE C Al ph a IP S M HP Pe nt iu m Su n 0% [H. Falk et al., Inf 12, UniDo, 2002] Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -35- Results for loop nest splitting - Code sizes 200% Cavity Motion Estimation QSDPCM 180% 160% 140% 120% 100% 80% 60% 40% 20% 2004-2005 7 M AR 7 M AR Av er ag e ar m b th m 6x TI C Tr iM ed ia Po w er PC DE C Al ph a IP S M HP Pe nt iu m Su n 0% Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen [Falk, 2002] -36- Array folding • Initial arrays 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -37- Array folding • Unfolded arrays 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -38- Intra-array folding Inter-array folding 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -39- Function inlining: advantages and limitations Advantage: low calling overhead Limitations: • Not all functions are candidates. • Code size explosion. • Requires manual identification using ‘inline’ qualifier. push PC; push b; BRA sq; pull R1; Function sq(c:integer) mul R1,R1,R1; pull R2; return:integer; push R1; begin BRA (R2)+1; Goal: return c*c pull R1; • Controlled code size end; ST R1,a; .... • Automatic identification of a=sq(b); .... suitable functions. .... LD R1,b; MUL R1,R1,R1; ST R1,a .... Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005 -40- # relative to no inlining . Results for GSM speech and channel encoder: #calls, #cycles (TI ‘C62xx) 100 90 80 70 60 50 40 30 20 10 0 100 105 110 115 120 125 130 135 140 145 150 calls L [%] cycles 33% speedup for 25% increase in code size. # of cycles not a monotonically decreasing function of the code size! 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -41- Inline vectors computed by B&B algorithm size limit (%) 100 105 110 115 120 125 130 135 140 145 150 inline vector (functions 1-26) 00000000000000000000000000 00100000001100001110111111 10111001011100001111111111 10110000000001001000111001 10110100101000100110111101 10110000001010000100111101 00110000000010100100111000 10110010001110101110111101 10111011111110101111111111 10110110101010100110111101 10110110000010110110111101 Major changes for each new size limit. Difficult to generate manually. References: • J. Teich, E. Zitzler, S.S. Bhattacharyya. 3D Exploration of Software Schedules for DSP Algorithms, CODES’99 • R. Leupers, P.Marwedel: Function Inlining under Code Size Constraints for Embedded Processors ICCAD, 1999 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -42- Inhoud (deel 1) Inleiding over Ingebedde systemen, System-on-Chip en Platformgebaseerd ontwerp Systeemspecificatietechnieken Exploratie van de ontwerpruimte – Prestatiematen – De hardware/software-grens • • • • Raamwerk voor architectuurexploratie Hoogniveautransformaties Hardware/software-partitionering Exploratietools – Prototypes, emulatie en simulatie 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -43- Hardware/software partitioning No need to consider special purpose hardware in the long run? Maybe correct for fixed functionality, wrong in general, since “By the time MPEG-n can be implemented in software, MPEGn+1 has been invented” [de Man] Functionality to be implemented in software or in hardware? 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -44- Functionality to be implemented in software or in hardware? Decision based on hardware/ software partitioning, a special case of hardware/ software codesign. 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -45- Codesign Tool (COOL) as an example of HW/SW partitioning Inputs to COOL: 1. Target technology • Available HW platforms • Multiprocessors OK but all of same type • ASIC: synthesizable (technology library) 2. Design constraints • Throughput, latency, memory size, area 3. Required behavior • Hierarchical task graphs • Behaviour specified in VHDL • Communication edges and timing edges 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -46- Hardware/software codesign: approach Specification Mapping Processor P1 Processor P2 Hardware [Niemann, Hardware/Software Co-Design for Data Flow Dominated Embedded Systems, Kluwer Academic Publishers, 1998 (Comprehensive mathematical model)] 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -47- Steps of the COOL partitioning algorithm (1) 1. Translation of the behavior into an internal graph model 2. Translation of the behavior of each node from VHDL into C 3. Compilation • • • All C programs compiled for the target processor, Computation of the resulting program size, estimation of the resulting execution time (simulation input data might be required) 4. Synthesis of hardware components: leaf node, application-specific hardware is synthesized. High-level synthesis must be sufficiently fast! 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -48- Steps of the COOL partitioning algorithm (2) 5. Flattening of the hierarchy: Granularity used by the designer is maintained. Cost and performance information added to the nodes. Precise information required for partitioning is precomputed 6. Generating and solving a mathematical model of the optimization problem: Integer programming IP model for optimization. Optimal with respect to the cost function (approximates communication time) 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -49- Steps of the COOL partitioning algorithm (3) 7. Iterative improvements: Adjacent nodes mapped to the same hardware component are now merged. 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -50- Steps of the COOL partitioning algorithm (4) 8. Interface synthesis: After partitioning, the glue logic required for interfacing processors, application-specific hardware and memories is created. 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -51- Integer programming models Ingredients: 1. Cost function 2. Constraints Cost function Involving linear expressions of integer variables from a set X C a x x i X Constraints:j J : b i, j i i with ai R, x i ℕ N (1) xi c j with bi , j , c j ℝ R (2) x i X Def.: The problem of minimizing (1) subject to the constraints (2) is called an integer programming (IP) problem. If all xi are constrained to be either 0 or 1, the IP problem said to be a 0/1 integer programming problem. 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -52- Example C 5 x1 6 x2 4 x3 x1 x 2 x3 2 x1, x 2 , x3 {0,1} C Optimal 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -53- Remarks on integer programming Maximizing the cost function can be done by setting C‘=-C Integer programming is NP-complete. In practice, running times can increase exponentially with the size of the problem, but problems of some thousands of variables can still be solved with commercial solvers, depending on the size and structure of the problem. IP models can be a good starting point for modeling, even if in the end heuristics have to be used solve them. Stroobandt: Ontwerpmethodologie van Complexe Systemen 2004-2005to Dirk -54- An IP model for HW/SW partitioning Notation: Index set I denotes task graph nodes. Index set L denotes task graph node types e.g. square root, DCT or FFT Index set KH denotes hardware component types. e.g. hardware components for the DCT or the FFT. Index set J of hardware component instances Index set KP denotes processors. All processors are assumed to be of the same type 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -55- An IP model for HW/SW partitioning Xi,k: =1 if node vi is mapped to hardware component type k KH and 0 otherwise. Yi,k: =1 if node vi is mapped to processor k KP and 0 otherwise. NYl,k =1 if at least one node of type l is mapped to processor k KP and 0 otherwise. T is a mapping from task graph nodes to their types: T: I L The cost function accumulates the cost of hardware units: C = cost(processors) + cost(memories) + cost(application specific hardware) 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -56- Constraints Operation assignment constraints i I : X kKH i ,k Y i ,k 1 kKP All task graph nodes have to be mapped either in software or in hardware. Variables are assumed to be integers. Additional constraints to guarantee they are either 0 or 1: i I : k KH : X i ,k 1 i I : k KP : Yi ,k 1 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -57- Operation assignment constraints (2) lL, i:T(vi)=cl, k KP: NYl,k Yi,k For all types l of operations and for all nodes i of this type: if i is mapped to some processor k, then that processor must implement the functionality of l. Decision variables must also be 0/1 variables: lL, k KP: NYl,k 1. 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -58- Resource & design constraints • k KH, the cost (area) used for components of that type is calculated as the sum of the costs of the components of that type. This cost should not exceed its maximum. • k KP, the cost for associated data storage area should not exceed its maximum. • k KP the cost for storing instructions should not exceed its maximum. • The total cost (k KH) of HW components should not exceed its maximum • The total cost of data memories (k KP) should not exceed its maximum • The total cost instruction memories (k KP) should not exceed its maximum 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -59- Scheduling v1 v2 v3 v4 Processor e3 v5 v6 e4 v7 FIR1 p1 FIR2 ASIC h1 v8 Communication channel c1 v9 v10 v11 2004-2005 FIR2 on h1 p1 c1 ... v ... v 3 4 ... v ... v 7 8 ...e ...e 3 4 or ... v ... v 4 3 or ... v ... v 8 7 or ...e ...e 4 3 t t t Scheduling / precedence constraints • For all nodes vi1 and vi2 that are potentially mapped to the same processor or hardware component instance, introduce a binary decision variable bi1,i2 with bi1,i2=1 if vi1 is executed before vi2 and = 0 otherwise. Define constraints of the type (end-time of vi1) (start time of vi2) if bi1,i2=1 and (end-time of vi2) (start time of vi1) if bi1,i2=0 • Ensure that the schedule for executing operations is consistent with the precedence constraints in the task graph. 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -61- Other constraints Timing constraints These constraints can be used to guarantee that certain time constraints are met. Some less important constraints omitted .. 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -62- Example HW types H1, H2 and H3 with costs of 20, 25, and 30. Processors of type P. Tasks T1 to T5. Execution times: T 1 2 3 4 5 2004-2005 H1 20 H2 H3 20 12 12 20 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen P 100 100 10 10 100 -63- Operation assignment constraints (1) T 1 2 3 4 5 H1 20 H2 H3 20 12 12 20 P 100 100 10 10 100 i I : X i ,k kKH Y i ,k 1 kKP X1,1+Y1,1=1 (task 1 mapped to H1 or to P) X2,2+Y2,1=1 X3,3+Y3,1=1 X4,3+Y4,1=1 X5,1+Y5,1=1 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -64- Operation assignment constraints (2) Assume types of tasks are l=1, 2, 3, 3, and 1. lL, i:T(vi)=cl, k KP: NYl,k Yi,k Functionality 3 to be implemented on processor if node 4 is mapped to it. 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -65- Other equations Time constraints leading to: Application specific hardware required for time constraints under 100 time units. T 1 2 3 4 5 H1 20 H2 H3 20 12 12 20 P 100 100 10 10 100 Cost function: C=20 #(H1) + 25 #(H2) + 30 # (H3) + cost(processor) + cost(memory) 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -66- Result For a time constraint of 100 time units and cost(P)<cost(H3): T 1 2 3 4 5 H1 20 H2 H3 20 12 12 20 P 100 100 10 10 100 Solution (educated guessing) : T1 H1 T2 H2 T3 P T4 P T5 H1 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -67- Separation of scheduling and partitioning Combined scheduling/partitioning very complex; Heuristic: 1. Compute estimated schedule 2. Perform partitioning for estimated schedule 3. Perform final scheduling 4. If final schedule does not meet time constraint, go to 1 using a reduced overall timing constraint. t t Actual execution time specification approx. execution time Actual execution time New specification approx. execution time 1. Iteration 2004-2005 2. Iteration Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -68- Application example Audio lab (mixer, fader, echo, equalizer, balance units) • slow SPARC processor • 1µ ASIC library • Allowable delay of 22.675 µs (~ 44.1 kHz) SPARC processor ASIC (Compass, 1 µ) External memory Outdated technology; just a proof of concept. 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -69- Design space for audio lab Everything in software: Everything in hardware: Lowest cost for given sample rate: 2004-2005 72.9 µs, 0 2 3.06 µs, 457.9x106 2 18.6 µs, 78.4x106 2, Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -70- Final remarks COOL approach: shows that formal model of hardware/SW codesign is beneficial; IP modeling can lead to useful implementation even if optimal result is available only for small designs. Other approaches for HW/SW partitioning: starting with everything mapped to hardware; gradually moving to software as long as timing constraint is met. starting with everything mapped to software; gradually moving to hardware until timing constraint is met. Binary search. 2004-2005 Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen -71-