Flip-flops Definitie van het tijdsgedrag

advertisement
Exploratie van de ontwerpruimte
2. De Hardware/software-grens
Prof. dr. ir. Dirk Stroobandt
Academiejaar 2004-2005
Inhoud (deel 1)
Inleiding over Ingebedde systemen,
System-on-Chip en Platformgebaseerd ontwerp
Systeemspecificatietechnieken
Exploratie van de ontwerpruimte
– Prestatiematen
– De hardware/software-grens
•
•
•
•
Raamwerk voor architectuurexploratie
Hoogniveautransformaties
Hardware/software-partitionering
Exploratietools
– Prototypes, emulatie en simulatie
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-2-
Ontwerptraject
Architectuurexploratie
Verificatie
Systeemspecificatie
Platformontwerp
Analoog
ontwerp
HW
en
Hardware/software-partitionering
SW
Hardware-ontwerp
Communicatie
Softwarecompilatie
Logisch ontwerp
Componentselectie
Simulatie
Hoogniveausynthese
Interfacesynthese
Fysisch ontwerp
Testing
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-3-
Inhoud (deel 1)
Inleiding over Ingebedde systemen,
System-on-Chip en Platformgebaseerd ontwerp
Systeemspecificatietechnieken
Exploratie van de ontwerpruimte
– Prestatiematen
– De hardware/software-grens
•
•
•
•
Raamwerk voor architectuurexploratie
Hoogniveautransformaties
Hardware/software-partitionering
Exploratietools
– Prototypes, emulatie en simulatie
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-4-
Raamwerk voor
architectuurexploratie
• Gegeven: initiële specificatie
– Beschrijving van het systeemgedrag op
algoritmeniveau
– Een set van vereisten (snelheid, oppervlakte,
vermogen, …)
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-5-
Raamwerk voor
architectuurexploratie
• Gevraagd: systeemarchitectuurspecificatie
– Definitie van architecturale componenten en
interfaces
– Eerste partitionering van de algoritmische
beschrijving in segmenten die elk door een andere
architecturale component geïmplementeerd zullen
worden
– Een grondplan van de systeemarchitectuur
– Budgetten voor tijd, oppervlakte, vermogen voor de
architecturale componenten
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-6-
Raamwerk voor
architectuurexploratie
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-7-
Iteratieve aanpak
• Geleid door
prestatieanalyse
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-8-
Hoe vind je heuristisch goede
oplossingen?
• Types van hardware hulpbronnen
– Hulpbronnen inherent aan het algoritme
• Functionele eenheden (vermenigvuldigers, optellers, …)
• Redelijk goed te schatten op systeemniveau
– Hulpbronnen voor implementatie-overhead
• Al de rest (controlelogica, bussen, MUXen, registers, draden)
• Niet (nauwelijks) te schatten op systeemniveau (zeker niet
voor dedicated hardware)
• Oplossing: taxonomie op basis van potentieel om
specifieke eigenschappen te gebruiken die een
minimale implementatie-overhead genereren
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-9-
Voorbeelden van eigenschappen met
weinig implementatie-overhead
• Lokaliteit van berekeningen
– Hoge lokaliteit = berekeningen zijn geïsoleerd, hebben
een sterk geïnterconnecteerde deelgraaf
– Veel meer data tussen knopen binnen de cluster dan
naar buiten
• Regelmatigheid
– Herhaalde berekening van bepaalde functiepatronen
– Hergebruik van berekeningselementen
• Korte tijdsafhankelijkheden
– Liefst lokaliteit in de tijd: vermijden van globale bussen
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-10-
Inhoud (deel 1)
Inleiding over Ingebedde systemen,
System-on-Chip en Platformgebaseerd ontwerp
Systeemspecificatietechnieken
Exploratie van de ontwerpruimte
– Prestatiematen
– De hardware/software-grens
•
•
•
•
Raamwerk voor architectuurexploratie
Hoogniveautransformaties
Hardware/software-partitionering
Exploratietools
– Prototypes, emulatie en simulatie
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-11-
Voorbereidende taken
• Task level concurrency management
Which tasks in the final system?
• High level transformations
Transformations that are outside the scope of
traditional compilers
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-12-
Task-level concurrency management
• Granularity: size of tasks (e.g. in instructions)
• Readable specifications and efficient
implementations can possibly require different
task structures.
 Granularity changes
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-13-
Merging of tasks
• Reduced overhead of context switches,
• More global optimization of machine code,
• Reduced overhead for inter-process/task
communication.
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-14-
Splitting of tasks
• No blocking of resources while waiting for input,
• More flexibility for scheduling, possibly improved
result.
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-15-
Merging and splitting of tasks
• The most appropriate task graph granularity
depends upon the context  merging and
splitting may be required.
• Merging and splitting of tasks should be done
automatically, depending upon the context.
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-16-
High-level optimizations
• To improve (potentially) the efficiency of
embedded software and hardware
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-17-
Floating-point to fixed point
conversion
• Pros
–
–
–
–
–
Lower cost
Faster
Lower power consumption
Sufficient SNR, if properly scaled
Suitable for portable applications
• Cons
– Decreased dynamic range
– Finite word-length effect, unless
properly scaled
• Overflow and excessive quantization
noise
– Extra programming effort
© Ki-Il Kum, et al. (Seoul National
University): A Floating-point To Fixed-point
C Converter For Fixed-point Digital Signal
Processors, 2nd SUIF Workshop, 1996
-18-
Fixed-Point Data Format
• Floating-Point vs. FixedPoint
– exponent, mantissa
– Floating-Point
• automatic computation and
update of each exponent at
run-time
•
Integer vs. Fixed-Point
S 1 0 0
. . .
0 0 0 0 1 0
(a) Integer
IWL=3
FWL
– Fixed-Point
• implicit exponent
• determined off-line
S 1 0 0
. . .
0 0 0 0 1 0
hypothetical binary point
(b) Fixed-Point
© Ki-Il Kum, et al
-19-
Assignment and Addition/Subtraction
• Assume y = x, with
- x (IWL=2) and
- y (IWL=3):
x
x>>1
Let result = x + y:
equalizing each IWL
x
s
x>>1
s
s
s
+
y
y s
s
result
s
© Ki-Il Kum, et al
-20-
Development Procedure
Floating-Point
C Program
Range
Estimator
FloatingPoint to
Fixed-Point
C Program
Converter
Fixed-Point
C Program
Range Estimation
C Program
Execution
Manual
specification
IWL
information
-21-et al
© Ki-Il Kum,
Performance Comparison
- Machine Cycles Cycles
4000
Fourth Order IIR Filter
2980
3000
2000
1000
215
0
Fixed-Point (16b)
Floating-Point
© Ki-Il Kum, et al
-22-
Performance Comparison
- Machine Cycles Cycles
ADPCM
140000
125249
120000
100000
80000
61401
60000
40000
26718
20000
0
Fixed-Point (16b) Fixed-Point (32b)
Floating-Point
© Ki-Il Kum, et al
-23-
Performance Comparison
- SNR Fixed-Point (16b)
Fixed-Point (32b)
Floating-Point
ADPCM
SNR (dB)
25
20
15
10
5
0
A
B
C
D
© Ki-Il Kum, et al
-24-
Fridge
• RWTH Aachen, commercialized by Synopsys as
part of the CoCentric tool suite.
• Used type definition features of C++ to define types
Fixed and fixed.
• Using types in declarations: fixed a, *b, c[8]
• Defining types in assignments: a= fixed(5,4,wt,*b)
truncation
Word-length
Wrap-around
Fractional wordlength
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-25-
Other work on the topic
• Fridge (RWTH Aachen), commercialized by
Synopsys
• Some support in Simulink (MATLAB toolbox)
• .. hundreds of papers on the topic.
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-26-
Simple loop transformations:
Loop permutation
• Row major
order
• Array p[j][k]
• Column major
order
…
j=0
j=1
j=2
k=0
j=0
j=1
…
k=1
j=0
j=1
…
k=2
j=0
j=1
…
…
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-27-
Simple loop transformations:
Loop permutation
Two loops, assuming row major order (C):
for (k=0; k<=m; k++)
for (j=0; j<=n; j++)
for (j=0; j<=n; j++) )
for (k=0; k<=m; k++)
p[j][k] = ...
p[j][k] = ...
Poor cache behavior
Good cache behavior
• For row major
order
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-28-
Loop fusion, loop fission
for(j=0; j<=n; j++)
p[j]= ... ;
for (j=0; j<=n; j++) ,
p[j]= p[j] + ...
for (j=0; j<=n; j++)
{p[j]= ... ;
p[j]= p[j] + ...}
Loops small enough to
allow zero overhead
loops
Better locality for
access to p.
Better chances for
parallel execution.
Which of the two versions is best?
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-29-
Loop unrolling
for (j=0; j<=n; j++)
p[j]= ... ;
for (j=0; j<=n; j+=2)
{p[j]= ... ;
p[j+1]= ...} % factor = 2
Less branches per
execution of the loop.
More opportunities for
optimizations.
Tradeoff between code
size and improvement.
Extreme case:
completely unrolled loop
(no branch).
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-30-
Loop tiling/loop blocking
Original version
for (i=1; i<=N; i++)
for(k=1; k<=N; k++){
r=X[i,k]; /* to be allocated to a register*/
for (j=1; j<=N; j++)
Z[i,j] += r* Y[k,j]
} % Never reusing information in the cache for Y and Z
if N is large or cache is small (2 N³ references for Z
and Y + N² references for X).
i++
k++
k++
j++
j++
i++
j++
k++
i++
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-31-
Loop tiling/loop blocking
tiled version
for (kk=1; kk<= N; kk+=B)
Reuse factor of B
for Z and Y,
for (jj=1; jj<= N; jj+=B)
O(N³/B) accesses
for (i=1; i<= N; i++)
to main memory
for (k=kk; k<= min(kk+B-1,N); k++){
r=X[i][k]; /* to be allocated to a register*/
for (j=jj; j<= min(jj+B-1, N); j++)
Z[i][j] += r* Y[k][j]
Same elements for
next iteration of i
}
kk
jj
k++
j++
k++
jj
i++
i++
kk
kk
i++
j++
jj
k++, j++
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-32-
High-level transformations
Example: Separation of margin handling
many ifstatements for
margin-checking
2004-2005
no checking,
efficient
+
only few
margin
elements to
be processed
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-33-
Loop nest splitting
MPEG-4 full search motion estimation
for (z=0; z<20; z++)
for (x=0; x<36; x++) {x1=4*x;
for (y=0; y<49; y++) {y1=4*y;
for (k=0; k<9; k++) {x2=x1+k-4;
for (l=0; l<9; ) {y2=y1+l-4;
for (i=0; i<4; i++) {x3=x1+i; x4=x2+i;
for (j=0; j<4;j++) {y3=y1+j; y4=y2+j;
if (x3<0 || 35<x3||y3<0||48<y3)
then_block_1; else else_block_1;
if (x4<0|| 35<x4||y4<0||48<y4)
then_block_2; else else_block_2;
}}}}}}
if (x>=10||y>=14)
for (y=0; y<49; y++)
for (k=0; k<9; k++)
for (l=0; l<9;l++ )
for (i=0; i<4; i++)
for (j=0; j<4;j++) {
then_block_1; then_block_2}
else {y1=4*y;
for (k=0; k<9; k++) {x2=x1+k-4;
for (l=0; l<9; ) {y2=y1+l-4;
for (i=0; i<4; i++) {x3=x1+i; x4=x2+i;
for (j=0; j<4;j++) {y3=y1+j; y4=y2+j;
if (0 || 35<x3 ||0 || 48<y3)
analysis of polyhedral domains,
then-block-1; else else-block-1;
selection with genetic algorithm
if (x4<0|| 35<x4||y4<0||48<y4)
for (z=0; z<20; z++)
then_block_2; else else_block_2;
for (x=0; x<36; x++) {x1=4*x;
}}}}}}
for (y=0; y<49; y++)
[H. Falk et al., Inf 12, UniDo, 2002]
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-34-
Results for loop nest splitting
- Execution times 110%
Cavity
Motion Estimation
QSDPCM
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
2004-2005
7
M
AR
7
M
AR
Av
er
ag
e
ar
m
b
th
m
6x
TI
C
Tr
iM
ed
ia
Po
w
er
PC
DE
C
Al
ph
a
IP
S
M
HP
Pe
nt
iu
m
Su
n
0%
[H. Falk et al., Inf 12, UniDo, 2002]
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-35-
Results for loop nest splitting
- Code sizes 200%
Cavity
Motion Estimation
QSDPCM
180%
160%
140%
120%
100%
80%
60%
40%
20%
2004-2005
7
M
AR
7
M
AR
Av
er
ag
e
ar
m
b
th
m
6x
TI
C
Tr
iM
ed
ia
Po
w
er
PC
DE
C
Al
ph
a
IP
S
M
HP
Pe
nt
iu
m
Su
n
0%
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
[Falk, 2002]
-36-
Array folding
• Initial
arrays
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-37-
Array folding
• Unfolded
arrays
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-38-
Intra-array
folding
Inter-array
folding
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-39-
Function inlining:
advantages and limitations
Advantage: low calling
overhead
Limitations:
• Not all functions are
candidates.
• Code size explosion.
• Requires manual
identification using
‘inline’ qualifier.
push PC;
push b;
BRA sq;
pull R1;
Function sq(c:integer) mul R1,R1,R1;
pull R2;
return:integer;
push R1;
begin
BRA (R2)+1;
Goal:
return c*c
pull R1;
• Controlled code size
end;
ST R1,a;
....
• Automatic identification of
a=sq(b);
....
suitable functions.
....
LD R1,b;
MUL R1,R1,R1;
ST R1,a
....
Dirk Stroobandt:
Ontwerpmethodologie van Complexe Systemen
2004-2005
-40-
# relative to no inlining .
Results for GSM speech and channel
encoder: #calls, #cycles (TI ‘C62xx)
100
90
80
70
60
50
40
30
20
10
0
100
105
110
115
120
125
130
135
140
145
150
calls
L [%]
cycles
33% speedup for 25% increase in code size.
# of cycles not a monotonically decreasing function of the code size!
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-41-
Inline vectors computed by B&B
algorithm
size limit (%)
100
105
110
115
120
125
130
135
140
145
150
inline vector (functions 1-26)
00000000000000000000000000
00100000001100001110111111
10111001011100001111111111
10110000000001001000111001
10110100101000100110111101
10110000001010000100111101
00110000000010100100111000
10110010001110101110111101
10111011111110101111111111
10110110101010100110111101
10110110000010110110111101
Major changes for each new size limit. Difficult to generate manually.
References:
• J. Teich, E. Zitzler, S.S. Bhattacharyya. 3D Exploration of Software
Schedules for DSP Algorithms, CODES’99
• R. Leupers, P.Marwedel: Function Inlining under Code Size Constraints
for Embedded Processors ICCAD, 1999
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-42-
Inhoud (deel 1)
Inleiding over Ingebedde systemen,
System-on-Chip en Platformgebaseerd ontwerp
Systeemspecificatietechnieken
Exploratie van de ontwerpruimte
– Prestatiematen
– De hardware/software-grens
•
•
•
•
Raamwerk voor architectuurexploratie
Hoogniveautransformaties
Hardware/software-partitionering
Exploratietools
– Prototypes, emulatie en simulatie
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-43-
Hardware/software partitioning
No need to consider special purpose hardware in the long run?
Maybe correct for fixed functionality, wrong in general, since
“By the time MPEG-n can be implemented in software, MPEGn+1 has been invented” [de Man]
Functionality to be implemented in software or in hardware?
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-44-
Functionality to be implemented
in software or in hardware?
Decision
based on
hardware/
software
partitioning,
a special
case of
hardware/
software
codesign.
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-45-
Codesign Tool (COOL)
as an example of HW/SW partitioning
Inputs to COOL:
1. Target technology
• Available HW platforms
• Multiprocessors OK but all of same type
• ASIC: synthesizable (technology library)
2. Design constraints
• Throughput, latency, memory size, area
3. Required behavior
• Hierarchical task graphs
• Behaviour specified in VHDL
• Communication edges and timing edges
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-46-
Hardware/software codesign:
approach
Specification
Mapping
Processor
P1
Processor
P2
Hardware
[Niemann, Hardware/Software Co-Design for Data Flow Dominated Embedded Systems, Kluwer
Academic Publishers, 1998 (Comprehensive mathematical model)]
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-47-
Steps of the COOL partitioning
algorithm (1)
1. Translation of the behavior into an internal graph model
2. Translation of the behavior of each node from VHDL into C
3. Compilation
•
•
•
All C programs compiled for the target processor,
Computation of the resulting program size,
estimation of the resulting execution time
(simulation input data might be required)
4. Synthesis of hardware components:
 leaf node, application-specific hardware is
synthesized.
High-level synthesis must be sufficiently fast!
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-48-
Steps of the COOL partitioning
algorithm (2)
5. Flattening of the hierarchy:
Granularity used by the designer is maintained.
Cost and performance information added to the nodes.
Precise information required for partitioning is precomputed
6. Generating and solving a mathematical model of the
optimization problem:
Integer programming IP model for optimization.
Optimal with respect to the cost function
(approximates communication time)
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-49-
Steps of the COOL partitioning
algorithm (3)
7. Iterative improvements:
Adjacent nodes mapped to the same hardware
component are now merged.
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-50-
Steps of the COOL partitioning
algorithm (4)
8. Interface synthesis:
After partitioning, the glue logic required for interfacing
processors, application-specific hardware and memories is
created.
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-51-
Integer programming models
Ingredients:
1. Cost function
2. Constraints
Cost function
Involving linear expressions of
integer variables from a set X
C
a x
x i X
Constraints:j  J :
b
i, j
i
i
with ai R, x i  ℕ
N (1)
xi  c j with bi , j , c j ℝ
R (2)
x i X
Def.: The problem of minimizing (1) subject to the constraints
(2) is called an integer programming (IP) problem.
If all xi are constrained to be either 0 or 1, the IP problem said
to be a 0/1 integer programming problem.
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-52-
Example
C  5 x1  6 x2  4 x3
x1  x 2  x3  2
x1, x 2 , x3  {0,1}
C
Optimal
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-53-
Remarks on integer programming
Maximizing the cost function can be done by
setting C‘=-C
Integer programming is NP-complete.
In practice, running times can increase
exponentially with the size of the problem, but
problems of some thousands of variables can
still be solved with commercial solvers,
depending on the size and structure of the
problem.
IP models can be a good starting point for
modeling, even if in the end heuristics have to
be used
solve
them.
Stroobandt:
Ontwerpmethodologie van Complexe Systemen
2004-2005to Dirk
-54-
An IP model for HW/SW partitioning
Notation:
 Index set I denotes task graph nodes.
 Index set L denotes task graph node types
e.g. square root, DCT or FFT
 Index set KH denotes hardware component types.
e.g. hardware components for the DCT or the FFT.
 Index set J of hardware component instances
 Index set KP denotes processors.
All processors are assumed to be of the same type
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-55-
An IP model for HW/SW partitioning
 Xi,k: =1 if node vi is mapped to hardware component
type k  KH and 0 otherwise.
 Yi,k: =1 if node vi is mapped to processor k  KP and 0
otherwise.
 NYl,k =1 if at least one node of type l is mapped to
processor k  KP and 0 otherwise.
 T is a mapping from task graph nodes to their types:
T: I L
 The cost function accumulates the cost of hardware
units:
C = cost(processors) + cost(memories) +
cost(application specific hardware)
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-56-
Constraints
 Operation assignment constraints
i  I :
X
kKH
i ,k

Y
i ,k
1
kKP
All task graph nodes have to be mapped either in
software or in hardware.
Variables are assumed to be integers.
Additional constraints to guarantee they are either 0 or 1:
i  I : k  KH : X i ,k  1
i  I : k  KP : Yi ,k  1
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-57-
Operation assignment constraints (2)
lL,  i:T(vi)=cl,  k  KP: NYl,k  Yi,k
For all types l of operations and for all nodes i of this
type:
if i is mapped to some processor k, then that processor
must implement the functionality of l.
Decision variables must also be 0/1 variables:
lL,  k  KP: NYl,k  1.
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-58-
Resource & design constraints
•  k  KH, the cost (area) used for components of that type
is calculated as the sum of the costs of the components of
that type. This cost should not exceed its maximum.
•  k  KP, the cost for associated data storage area should
not exceed its maximum.
•  k  KP the cost for storing instructions should not exceed
its maximum.
• The total cost (k  KH) of HW components should not
exceed its maximum
• The total cost of data memories (k  KP) should not exceed
its maximum
• The total cost instruction memories (k  KP) should not
exceed its maximum
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-59-
Scheduling
v1
v2
v3
v4
Processor
e3
v5
v6
e4
v7
FIR1
p1
FIR2
ASIC h1
v8
Communication channel c1
v9
v10
v11
2004-2005
FIR2 on h1
p1
c1
... v ... v
3
4
... v ... v
7
8
...e ...e
3
4
or
... v ... v
4
3
or
... v ... v
8
7
or
...e ...e
4
3 t
t
t
Scheduling / precedence constraints
• For all nodes vi1 and vi2 that are potentially
mapped to the same processor or hardware
component instance, introduce a binary decision
variable bi1,i2 with
bi1,i2=1 if vi1 is executed before vi2 and
= 0 otherwise.
Define constraints of the type
(end-time of vi1)  (start time of vi2) if bi1,i2=1 and
(end-time of vi2)  (start time of vi1) if bi1,i2=0
• Ensure that the schedule for executing operations
is consistent with the precedence constraints in
the task graph.
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-61-
Other constraints
Timing constraints
These constraints can be used to guarantee that
certain time constraints are met.
Some less important constraints omitted ..
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-62-
Example
HW types H1, H2 and H3 with
costs of 20, 25, and 30.
Processors of type P.
Tasks T1 to T5.
Execution times:
T
1
2
3
4
5
2004-2005
H1
20
H2
H3
20
12
12
20
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
P
100
100
10
10
100
-63-
Operation assignment constraints (1)
T
1
2
3
4
5
H1
20
H2
H3
20
12
12
20
P
100
100
10
10
100
i  I :
X
i ,k
kKH

Y
i ,k
1
kKP
X1,1+Y1,1=1 (task 1 mapped to H1 or to P)
X2,2+Y2,1=1
X3,3+Y3,1=1
X4,3+Y4,1=1
X5,1+Y5,1=1
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-64-
Operation assignment constraints (2)
Assume types of tasks are l=1, 2, 3, 3, and 1.
lL,  i:T(vi)=cl,  k  KP: NYl,k  Yi,k
Functionality 3 to be
implemented on
processor if node 4 is
mapped to it.
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-65-
Other equations
Time constraints leading to: Application
specific hardware required for time
constraints under 100 time units.
T
1
2
3
4
5
H1
20
H2
H3
20
12
12
20
P
100
100
10
10
100
Cost function:
C=20 #(H1) + 25 #(H2) + 30 # (H3) + cost(processor)
+ cost(memory)
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-66-
Result
For a time constraint of 100 time units and cost(P)<cost(H3):
T
1
2
3
4
5
H1
20
H2
H3
20
12
12
20
P
100
100
10
10
100
Solution (educated guessing) :
T1  H1
T2  H2
T3  P
T4  P
T5  H1
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-67-
Separation of scheduling and
partitioning
Combined scheduling/partitioning very complex;  Heuristic:
1. Compute estimated schedule
2. Perform partitioning for estimated schedule
3. Perform final scheduling
4. If final schedule does not meet time constraint, go to 1 using a
reduced overall timing constraint.
t
t
Actual execution time
specification
approx. execution time
Actual execution time
New specification
approx. execution time
1. Iteration
2004-2005
2. Iteration
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-68-
Application example
Audio lab (mixer, fader, echo, equalizer, balance units)
• slow SPARC processor
• 1µ ASIC library
• Allowable delay of 22.675 µs (~ 44.1 kHz)
SPARC
processor
ASIC
(Compass,
1 µ)
External
memory
Outdated technology; just a proof of concept.
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-69-
Design space for audio lab
Everything in software:
Everything in hardware:
Lowest cost for given sample rate:
2004-2005
72.9 µs,
0 2
3.06 µs, 457.9x106 2
18.6 µs, 78.4x106 2,
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-70-
Final remarks
COOL approach:
shows that formal model of hardware/SW codesign is
beneficial; IP modeling can lead to useful implementation
even if optimal result is available only for small designs.
Other approaches for HW/SW partitioning:
starting with everything mapped to hardware; gradually
moving to software as long as timing constraint is met.
starting with everything mapped to software; gradually
moving to hardware until timing constraint is met.
Binary search.
2004-2005
Dirk Stroobandt: Ontwerpmethodologie van Complexe Systemen
-71-
Download