Basics of Product Development

advertisement
Predictable Design
of
Embedded Systems
using
Networked Architectures
Henk Corporaal
www.ics.ele.tue.nl/~heco
ASCI Winterschool on Embedded Systems
Rockanje, March 2006
Outline
 Trends and design problems
 Unpredictability
 Platforms
 Predictable design
 Proposed design flow
 Open issues
Note: this lecture is not about a solved problem
ASCI Winterschool 2006
Henk Corporaal
(2)
Outline
 Trends and design problems
 Embedded systems everywhere
 Design practice
 Design complexity
 Memory wall
 Unpredictability
 Platforms
 Predictable design
 Design flow
 Open issues
ASCI Winterschool 2006
Henk Corporaal
(3)
Embedded systems everywhere
 Convergence of 3 Cs
computers, communications and
consumer electronics
 The computer enters the 3rd fase
computing power - networking - intelligent
processing
 The world is 1 network
wherever, whenever, all information and
communication available
We get a smart environment
ASCI Winterschool 2006
Henk Corporaal
(4)
Design practice:
Informal system specification
System Task
people
Task
Task
Paper spec
Hardware vhdl
people verilog
C
ASM
Software
people
Integration
ASCI Winterschool 2006
Henk Corporaal
(5)
Design practice
Behavioral
specification
System
Algorithm
Structure
description
R/T
Logic
circuit
Y-Chart (Gajski-Kuhn)
 Design Flow is path in Y chart
Physical
realization
 Till RT-level largely manual flow
ASCI Winterschool 2006
Henk Corporaal
(6)
Design complexity problem
complexity
Process technology + 58%
103
102
HW gap
HW design productivity +21 %
SW gap
101
SW productivity + 8 %
4
ASCI Winterschool 2006
8
12
16
year
Henk Corporaal
(7)
Hitting the memory wall
Performance
µProc:
55%/yea
r
1000
10
Processor-Memory
Performance Gap:
(grows 50% / year)
CPU
100
“Moore’s Law”
DRAM:
7%/year
DRAM
1
1980
1985
1990
1995
2000
2005
Time
[Patterson]
ASCI Winterschool 2006
Henk Corporaal
(8)
Outline
 Trends and design problems
 Unpredictability
 Platforms
 Predictable design
 Proposed design flow
 Open issues
ASCI Winterschool 2006
Henk Corporaal
(9)
Unpredictability at all levels
applications
architectures
DSM VLSI design
Uncertainty increases at all levels
ASCI Winterschool 2006
Henk Corporaal
(10)
Application: Two forms of unpredictability
mem
Txt
Video
In1
Video
In2
NR
NR
HSRC
HSRC
gen
VSRC
VSRC
mix
100Hz
mem
HSRC
Peak
Matrix
VSRC
mix
mem
resources
 Applications can be data dependent
 Applications may have different
scenarios
time
ASCI Winterschool 2006
Henk Corporaal
(11)
In addition: dynamic changing set of
applications
Multi-standard modem operation
 Several applications have to be activated simultaneously

Too many combinations for an analysis at design time (non
deterministic events)
[Philips EVP]
SCH = SCH search
SCH
100
SCH
CPICH search
Compute load 
125
75
50
25
SCH
Initial
acquisition
ASCI Winterschool 2006
SCH
Inter-system
handover
SCH
CPICH search
SCH
CPICH search
RAKE
chip-rate
processing
RAKE
chip-rate
processing
RAKE sym-rate proc.
RAKE sym-rate proc.
WLAN acquisition
UMTS
connected
UMTS connected/
WLAN acquisition
SCH
CPICH search
WLAN receiver
WLAN connected/
UMTS monitoring
time
Henk Corporaal
(12)
Architecture unpredictability
ext.
mem
mem
arb.
Local schedulers:

cpu $
OS


task switching
interrupts
IP
 interconnect


busses, bridges
networks
 memory controllers

IP
…
IP
external memory
e.g. RR, TDMA, FCFS,
LRU, EDLF, FIFO,
priority, …
IP
IP
…
IP
IP
IP
…
IP
IP
IP
interconnect
cache pollution
IP
interconnect

IP
interconnect
 cache strategy
$ cpu
IP
…
IP
IP
What is the global behavior (end-to-end),
composed of interacting local solutions ?
ASCI Winterschool 2006
Henk Corporaal
(13)
DSM VLSI Unpredictability
 Global wiring delay becomes dominant over gate delay
(timing closure)
Gate delay vs. wire delay
400
350
300
ps
250
wire delay (ps/mm)
200
gate delay (ps)
150
100
50
0
0.5
0.35
0.25
0.18
0.13
0.1
technology (micron)
ASCI Winterschool 2006
Henk Corporaal
(14)
DSM VLSI Unpredictability
Length of
Isosynchronous zone
as function of frequency
Other DSM problems:
 Clock distribution, skew
 VDD and VSS voltage drop
 Signal integrity, cross-talk
 Variance in process parameters increases
ASCI Winterschool 2006
Henk Corporaal
(15)
Unpredictability: Design Closure problems
Design closure =
 a realization meets all
requirements, including
functionality, speed, power,
area, yield, etc.,
without design iterations
application
mapping &
scheduling
architecture
placement &
routing
Closure problem
at all levels
ASCI Winterschool 2006
FPGA realization
VLSI realization
Henk Corporaal
(16)
Computational Requirements →
Unpredictability: Design Closure problems
1200%
1000%
800%
600%
400%
Orders of
Magnitude
200%
0%
Time →
Mapping with performance guarantees looks impossible !!
ASCI Winterschool 2006
Henk Corporaal
(17)
Solution ingredients:
 Higher abstraction levels
 SW and HW IP reuse / PnP principle
 Standards
 Avoid large design iterations
 Design correct by synthesis
 Avoid worst case resource requirements
How do we achieve all of this?
ASCI Winterschool 2006
Henk Corporaal
(18)
Outline
 Trends and design problems
 Unpredictability
 Platforms
 Predictable design
 Design flow
 Open issues
ASCI Winterschool 2006
Henk Corporaal
(19)
What is a platform?
Definition:
A platform is a generic, but domain specific
information processing (sub-)system
• Generic means that it is flexible, containing programmable
component(s).
• Platforms are meant to quickly realize your next system
(in a certain domain).
• Single chip?
ASCI Winterschool 2006
Henk Corporaal
(20)
Platforms, why?
- Reuse
- Short Time-to-Market
- High Quality
•
•
•
•
•
Flexible and Programmable
Large software component
Standardization
Optimized for specific domain
and you do not have to solve this design closure problem !!
ASCI Winterschool 2006
Henk Corporaal
(21)
Platforms separate the design communities !
SDT
system design
technology
PDT
platform design
technology
Design technology
Applications
Platform
Enabling technologies
ASCI Winterschool 2006
Henk Corporaal
(22)
Platform examples: Digital camera
Sanyo [Okada99]
ASCI Winterschool 2006
Henk Corporaal
(23)
TI OMAP
Up to 192Mbyte off-chip memory
192Kbyte shared SRAM
8Kb data cache (2-way,
512 lines of 16 bytes)
Write buffer (17 elements)
16Kb (2-way)
16Kb (2-way)
8Kb mem (2x 4K)
64Kb dual port (8x 4K x 16b)
96Kb single port (12x 4k x 16b)
32Kb ROM
ASCI Winterschool 2006
Henk Corporaal
(24)
SpaceCake (Philips research)
 Homogeneous: set of equal tiles
 Per tile e.g.:
 n * MIPS
 m * TriMedia
 Accelerators
 k * L2 Cache bank
 Shared memory
 Cache coherency
 Big interconnect switch
switch
L2 cache memory banks
 Inter Tile:
 Router
 Message passing
 Working on inter tile cache coherence
ASCI Winterschool 2006
Single tile
Henk Corporaal
(25)
IMAGINE Stream Processor (Stanford)
IMAGINE = SIMD of VLIWs
It is controlled by a host processor, which send it stream
instructions (Load, store, receive, send, VLIW op, load microcode)
ASCI Winterschool 2006
Henk Corporaal
(26)
Hybrid FPGAs: Xilinx Virtex 4-Pro
GHz IO: Up to 16 serial transceivers
PowerPCs
Memory blocks &
Multipliers
PowerPC
ReConfig.
logic
Reconfigurable logic
blocks
Courtesy of Xilinx (Virtex II Pro)
ASCI Winterschool 2006
Henk Corporaal
(27)
Fundamental platform design decisions
 Homogeneous versus Heterogeneous ?
 Bus versus Network ?
 Shared memory versus Message passing ?
 QoS support, Guarantees built-in ?
 Generic versus Application specific ?
 What types of parallelism to support ?
 ILP, DLP, TLP
 Focus on Performance, Power or Cost ?
 Memory organisation ?
 HW or SW reconfigurable ?
And further:
 OS support, Middleware ?
 Mapping support?
ASCI Winterschool 2006
Henk Corporaal
(28)
Homogeneous or Heterogeneous
 Homogenous:
 replication effect
 memory dominated any way
 solve realization issues
once and for all
 less flexible
ASCI Winterschool 2006
Henk Corporaal
(29)
Homogeneous or Heterogeneous
 Heterogeneous
 more flexible
 better fit to application domain
 smaller increments
 no tile reuse
ASCI Winterschool 2006
Henk Corporaal
(30)
Homogeneous or Heterogeneous
 Middle of the road approach


Flexibile tiles
Fixed tile structure at top level
tile
router
ASCI Winterschool 2006
Henk Corporaal
(31)
Reconfiguration time
HW or SW reconfigurable?
reset
FPGA
Spatial mapping
loopbuffer
context
Temporal mapping
Subword parallelism
1 cycle
fine
ASCI Winterschool 2006
Data path granularity
VLIW
coarse
Henk Corporaal
(32)
Outline
 Trends and design problems
 Unpredictability
 Platforms
 Predictable design
 Current practise
 Predictability
 Architecture consequences
 Design consequences
 Design flow
 Open issues
ASCI Winterschool 2006
Henk Corporaal
(33)
How should we design ?
 Trajectory, from Idea to Realization
 Desicions based on models
 Abstract from implementation details (not all known yet)
 Relatively cheap to create, validate and simulate
Idea
Concepts
Requirements
Design Problem
• Generate Ideas
Design Time
• Construct Models
“Steers”
• Evaluate Properties
• Make Design Decisions
Realization
ASCI Winterschool 2006
Henk Corporaal
(34)
Current practice
Mapping, easy, but...........
 Given
 reference C code for application
e.g. MPEG-4 Motion Estimation
 platform: SUPERDUPER-LX50
Idea
a=b*5+d;
for (...)
{..
}
 Task
 map application on architecture
 But … wait a moment
me@work> CC –o2 mpeg4_me mpeg4_me.c
Thank you for running SUPERDUPER-LX50
compiler.
Your program uses 257321886 bytes
memory, 78 Watt, 428798765291 clock
cycles
ASCI Winterschool 2006
Henk Corporaal
(35)
Current design process
application
mapping
constraints
OK ?
yes
 Post analysis: check constraints after mapping
no
 Simulation based
 Does it still work for other data ?
 Does it still work when other applications are active ?
 Too many iterations
 Easy to program, hard to tune
 Can this be improved ?
 e.g. Constraints = input
ASCI Winterschool 2006
Henk Corporaal
(36)
Predictable design
What is it?
 Being able to reason at a high level about a design (in terms of
functional and non-functional properties) and
 Being able to realize this design without time consuming
iterations in the design flow (design closure)
How:
 Predictable architecture
 Making resources predictable
 Proper modeling of less predictable elements
 Predictable design flow
 Compositionality
 Composability
 Design time analysis  Run time analysis
ASCI Winterschool 2006
Henk Corporaal
(37)
Making architectures predictable
 Getting rid of all unpredictable elements
 Caches ?
 No problem, but WCET estimation may be big and
unacceptable !
 Software controlled



locked cache lines
non-cachable memory
controlled replacement
 Shared memory
 Communication
ASCI Winterschool 2006
Henk Corporaal
(38)
Making architectures predictable: NoC
Philips AETHEREAL
Router provides both
guaranteed throughput
(GT) and best effort
(BE) services to
communicate with IPs.
Router
Network
Combination of GT and
BE leads to efficient use
of bandwidth and simple
programming model.
R
IP
ASCI Winterschool 2006
Network
Interface
R
R
R
R
R
R
R
R
Network
Interface
IP
Network
Interface
IP
Henk Corporaal
(39)
Making the NoC predictable:
how to support GT traffic?
Time wheel concept
 control injection traffic at network interface
8
7
2
6
3
5
ASCI Winterschool 2006
time
1
4
Henk Corporaal
(40)
Making the design flow predictable :
Compositionality
High level
design
a
b
y
x
z
P(x,y) if [P(a,b),...] !
Low level
design
a
b
y
x
z
P(x,y) if [P(a,b),...] ?
ASCI Winterschool 2006
Henk Corporaal
(41)
Making the design flow predictable
 Design time
 Determine of upper bounds on time and resources
pareto curves

Scenario discovery:

Freq
separate your application in parts for which upper bounds
not too far from worst case
Sc1
Sc2
Sc3
Load
ASCI Winterschool 2006
Henk Corporaal
(42)
What do we want ? Design time analysis
Single application



Reasoning about end-to-end timing constraints (for given
resources and quality) = predictability
Which local arbitration mechanisms are needed ?
How to translate this to the global level ?
Example:



Given
 Comp. Resources
 Bandwidth
 Buffer size
Throughput
 Pareto curve
A5
A1
P1
A2
P2
A4
A3
P3
P4
1/Throughput
(q1,c1)
ASCI Winterschool 2006
Cost (resources)
Henk Corporaal
(43)
Scenarios: MP3
ASCI Winterschool 2006
Henk Corporaal
(44)
What do we want ? Composability
 Multiple applications
 If app. 1 and app. 2 fit each individually, what can be said about
the combination ?
 Concept of virtual platform
A1
A2
Proc1
A3
ASCI Winterschool 2006
Proc2
A4
Henk Corporaal
(45)
Predictability: Composability
Can we add Pareto points?
application 1
application 2
Q
Q
(q1,c1)
(q2,c2)
Cost (resources)
Cost (resources)
+
(q1+q2,c1+c2) ?
ASCI Winterschool 2006
Henk Corporaal
(46)
Problem: Predictable Resource utilization?
50
A
50
50
50
B
50
50
Mapping & Scheduling
P1
ASCI Winterschool 2006
P2
P3
Henk Corporaal
(47)
Problem – Predictable Resource utilization?
50
A
50
50
50
B
50
50
Add ordering
dependences (edges)
P1
A
P2
B
P3
t0 t1
t2
Only 50%
processor
utilization !
t3
Scheduling conflict!
ASCI Winterschool 2006
Henk Corporaal
(48)
Where is the problem?
 Different throughput obtained for different order of
actors
 Possibilities of overall graph increases exponentially
with number of actors and individual graphs
 Very difficult to do a complete analysis to obtain an
optimal order
 Hard to model and analyze different arbitration
strategies realistically
ASCI Winterschool 2006
Henk Corporaal
(49)
Problem – Too many possibilities!
3
A
3
3
3
B
1
5
3
5
C
1
ASCI Winterschool 2006
Henk Corporaal
(50)
So, what is Composability?
 The degree to which we can analyze the applications
in isolation:

Throughput, Latency, Resource utilization, Deadlock,
Switching / reconfiguration overhead, etc.
 Design time analysis for complete system is too
expensive and often infeasible
 Each job should be executed as if it had access to its
own dedicated resources – Virtualization
 Consider applications separately and then reason
about the behavior of overall system
ASCI Winterschool 2006
Henk Corporaal
(51)
Providing a Bound for Resources
 Arbitration strategy plays an important role in
determining resource requirement
 A naive strategy leads to over-estimation of resources
 Worst-case estimate is not always possible
 Need predictable arbitration mechanism
 More ‘realistic’ worst case bounds
 Handle dynamism in the system
 An overall quality versus resources Pareto curve
needed
ASCI Winterschool 2006
Henk Corporaal
(52)
Making the design flow predictable:
Run-time aspects
 Scalable applications
 QoS management
Application n
Application n / Scenario m
Local manager
Local manager
QoS protocol
Global manager
Platform
ASCI Winterschool 2006
Henk Corporaal
(53)
Quality-1 →
Match quality with resources
Computational Requirements →
ASCI Winterschool 2006
Henk Corporaal
(54)
Outline
 Trends and design problems
 Unpredictability
 Platforms
 Predictable design
 Design flow
 Open issues
ASCI Winterschool 2006
Henk Corporaal
(55)
Design flow
Idea
C
Requirements spec
Models
Spec
Reactive Process Network
POOSL/SystemC
Kahn Process Network (YAPI)
BDF
SDF
correct by
synthesis
Platform
ASCI Winterschool 2006
Henk Corporaal
(56)
RPN (Reactive Process Networks):
events and streaming
Event_in
• Processing of events
•Finite State Machine
• Controlling host-CPU (e.g. ARM)
• RTOS; hard real-time
• ‘classical’ SW complexity
mode
Stream_in
ASCI Winterschool 2006
• Soft Real-time
• Compute intensive
• Special hardware
Event_out
status
Stream_out
Henk Corporaal
(57)
POOSL Modeling Language
 Mathematically defined semantics
 Allows formal analysis of model properties
 Can formally describe:
 concurrency
 synchronous communication
 timing (delay statements)

functionality
P1
P2
delay 1;
ASCI Winterschool 2006
Henk Corporaal
(58)
POOSL: Phases of Model Execution
State space
State space
State space
Synchronous
time passage
Asynchronous
actions execution
model
time
ASCI Winterschool 2006
Henk Corporaal
(59)
From Model to Realization
a
S1
delay d1
S2
b
S3
S5
c
Possible execution (timed) traces:
delay d2
S4
S6
(S1, t1), (S2, t1), (S3, t1+d1), (S5, t1+d1)
(S1, t1), (S2, t1), (S4, t1+d2), (S6, t1+d2)
a()();
(S1, t1), (S2, t1+wcet(a)), (S3, t1+d1),
(S5, t1+d1+wcet(b))
(S1, t1), (S2, t1+wcet(a)),
(S4, t1+wcet(a)+wcet(c)), (S6, t1+d2)
ASCI Winterschool 2006
sel
delay d1; b()();
or
c()(); delay d2;
les;
Henk Corporaal
(60)
-Hypothesis: property preservation
 If the time-deviation between two timed execution
traces is less than , then, if one trace satisfies a realtime property, that property, weakened upto , is
preserved in the second one as well
a
d1
b
Model
time
t1
t2
d1 - ε1
t’1
ASCI Winterschool 2006
ε1, ε2 < ε
t’2
a
b
t’1 + ε1
t’2 + ε2
Physical
time
Henk Corporaal
(61)
Extending SDF
SADF: Scenario Aware Data Flow
 Can deal with dynamism
 Still possible to reason about
 deadlock,
 resource utilization,
 latency and throughput
 Currently implemented in POOSL
ASCI Winterschool 2006
Henk Corporaal
(62)
SADF example: MPEG-2 Decoder
 Pipelined MPEG-2 decoder for I and P frames
d
 VLD and IDCT fire per macro-block
VLD
 MC and RC fire per frame
a
1
 FD (frame detector) models control part of VLD
that determines frame type
b
c c
 Image size = 176x144
1
 I-frame
 99 macro-blocks
 No motion vectors
 Px-frame
 x macro-blocks
 Motion vectors from VLD to MC
 Previous frame from RC to MC
 P0-frame (still video)
 Copy previous frame
 FD model based on occurrence
probability of frame types
 Execution time distributions of
kernels determined with profiling tool
ASCI Winterschool 2006
d
1
IDCT
d
1
1
1
MC
1
1
FD
1
1
1
1
e
RC
1
3
Rate
I
P0
Px
a
0
0
1
b
0
0
x
c
99
1
x
d
1
0
1
ex = {30, 40,
9950 ,60, 70,
0 80, 99} x
Henk Corporaal
(63)
Results for MPEG-2 Decoder
Time unit = 1 kCycle
Process
Throughput
VLD
0.063
rel. error ≤ 0.036%
IDCT
0.063
rel. error ≤ 0.036%
MC
0.00106
rel. error ≤ 0.190%
RC
0.00106
rel. error ≤ 0.191%
Average Latency between
Successive Firings
Accuracy results based on
confidence levels of 0.95
Process
Max. Latency between
Successive Firings
Variance in Latency between
Successive Firings
VLD
710
15.99
rel. error ≤ 0.031%
75.38
rel. error ≤ 0.18%
IDCT
698
15.99
rel. error ≤ 0.031%
56.45
rel. error ≤ 4.99%
MC
3305
940.3
rel. error ≤ 0.017%
2.4·105
rel. error ≤ 3.46%
RC
2216
940.3
rel. error ≤ 0.017%
1.5·105
rel. error ≤ 4.99%
Channel Memory
between Processes
Maximum
Occupancy
VLD and IDCT
9
1.910
rel. error ≤ 0.064%
0.528
rel. error ≤ 1.99%
IDCT and RC
154
60.19
rel. error ≤ 0.178%
671.8
rel. error ≤ 4.55%
VLD and MC
133
34.73
rel. error ≤ 0.517%
698.4
rel. error ≤ 4.39%
MC and RC
1
0.577
rel. error ≤ 0.561%
0.244
rel. error ≤ 3.27%
ASCI Winterschool 2006
Time-Average Occupancy
Time-Variance in Occupancy
Henk Corporaal
(64)
Design flow
 Run-time
 Combine pareto points


exploit pareto algebra
QoS management / scalable application
ASCI Winterschool 2006
Henk Corporaal
(65)
Mapping multiple jobs
T0
T1
T2
Multiple jobs can be active simultaneously.
 When can a second job start ?
 Are the requested resources available ?
 If not, can the quality level be lowered ?
 If not, can other jobs go for a lower
quality ?
 If yes, independent from other jobs ?
 How to give guarantees?
resources
100%
time
reconfiguration
ASCI Winterschool 2006
Henk Corporaal
(66)
Combining Pareto points
Cost
Application 1
80
Cost
100 Cycle Budget
Cycle Budget
+
Cost
ASCI Winterschool 2006
Application 2
•A new thread frame coming
•20 cycle budgets available
Application 3
Cycle Budget
Henk Corporaal
(67)
Combining Pareto points
Cost
Application 1
80
Cost
Application 2
100 Cycle Budget
Cycle Budget
Cost
Application 3
feasible,
but optimal?
20
ASCI Winterschool 2006
Cycle Budget
Henk Corporaal
(68)
Combining Pareto points
Cost
Application 1
Application 2
Cost
cost increase
1
80
80 100 Cycle Budget
Cycle Budget
Cost
Application 3
cost decrease
and
2 > 1
20
ASCI Winterschool 2006
40
a better
solution
Cycle Budget
Henk Corporaal
(69)
Outline
 Trends and design problems
 Unpredictability
 Platforms
 Predictable design
 Design flow
 Open issues
ASCI Winterschool 2006
Henk Corporaal
(70)
Open issues
 Gap between specification and architecture modeling
 High level modeling
 use of modeling pattern library
 Incorporate multiple pareto solutions into DSE
 Pareto Algebra
 Get synthesis correct for
 control applications including compute intensive tasks
 mapping to multi-processor
 Managing QoS
 Scenario detection, merging, prediction and exploitation
 Runtime resource manager optimizing overall quality
 Measuring overall quality
ASCI Winterschool 2006
Henk Corporaal
(71)
Open issues (cont'd)
 Architecture modeling
 how to deal with local memory (scratch pad / cache)
 Modeling scheduling and arbitration
 make things composable !
 Definition NAL (run-time services)
 Automatic partitioning
 e.g., SPRINT tool of IMEC is a good start (C to SystemC)
 VLSI tiling
 …. and many more …..
e.g. see: Ogras e.a.: Key research problems in NoC Design
A holistic perspective
CODES – ISSS 2005
ASCI Winterschool 2006
Henk Corporaal
(72)
ASCI Winterschool 2006
Henk Corporaal
(73)
Download