Timing Verification of Real-Time Systems - A Window is Closing - Reinhard Wilhelm Universität des Saarlandes DS Adaptive Isolation for Predictability and Security My Message -2- • Hard real-time embedded systems stay with us, even increasingly – autonomous driving • Timing verification for high-performant platforms has been possible and has been practiced from roughly 2001 till today • It is made impossible by new architectural developments • The problem remains, the potential to solve it disappears • What is the alternative? Structure of the Talk • The problem: determining bounds on execution times • Increasing complexity by architectural developments • Predictability research – where is our impact? • The PROMPT vision -3- Deriving Run-Time Guarantees for Hard Real-Time Systems -4- The simplest problem statement: Given 1. an uninterrupted, terminating software to produce a reaction, 2. a (single-core) hardware platform, on which to execute the software, 3. a required reaction time. Derive: a guarantee for timeliness Complexity increased by preemptive scheduling, more complex architectures, e.g. multi-core platforms Goal: Efficiently and precisely predictable good worst-case performance -5- Timing Analysis • • – – Sounds methods determine upper bounds for all execution times, can be seen as the search for a longest path, through different types of graphs, through a huge space of paths. 1. I will show how this huge state space originates. 2. How and how far we can cope with this huge state space. Decidability is not the problem! - It’s Complexity! -6- Timing Analysis – the Search HistoricSpace • All control-flow paths (through the binary executable) – depending on the possible inputs. • Feasible as search for a longest path if – Iteration and recursion are bounded, – Execution time of instructions are (positive) constants. • Timing schema (Shaw’91) for induction over the structure of the program Input Software Architecture (constant execution times) -7- High-Performance Microprocessors • increase (average-case) performance by using: Caches, Pipelines, Branch Prediction, Speculation • These features make timing analysis difficult: Execution times of instructions vary widely – Best case - everything goes smoothly: no cache miss, operands ready, resources free, branch correctly predicted – Worst case - everything goes wrong: all loads miss the cache, resources are occupied, operands not ready – Span may be several hundred cycles -8- Variability of Execution Times x = a + b; LOAD r2, _a LOAD r1, _b ADD r3,r2,r1 PPC 755 Execution Time (Clock Cycles) In most cases, execution will be fast. So, assuming the worst case is safe, but very pessimistic! 350 300 250 200 Clock Cycles 150 100 50 0 Best Case Worst Case State-dependent Execution Times • Execution time of an instruction is a function of the execution state timing schemata no more applicable. • Execution state results from the execution history. -9- state semantics state: values of variables execution state: occupancy of resources Timing Analysis – the Search Space with State-dependent Execution Times • all control-flow paths – depending on the possible inputs • all paths through the architecture for potential initial states execution states for paths reaching this program point instruction in I-cache instruction not in I-cache mul rD, rA, rB - 10 - Input Software initial state Architecture 1 small operands 1 bus occupied bus not occupied ≥ 40 large operands 4 Timing Analysis – the Search Space with out-of-order execution • all control-flow paths – depending on the possible inputs • all paths through the architecture for potential initial states • including different schedules for instruction sequences - 11 - Input Software initial state Architecture Timing Analysis – the Search Space with multi-threading • all control-flow paths – depending on the possible inputs • all paths through the architecture for potential initial states • including different schedules for instruction sequences • including different interleavings of accesses to shared resources - 12 - Input Software initial state Architecture Timing Accidents and Penalties Timing Accident – cause for an increase of the execution time of an instruction Timing Penalty – the associated increase • Types of timing accidents – – – – – – Cache misses Pipeline stalls Branch mispredictions Bus collisions Memory refresh of DRAM TLB miss - 13 - - 14 - Our Approach • Static Analysis of Programs for their behavior on the execution platform • computes invariants about the set of all potential execution states at all program points, • the execution states result from the execution history, • static analysis explores all execution histories state semantics state: values of variables execution state: occupancy of resources Deriving Run-Time Guarantees - 15 - • Our method and tool derives Safety Properties from these invariants : Certain timing accidents will never happen. Example: At program point p, instruction fetch will never cause a cache miss. • The more accidents excluded, the lower the upper bound. Murphy’s invariant Fastest Variance of execution times Slowest Architectural Complexity implies Analysis Complexity - 16 - Every hardware component whose state has an influence on the timing behavior • must be conservatively modeled, • contribute to the size of the search space, most of the time exponentially in some architectural parameters • Exception: Caches – some have good abstractions providing for highly precise analyses (LRU), cf. Diss. of J. Reineke – some have abstractions with compact representations, but not so precise analyses Recipes for Success • Abstraction: identify abstract domains that are – precise and – efficient • Decomposition: separate different aspects of the semantics and use precomputation - 17 - Abstraction and Decomposition - 18 - Components with domains of states C1, C2, … , Ck Analysis has to track domain C1 C2 … Ck Start with the powerset domain 2 C1 C2 … C k Find an abstract domain C1# Find abstractions C11# and C12# transform into C1# 2 C2 … Ck factor out C11# and transform rest into 2 C12# C2… Ck This has worked for caches and cache-like devices. program This has worked for the arithmetic of the pipeline. C11# value analysis program with annotations 2 C12# … C k microarchitectural analysis Analyzability - 19 - • M. Lv, N. Guan, J. Reineke, R.Wilhelm, W. Yi: A Survey on Static Cache Analysis for Real-Time Systems. LITES 3(1): 05:1-05:48 (2016) explains several different abstract domains for cache analysis • S.Hahn, J.Reineke, R.Wilhelm: Toward Compact Abstractions for Processor Pipelines. Correct System Design 2015: 205-220 shows how to obtain a compact domain for pipeline analysis and how tog get rid of timing anomalies - 20 - State Space Explosion in Timing Analysis concurrency + shared resources preemptive scheduling out-of-order execution state-dependent execution times constant execution times years + ~1995 ~2000 methods Timing schemata Static analysis 2010+ ??? ARM Cortex R5F - an architecture for real-time? - - 21 - The ARM Cortex R5F processor “provides a high-performance solution for real-time applications” and provides “simplified certification effort with the optional Safety Documentation Package for standards such as ISO 26262 and IEC 61508, and enable higher levels of certification to be obtained”, according to ARM. 22 - 22 - Cortex-R5F Predicatability Issues features of the design that limit the predictability of the R5F-based TMS570 in real-time systems 23 - 23 - Cortex-R5F Predictability Issues • Random replacement caches + L2 memories – average performance better than cache-less TCMs on earlier Cortex-R4F-based TMS570 variants – 0-cycle hit vs. high cache miss latency leads to runtime variability – static predictability reduced from a 4-way to a 1-way associativity cache (replaced way is random) – locking one or multiple ways is not supported so that critical code or data regions cannot be guaranteed to hit the cache 24 - 24 - Cortex-R5F Predictability Issues • Branch prediction – Complex branch outcome and loop prediction – Can be switched to static prediction • Decoupled writes – – – – 2-entry Store Queue (SQ), 4-entry Store Buffer (STB) STB can delay a single write from 64 upto 128 cycles STB can merge multiple writes to a single access L2 memory interface can merge multiple accesses from STB to a burst – SQ buffers writes if STB is full 25 - 25 - L2 AXI Master Port up to 7 outstanding reads, up to 4 outstanding writes out-of-order access handling runs at slower bus clock speed, which is synchrounous to the CPU core clock What is the alternative to sound static timing analysis? • Measurement-based methods – have soundness problems – don’t get the necessary trace data off the platform - 26 - Taking Constructive Influence - the PROMPT Approach - - 27 - • Multi-core implementations of many embedded systems require mapping applications to cores – one point of attack. Traditional System Design Process Selection of the execution platform One application as a set of tasks Software development Timing Analysis No Schedulability Analysis Yes - 28 - System Design Process with Integration of Applications Design of distributed execution platform Several applications as sets of sets of tasks Software Software development Software development development Timing Timing Analysis Timing Analysis Analysis No Integration: Mapping and Schedulability Yes - 29 - Application Domains I - 30 - • Architectures for safety- and time-critical avionics and automotive systems • system characteristics: – – – – – – – combination of control loops and finite-state control each control loop fully contained in one application little shared code global (finite) state partly shared between applications; state transitions influence control parameters, control loops trigger state transitions reading from and writing to shared state happens only at the beginning and at the end of task activations – some applications require high performance, but share little with the control applications - 31 - Application Domains II • Similar integration trends, IMA and AUTOSAR, integrating applications on powerful platforms instead of 1-application-per-platform/ECU • More complex development process – Mapping a set of applications to nodes of a platform. • Goal is Composability: timing behavior of one task is independent of that of the other tasks integrated on the same platform. – IMA: incremental qualification, i.e. modification of one application integrated with a set of other applications only requires recertification of the modified component. “Total” Task Isolation - 32 - • IMA attempts to realize total task isolation by – Spatial partitioning – one task does not access a memory area or device assigned to another task – Temporal partitioning – the execution of one task must not have an effect on the timing behavior of another task • These brick walls – are too thick, i.e. entail too much performance loss, – have holes, i.e. cannot be realized on complex processor architectures Dealing with Shared Resources - 33 - Alternatives: • Avoiding them, • Bounding their effects on timing variability The PROMPT Principle: Architecture Follows Application - 34 - Starting with a generic multi-node architecture, the PROMPT architecture, • parametric in the ISAs, the hierarchy of “nodes”, the memory hierarchies, the interconnect, etc. • nodes may be – atomic processing units with their private resources or – if performance requires with shared resources, • nodes on each hierarchy level should be predictable • we start with predictable cores, i.e., fully compositional architectures - 35 - The PROMPT Design Process The generic PROMPT architecture is instantiated for a given set of applications with their resource requirements The design process works in multiple phases 1. hierarchical privatization 2. sharing of lonely resources 3. controlled socialization Principles for the PROMPT Architecture and Design Process - 36 - • No shared resources where not needed for performance, • Harmonious integration of applications: not introducing interferences on shared resources not existing in the applications. The PROMPT System Design Process Generic PROMPT architecture Software development Core Design Sets of applications as sets of set of tasks Implement Timing Analysis Analysis of Applications Timing Analysis Multi-core Design Derivation of Timing Guarantees - 37 - I N S T A N T I A T I O N Steps of the Design Process 1. – – – – 2. 3. • • Hierarchical privatization - 38 - decomposition of the set of applications according to the sharing relation on the global state allocation of private resources for non-shared code and state allocation of the shared global state to non-cached memory, e.g. scratchpad, sound (and precise) determination of delays for accesses to the shared global state Sharing of lonely resources – seldom accessed resources, e.g. I/O devices Controlled socialization introduction of sharing to reduce costs controlling loss of predictability - 39 - Sharing of Lonely Resources • Costly lonely resources will be shared. • Accesses rate is low compared to CPU and memory bandwidth. • The access delay contributes little to the overall execution time because accesses happen infrequently. PROMPT Design Principles for Predictable Systems - 40 - • reduce interference on shared resources in architecture design • avoid introduction of interferences in mapping application to target architecture Applied to Predictable Multi-Core Systems • Private resources for non-shared components of applications • Deterministic regime for the access to shared resources Some Relevant Publications from my Group • • • • • • • • • • • • - 41 - C. Ferdinand et al.: Cache Behavior Prediction by Abstract Interpretation. Science of Computer Programming 35(2): 163-189 (1999) C. Ferdinand et al.: Reliable and Precise WCET Determination of a Real-Life Processor, EMSOFT 2001 R. Heckmann et al.: The Influence of Processor Architecture on the Design and the Results of WCET Tools, IEEE Proc. on Real-Time Systems, July 2003 St. Thesing et al.: An Abstract Interpretation-based Timing Validation of Hard Real-Time Avionics Software, IPDS 2003 L. Thiele, R. Wilhelm: Design for Timing Predictability, Real-Time Systems, Dec. 2004 R. Wilhelm: Determination of Execution Time Bounds, Embedded Systems Handbook, CRC Press, 2005 St. Thesing: Modeling a System Controller for Timing Analysis, EMSOFT 2006 J. Reineke et al.: Predictability of Cache Replacement Policies, Real-Time Systems, 2007 R. Wilhelm et al.:The Determination of Worst-Case Execution Times - Overview of the Methods and Survey of Tools. ACM Transactions on Embedded Computing Systems (TECS) 7(3), 2008. R.Wilhelm et al.: Memory Hierarchies, Pipelines, and Buses for Future Architectures in Time-critical Embedded Systems, IEEE TCAD, July 2009 M. Lv, N. Guan, J. Reineke, R.Wilhelm, W. Yi: A Survey on Static Cache Analysis for Real-Time Systems. LITES 3(1): 05:1-05:48 (2016) S.Hahn, J.Reineke, R.Wilhelm: Toward Compact Abstractions for Processor Pipelines. Correct System Design 2015: 205-220
© Copyright 2026 Paperzz