dc.description.abstract |
Modern architectures become more vulnerable to soft errors with technology scaling. Enabling fault tolerance capabilities on all cache structures in a system is ine cient in terms of performance and power consumption. In this study, we propose an enhanced protection mechanism for code segments, which are critical in terms of reliability, by utilizing asymmetrically reliable cores under performance and power constraints. Our proposed system contains at least one high-reliability core, which has an ECC-protected L1 cache, and several low-reliability cores, which have no protection mechanisms. Our framework protects only reliability-based critical code regions of each application, which are determined based on critical data usage, user annotations, or static analysis. In our rst attempt, the framework dynamically assigns the software threads executing critical code fragments to the protected core(s) by using the First Come First Served (FCFS) algorithm. Our experimental evaluation shows that the proposed approach takes advantage of protecting only critical code regions and presents comparable performance and reliability results with fully protected systems having lower power consumption and cost values for a set of applications. However, the FCFS-based scheduling algorithm may degrade the system performance and unfairly slow down applications for some workloads. Therefore, a set of scheduling algorithms is proposed to improve both the system performance and fairness perspectives. Various static priority techniques that require preliminary information about the applications and dynamic priority techniques that target to equalize the total time spent of applications on the protected core(s) are presented as part of this thesis. Extensive evaluations using multi-application workloads validate signi cant improvements of proposed scheduling techniques on system performance and fairness over the FCFS algorithm. |
|