Integrated with an improved architectural vulnerability factor (AVF) computing model,a new architectural level soft error reliability analysis framework,SS-SERA (soft error reliability analysis based on SimpleScalar),...Integrated with an improved architectural vulnerability factor (AVF) computing model,a new architectural level soft error reliability analysis framework,SS-SERA (soft error reliability analysis based on SimpleScalar),was developed.SS-SERA was used to estimate the AVFs for various on-chip structures accurately.Experimental results show that the AVFs of issue queue (IQ),register update units (RUU),load store queue (LSQ) and functional unit (FU) are 38.11%,22.17%,23.05% and 24.43%,respectively.For address-based structures,i.e.,level1 data cache (L1D),DTLB,level2 unified cache (L2U),level1 instruction cache (L1I) and ITLB,AVFs of their data arrays are 22.86%,27.57%,14.80%,8.25% and 12.58%,lower than their tag arrays' AVFs which are 30.01%,28.89%,17.69%,10.26% and 13.84%,respectively.Furthermore,using the AVF values obtained with SS-SERA,a qualitative and quantitative analysis of the AVF variation and predictability was performed for the structures studied.Experimental results show that the AVF exhibits significant variations across different structures and workloads,and is influenced by multiple microarchitectural metrics and their interactions.Besides,AVFs of SPEC2K floating point programs exhibit better predictability than SPEC2K integer programs.展开更多
This paper presents a new method for soft error detection using software redundancy (SEDSR) that is able to detect transient faults. Soft errors damage the control flow and data of programs and designers usually use h...This paper presents a new method for soft error detection using software redundancy (SEDSR) that is able to detect transient faults. Soft errors damage the control flow and data of programs and designers usually use hardware-based solutions to handle them. Software-based techniques for soft error detection force less cost and delay to systems and do not change their configuration. Therefore, these kinds of methods are appropriate alternatives for hardware-based techniques. SEDSR has two separate parts for data and control flow errors detection. Fault injection method is used to compare SEDSR with previous methods of this field based on the new parameter of “Evaluation Factor” that takes in account fault coverage, memory and performance overheads. These parameters are important in real time safety critical applications. Experimental results on SPEC2000 and some traditional benchmarks of this field show that SEDSR is much better than previous methods of this field. SEDSR’s evaluation factor is about 50% better than other methods of this field. These results show its success in satisfaction of the existing tradeoff between fault coverage, performance and memory overheads.展开更多
High-energy particles in the space can easily cause soft error in register file(RF).As a critical structure in a processor,RF often stores data for long periods of time and is read frequently,resulting in a higher pro...High-energy particles in the space can easily cause soft error in register file(RF).As a critical structure in a processor,RF often stores data for long periods of time and is read frequently,resulting in a higher probability of spreading corrupted data to other parts of the processor.The triple modular redundancy(TMR)is a common and effective fault tolerance method that enables multi-bit error correction.Designing full TMR for all the registers could cause excessive area and power overheads.However,some registers in RF have less impact on processor reliability.Therefore,there is no need to design TMR for them.This paper designs an efficient strategy which can rate the registers in RF based on their vulnerability.Based on the proposed strategy,a new RF fault tolerance method named Partial-TMR formulates in this paper,which selectively protects more vulnerable registers against multi-bit error,and improves fault tolerance efficiency.For integer RF,Partial-TMR improves its soft error correction capability by 24.5%relative to the baseline system and 3%relative to ParShield,while for floating-point RF,the improvement comes to 5.17%and 0.58%respectively.The soft error correction capability of Partial-TMR is slightly lower than that of full TMR by 1%to 3%,but Partial-TMR significantly cuts the area and power overheads.Compared with full TMR,Partial-TMR decreases the area and power overheads by 71.6%and 64.9%,respectively.It also has little impact on the performance.Partial-TMR is a more cost-effective fault tolerance method compared with ParShield and full TMR.展开更多
With the decrease of the device size,soft error induced by various particles becomes a serious problem for advanced CMOS technologies.In this paper,we review the evolution of two main aspects of soft error-SEU and SET...With the decrease of the device size,soft error induced by various particles becomes a serious problem for advanced CMOS technologies.In this paper,we review the evolution of two main aspects of soft error-SEU and SET,including the new mechanisms to induced SEUs,the advances of the MCUs and some newly observed phenomena of the SETs.The mechanisms and the trends with downscaling of these issues are briefly discussed.We also review the hardening strategies for different types of soft errors from different perspective and present the challenges in testing,modeling and hardening assurance of soft error issues we have to address in the future.展开更多
Subsequently to the problem of performance and energy overhead, the reliability problem of the system caused by soft error has become a growing concern. Since register file(RF) is the hottest component in processor, i...Subsequently to the problem of performance and energy overhead, the reliability problem of the system caused by soft error has become a growing concern. Since register file(RF) is the hottest component in processor, if not well protected, soft errors occurring in it will do harm to the system reliability greatly. In order to reduce soft error occurrence rate of register file, this paper presents a method to reallocate the register based on the fact that different live variables have different contribution to the register file vulnerability(RFV). Our experimental results on benchmarks from MiBench suite indicate that our method can significantly enhance the reliability.展开更多
We first study the impacts of soft errors on various types of CAM for different feature sizes.After presenting a soft error immune CAM cell,SSB-RCAM,we propose two kinds of reliable CAM,DCF-RCAM and DCK-RCAM. In addit...We first study the impacts of soft errors on various types of CAM for different feature sizes.After presenting a soft error immune CAM cell,SSB-RCAM,we propose two kinds of reliable CAM,DCF-RCAM and DCK-RCAM. In addition,we present an ignore mechanism to protect dual cell redundancy CAMs against soft errors.Experimental results indicate that the 11T-NOR CAM cell has an advantage in soft error immunity.Based on 11T-NOR,the proposed reliable CAMs reduce the SER by about 81%on average with acceptable overheads.The SER of dual cell redundancy CAMs can also be decreased using the ignore mechanism in specific applications.展开更多
Due to the decreasing threshold voltages, shrinking feature size, as well as the exponential growth of on-chip transistors, modern processors are increasingly vulnerable to soft errors. However, traditional mechanisms...Due to the decreasing threshold voltages, shrinking feature size, as well as the exponential growth of on-chip transistors, modern processors are increasingly vulnerable to soft errors. However, traditional mechanisms of soft error mitigation take actions to deal with soft errors only after they have been detected. Instead of the passive responses, this paper proposes a novel mechanism which proactively prevents from the occurrence of soft errors via architecture elasticity.In the light of a predictive model, we adapt the processor architectures holistically and dynamically. The predictive model provides the ability to quickly and accurately predict the simulation target across different program execution phases on any architecture configurations by leveraging an artificial neural network model. Experimental results on SPEC CPU 2000benchmarks show that our method inherently reduces the soft error rate by 33.2% and improves the energy efficiency by18.3% as compared with the static configuration processor.展开更多
Reliability is expected to become a big concern in future deep sub-micron integrated circuits design.Soft error rate(SER) of combinational logic is considered to be a great reliability problem.Previous SER analysis an...Reliability is expected to become a big concern in future deep sub-micron integrated circuits design.Soft error rate(SER) of combinational logic is considered to be a great reliability problem.Previous SER analysis and models indicated that glitch width has a great impact on electrical masking and latch window masking effects,but they failed to achieve enough insights.In this paper,an analytical glitch generation model is proposed.This model shows that after an inflexion point the collected charge has an exponential relationship with glitch duration and the model only introduces an estimation error of on average 2.5%.展开更多
Single event effects(SEEs) induced by radiations become a significant challenge to the reliability for modern electronic systems. To evaluate SEEs susceptibility for microelectronic devices and integrated circuits(ICs...Single event effects(SEEs) induced by radiations become a significant challenge to the reliability for modern electronic systems. To evaluate SEEs susceptibility for microelectronic devices and integrated circuits(ICs), an SEE testing system with flexibility and robustness was developed at Heavy Ion Research Facility in Lanzhou(HIRFL). The system is compatible with various types of microelectronic devices and ICs, and supports plenty of complex and high-speed test schemes and plans for the irradiated devices under test(DUTs). Thanks to the combination of meticulous circuit design and the hardened logic design, the system has additional performances to avoid an overheated situation and irradiations by stray radiations. The system has been tested and verified by experiments for irradiating devices at HIRFL.展开更多
In accordance with the difficult problems of belt cross vibrations and effects of belt tension on machine spindle precision in abrasive belt grinding, a new soft grinding wheel is put forward, which is provided with t...In accordance with the difficult problems of belt cross vibrations and effects of belt tension on machine spindle precision in abrasive belt grinding, a new soft grinding wheel is put forward, which is provided with the advantages of belt grinding and can he installed directly on the grinding machine spindle substituting for common grinding wheels. The new soft grinding wheel does not need any ancillary facilities and dressing devices in banding. With analyzing error of wheel and grinding experiment, the high-efficiency grinding characteristics grinding hard-brittle materials has been obtained.展开更多
To predict the soft error rate for applications, it is essential to study the energy dependence of the single-event-upset(SEU) cross-section. In this work, we present a direct measurement of the SEU cross-section with...To predict the soft error rate for applications, it is essential to study the energy dependence of the single-event-upset(SEU) cross-section. In this work, we present a direct measurement of the SEU cross-section with the Back-n white neutron source at the China Spallation Neutron Source. The measured cross section is consistent with the soft error data from the manufacturer and the result suggests that the threshold energy of the SEU is about 0.5 Me V, which confirms the statement in Iwashita’s report that the threshold energy for neutron soft error is much below that of the(n, α) cross-section of silicon.In addition, an index of the effective neutron energy is suggested to characterize the similarity between a spallation neutron beam and the standard atmospheric neutron environment.展开更多
With the development of semiconductor technology,the size of transistors continues to shrink.In complex radiation environments in aerospace and other fields,small-sized circuits are more prone to soft error(SE).Curren...With the development of semiconductor technology,the size of transistors continues to shrink.In complex radiation environments in aerospace and other fields,small-sized circuits are more prone to soft error(SE).Currently,single-node upset(SNU),double-node upset(DNU)and triple-node upset(TNU)caused by SE are relatively common.TNU’s solution is not yet fully mature.A novel and low-cost TNU self-recoverable latch(named NLCTNURL)was designed which is resistant to harsh radiation effects.When analyzing circuit resiliency,a double-exponential current source is used to simulate the flipping behavior of a node’s stored value when an error occurs.Simulation results show that the latch has full TNU self-recovery.A comparative analysis was conducted on seven latches related to TNU.Besides,a comprehensive index combining delay,power,area and self-recovery—DPAN index was proposed,and all eight types of latches from the perspectives of delay,power,area,and DPAN index were analyzed and compared.The simulation results show that compared with the latches LCTNURL and TNURL which can also achieve TNU self-recoverable,NLCTNURL is reduced by 68.23%and 57.46%respectively from the perspective of delay.From the perspective of power,NLCTNURL is reduced by 72.84%and 74.19%,respectively.From the area perspective,NLCTNURL is reduced by about 28.57%and 53.13%,respectively.From the DPAN index perspective,NLCTNURL is reduced by about 93.12%and 97.31%.The simulation results show that the delay and power stability of the circuit are very high no matter in different temperatures or operating voltages.展开更多
A new interferometer for extreme ultraviolet (EUV) radiation with a laser produced plasma (LPP) laboratory source is under construction. The LPP source is operated with a Sn solid rod target on which pulsed YAG laser ...A new interferometer for extreme ultraviolet (EUV) radiation with a laser produced plasma (LPP) laboratory source is under construction. The LPP source is operated with a Sn solid rod target on which pulsed YAG laser is focused to produce high temperature plasma emitting EUV radiation. The source is equipped with a newly designed debris stopper protecting a condenser multilayer mirror from the particle debris of the target. The condenser mirror focuses the light onto an EUV beam-splitter to form transmitted and reflected paths for producing interference fringes of a sharing type. The optical configuration is of a common path based on a triangular path type with a focusing at the beam-splitter, which is enabled to produce fringes by a low coherence radiation with a standard optical quality beam-splitter. The fringes are recorded by an imaging plate with pixels as small as 25μm. The dynamic range of linearity in detection of the EUV light was found to be more than 10 4 with sensitivity of 10 4 photons/pixel, enough for the purpose of interferogram recording, possibly with one laser shot.展开更多
基金Projects(60970036,60873016,61170045)supported by the National Natural Science Foundation of ChinaProjects(2009AA01Z102,2009AA01Z124)supported by the National High Technology Development Program of China
文摘Integrated with an improved architectural vulnerability factor (AVF) computing model,a new architectural level soft error reliability analysis framework,SS-SERA (soft error reliability analysis based on SimpleScalar),was developed.SS-SERA was used to estimate the AVFs for various on-chip structures accurately.Experimental results show that the AVFs of issue queue (IQ),register update units (RUU),load store queue (LSQ) and functional unit (FU) are 38.11%,22.17%,23.05% and 24.43%,respectively.For address-based structures,i.e.,level1 data cache (L1D),DTLB,level2 unified cache (L2U),level1 instruction cache (L1I) and ITLB,AVFs of their data arrays are 22.86%,27.57%,14.80%,8.25% and 12.58%,lower than their tag arrays' AVFs which are 30.01%,28.89%,17.69%,10.26% and 13.84%,respectively.Furthermore,using the AVF values obtained with SS-SERA,a qualitative and quantitative analysis of the AVF variation and predictability was performed for the structures studied.Experimental results show that the AVF exhibits significant variations across different structures and workloads,and is influenced by multiple microarchitectural metrics and their interactions.Besides,AVFs of SPEC2K floating point programs exhibit better predictability than SPEC2K integer programs.
文摘This paper presents a new method for soft error detection using software redundancy (SEDSR) that is able to detect transient faults. Soft errors damage the control flow and data of programs and designers usually use hardware-based solutions to handle them. Software-based techniques for soft error detection force less cost and delay to systems and do not change their configuration. Therefore, these kinds of methods are appropriate alternatives for hardware-based techniques. SEDSR has two separate parts for data and control flow errors detection. Fault injection method is used to compare SEDSR with previous methods of this field based on the new parameter of “Evaluation Factor” that takes in account fault coverage, memory and performance overheads. These parameters are important in real time safety critical applications. Experimental results on SPEC2000 and some traditional benchmarks of this field show that SEDSR is much better than previous methods of this field. SEDSR’s evaluation factor is about 50% better than other methods of this field. These results show its success in satisfaction of the existing tradeoff between fault coverage, performance and memory overheads.
文摘High-energy particles in the space can easily cause soft error in register file(RF).As a critical structure in a processor,RF often stores data for long periods of time and is read frequently,resulting in a higher probability of spreading corrupted data to other parts of the processor.The triple modular redundancy(TMR)is a common and effective fault tolerance method that enables multi-bit error correction.Designing full TMR for all the registers could cause excessive area and power overheads.However,some registers in RF have less impact on processor reliability.Therefore,there is no need to design TMR for them.This paper designs an efficient strategy which can rate the registers in RF based on their vulnerability.Based on the proposed strategy,a new RF fault tolerance method named Partial-TMR formulates in this paper,which selectively protects more vulnerable registers against multi-bit error,and improves fault tolerance efficiency.For integer RF,Partial-TMR improves its soft error correction capability by 24.5%relative to the baseline system and 3%relative to ParShield,while for floating-point RF,the improvement comes to 5.17%and 0.58%respectively.The soft error correction capability of Partial-TMR is slightly lower than that of full TMR by 1%to 3%,but Partial-TMR significantly cuts the area and power overheads.Compared with full TMR,Partial-TMR decreases the area and power overheads by 71.6%and 64.9%,respectively.It also has little impact on the performance.Partial-TMR is a more cost-effective fault tolerance method compared with ParShield and full TMR.
基金supported by the National Natural Science Foundation of China(Grant No.11175138)the Specialized Research Fund for the Doctoral Program of Higher Education of China(Grant No.20100201110018)+1 种基金the Key Program of the National Natural Science Foundation of China(Grant No.11235008)the State Key Laboratory Program(Grant No.20140134)
文摘With the decrease of the device size,soft error induced by various particles becomes a serious problem for advanced CMOS technologies.In this paper,we review the evolution of two main aspects of soft error-SEU and SET,including the new mechanisms to induced SEUs,the advances of the MCUs and some newly observed phenomena of the SETs.The mechanisms and the trends with downscaling of these issues are briefly discussed.We also review the hardening strategies for different types of soft errors from different perspective and present the challenges in testing,modeling and hardening assurance of soft error issues we have to address in the future.
基金Supported by the National Natural Science Foundation of China(61272110)
文摘Subsequently to the problem of performance and energy overhead, the reliability problem of the system caused by soft error has become a growing concern. Since register file(RF) is the hottest component in processor, if not well protected, soft errors occurring in it will do harm to the system reliability greatly. In order to reduce soft error occurrence rate of register file, this paper presents a method to reallocate the register based on the fact that different live variables have different contribution to the register file vulnerability(RFV). Our experimental results on benchmarks from MiBench suite indicate that our method can significantly enhance the reliability.
基金supported by the National Natural Science Foundation of China(No.60703074)the National High-Tech Research and Development Program of China(No.2009AA01Z124)
文摘We first study the impacts of soft errors on various types of CAM for different feature sizes.After presenting a soft error immune CAM cell,SSB-RCAM,we propose two kinds of reliable CAM,DCF-RCAM and DCK-RCAM. In addition,we present an ignore mechanism to protect dual cell redundancy CAMs against soft errors.Experimental results indicate that the 11T-NOR CAM cell has an advantage in soft error immunity.Based on 11T-NOR,the proposed reliable CAMs reduce the SER by about 81%on average with acceptable overheads.The SER of dual cell redundancy CAMs can also be decreased using the ignore mechanism in specific applications.
基金supported by the National Science and Technology Major Project under Grant Nos.2009ZX01028-002-003,2009ZX01029-001-003the National Natural Science Foundation of China under Grant Nos.61221062,61100163,61133004,61232009,61222204,61221062,61303158+1 种基金the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No.XDA06010403the Ten Thousand Talent Program of China
文摘Due to the decreasing threshold voltages, shrinking feature size, as well as the exponential growth of on-chip transistors, modern processors are increasingly vulnerable to soft errors. However, traditional mechanisms of soft error mitigation take actions to deal with soft errors only after they have been detected. Instead of the passive responses, this paper proposes a novel mechanism which proactively prevents from the occurrence of soft errors via architecture elasticity.In the light of a predictive model, we adapt the processor architectures holistically and dynamically. The predictive model provides the ability to quickly and accurately predict the simulation target across different program execution phases on any architecture configurations by leveraging an artificial neural network model. Experimental results on SPEC CPU 2000benchmarks show that our method inherently reduces the soft error rate by 33.2% and improves the energy efficiency by18.3% as compared with the static configuration processor.
基金supported by the National Key Technological Program of China(No.2008ZX01035-001)the National Natural Science Foundation of China(No.60870001)the TNList Cross-discipline Foundation,China
文摘Reliability is expected to become a big concern in future deep sub-micron integrated circuits design.Soft error rate(SER) of combinational logic is considered to be a great reliability problem.Previous SER analysis and models indicated that glitch width has a great impact on electrical masking and latch window masking effects,but they failed to achieve enough insights.In this paper,an analytical glitch generation model is proposed.This model shows that after an inflexion point the collected charge has an exponential relationship with glitch duration and the model only introduces an estimation error of on average 2.5%.
基金Supported by the National Natural Science Foundation of China(No.11079045,11179003 and 11305233)the Important Direction Project of the CAS Knowledge Innovation Program(No.KJCX2-YWN27)
文摘Single event effects(SEEs) induced by radiations become a significant challenge to the reliability for modern electronic systems. To evaluate SEEs susceptibility for microelectronic devices and integrated circuits(ICs), an SEE testing system with flexibility and robustness was developed at Heavy Ion Research Facility in Lanzhou(HIRFL). The system is compatible with various types of microelectronic devices and ICs, and supports plenty of complex and high-speed test schemes and plans for the irradiated devices under test(DUTs). Thanks to the combination of meticulous circuit design and the hardened logic design, the system has additional performances to avoid an overheated situation and irradiations by stray radiations. The system has been tested and verified by experiments for irradiating devices at HIRFL.
基金This project is supported by the foundation of State Key Lap. of Mechanical Transmition
文摘In accordance with the difficult problems of belt cross vibrations and effects of belt tension on machine spindle precision in abrasive belt grinding, a new soft grinding wheel is put forward, which is provided with the advantages of belt grinding and can he installed directly on the grinding machine spindle substituting for common grinding wheels. The new soft grinding wheel does not need any ancillary facilities and dressing devices in banding. With analyzing error of wheel and grinding experiment, the high-efficiency grinding characteristics grinding hard-brittle materials has been obtained.
基金supported by the National Natural Science Foundation of China (Grant Nos. 2032165 and 62004158)the National Key Scientific Instrument and Equipment Development Project of China (Grant No. 52127817)+1 种基金the State Key Laboratory of Particle Detection and Electronics (Grant Nos. SKLPDE-ZZ-201801 and SKLPDE-ZZ-202008)the Special Funds for Science and Technology Innovation Strategy of Guangdong Province, China (Grant No. 2018A0303130030)。
文摘To predict the soft error rate for applications, it is essential to study the energy dependence of the single-event-upset(SEU) cross-section. In this work, we present a direct measurement of the SEU cross-section with the Back-n white neutron source at the China Spallation Neutron Source. The measured cross section is consistent with the soft error data from the manufacturer and the result suggests that the threshold energy of the SEU is about 0.5 Me V, which confirms the statement in Iwashita’s report that the threshold energy for neutron soft error is much below that of the(n, α) cross-section of silicon.In addition, an index of the effective neutron energy is suggested to characterize the similarity between a spallation neutron beam and the standard atmospheric neutron environment.
基金The Open Project Program of the Shanxi Key Laboratory of Advanced Semiconductor Optoelectronic Devices and Integrated Systems(2023SZKF17)the University Synergy Innovation Program of Anhui Province(GXXT-2022-080)。
文摘With the development of semiconductor technology,the size of transistors continues to shrink.In complex radiation environments in aerospace and other fields,small-sized circuits are more prone to soft error(SE).Currently,single-node upset(SNU),double-node upset(DNU)and triple-node upset(TNU)caused by SE are relatively common.TNU’s solution is not yet fully mature.A novel and low-cost TNU self-recoverable latch(named NLCTNURL)was designed which is resistant to harsh radiation effects.When analyzing circuit resiliency,a double-exponential current source is used to simulate the flipping behavior of a node’s stored value when an error occurs.Simulation results show that the latch has full TNU self-recovery.A comparative analysis was conducted on seven latches related to TNU.Besides,a comprehensive index combining delay,power,area and self-recovery—DPAN index was proposed,and all eight types of latches from the perspectives of delay,power,area,and DPAN index were analyzed and compared.The simulation results show that compared with the latches LCTNURL and TNURL which can also achieve TNU self-recoverable,NLCTNURL is reduced by 68.23%and 57.46%respectively from the perspective of delay.From the perspective of power,NLCTNURL is reduced by 72.84%and 74.19%,respectively.From the area perspective,NLCTNURL is reduced by about 28.57%and 53.13%,respectively.From the DPAN index perspective,NLCTNURL is reduced by about 93.12%and 97.31%.The simulation results show that the delay and power stability of the circuit are very high no matter in different temperatures or operating voltages.
文摘A new interferometer for extreme ultraviolet (EUV) radiation with a laser produced plasma (LPP) laboratory source is under construction. The LPP source is operated with a Sn solid rod target on which pulsed YAG laser is focused to produce high temperature plasma emitting EUV radiation. The source is equipped with a newly designed debris stopper protecting a condenser multilayer mirror from the particle debris of the target. The condenser mirror focuses the light onto an EUV beam-splitter to form transmitted and reflected paths for producing interference fringes of a sharing type. The optical configuration is of a common path based on a triangular path type with a focusing at the beam-splitter, which is enabled to produce fringes by a low coherence radiation with a standard optical quality beam-splitter. The fringes are recorded by an imaging plate with pixels as small as 25μm. The dynamic range of linearity in detection of the EUV light was found to be more than 10 4 with sensitivity of 10 4 photons/pixel, enough for the purpose of interferogram recording, possibly with one laser shot.