The importance of health monitoring

The importance of health monitoring

The importance of health monitoring

With the ever growing plethora of systems processing and managing valuable data, it has never been more critical to foresee impending issues before they happen or reach critical levels. Indeed, a system downtime and data loss may have huge cost impacts. Health monitoring is a vital step in managing and maintaining modern data storage systems, especially in industrial environments where the Total Cost of Ownership (TCO) is extremely high in comparison to consumer applications.

Initially implemented for HDD and then adopted by SSD NAND flash memories, ATA Smart Commands address the growing demand for the management of reliable data storage. The theoretical lifetime of a drive is a function of many factors, like P/E endurance, use case WAF, environmental conditions, to name a few. Scheduled maintenance can be extrapolated, but it is not the best approach. It is especially true if the cost for an unexpected system failure is very high and results in data losses or downtime, impacting a much bigger scale of infrastructure. To prevent this scenario, a SMART tool (Self-Monitoring Analysis and Reporting Technology) can be used. There, the flash controller gathers the necessary health information and reports the vital statistics to alert the user of an imminent failure of the memory device. The parameters can be set to address the need of the use case, and warning can be reported with different levels of margin. This gives the applicant more time to replace the storage device, preventing an unexpected system failure.

Hyperstone developed its own hySMART™ tool, which is a utility to access and decode ATA standards and Hyperstone vendor specific SMART data and lifetime information through a graphical interface.  hySMART™ reads the data from a connected Hyperstone device and displays decoded data as sector values and as readable information and statistics. In detail, it includes information such as spare and erase blocks, as well as ECC error information, and many others. As an example the spare block information displays the number of remaining spare blocks and it will turn yellow when a certain threshold (10% by default) of free spare blocks is reached. The erase block information gives the user the estimated percentage of the remaining card life, based on the guaranteed erase count of the flash supplier. At last, the ECC error information shows the number of uncorrectable errors during operation, which will be highlighted by the tool. hySMART™ is invaluable in applications relying heavily in the uptime of their NAND flash storage systems. During qualification, this information is very important and helpful, as one can monitor the lifetime parameters to evaluate how storage media behaves in their system. Once deployed in the field, one can remotely monitor the status of their solution or setup the health monitoring utility to issue warnings when the media approaches the end of its useful life. hySMART™ is delivered free of charge with Hyperstone’s  controllers as part of our commitment for quality, reliability and safety.

back