Meltdown & Spectre in Software Testing
I drew my Walther PPK from my sock as I sipped on my Vodka Martini (which was obviously shaken and not stirred). Something was definitely amiss as I began investigating the recent Meltdown and Spectre bugs. Evil masterminds had been exploiting technology fails and gaining unauthorised access to systems for years - it was time to do something!
It sounds like something from a James Bond movie. However, the recent security exploits all over the tech news headlines can’t help but read like a bad plot of a top spy movie. I felt like it was time to dive deep into what this means - be ready for some deep tech review.
Two major bugs that have been lying dormant on the hardware/processor level have recently been brought to light and are the subject of today’s review.
Given the severity of the issues let’s dive into the inner workings!
How do Meltdown and Spectre work?
Meltdown and Spectre exploit a processor feature called speculative execution. This allows the processor to execute its instructions out of order so that it has results it will need soon, ready ahead of time. This is used if part of the process it is executing is a bottleneck that slows everything down. Rather than wait for the slow process before it can do anything else, it can avoid losing time by continuing partially as if that process was done and keep its results stored.
Meltdown and Spectre work by tricking the processor into executing code, which caches and retrieves information from another process it should not have access to, while using the actual security check as the bottleneck moment. When the processor determines that this process does not have the clearance needed for the information it is trying to get, it immediately removes it from the cache. However, this is still too late since the process was able to successfully read it before it was locked out from it.
Meltdown is easier to account for with software and firmware updates, but Spectre is more difficult to patch without actually replacing hardware. These bugs are not limited to one specific processor but can be executed on a wide range of processors across the industry since speculative execution is an old, commonly used trick. The setup to use the exploit is complicated and companies are taking action to prevent it. This means the risk of attacks using these bugs is low, but the potential impact they can have, the amount of time they went undiscovered, and the way they were brought to light mean that they have a more significant role in the software testing industry.
Unexpected results
When software developers write an application, they generally expect the processor will follow their instructions as written. Features like speculative execution typically make no difference on the developer's end of things, so they go unnoticed other than the fact that the processor is nice and fast. This means that software can be written and released with unexpected side effects the developers never noticed.
Speculative execution is just one feature that has gone largely unnoticed and unchecked over the course of the careers of hardware developers. The issues with it were hiding in plain sight for 20 years and could have theoretically been discovered at any time but only publicly surfaced recently. That means other long-standing issues could easily be hiding in the same way.
Software and hardware providers have scrambled to push out patches to deal with the Meltdown and Spectre issues as best as they are able. This catch-up game is not where they want to be, and it would be preferable if they had been prepared for the issue before releasing their products. These sort of hurried patches are a necessity and can help to prevent damage, but they can be unstable and introduce new issues in place of the one they fix (hello regression testing). While some users can apply patches without interruptions, others cannot afford to stop production to apply them, such as hospital machines or airline control systems. Board members will be scrutinizing new software more harshly in hopes of avoiding these sorts of problems in the future.
The importance of exploratory testing
Automated testing works well for detecting issues the developers know can happen or would expect to see. However, Meltdown and Spectre were issues that were present for years and were never discovered or detected by automated testing.
The teams who brought these issues to light found them using exploratory testing.
While the code they used was specifically geared to see the issue in a proof of concept, it took a trial and error approach to actually see the resulting bug they were looking for.
While exploratory testing does not necessarily identify and isolate these hidden issues, it can reveal consistent unexpected effects they have on programs. Once you have repeatable steps to see an issue like this, it can give your developers a roadmap of what needs fixing and what to alter about their program, even if their code did not technically allow for the issue. You may be able to form an automated test from the results, but you can still use the steps as a testing script even if that is not possible. Adding this to your workflow can help you to stay ahead the next time one of these issues surfaces, even if you did not technically identify what flaw there was in the chip or operating system.
Improving exploratory testing
If you are interested in improving the exploratory testing ability of your QA department without increasing your headcount, we recommend looking through our Enterprise Exploratory Testing Guide. This will also give you further insight into the limitations of automated testing and the importance QA has on maintaining your customer loyalty.