The CrowdStrike Outage and the CRA: A Measured Look

The CrowdStrike Outage and the CRA: A Measured Look

The recent CrowdStrike outage sent shockwaves through the cybersecurity landscape, sparking discussions about its impact on companies with robust security practices. While the outage itself may not have directly affected businesses adhering to the new EU Cyber Resilience Act (CRA), it reignited a simmering debate about the potential unintended consequences of such regulations.

Proponents of the Cyber Resilience Act hail it as a much-needed step forward in fortifying the often-porous security posture of Internet of Things (IoT) products. By mandating stricter security measures throughout the development lifecycle and requiring longer periods of software updates, the CRA aims to create a more secure and resilient IoT ecosystem. However, some experts worry that the standardized approach mandated by the Cyber Resilience Act might have unforeseen drawbacks.

In the following article, we’ll delve into these potential downsides and explore how the Cyber Resilience Act, while well-intentioned, might inadvertently create negative impacts on the overall resilience of some IoT products.

Correlation action may be more risky than the original problem 

“If it ain’t broke, don’t fix it” behind this statement hidden the long term experience that many times that fix is actually creating more damage than benefit and the improvement that we try to achieve may not be worth the risk that the system will not work or work less effectively after the fix. There is a good reason for this concept, products normally had been tested before the release, those extensive testing are normally not repeated before the release of a new version. Furthermore, the Cyber Resilience Act does not require retesting of a software update for a new product and accepts testing that verifies that the change does not have a negative impact on the product. 

This approach leads to the modified product having passed less tests compared to the original product. While this approach is very reasonable, as the alternative is to dramatically increase the product cost, the risk is still there, and the more updates that we are performing the risk is increased. 

Patching Time Window

The CRA’s 24-hour window for fixing vulnerabilities is a point of contention. While 24 hours may be sufficient for patching third-party software, it might be tight for complex updates to a manufacturer’s own software. A risk-based approach might be more appropriate, balancing urgency with thorough testing to avoid introducing new problems. However, the current regulatory framework does not provide any room for corrective action, the existing time window is simply too short for a “wait and see” approach. 

Furthermore, the Cyber Resilience Act is using a cascade approach in which any security update in a software element that include in the SBOM (software bill of materials) of one software elements will requires a security update in the in each of the softwares that are using this software elements which will requires a software update in any software that is using those software etc. as a results the 24 hours time window will lead to large number of small related software updates rather then a single software update that includes all the impacted software. This phenomena increase the risk of failure during the software update and reduce the time available for product testing prior to a software release.

Minimal Surface Attack  

According to the Cyber Resilience Act, the development of a device shall follow the principles of minimal surface attack which is applicable for both hardware and software. This means that while a desktop computer shall have a USB port (or wireless connection) in order to connect a keyboard, an IoT device such a billboard controller should not have such a port. Furthermore, this requirement is critical, a device cannot be Cyber Resilience Act certificate if unused ports that can be used for cyber attack are exposed and this is very specifically targeting keyboard and screen.

The technical solution for the CrowdStrike event is to restart the device in boot safe mode and then to execute a software update. However, this solution can not be performed for devices that do not have a keyboard and screen. This means that if the device had been certified for Cyber Resilience Act, the solution offered by Microsoft and CrowdStrike will not work and the only way to fix the device is to replace it. 

The CRA offers an alternative solution for this problem which is a reset function that will reset the device to its original state, this solution is valid for the CrowdStrike issue but unfortunately is not supported under Windows.

It is difficult to point to a single solution for this problem, on one hand the minimal surface attack principle is one of the more important principles of the Cyber Resilience Act and specifically having access to a keyboard and screen of an IoT device is a huge security risk. On the other hand, without having a method to change the boot sequence most of the failures that are part of the boot sequence will require a replacement of the device or factory reset option, however, many operating systems that are widely used are not able to support factory reset (without a keyboard). 

This is a real deadlock, the use of general purpose OS may requires a replacement of the device in case of failure in the boot sequence while using other solution such as a system without operating system may be more expensive and may not be easy to update in case of security issue detected in one of the SBOM of the device. 

Security Vs Maintainability 

There is a clear tradeoff between good security and easy maintenance. Maintenance may require technician access to the IoT device, however, this access may be as a potential attach method. In practice, a very significant part of the attack on IoT devices are done via interfaces designed for maintenance and software updates. 

The Cyber Resilience Act has clearly selected security over maintenance, a device should not have interfaces that are designed for maintenance open permanently and a device with such interface (for example a device with SSH port open) cannot get CRA certification. Maintenance interfaces are allowed only “by demand’ meaning that they are open by the device owner (and not by the manufacturer) for a specific need and only during this time period. 

While this decision makes IoT devices more required, it also makes the devices more complex to maintain and will request more frequency replacement compared to the current status. 

Update of boot sequence 

The Cyber Resilience Act does not put any specific requirements on the boot sequence with the exception that the boot must be secured and should not allow execution of alternative boot sequences. However, as we all know the boot sequence is a very sensitive part and an error during this phase may lead to the device not to function. It is a good practice (but not a requirement of the CRA) to double and triple check any software change that include a change in the boot sequence or in programs that are executed as part of the boot sequence.

 

Windows as an operating system for IoT devices 

While a precise overall market share of Windows in the IoT market  is elusive, it’s clear that Windows holds a significant position in certain IoT niches. For instance:

  • Industrial IoT (IIoT): Windows has a strong presence in industrial automation and control systems, leveraging its familiarity and robustness.
  • Retail and Point of Sale (POS): Many POS systems and digital signage solutions rely on Windows-based platforms.
  • Video Players and DOOH including billboards and small advertisements devices. 

While the windows operating system is definitely meet the requirements of CRA, the deployment of this operating system on IoT device is far from optimal, it’s requires to make sure that the “start” menu is not accessible (possible but not always perfect) and as we explained before may not be easy to implement without interfaces that are required for maintenance but do not requires for the device operation. 

As a manufacturer, you need to check very carefully if a device can be certified for CRA using Windows especially if the device is designed on DOOH (Digital Out Of Home) markets. 

It is possible to have a similar problem in devices operated by Linux.

In short, yes. While Linux is not as centralized as Windows, several distributions are actually controlling most of the market. CentOS, Debian and Ubuntu are the most common versions of Linux used for deployment of servers. Any fault in the software distribution mechanism of one of those distributions will create a huge global server side failure. In addition any third party tools that have access to root privilege may create a global failure while the number of tools that have such an access is small; the most critical ones are MySQL, PostgreSQL, Apache and Nginx. A failure in each of these tools will have a huge global impact and will create a global impact probably much bigger than the failure in CrowdStrike. 

From the end user perspective, not much can be done to avoid this kind of failure, the only solution that is commonly used is to delay the deployment of changes on those critical softwares for some time in order to let other users detect potential failures in those tools. However, again, this is in contradictory with the CRA requirements.

 

Conclusion

The CrowdStrike outage highlights many weaknesses in the current CRA legislation. It seems that the current legislation has difficulty to cover the complexity and diversity of the IoT market and as a result many IoT devices manufacturers will face real technical challenges to meet the CRA requirements on one hand and to provide a reasonable user experience in terms of maintainability and availability.  

It is also shows that an existing IoT devices, especially devices that had not been designed with high end security requirements will not be able to get the CRA certification and will required a significant redesign, starting from the hardware itself which may not fit to the CRA requirements and to the software and software maintenance process which is highly likely requires a significant modification. 

Finally, multiple software tools such as Windows, Linux key distributions, MySQL, PostgreSQL, Apache and Nginx are global critical products. Any problem with one of those tools may create a global disaster. Maybe it is the time to think of a global solution that will try to make sure that this kind of outage will not happen for those tools.

zh_CNChinese