Monday, March 15, 2021

Risk Template: Availability & Performance (Part 3)



MaBell (Bell Telephone Company and legacy AT&T) with her always available dial tone is still the role model for availability. The proverbial five 9's (99.999%) goal is more challenging today given the complexity and risks of modern computer systems. 

As pat of the risk assessment, it is critical to understand your organization's availability goal. Before committing to five 9's, you should consider all the service providers that must be available to achieve your organization's availability goal. 


This is the third in a series of articles to cover risk areas (e.g., compliance, breach, availability). If you read the previous articles, the Goal and Process sections are identical. The Template section has been updated for the availability and performance risk.

Goal

Although there are many great resources that can help you complete a risk analysis (such as ISO31000 and NIST), there are limited resources on examples on how to put together your risk report. The template below provides a starting point with verbiage for you to edit for your organization's requirements.

Process

The template below suggests the order of how to document your findings. (Please reorder to work best for your situation.) However, the analysis steps are different then shown in the template. Check the template for more detailed directions for each step below.
  1. Determine scope - What data and what data owners are handled by your organization. Please note that even if you do not retain the data, you are still subject to the regulations. See sections labelled "Scope: Business Functions" and Scope: Solution Providers".
  2. Review each sub-risk to uncover your vulnerabilities. Consider the various threat actors See Sub-risks Section for Sub-Risk 1 - Sub-Risk 11.
  3. Determine the risk ranking. See Risk Ranking Section.
  4. Complete the template
Once you have completed your annual security risk assessment, distribute it to the Board of Director members and executives. Resolve risks found focusing on highest risks first. Keep leadership apprised of progress with regular updates throughout the year.

Finally, remember to complete risk assessments throughout the year when there is a significant change planned for the technical, physical or administrative environment.

REPORT TEMPLATE WITH SAMPLE VERBIAGE

Risk Description

Here is some verbiage that you may edit for your risk report. 

 

The organization is unable to generate revenue or perform other critical business functions (list critical functions) due to a system outage or degradation in systems operated by the organization and third party service providers. These issues may adversely impact customers and business partners.

Context

The purpose of this section is to provide compelling examples that an investment in reducing the organization's breach risks is warranted.
  • Highlight the most egregious risks found during your analysis
  • Provide statistics and illustrations of challenges your organization faced throughout the year. For example, ransomware has increased x% in our sector.
  • Add case studies from other organizations in your space that experienced a performance issue that caused significant impact. For example: Company Z was compromised and a bitcoin mining operation stole resources needed for peak loads. The root cause was xxxxx, which is a security capability that is not mature at our organization.
  • The annual Verizon Data Breach Investigations Report (DBIR) produced by the Verizon Threat Research Advisory Center (VTRAC) is an excellent resource for compelling statistics to add to your report.
  • An internet search by Sub-Risk and your industry will also uncover interesting case studies.

Risk Ranking

Once your analysis is complete (see sub-risk section), then describe the range of risks. Settle on consolidated risk.
  • Likelihood - Low, Medium, High. Select one.
    • Typically performance and availability likelihood varies depending on the organization's vulnerabilities and activities of threat actors. Provide examples of different likelihood events.

  • Impact - Low, Medium, high. Select one.
    • The impact will vary depending on the line of business and business activity. The business stakeholders should provide a cost per minute, hour or day for an outage. Your corporate risk management function may have already defined criteria for losses and their ranking. For example a $1 million loss is low while $50 million is high impact. 


Threat Actors

The threat actors to consider are listed below.

T1. Unaware insider including BPO

T2. Malicious insider including BPO

T3. Cybercriminal

T4. State actor

T5. Physical Hazard

T6. Activist

Sub-Risks

The list below defines the multiple potential ways availability and performance issues may occur. The examples (vulnerabilities) are just a starting point for your analysis. Also, remember to consider your service providers. 

Sub-Risk 1: Ransomware locks critical resources or data. Ransomware can be considered a BCP type event.

 

Examples:

·        Poor malware defenses lead to an attack where many resources are locked. Correcting ransomware can be more challenging in the remote work environment.


Sub-Risk 2: Insufficient resources are available during high volume usage that leads to degradation or outages. 

 

Examples:

·        Poor resource planning and stress testing do not uncover weaknesses in the architecture. 

·       Software performs poorly. e.g., Does not release resources, consumes excess resources 

·       Monitoring does not detect changes in traffic patterns

 

Sub-Risk 3: Critical components are not redundant (i.e. single points of failure) Technical, facility and  personnel resources need to be reviewed. 

 

Examples:

·        3rd party Runtime or other service is unavailable

·        3rd party service provider (cloud, ISP, telco, etc.) is unavailable 

·        A single hardware device is used in a critical process and fails

·        Multiple aging hardware devices fail - Had redundancy but based on devices that are obsolete. 

·        Critical network expert quits.

 Sub-Risk 4: Changes to the system cause availability or performance issues.   

 

Examples:

  • Change accidentally breaks a component. Cable is left unplugged. New device malfunctions. Configuration settings are incorrect. 
  • Configuration changes that may not be tested since considered simple cause an issue. (Firewall rule change, accidentally drop index to database, application timeout change)
  • Change has wider scope than realized and causes issues to unexpected components. Service password updated and downstream access points are not updated.  
  • Change has compatibility issues with existing components. Upgraded OS requires more resources.
  • Software change introduces errors. 

Sub-Risk 5: An authorization issue causes availability issues. 

 

Examples:


  • password expires, 
  • token expires, 
  • service account changes and applications services are not updated, 
  • fire wall rule conflict, 
  • SSL incompatibility where requestor still using outdated protocol and host will not accept, 
  • timeout causing more reconnections that degrade performance and does not balance user needs and security 

 Sub-Risk 6: Natural or man-made disaster that impacts the organization’s facilities, systems, data or personnel.  

 

Examples:

  • Earthquake, hurricane, fire, flood, global pandemic, etc.
  • Power outages, rolling black-outs 

 Sub-Risk 7: Distributed Denial of Service (DDoS) Attack is launched.

 

Examples:

·        DDoS attacks are used to shut down an internet-based services

·        In addition, DDoS attacks are also used as a smokescreen to distract an organization from a data breach attack. 

 Sub-Risk 8: Computer resources are hijacked and used for the hacker's purposes and degrades performance for organization. These hackers will try to use limited bandwidth to remain undetected. Peak loads may uncover this illicit usage.  

 

Examples:

·        Crypto-jacking or VOIP hijack

·        Employees use network bandwidth for personal use

 

 Sub-Risk 9: The organization’s website is defaced

 

Examples:

·        DNS for website may be hijacked

·        Content is removed or replaced

 

 Sub-Risk 10: The organization is categorized as malicious by 3rd party gatekeepers 

 

Examples

·        An email marketing campaign is categorized as SPAM, 

·        An organization’s systems access to a third-party service behaves like a DDoS attack and is blocked. 

 

 Sub-Risk 11: Use of third-party tools and services do not satisfy license agreements

 

Examples:

·        Open source or other third parties require the organization to stop using its components until licensing or terms of service is satisfied.

 

Scope: Business Functions

Below are list of typical business functions that your organization considers critical and deserving of availability and performance goals. If you have already completed your BIA (Business Impact Analysis) as part of your organization's Business Continuity Plan, there should be significant overlap. 

  • Sales (line of business. may have different systems/processes for physical and online. 
  • Payroll
  • Ordering and Inventory (if applicable)
  • Accounts Receivable / Payable
  • Financial reporting SEC

Scope: Solution Providers

After you have determined the business process you want to establish availability and performance goals, then inventory service provides such as your ISP, cloud, third party application services, third party resource providers. Application services are often not considered including open source during availability planning. Some application teams link real time to the open source code which adds an unnecessary risk to availability.

During the inventory process also document the contractual commitment as to uptime. You should assume that each service provider will have a distinct outage. So if you have just five service providers that commit to 99.999% then you should plan on achieving no more than 99.994%. (Don't forget to add in your organization's  potential downtime. In this calculation, 0.00001% outage was assumed.) 

Of course, your organization may be lucky and achieve higher availability. Risk planning though is based on likelihood and not luck. If key service providers are committing to much lower availability or not committing to any uptime goal then you should negotiate better terms, find new service providers, or make other contingency plans.   

The risk exercise should be completed for each major business function. For example, if you are a public company and systems to report quarterly earning are unavailable is also a significant issue. 


No comments:

Post a Comment