# Digital Resilience and High Availability BlueT / Matthew Lien - 練喆明 國家資通安全研究院 韌性架構總顧問 --- 朋友們,你們準備好了嗎 --- !(political correctness) --- **Hello World** Hello World vs go Deeper Digital Resilience and High Availability 以及...(大量關鍵字) --- ## Introduction to Digital Resilience - Definition of digital resilience. - Why digital resilience is important, especially in the context of external shocks like financial crises and pandemics [ref](https://aisel.aisnet.org/ecis2021_rip/44/). ## Components of Digital Resilience - digital resilience contains cybersecurity, high availability, accessability, maintainancy, etc. - Other key components: service, software, data, system (online, backup), network, infrastructure, personnel. ## CIA Model (Cybersecurity) - Definition and explanation of the CIA model (Confidentiality, Integrity, Availability) for Cybersecurity. - How the model supports digital resilience. - 驗證 - Discuss what should be verified, what can be verified, and what has been verified. - Discuss the role of verification in improving digital resilience, with specific examples for services (user input/output) and software (SAST/DAST). ## PDCA - Explain the PDCA model (Plan, Do, Check, Act). - Discuss how PDCA can be applied to improve digital resilience. - Discuss the importance of requirements management and change management in the PDCA cycle. ## Models for Digital Resilience - Models for Digital Resilience (a lot of related, but none specific) - Briefly explain the SRAAEE model. - Security - Implement comprehensive security measures to protect data, software, networks, infrastructure, and services from threats. This includes technical, physical, and administrative measures. - Redundancy and Recovery: - Maintain backup systems, data, and processes to avoid a single point of failure that could bring down the entire system. In the event of a security incident or other disruption, have efficient disaster recovery plans to restore services. - Automation: - Incorporate technology to perform tasks with minimal human intervention. This can increase efficiency, reduce errors, and improve response times in areas such as security incident response, data backup, and system monitoring. - Agility and Adaptation: - Cultivate a flexible and adaptable approach to system design and processes. This includes the ability to quickly adapt to changes and incorporate lessons learned from incidents, audits, and other experiences. - Evaluation and Examination: - Regularly assess systems, processes, and personnel to identify potential vulnerabilities and areas for improvement. This also includes ongoing evaluation of the ever-changing threat environment, and regular testing of systems and disaster recovery plans. - Education: - Regularly train and raise awareness among all personnel to ensure they understand their responsibilities in maintaining system security and resilience. - How the model supports digital resilience. - service, software, data, system (online, backup), network, infrastructure, personnel - Other components - Vendor Management: - If your organization relies on third-party providers for certain services, it's important to manage these relationships effectively. This includes ensuring vendors meet your security requirements, have robust resilience measures in place, and align with your business continuity plans. - Regulatory Compliance: - Depending on the nature of your organization and the data you handle, there may be legal and regulatory requirements for data protection and system security that you need to comply with. - Incident Response Planning: - While your model does encompass "Redundancy and Recovery," explicit attention to incident response planning could be beneficial. This involves having a clear process in place for how to respond to security incidents or system failures, including communication plans, roles and responsibilities, and steps for investigating and resolving issues. - Business Continuity and Disaster Recovery Planning: - This goes hand-in-hand with redundancy and recovery. Business continuity focuses on maintaining operations during a disruption, while disaster recovery focuses on restoring normal operations after the fact. ## Risk Prevention and Control in Digital Resilience - Discuss the importance of risk prevention and control in digital resilience. - Explain how to apply risk prevention and control in practical scenarios. - Discuss the concept of not "putting all eggs in one basket" and the role of risk control in achieving high availability (e.g., Kubernetes rolling updates). ## Technology and Solution Selection - Discuss factors to consider when choosing technology solutions for digital resilience (e.g., capabilities needed, importance and frequency of use, cost, maintainability). - Give specific examples of technology solutions and discuss their pros and cons. ## Maintainability and Modularity - Explain the concept of maintainability and its importance in digital resilience. - Discuss the benefits of modularity, comparing monolithic vs microservice architectures. ## Verifiability - Explain the importance of verifiability in digital resilience. - Discuss the role of SBOM (Software Bill of Materials) in verifying the security and stability of software and supply chains. ## Standard Selection - Discuss the importance of choosing the right standards, with a comparison between SPDX and CycloneDX. ## Conclusion - Recap of the main points covered in the training. - The importance of being solid, steady, calm, flexible, introspective, and constantly seeking progress. - Suggestions for how to assess a system: What we already know, what we don't know yet, what might be missing, and how to improve it. --- 資安 CIA 模型 服務 軟體 資料 系統(online、backup) 網路 基礎建設 人員 PDCA 需求管理與更變管理 什麼可驗證、什麼該驗證、什麼已驗證 - 服務:使用者輸出入 - 軟體:SAST / DAST(交大 CRAX) 風險預防、風險控制 Security/Exam data,infra,power,etc Isolation(network and env, 平時就在這種規劃的環境運行,遇到狀況不會擴散,另外要有 kill switch)/Backup plan Scalability/Availability Redundancy, Responsive (Automation), Recovery, Monitoring Dynamic 動態依需求調整資源、開啟新 instance 使用某個產品,不代表相關功能有開啟、系統有設定好、有相關配合與能夠正確連動。(ex, 購買 WAF 但沒開功能、買了 LB 但沒設定好 service pool 或是後端伺服器根本沒法動態啟動服務、使用雲端但只是開一個 VM) 技術與方案的選擇: 我們需要什麼能力、這個角色重要程度如何、這個角色使用頻率如何 成本($、維護人力$、維護力可及度、維護力水準可及度/社群大小與文件完備度)(nginx/sws/lighttpd) 風險控制:雞蛋不放同一個籃子。一次只讓一半承受風險。搭兩班不同飛機、k8s rolling update 。 可維護性 模組化 一坨義大利麵條 vs 權責分工 Monolithic vs Microservice 可驗證性:例如,軟體的 SBOM(CycloneDX )要有,才能驗證軟體及供應鏈的安全問題與穩定性等。 「標準」的選擇:檢視「標準」的特性,例如 SPDX 與 CycloneDX 的差異性。 難打掛、打不掛、打掛也不怕 堅穩、踏實、 從容、靈活、 自省、求進。 設計堅固穩定,落實要確實, 應變方案妥善,反應要靈活, 持續檢核驗證,進步要持續。 CIA + Readiness (qualified, configured, and well-prepared) Availability 換為 Accessibility 不只要在,還要存取得到 島在人在,島亡人亡 Automation, Redundancy, Response, Recovery, Guidelines, Hardening Proactive, Reactive 流程 整個都該是有韌性的 WAF 有裝,功能沒開,因為一開就撐不住,或是 DDoS 時 bypass Scale up / scale out HA 沒有啟用 一台設備倒機 需求是什麼?使用量、對象、成本 然後綜合考量該做到哪、能做到哪 Backup plan / fallback and rollback plan (url 資料庫失效時怎麼辦) 瞭解技術的特性、要解決的問題、有什麼限制 Digital Resilience - Triad - Security, High Availability, Disaster Recovery - circles: service, application, data, .... - Goverance, .... ### Password Password Requirements – GDPR, ISO 27001/27002, PCI DSS, NIST 800-53 [ref](https://davintechgroup.com/toolkit/password-requirements-gdpr-iso-27001-27002-pci-dss-nist-800-53/) NIST Special Publication 800-63B https://pages.nist.gov/800-63-3/sp800-63b.html - 5 Authenticator and Verifier Requirements - 5.1 Requirements by Authenticator Type - 5.1.1.2 Memorized Secret Verifiers > When processing requests to establish and change memorized secrets, verifiers SHALL compare the prospective secrets against a list that contains values known to be commonly-used, expected, or compromised. For example, the list MAY include, but is not limited to: > - Passwords obtained from previous breach corpuses. > - Dictionary words. > - Repetitive or sequential characters (e.g. ‘aaaaaa’, ‘1234abcd’). > - Context-specific words, such as the name of the service, the username, and derivatives thereof. > Verifiers SHOULD NOT impose other composition rules (e.g., requiring mixtures of different character types or prohibiting consecutively repeated characters) for memorized secrets. Verifiers SHOULD NOT require memorized secrets to be changed arbitrarily (e.g., periodically). - Appendix A—Strength of Memorized Secrets - A.1 > However, analyses of breached password databases reveal that the benefit of such rules is not nearly as significant as initially thought [Policies], although the impact on usability and memorability is severe. - A.2 > Password length has been found to be a primary factor in characterizing password strength PCI DSS - Requirement 8: Assign a unique ID to each person with computer access - 8.1 Define and implement policies and procedures to ensure proper user identification management for non-consumer users and administrators on all system components as follows - 8.1.4 Remove/disable inactive user accounts within 90 days. 2 - 8.2 In addition to assigning a unique ID, ensure proper user-authentication management for non-consumer users and administrators on all system components by employing at least one of the following methods to authenticate all users: - 8.2.4 Change user passwords/passphrases at least once every 90 days. 2 法遵。過時的法遵、 ## 巡航時的方法 檢驗一個系統: - What do we know already (about the system) - What we don't know yet (about the system) - What might be missing (of the system) - How to improve it. (Add or change) | What do we know already | What we don't know yet | What might be missing | | ----------------------- | ---------------------- | -------------------- | | . | . | . | | How to improve it. ||| Case study: - Physical or VM or container - Isolation - Fix size or docker or docker swarm or k8s or ASG - "Using AWS" 是否滿足 resilience (need multi AZ, ASG, etc) - NIST 密碼規則建議(複雜度 vs 長度、更換頻率) - 完整備份?差異備份?漸進備份? - RTO/RPO - 快速部署(docker/container ) - 機房 tier / 9s - GitHub DDoS LSA content #nics #training
{"metaMigratedAt":"2023-06-18T06:24:24.500Z","metaMigratedFrom":"Content","title":"Digital Resilience and High Availability","breaks":false,"contributors":"[{\"id\":\"d756a3a3-b14c-406f-8570-44438ebb7420\",\"add\":12357,\"del\":1528}]"}
    370 views