CM's introduction

## Dev ops lifecycle PLAN -> CODE -> BUILD -> TEST -> || RELEASE -> DEPLOY -> OPERATE -> MONITOR -> PLAN -> ... ## What is configuration management DEPLOY <= Thats where configuration management comes in PUPPET, CHEF, ANSIBLE >3 minutes ### Issues: Mass deployment: Deploying on 5 machines vs 500 machines we need automations why? scalability, time Migration: test -> production Mitigate differences between test environments and production environments = consistency Rollback <- manual takes time, time = $, companies cant afford downtime >3 minutes, 6 total ### Infrastructure as code Environment/infrastructure defined by code, code can be easily versioned and controlled Easily available system histroy quick provisioning by executing that code Previous issues: Mass deploying <- solved, execute code on each machine, deterministic same effect Roll back <- solved, git checkout previous version or smh Migration <- solved, use same code/config for prod and test >3 minutes, 9 total ## Shell script vs CM tool Shell: - write automation from scratch - write worfklow from scratch CM: - most of the work is already done - workflow is already defined - gui > 1 minute, 10 total # CM's introduction Setup: Ansible is Agent free, ssh Puppet master agent certificate signing Chef has an additional component, workstation Availability: Master can be replaced on failure Mangament: Ansible - Push, the rest is pull > 5 minutes, 15 total ## Ansible ### Additional text for freestyle (all hail chatgpt) <3 Ansible is an open-source IT automation platform that helps organizations automate various tasks such as configuration management, application deployment, and provisioning. It is a simple, yet powerful, tool that makes it easy for organizations to manage multiple systems from a centralized location. Ansible works by connecting to nodes, or hosts, and executing tasks on those systems, eliminating the need for agents or additional software to be installed on the nodes. Ansible uses YAML syntax to define tasks and workflows, making it easily readable by humans. It operates on a client-server architecture, with the Ansible control node executing tasks on remote nodes. The control node requires Python 2.7 or later, while remote nodes only require an SSH connection and Python 2.4 or later. Ansible playbooks are collections of tasks and workflows written in YAML, and can be used to define configuration management policies, application deployments, and other IT automation tasks. Ansible includes over 500 modules, providing a wide range of functionality for IT automation tasks. Ansible also includes a number of advanced features, such as task delegation, rolling updates, error handling, and debugging and tracing. Ansible’s simplicity and ease of use, combined with its versatility and scalability, make it an excellent choice for organizations looking to streamline their IT operations. Its large community and comprehensive library of modules also make it a versatile tool that can be used for a wide range of IT automation tasks. Ansible is highly customizable and can be integrated with other tools and systems, such as Docker containers, cloud platforms, and monitoring and reporting tools. This makes it a valuable tool for organizations looking to implement a DevOps culture, where development and operations teams can collaborate more effectively to deliver high-quality applications and services faster. Ansible also supports multiple operating systems, including Windows, Linux, and macOS, making it a versatile tool for IT automation across diverse environments. Additionally, Ansible provides a secure communication method for executing tasks on remote systems, using SSH for communication between the control node and remote nodes. Ansible provides a number of advantages over other IT automation tools. Its simple, yet powerful, syntax makes it easy for IT professionals with varying levels of technical expertise to use, and its open-source nature allows organizations to customize the tool to meet their specific needs. Its scalability and versatility make it suitable for use in large, complex IT environments, and its integration with other tools and systems helps organizations to streamline their IT operations. In summary, Ansible is an open-source IT automation platform that provides organizations with a simple, yet powerful, tool for automating a wide range of IT tasks. Its versatility, scalability, and ease of use make it a valuable tool for organizations looking to streamline their IT operations and implement a DevOps culture. https://hackmd.io/dmag4uxZTrOBTAfG8AC8qQ?both#main-ansible-edureka-video-content #### ansible playbook example ```yaml= - hosts:webserver tasks: - name: Installs nginx web server apt: pkg=nginx state=installed update_cashe=true notify: - start nginx handlers: - name: start nginx service: name=nginx state=started ``` to start a playbook: ansible-playbook file.yaml #### ansible google trends graph ansible getting more popular, because of ease of setup and maintaining #### ends with ansible setup and demo ## Architecture of Ansible Ansible's architecture is based on a client-server model, where the Ansible control node serves as the client and the managed nodes serve as the servers. The control node is the machine where the Ansible playbook is executed and from where tasks are dispatched to the managed nodes. The managed nodes are the systems that are being managed by Ansible, and they can be running any operating system that supports SSH connectivity. The basic components of the Ansible architecture include: Ansible Control Node: The machine where Ansible is installed and where playbooks are executed. The control node communicates with the managed nodes over SSH to execute tasks. Managed Nodes: The systems that are being managed by Ansible. Managed nodes can be physical servers, virtual machines, or cloud instances. They require only a minimal installation of Python and an SSH server. Playbooks: Playbooks are collections of tasks and workflows written in YAML that define the automation tasks to be performed on the managed nodes. Playbooks are executed on the control node and are used to configure, deploy, and manage applications on the managed nodes. Modules: Modules are pre-written scripts that are used to perform specific tasks, such as installing software or configuring services. Ansible provides over 500 modules for a wide range of IT automation tasks. Inventory: The Ansible inventory is a file that defines the nodes that Ansible will manage. The inventory can be specified as a simple text file or in a more structured format such as INI or YAML. The inventory can also be dynamically generated using external scripts or inventory plugins. In summary, the Ansible architecture is designed to be simple, yet flexible and scalable, allowing organizations to automate a wide range of IT tasks across a diverse set of environments. The client-server model of the Ansible architecture enables organizations to manage multiple systems from a centralized location, making it easy to implement IT automation across large, complex environments. ### Ansible push model While the push architecture of Ansible has many advantages, it also has some disadvantages: Scalability: As the number of managed nodes increases, the amount of data that needs to be transmitted from the control node to the managed nodes can become quite large, potentially leading to scalability issues. Network Latency: If the managed nodes are located in different geographical locations, the network latency can become a bottleneck, affecting the performance of the configuration management process. Dependency on network connectivity: If the network connection between the control node and the managed nodes is disrupted, the configuration management process will fail, which can result in inconsistent states of the managed nodes. Security: The push architecture requires that the control node have access to the managed nodes over SSH or WinRM, which can introduce security vulnerabilities if not properly secured. Resource utilization: The push architecture can place a high load on the control node, especially if it is managing a large number of nodes. This can lead to high resource utilization, affecting the performance of the control node. In summary, while the push architecture of Ansible has many advantages, it is important to understand its limitations and to plan and design the configuration management process accordingly. ### Ansible's requirements Control machine: The machine that runs Ansible must have Python 2.7 or later installed. The control machine can be any system, including Windows, macOS, or Linux. Managed nodes: The target servers or devices that Ansible will manage are called managed nodes. Ansible requires a secure connection to the managed nodes, which can be established using SSH, Paramiko, or WinRM. Inventory: An inventory is a list of managed nodes that Ansible will manage. The inventory can be a simple text file or an organized and dynamic inventory managed by an external tool. Modules: Modules are the units of work in Ansible and are used to perform specific tasks on the managed nodes. Ansible comes with a large number of pre-built modules for various purposes, including managing files, software packages, and services. Playbooks: Playbooks are collections of tasks, defined using YAML syntax, that can be executed on the managed nodes. Playbooks provide a way to automate complex operations and are the main building blocks of an Ansible infrastructure. Network connectivity: Managed nodes must be accessible over the network and have the required network ports open. Ansible uses a default SSH port (22), but this can be changed to a different port if required. ### Ansible Inventories Ansible inventories are lists of managed nodes that Ansible uses to execute playbooks and perform configuration management tasks. An inventory can be a simple text file or an organized database that describes the state of the managed nodes, such as their IP addresses, hostnames, operating systems, and other relevant information. Inventories are essential for Ansible to know which nodes to manage and what tasks to perform on them. The inventory file is specified as an argument when running an Ansible playbook, and it defines the target nodes for the playbook to execute on. Ansible inventories can be organized in a variety of ways, including by group, region, environment, or any other custom organization that is relevant to the project. For example, you can create groups of nodes for each environment, such as production, staging, and development, and then specify the relevant group in the playbook to manage the nodes in that environment. Ansible also supports dynamic inventories, which allow the inventory information to be retrieved from external sources, such as cloud providers, LDAP, or CMDBs. This makes it possible to manage nodes that are dynamically created, such as those in a cloud environment, without the need to manually update the inventory file. In conclusion, Ansible inventories are a crucial part of the configuration management process and provide a way to organize and manage the nodes that Ansible will perform tasks on. Ex1 ```yaml= [webservers] node1.example.com node2.example.com [databases] db1.example.com db2.example.com [all:vars] ansible_user=ubuntu ansible_python_interpreter=/usr/bin/python3 ``` Ex2 ```yaml= [webservers:children] prod_web stage_web [prod_web] web1.prod.example.com ansible_user=root web2.prod.example.com ansible_user=root [stage_web] web1.stage.example.com ansible_user=ubuntu web2.stage.example.com ansible_user=ubuntu [databases:children] prod_db stage_db [prod_db] db1.prod.example.com ansible_user=root db2.prod.example.com ansible_user=root [stage_db] db1.stage.example.com ansible_user=ubuntu db2.stage.example.com ansible_user=ubuntu [all:vars] ansible_python_interpreter=/usr/bin/python3 [prod:vars] nginx_version=1.18.0 [stage:vars] nginx_version=1.16.0 ``` ## Ansible modules Ansible modules are pre-written code snippets that automate tasks on managed nodes. They are used to perform specific functions such as installing packages, creating users, copying files, etc. Modules are executed on the managed nodes by the Ansible control machine, which sends module arguments over a secure connection and returns the results to the control machine. There are over 450 modules available in Ansible, and they can be used in playbooks to perform various tasks. Some of the commonly used modules include: apt: used to manage packages on Debian-based systems yum: used to manage packages on Red Hat-based systems copy: used to copy files to the managed nodes file: used to manage files and directories on the managed nodes service: used to manage services on the managed nodes user: used to manage user accounts on the managed nodes Modules can be executed using the ansible command-line tool or in playbooks, where tasks are executed in a specific order to accomplish a desired outcome. Modules can also accept parameters and return results, which can be used in later tasks or playbooks. In summary, modules are a fundamental part of Ansible and provide a way to automate tasks on the managed nodes in a simple and repeatable manner Certainly! Here are some additional Ansible modules, including the service module: command: executes commands on the managed nodes shell: executes shell commands on the managed nodes raw: executes low-level shell commands on the managed nodes systemd: manages services managed by the Systemd service manager system: manages system-level actions, such as rebooting the system cron: manages cron jobs on the managed nodes firewalld: manages firewalls on systems that use the firewalld firewall service git: manages Git repositories on the managed nodes apt_key: manages apt signing keys on Debian-based systems yum_repository: manages Yum repositories on Red Hat-based systems assemble: used to assemble files from fragments unarchive: used to unpack archive files template: used to template files with Jinja2 These are just a few examples of the many modules available in Ansible. It's worth noting that you can also write custom modules in any programming language to extend the functionality of Ansible to meet your specific needs. Regenerate response ### Ansible Playbooks Ansible playbooks are files that define a series of tasks to be executed by Ansible on a set of managed nodes. Playbooks are written in YAML format and are used to automate a wide variety of IT tasks, such as configuring servers, deploying applications, and managing network devices. A playbook is made up of one or more plays, each of which maps a set of tasks to a specific set of managed nodes. The tasks are defined using Ansible modules, which are pre-written scripts that perform specific actions, such as installing packages, copying files, or managing services. Ex1. ```yaml= --- - name: Install Apache hosts: webservers become: yes tasks: - name: Install Apache package package: name: httpd state: present - name: Start Apache service service: name: httpd state: started enabled: yes ``` Ex2. ```yaml= --- - name: Install and configure Nginx hosts: webservers become: yes tasks: - name: Install Nginx apt: name: nginx state: present - name: Start Nginx service service: name: nginx state: started - name: Ensure Nginx is running shell: systemctl is-active nginx register: nginx_status ignore_errors: yes - name: Copy Nginx configuration file copy: src: nginx.conf dest: /etc/nginx/nginx.conf - name: Restart Nginx service if configuration has changed shell: systemctl restart nginx when: nginx_status.rc != 0 ``` ## Ansible modules ### Builtin Ansible modules are pre-written code snippets that automate tasks on managed nodes. They are used to perform specific functions such as installing packages, creating users, copying files, etc. Modules are executed on the managed nodes by the Ansible control machine, which sends module arguments over a secure connection and returns the results to the control machine. There are over 450 modules available in Ansible, and they can be used in playbooks to perform various tasks. Some of the commonly used modules include: apt: used to manage packages on Debian-based systems yum: used to manage packages on Red Hat-based systems copy: used to copy files to the managed nodes file: used to manage files and directories on the managed nodes service: used to manage services on the managed nodes user: used to manage user accounts on the managed nodes Modules can be executed using the ansible command-line tool or in playbooks, where tasks are executed in a specific order to accomplish a desired outcome. Modules can also accept parameters and return results, which can be used in later tasks or playbooks. command: executes commands on the managed nodes shell: executes shell commands on the managed nodes raw: executes low-level shell commands on the managed nodes systemd: manages services managed by the Systemd service manager system: manages system-level actions, such as rebooting the system cron: manages cron jobs on the managed nodes firewalld: manages firewalls on systems that use the firewalld firewall service git: manages Git repositories on the managed nodes apt_key: manages apt signing keys on Debian-based systems yum_repository: manages Yum repositories on Red Hat-based systems assemble: used to assemble files from fragments unarchive: used to unpack archive files These are just a few examples of the many modules available in Ansible. It's worth noting that you can also write custom modules in any programming language to extend the functionality of Ansible to meet your specific needs. ### Custom Ansible custom modules are scripts that are created by users to perform specific tasks that are not provided by the pre-existing modules that come with Ansible. Custom modules can be written in any language that is compatible with the target system and can be executed using Ansible. Custom modules provide organizations with the flexibility to automate tasks that are unique to their environment and can be used to extend the functionality of Ansible. Custom modules can be used to perform a wide range of tasks, such as: Automating tasks that are not covered by pre-existing modules Integrating Ansible with custom or third-party applications Accessing APIs or other data sources that are not accessible through pre-existing modules Modifying the behavior of existing modules to better meet the needs of the organization To create a custom module, you will need to have a good understanding of the task you want to automate and the target system, as well as a basic understanding of programming and the language you will use to write the module. Custom modules are typically stored in a specific directory on the Ansible control node and are executed by referencing them in a playbook. In summary, custom modules are an important feature of Ansible that provide organizations with the flexibility to automate tasks that are unique to their environment. Custom modules can be used to extend the functionality of Ansible and to integrate it with custom or third-party applications, making it a versatile tool for IT automation. # Availaiblity and redundancy Ansible provides several features that help to ensure availability: Idempotence: Ansible ensures that tasks are executed only when necessary, which helps to reduce downtime and improve reliability. When a task is executed, Ansible checks the state of the system before and after the task is performed to ensure that the desired state has been achieved. If the desired state has already been achieved, the task is not executed, which helps to ensure that the system remains in a stable state. Load Balancing: Ansible provides load balancing capabilities that help to distribute tasks across multiple systems, improving performance and availability. This can be done using the built-in "serial" and "max_fail_percentage" options in a playbook, which allow administrators to specify the number of systems to be updated at one time and the maximum number of systems that can fail before a task is stopped. Error Handling: Ansible provides robust error handling capabilities that help to minimize downtime and improve reliability. Error handling is built into Ansible modules and playbooks, allowing administrators to specify the actions that should be taken if a task fails. Task Monitoring: Ansible provides the ability to monitor tasks in real-time, allowing administrators to quickly identify and resolve issues that may impact availability. This can be done using the built-in "notify" feature in playbooks, which allows administrators to specify when and how notifications should be sent when a task is executed. High Availability: Ansible can be configured for high availability by using multiple control nodes and managed nodes, ensuring that tasks can be executed even if one or more systems are unavailable. This can be done using the built-in "delegation" and "acceleration" features in Ansible, which allow administrators to delegate tasks to specific systems and accelerate task execution by using caching. In summary, Ansible provides several features that help to ensure availability, including idempotence, load balancing, error handling, task monitoring, and high availability. These features work together to help organizations automate IT tasks in a reliable and efficient manner, improving the overall availability of their systems. ### Ansible Summary - Declarative, yaml - Easy to use (try it yourself) - Agentless (very easy setup) ## A different tool: Puppet Puppet and Ansible are both configuration management tools that automate the management of infrastructure, but there are several key differences between the two: Architecture: Puppet uses a client-server architecture, where the Puppet agent runs on each managed node and communicates with the Puppet master to receive configuration information. In contrast, Ansible uses a push architecture, where the Ansible control node pushes configuration changes to the managed nodes. Language: Puppet uses its own declarative language, Puppet DSL (Domain Specific Language), to describe the desired state of the infrastructure. In contrast, Ansible uses YAML, a more human-readable and accessible language, to describe playbooks and tasks. Flexibility: Ansible is designed to be more flexible and easier to use, with a lower learning curve compared to Puppet. It can be used to manage a wide range of systems and is not limited to a specific operating system. In contrast, Puppet is designed for enterprise-scale infrastructure and may have a steeper learning curve, but it provides a more comprehensive solution for large-scale environments. Scale: Puppet is designed to handle large-scale environments, with features such as node classification and a centralized certificate authority. Ansible is also scalable, but it does not have the same level of built-in support for large environments. Agentless: Ansible does not require an agent to be installed on the managed nodes, making it easier to deploy and manage. In contrast, Puppet requires the installation of an agent on each managed node. In conclusion, both Puppet and Ansible have their own strengths and weaknesses, and the choice between the two will depend on the specific requirements and constraints of your infrastructure. While Ansible is easier to use and more flexible, Puppet is better suited for large-scale environments. ## Architecture - pull based Puppet uses a pull-based architecture, where the Puppet agents on the managed nodes periodically pull configuration information from the Puppet master. The Puppet master acts as a centralized source of truth, and the Puppet agents use the information from the master to ensure that the state of the managed nodes is in line with the desired configuration. Here's how the process works: The Puppet agent on each managed node runs at a specified interval (30 minutes by default). The Puppet agent sends a request to the Puppet master for the latest configuration information. The Puppet master responds with the current configuration information, including any changes that have been made since the last request. The Puppet agent compares the received information to the current state of the managed node and applies any changes necessary to bring the node into compliance with the desired configuration. The Puppet agent logs the changes that were made and sends a report back to the Puppet master. By using a pull-based architecture, Puppet ensures that the managed nodes are always up to date with the latest configuration information, even if changes are made to the Puppet master. This architecture also allows for greater scalability, as the Puppet master can handle a large number of Puppet agents. It's worth noting that Puppet also supports push-based architectures, where the Puppet master pushes changes to the Puppet agents, but the pull-based architecture is the more commonly used approach. ### DSL instead of YAML ```yaml= # Install Apache HTTP server package { 'httpd': ensure => installed, } # Start the Apache HTTP server service service { 'httpd': ensure => running, enable => true, } # Configure the default Apache virtual host file { '/etc/httpd/conf/httpd.conf': ensure => file, content => template('module_name/httpd.conf.erb'), } # Define a template for the Apache virtual host configuration template { 'module_name/httpd.conf.erb': source => 'puppet:///modules/module_name/httpd.conf.erb', owner => 'root', group => 'root', mode => '0644', } ``` ## Additional mentions: what to work with ## Terraform ## Kubernetes Ansible and Kubernetes can complement each other in various ways to automate and manage infrastructure, especially in the context of deploying and managing applications. Here are a few ways they can work together: Deploying and configuring the Kubernetes cluster: Ansible can be used to automate the deployment of a Kubernetes cluster, including setting up the necessary components such as etcd, API server, controller manager, and worker nodes. Ansible can also be used to configure and customize the cluster components. Managing and deploying applications: Ansible can be used to automate the deployment and management of applications within the Kubernetes cluster. For example, Ansible playbooks can be used to manage and configure the state of the resources in the cluster, such as pods, services, and deployment configurations. Updating and scaling the cluster: Ansible can be used to automate the process of updating the cluster components, including scaling up the worker nodes as needed. Integrating with CI/CD pipelines: Ansible and Kubernetes can be integrated into CI/CD pipelines to automate the deployment of applications from development to production. Ansible playbooks can be used to deploy applications to the Kubernetes cluster, and Kubernetes can be used to manage and orchestrate the deployment. By combining the power of Ansible and Kubernetes, organizations can simplify the deployment and management of their infrastructure, ensuring that it remains consistent, up-to-date, and scalable. ## Cloudformation Terraform and Ansible can be paired to automate the provisioning and configuration of infrastructure. Terraform can be used to provision the infrastructure, while Ansible can be used to configure and manage the software components running on that infrastructure. Here's an example of how they can be used together: Terraform is used to provision the infrastructure, such as virtual machines, networks, and load balancers, in a cloud environment. Once the infrastructure is provisioned, Terraform can be used to generate an inventory file that Ansible can use to manage the software components running on the infrastructure. Ansible is then used to install and configure software on the virtual machines, such as web servers, databases, and other applications. Terraform can be used to manage changes to the infrastructure over time, such as scaling the infrastructure up or down, and Ansible can be used to manage changes to the software components. By pairing Terraform and Ansible, organizations can benefit from the strengths of both tools. Terraform provides a consistent and repeatable way to provision and manage infrastructure, while Ansible provides a flexible and powerful way to manage and configure the software components running on that infrastructure. ## Configuration management vs provisioning ## configuration management - co to? Configuration Management to proces utrzymywania systemów, takich jak sprzęt i oprogramowanie komputerowe, w pożądanym stanie. Configuration Management (CM) jest również metodą zapewnienia, że systemy działają w sposób zgodny z oczekiwaniami w czasie. W skrócie są to narzędzia pozwalające zautomatyzować i ułatwić proces wdrażania i skalowania aplikacji a także powrócić do poprzedniej stabilnej wersji aplikacji w razie błędu, żeby zminimalizować przestój (downtime) ## Ansible short (edureka video) Ansible - It automation, configuration, management and provisioning tool uses *playbooks* to deploy, manage, build, test and configure full server environments ### features - ansible is agentless/ only installed on controll machine - build on top of python, can utilise pip to download packages - uses ssh for connections - push based architecture - easy/fast to setup, minimal requirements ### (explanation of push/pull architecture) advantages of push based system: - full controll - everything synchronised - in case of a mistake, you can immediately correct it disadvantages: - unable to achieve full automation pull based system for better scalability/bigger infrastructure push based system for quick changes in config #### ansible agentless architecture basically "Architecture of Ansible" - text above + plugins Plugins: executed on the main control machine for stuff like logging, email etc. allows to exectue ansible task as job build step connection plugin allows you to utilize other connection modules besides ssh example: docker connection plugin to connect to docker containers (talks about host inventory, modules like apt or service module) ansible ad-hoc command quick commands for cli like purpose (check uptime, release, date etc) ex: ansible all -s -n shell -a 'uptime' ## Chef Chef (progress chef) - opensource tool developed by Chef (formerly opscode (formerly chuck's)) company merged with progress some time ago Chef is a declarative configuration management and automation platform allowing to utilize infrastructure as code Utilizes the push infrastructure ### chef server Center of the operations, stores, manages and provides configuration data to other chef components chef server also keeps a record of the state of all nodes at the time of the last *chef-client* run Workstation communicates with the server using Knife and Chef CLI while nodes communicate with chef using chef-client Any changes made to your infrastructure code must pass through the Chef server to be applied to nodes Prior to accepting or pushing changes, the Chef server authenticates all communication via its REST API using public key encryption Each server has: - nginx front-end load balancer to route all requests to the API - PostgreSQL database - Apache Solr instance (wrapped by chef-solr) for indexing and search **Chef server’s minimum system requirements are a Linode with 8 GB of RAM and four CPU cores. For more specifications, see the Chef System Requirements documentation page.** Chef Server uses a Bookshelf to store its cookbooks (and related files and templates) The Bookshelf is a versioned repository (generally located at /var/opt/opscode/bookshelf; full root access is necessary for access) When a cookbook is uploaded to the Chef server, the new version is compared to the one already stored If there are changes, the new version is stored The Chef server only stores one copy of a file or template, meaning if resources are shared between cookbooks and cookbook versions, they will not be stored multiple times ### chef workstation A Chef workstation is where a user creates, tests, and maintains cookbooks and policies pushed to the Chef server and pulled by the Chef nodes The workstation functionality is available by downloading the Chef Workstation package, which provides the chef and knife command-line tools, the testing tools (Test Kitchen, ChefSpec, Cookstyle, and Foodcritic), and InSpec Inspec is a tool for writing automated tests for compliance, security, and policy requirements, additionally, Berkshelf, the dependency manager for Chef cookbooks, is installed. The Chef workstation can be installed on virtual servers or personal computers A Chef workstation will interact with a single Chef server, and most work is done in the chef-repo directory located on the workstation. Cookbooks created on a Chef workstation can be used privately by one organization or uploaded to the Chef Supermarket for others to use Chef workstations can download cookbooks found in the Supermarket. A Chef workstation’s chef-repo directory is where cookbooks are authored and maintained. Any supporting resources (such as roles, data bags, and environments) are also stored there The chef-repo should be version-controlled with a remote version control system (such as Git). Chef can communicate with the server from the chef-repo and push any changes via the use of knife commands. You can generate a Chef repository using the following command: chef generate repo repo-name. #### knife tool The Knife command-line tool is the primary way that a workstation communicates the contents of its chef-repo directory with a Chef server. It also provides an interface to manage nodes, cookbooks, roles, environments, and databags. A Knife command executed from the workstation uses the following format: ```knife subcommand [ARGUMENT] (options)``` For example, to view the details of a Chef user, execute the following command: ```knife user show USER_NAME``` The Knife command-line tool is configured with the knife.rb file: ``` File: ~/chef-repo/.chef/knife.rb log_level :info log_location STDOUT node_name 'username' client_key '~/chef-repo/.chef/username.pem' validation_client_name 'shortname-validator' validation_key '~/chef-repo/.chef/shortname.pem' chef_server_url 'https://123.45.67.89/organizations/shortname' syntax_check_cache_path '~/chef-repo/.chef/syntax_check_cache' cookbook_path [ '~/chef-repo/cookbooks' ] ``` The default knife.rb file is defined with the following properties: - log_level: The amount of logging to be stored in the log file. The default value, :info, notes that any informational messages will be logged. Other values include :debug, :warn, :error, and :fatal. - log_location: The location of the log file. The default value, STOUT is for standard output logging. If set to another value then standard output logging will still be performed. - node_name: The username of the person using the workstation. This user requires a valid authorization key located on the workstation. - client_key: The location of the user’s authorization key. - validation_client_name: The name for the server validation key determining whether a node is registered with the Chef server. These values must match during a chef-client run. - validation_key: The path to your organization’s validation key. - chef_server_url: The URL of the Chef server, with shortname being the defined shortname of your organization (this can also be an IP address). /organizations/shortname must be included in the URL. - syntax_check_cache_path: The location in which knife stores information about files checked for appropriate Ruby syntax. - cookbook_path: The path to the cookbook directory. #### test kitchen Test Kitchen provides a development environment on a workstation to create, test, and iterate on cookbooks before distributing its contents to production nodes The Kitchen command-line tool can be used to run integration tests against different platforms allowing testing against the variety of nodes running in a production environment ### chef nodes A Chef node is any machine managed by a Chef server. Chef can manage virtual servers, containers, network devices, and storage devices as nodes. Each node must have a corresponding Chef client installed to execute the steps needed to bring it to the required state defined by a cookbook. Nodes are validated through the validator.pem and client.pem certificates created on the node when it is bootstrapped. All nodes must be bootstrapped over SSH as either the root user or a user with elevated privileges. Nodes are kept up-to-date through chef-client, which runs a convergence between the node and the Chef server. What cookbooks and roles the node takes on depends on the run list and environment set for the node in question. #### chef client On a node, chef-client checks the node’s current configuration against the recipes and policies stored on the Chef server and brings the node up to date. The process begins with the chef-client checking the node’s run list, loading the required cookbooks, then checking and syncing the cookbooks with the current configuration of the node. The chef-client must be run with elevated privileges to configure the node correctly. It should run periodically to ensure that the server is always up to date (usually with a cron job or by setting up the chef-client to run as a service). #### run lists Run lists define which recipes a Chef node will use The run list is an ordered list of all roles and recipes chef-client needs to pull from the Chef server Roles define patterns and attributes across nodes #### Ohai Ohai collects system configuration data to be used in cookbooks and is required to be present on every Chef node. It is installed as part of the bootstrap process. Ohai gathers data about network and memory usage, CPU data, kernel data, hostnames, FQDNs, and other automatic attributes. This data helps the Chef client determine the state of the node prior to applying that node’s run list. ### Enviroments Chef environments mimic a real-life workflow, allowing nodes to be organized into different groups that define the role the node plays in the fleet. This allows for users to combine environments and versioned cookbooks to have different attributes for different nodes. For example, if testing a shopping cart, you may not want to test any changes on the live website, but with a development set of nodes. Environments can be saved either as ruby or json files ```ruby= name "environmentname" description "environment_description" cookbook_versions "cookbook" => "cookbook_version" default_attributes "node" => { "attribute" => [ "value", "value", "etc." ] } override_attributes "node" => { "attribute" => [ "value", "value", "etc." ] } ``` ```jsonld= { "name": "environmentname", "description": "a description of the environment", "cookbook_versions": { }, "json_class": "Chef::Environment", "chef_type": "environment", "default_attributes": { }, "override_attributes": { } ``` ### Cookbooks Cookbooks are the basis for managing the configurations on any node Cookbooks contain values and information about the desired state of a node Using the cookbook, the Chef server and Chef client ensure the defined state is achieved Cookbooks are comprised of recipes, metadata, attributes, resources, templates, libraries, and anything else used to create a functioning system (attributes and recipes being the two core parts) Components of a cookbook should be modular, with recipes that are small and related. ### Recipies Recipes are written in Ruby and contain information about everything needing to be run, changed, or created on a node Recipes work as a collection of resources determining the configuration or policy of a node (with resources being a configuration element of the recipe) For a node to run a recipe, it must be on that node’s run list The example recipe below is part of Chef’s Vim cookbook. It dictates the required Vim package based on a node’s Linux distribution: ```ruby= vim_base_pkgs = value_for_platform_family( %w(debian arch) => ['vim'], %w(rhel fedora) => ['vim-minimal', 'vim-enhanced'], 'default' => ['vim'] ) package vim_base_pkgs package node['vim']['extra_packages'] unless node['vim']['extra_packages'].empty? ``` ## Puppet ### what is puppet Puppet is a tool that helps you manage and automate the configuration of servers When you use Puppet, you define the desired state of the systems in your infrastructure that you want to manage You do this by writing infrastructure code in Puppet's Domain-Specific Language (DSL) — Puppet code — which you can use with a wide array of devices and operating systems Puppet code is declarative The Puppet primary server is the server that stores the code that defines your desired state. The Puppet agent translates your code into commands and then executes it on the systems you specify, in what is called a Puppet run. ### puppet platform Puppet is made up of several packages. Together these are called the Puppet platform, which is what you use to manage, store and run your Puppet code. These packages include puppetserver, puppetdb, and puppet-agent — which includes Facter and Hiera. Puppet is configured in an agent-server architecture, in which a primary node (system) controls configuration information for one or more managed agent nodes Servers and agents communicate by HTTPS using SSL certificates Puppet includes a built-in certificate authority for managing certificates Puppet Server performs the role of the primary node and also runs an agent to configure itself You keep nearly all of your Puppet code, such as manifests, in *modules* Each module manages a specific task in your infrastructure, such as installing and configuring a piece of software Modules contain both code and data The data is what allows you to customize your configuration #### hiera Using a tool called *Hiera*, you can separate the data from the code and place it in a centralized location This allows you to specify guardrails and define known parameters and variations, so that your code is fully testable and you can validate all the edge cases of your parameters All of the data generated by Puppet (for example facts, catalogs, reports) is stored in the Puppet database (PuppetDB) Storing data in PuppetDB allows Puppet to work faster and provides an API for other applications to access Puppet's collected data Once PuppetDB is full of your data, it becomes a great tool for infrastructure discovery, compliance reporting, vulnerability assessment, and more You perform all of these tasks with PuppetDB queries #### facter - client tool Puppet’s inventory tool gathers facts about an agent node such as its hostname, IP address, and operating system The agent sends these facts to the primary server in the form of a special Puppet code file called a manifest This is the information the primary server uses to compile a catalog — a JSON document describing the desired state of a specific agent node Each agent requests and receives its own individual catalog and then enforces that desired state on the node it's running on In this way, Puppet applies changes all across your infrastructure, ensuring that each node matches the state you defined with your Puppet code The agent sends a report back to the primary server ### why use puppet - consistency - configuration management is time saving - scalability - automation coupled with pull atchitecture makes it a little more time consuming to set up any new devices, but can be managed easily later