# 04.4-Security and DevOps Lesson4:Splunk ###### tags: `Udacity` # 01 Splunk {%youtube EQg0xJswATI%} **What is Splunk?** [Splunk](https://www.splunk.com/en_us/download.html) is a search engine for IT data that was created to analyze machine-generated data. We encourage you to read the [official documentation](https://docs.splunk.com/Documentation) to get a clear insight of the Splunk and its capabilities. **Machine-generated data** Machine-generated data refers to the data generated by application logs, server logs, network devices, sensors, IoT, cloud services, or mobile services. The data could be structured or unstructured. - Structured data has clearly defined data-types and a particular pattern e.g., relational data. It resides in a database, on which we can run queries. - Unstructured data has no specific pattern e.g., textual log messages, reviews on the eCommerce platform. Challenges without Splunk Machine-generated data is challenging to analyze because: - Its volume is high - Sometimes it is unstructured, thus requires a - pre-processing **Benefits** Splunk helps generate information from machine-generated data and provides insights quickly. It helps to scale the data input limitlessly. It also provides machine learning capabilities to build auto-alert systems. Splunk helps the user to leverage machine-generated data in gauging the system performance, investigating the failure conditions, developing the business matrix, data visualization, and many more benefits. {%youtube sGuMwIiI7lI%} **Why should we use Splunk?** {%youtube T4n5hi_pCWA%} # 02 Indexing in Splunk {%youtube 0mctvNViWoE%} **Why do we need indexing?** The "index" helps to reduce the search and analysis time for an operation. When the Splunk indexes raw data, it transforms the data into searchable events (flat files and metadata). Indexes reside in flat files called "tsidx files". These "tsidx files" (indexes) point to the raw data. **How does indexing work in Splunk?** Indexes are the searchable events. Events are records of activity that reside in log files and metadata. During indexing, Splunk Enterprise performs event processing. Splunk does the following tasks - the configuration of the character set encoding, stores timestamps, extraction of the standard fields, segmentation of events, and adding metadata. **Type of Indexes** There are two types of indexes: Events indexes - Events indexes impose minimal structure and can accommodate any type of data, including metrics data. Events indexes are the default index type. Metrics indexes - Metrics indexes use a highly structured format to handle the higher volume and lower latency demands associated with metrics data. # 03 Deployment Models **Deployment models** Before we learn about deployment models, it is necessary to understand the Splunk components, which vary with the deployment models. **Splunk Components** Components are the different types of [Splunk Enterprise instances](https://docs.splunk.com/Splexicon:Instance). Each instance (component) has its own specific role. Broadly, there are two categories - "Processing" and "Managing" components. - The "forwarders", "indexers" and "search heads" are the Processing-components. - The "deployment server", "monitoring console" are a few examples of Managing-components. At this point, it is essential to introduce the concept of "pipeline" in Splunk. Raw data enters the pipeline, and it gets processed by the Processing-components. The pipeline is segmented into smaller parts, and each part is mapped with one or more Processing-components. Segments of the pipeline are - data input, parsing, indexing, and searching. The following is the mapping between pipeline segments and processing components: ![](https://i.imgur.com/W8YkVMv.png) > 璇璇曉整理 - forwarder: get raw data from app - 他負責收原始資料 - three types of forwarder - universal: 只轉發必要資料出去 - heavy: 他是很全面的instance所以才叫heavy,功能像是可以index, search, change data as well as forward it. 但是她有些功能被disable,為了減少系統資源的使用,畢竟太重了~ - light: 他也是全面的instance也是某接功能被disable.後來在版本6之後就被deprecate(拿掉)了。所以現在沒有這個forwarder了。因為universal可以取代他並且universal可以達成他所做的目的且是更好的tool for sending data to indexers. 參考: https://docs.splunk.com/Documentation/Splunk/8.0.1/Forwarding/Typesofforwarders - indexer: - 他負責給forwarder收到的資料作轉換可以存在資料庫然後貼特殊標前讓search head容易做query找到他 - to index the data - converts the forwarder's raw data into events and indexes it into a bucket - search heads: - 他負責給那些圖像化的介面做query來做搜尋user要的資料 - index data is pulled by it when they are queried from all the different graphical user interface - responds to a search request **Recommended read** Read more about [Components of a Splunk Enterprise deployment](https://docs.splunk.com/Documentation/Splunk/8.0.1/Capacity/ComponentsofaSplunkEnterprisedeployment). In addition, we recommend you to read more about [Forwarders](https://docs.splunk.com/Documentation/Splunk/8.0.1/Forwarding/Aboutforwardingandreceivingdata) and [Indexers](https://docs.splunk.com/Documentation/Splunk/8.0.1/Indexer/Aboutindexesandindexers). **Deployment models** {%youtube 4k0eTUaNh10%} **Deployment Topology** There could be different deployment topology based on the requirements of the project. The following are the examples: - Single host - independent search heads manage searches for a group of independent indexers. - High Availability - a group of indexers replicate data among themselves to ensure high data availability. - Multiple data store clustering - a group of search heads share search management responsibilities. - Multiple datastore peering - common in larger deployments. It is similar to the pattern of high availability, except that the search management function is handled by a search head cluster instead of individual search heads. You can find the Splunk deployment guide [here](https://www.splunk.com/themes/splunk_com/img/assets/pdfs/education/SplunkDeploymentGuide2_1.pdf). # 04 Installation **Installation** In this section, we'll go over how to install Splunk on your local system, on AWS, and on Microsoft Azure. **Licensing** {%youtube FjlyhX5Wb4w%} **Local install** {%youtube yqo-fZZkwY0%} **Running Splunk on AWS** {%youtube bfGFTV5Vzeo%} **Running Splunk on Microsoft Azure** {%youtube ezUQvkGwUfk%} Installing Splunk on Microsoft Azure You can find the guide to learn more about deploying Splunk on Microsoft Azure [here] (https://azuremarketplace.microsoft.com/en-us/marketplace/apps/splunk.splunk-enterprise) and [this](https://www.splunk.com/pdfs/technical-briefs/deploying-splunk-enterprise-on-microsoft-azure.pdf) document outlines the necessary components. In general, the following are the steps involved to install Splunk on Microsoft Azure: 1. Goto https://azure.microsoft.com/en-us/ 2. Create a resource or goto https://portal.azure.com/ 3. Search `splunk enterprise` 4. Complete subscription 5. Create new resource → VS professional password 6. Set up your vNet 7. Do a single node deployment 8. Set up the DNS 9. Click resource group 10. Standalone Vm → DNS name 11. Copy on the browser, append https:// # 05 Adding Data to Splunk **Adding Data to Splunk** {%youtube CLVMFCn8t68%} The dataset used in the above video can be obtained from the TransStats site [here](https://www.transtats.bts.gov/DatabaseInfo.asp?DB_ID=505&Link=0). (US Department of Transportation, Bureau of Transportation Statistics, 1995 American Travel Survey. URL: https://nhts.ornl.gov.) You can download the exact dataset by selecting the same options as the Instructor does in the video—or if you prefer, you can experiment with selecting some different options to get a variation on the data. This is just for purposes of demonstration and experimentation, so getting the exact data that the Instructor uses in the video is not essential. {%youtube Q7ks1vA6FqU%} Further reading - [Further reading on how to add data to Splunk](https://docs.splunk.com/Documentation/Splunk/7.3.0/Data/Howdoyouwanttoadddata) - [Splunk docs on Metadata](https://docs.splunk.com/Documentation/Splunk/7.3.0/SearchReference/Metadata) # 06 SPL Commands **SPL Commands** Splunk uses Search Processing Language or SPL. Let's have a look at some useful SPL commands. {%youtube SceItRuqsr8%} you can refer to the Splunk [Quick-Reference Guide](https://www.splunk.com/pdfs/solution-guides/splunk-quick-reference-guide.pdf). ## Note > 所有的指令 [Part4:SPL:搜尋處理語言](https://docsplayer.com/23164722-%E6%8E%A2-%E7%B4%A2-splunk-%E6%90%9C-%E5%B0%8B-%E8%99%95-%E7%90%86-%E8%AA%9E-%E8%A8%80-spl-%E5%85%A5-%E9%96%80-%E5%92%8C-%E6%93%8D-%E4%BD%9C-%E6%89%8B-%E5%86%8A-splunk-%E9%A6%96-%E5%B8%AD-%E7%AD%96-%E5%8A%83-david-carasso-%E8%91%97.html) 可以找到很清楚的說明 - `dedup`用法: 主要在移除特定欄位重複的值,以下的例子是說 source_A有五組 source_B有四組,可是我不想看到這麼多的source_A跟source_B我希望各只出現三筆就好,因此指令可以下`\ dedup 3 source` ![](https://i.imgur.com/ytH2yBC.png) - `stats`用法: stats苦子計算整個資料表的統計資料後傳入自己命名的欄位,像是你想要新增一個欄位用來放get跟post的使用次數就可以寫 `stats count(eval(method="GET")) as GET欄位, count(eval(method="POST")) as POST欄位 by host`,通常可以搭配一些函數使用像是max, avg, min等等 ![](https://i.imgur.com/4IoOk2U.png) - ![](https://i.imgur.com/2cFw8zx.png) # 07 Forwarders **Forwarders** Splunk forwarders consume data and send it to an indexer. {%youtube cdAc0l47yzI%} Types of Forwarders Universal, Light, and Heavy are the three types of forwarders. These forwarders forward data from one Splunk Enterprise instance (data input) to another Splunk Enterprise instance (indexer or any other forwarder) or even to a non-Splunk system. The difference in the working of these three types of forwarders lies in internal working, such as event filtering, event routing, and the footprint (memory, CPU load). - Universal forwarders provide reliable, secure data collection from remote sources and forward that data into Splunk software for indexing and consolidation. They can scale to tens of thousands of remote systems, collecting terabytes of data. - Light forwarders are deprecated ones, which consumes small CPU and memory resources. It has much more limited functionality. - Heavy forwarders consume the maximum CPU and memory resources, but less than an indexer. It has most of the features available, except that it cannot perform distributed searches. Refer here - https://docs.splunk.com/Documentation/Splunk/8.0.1/Forwarding/Typesofforwarders for a comparative study on forwarders. # 08 Introduction to Visualization Introduction to Visualization and More **Part 1** {%youtube 7dNl6rxzeO8%} **Part 2** {%youtube nRb08QoQhro%} Visualization practice. If you haven't already, we encourage you to follow along with the examples shown above and try Splunk out for yourself. Below are the general tasks you should try in Splunk at least once. In addition to the above walkthroughs, you may also find it helpful to check out the [Splunk visualization reference](https://docs.splunk.com/Documentation/Splunk/7.3.0/Viz/Visualizationreference) [] Generate a table of your choice [] Generate a chart of your choice [] Create a custom visualization of your choice **Conclusion** We have learned the following in this lesson: - Install Splunk - Use Splunk tool to add raw data, and run SPL commands - Use Splunk tool to generate dashboards to get insight using the visualizer