Security - Collecting and storing logs

# Security - Collecting and storing logs Building the five layers of a modern logging pipeline. Seasoned security teams understand the importance of good logging practices when investigating incidents. If you don’t have the logs, responding to attacks will be difficult. Five core components of a modern logging pipeline: + The *collection layer* produces logs from applications, systems, network equipment, and third parties and forwards those logs to a central location + The *streaming layer* centralize logs into a single pipeline where routing can be handled. The streaming layer typically implemented as a message broker, like RabbitMQ or Apache Kafka + The processing of logs is done in an *analysis layer*. An analysis layer is composed of small programs designed to consume log messages and perform specific work on them. + The storage layer + The access layer gives operators an interface to analyze logs through various angles ![](https://i.imgur.com/IoDem0t.png) ## Collecting logs from systems and applications Most software emits some logs, whether it’s running inside a specialized networkdevice, serving a website on top of a Linux server, or running inside the Linux kernel itself. Finding and collecting those logs is the first challenge we must overcome in building a pipeline. For example: + The systems of a service typically run Linux, and web servers like Apache and NGINX generate a lot of information + Collecting logs from applications is a complex and important aspect of building web services + Infrastructures also produce logs that carry a lot of interesting security information + Finally, we’ll look at ways to capture logs from third-party services, like GitHub ![](https://i.imgur.com/M7YCm9j.png) ### Collecting logs from systems There are two broad categories of logs you may want to collect from your systems. The first, and most common to Unix-based systems, is *syslog*. The second, more modern and useful for security investigations, is *system calls audit logs*. **syslog `(/var/log)`**: syslog is the standard for Unix system logging implemented by most server software. An application can send messages to a syslog daemon over UDP on port 514 (some syslog daemons support TCP as well). For example with Go. ```go package main import ( "log" "log/syslog" ) func main() { slog, err := syslog.Dial( "udp", "localhost:514", syslog.LOG_LOCAL5|syslog.LOG_INFO, "SecuringDevOpsSyslog") defer slog.Close() if err != nil { log.Fatal("error:", err) } slog.Alert("This is an alert log") slog.Info("This is just info log") } ``` On a standard Ubuntu system, which runs the *rsyslog* daemon, running the preceding code produces two log messages that are published into `/var/log/syslog` on the local machine. The syslog format supports two classification parameters, a facility and a severity level: + The facility designates the type of application that publishes the log + The severity level indicates the importance of the event being logged On Ubuntu system, we use *rsyslog* to collect logs into `/var/log`. Configuration of `/etc/rsyslog.conf` to enable collection on UDP port 514. ``` # provides UDP syslog reception module(load="imudp") input(type="imudp" port="514") ``` Once logs are captured by the syslog daemon running on each system, forwarding logs to a central location is only a matter of configuration. But the question of which logs should be forwarded remains. The easy answer is to forward all logs to the logging pipeline at first, and gradually filter out logs that aren’t interesting or are too voluminous to be effective. The following categories of logs have proven to be important to investigative efforts: + Sessions opened on the system, either via SSH or through a direct console + Logs that relate to the main functionality of the system: access logs from Apache or NGINX for web servers, daemon logs from Postfix or Dovecot for mail servers, and so on + If the system is running a firewall, such as nftables, make sure to log securitysensitive events and collect those in the pipeline. For example, you could generate logs when connections are dropped by the firewal ### System-call auditing on linux On Linux, there’s a way to capture extremely detailed information about a system’s activity: *syscall auditing*. Syscalls are the programmatic interface between the kernel of an operating system and programs that perform tasks for users. Syscalls are used every time an application or a user interacts with the kernel to open a network connection, execute a command, read a file, and so on. Linux keeps track of which syscalls are executed by which programs and allows auditing tools to retrieve that information. One such auditing tool is called *auditd*. ![](https://i.imgur.com/C2r0dMn.png) Example log event generated by *syscall auditing* and sent to *auditd*. ![](https://i.imgur.com/W7wUJn2.png) ### Application logs System logs are often limited to what the developers of the systems have considered worth logging, and operators have little room to cater logs to their own concerns. In the previous section, we discussed how system daemons commonly send their logs to syslog for storage and centralization and hinted at the limitations of syslog. It’s still common for applications to support outputting their logs to a syslog destination of UDP, but modern applications increasingly prefer to ignore syslog entirely and write their logs to the standard output channel Applications that run inside of a Docker container, or that are launched by *systemd*, will have their standard output and standard errors automatically captured and written to log journals. *Systemd* gives access to logs via the `journalctl` command. The first rule of modern application logging is this: write logs to standard output, and let operators worry about routing them to the right destination. General rules that facilitate the processing of logs: + Publish logs in a structured format: JSON, XML, CSV, etc... + Standardize the timestamp format + Identify the origin of events by defining mandatory fields. Application name, hostname, PID, client public IP, etc... + Allow applications to add their own arbitrary data Developers, operators, and security engineers should work together to define a sensible standard for their organization, for example of the event in *mozlog* format. ![](https://i.imgur.com/mkBA0lZ.png) The OWASP organization provides two useful resources to decide which application events should be logged for security: [The OWASP Logging Cheat Sheet](http://mng.bz/15D3). ## Infrastructure logging Capturing log events from systems and applications only works as long as the underlying infrastructure remains secure. We’ll discuss how to do so with AWS CloudTrail and NetFlow. ### AWS CloudTrail AWS is a mature platform that provides detailed audit logs on all components of the infrastructure via the CloudTrail service. When enabled, CloudTrail keeps a full history of the account and provides invaluable information to investigate security incidents. An example of a CloudTrail log provided by AWS. ![](https://i.imgur.com/N4qCaCU.png) CloudTrail should be enabled on all AWS accounts. The service can write its audit logs into an S3 bucket where operators can forward them into the logging pipeline. These logs suffer from one limitation: they aren’t real-time. AWS writes CloudTrail logs every 10–15 minutes. ### Network logging with Netflow NetFlow is a format used by routers and network devices to log network connections. The amount of information carried by a NetFlow log is limited and only captures the most basic information about a connection: + Start time + Duration + Protocol + Source and destination of both IP and port + Total number of packets + Total number of bytes It’s not a very detailed format, but it provides enough information to find out the origin, destination, and size of a connection. In AWS, NetFlow logs are collected inside of a given VPC. You can configure an entire VPC, a given subnet, or a single network interface to generate NetFlow events, and collect them into the logging pipeline. > A word of warning though: NetFlow logs get large quickly, and it’s often impractical to turn it on everywhere at once. ## Streaming log events through message brokers In this section, we’ll focus on the streaming layer of the pipeline, and discuss how message brokers can be used to process such a large volume of logging information without overwhelming single components. A message broker is an application that receives messages from publishers and routes them to consumers. ![](https://i.imgur.com/Fvkfh9J.png) Message brokers are useful for streaming information between logical components and provide a standard interface between layers that may not know about each other. The collection layer only needs to know one thing: where to send the logs. On the other end of the message broker, we have an analysis layer composed of multiple programs that read log events and perform tasks on them. ![](https://i.imgur.com/KD7h3Jd.png) Different message broker software tools provide different reliability rules: + RabbitMQ can guarantee that messages are duplicated on more than one member of the message-broker cluster before acknowledging acceptance + Apache Kafka goes further and not only replicates messages but also keeps a history log of messages across cluster nodes for a configurable period + AWS Kinesis provides similar capabilities and is entirely operated by AWS Here again, various message brokers implement this differently, but most generally support fan-out and round-robin modes: + Round-robin mode sends a copy of a given message to a single consumer within a group + Fan-out mode sends a copy of a given message to all consumers subscribed to the topic ## Processing events in log consumers On the consumer side of the message broker live the log consumers, which compose the third layer of our logging pipeline: the analysis layer. ![](https://i.imgur.com/3wrsXJT.png) The most basic component of the analysis layer is one that consumes raw events and writes them into a database in the storage layer. A logging pipeline should always retain raw logs for some period of time. Consumers in a logging pipeline are primarily focused on three types of tasks: + Log transformation and storage + Metrics and stats computed to provide the DevOps team with visibility over the health of their services + Anomaly detection, which includes detecting attacks and fraud ## Storing and archiving logs A logging pipeline’s primary function is to collect and store logs from systems, so the storage layer is obviously an important piece of the entire architecture. Truth be told, you should never delete logs unless you absolutely must. Storing 10 TB of logs on Amazon Glacier costs less than a hundred dollars a month, an irrelevant fraction of any infrastructure budget. Achieving cheap, efficient, and reliable log storage requires mixing technologies at different times in the life of a log event, for example, the image below shows logs being first written into a database, which is generally considered an expensive storage type, and then exported into an archive. ![](https://i.imgur.com/jslaKgs.png) ## Accessing logs The final layer of our logging pipeline is the access layer, designed to give operators, developers, and anyone who requires it access to log data. ![](https://i.imgur.com/odiaUOG.png) ###### tags: `security-devops`