Project 1: Algorithms

You are working at a company called ‘OurCompany.’ Your supervisor has come to you because he heard you took a class on Reverse Engineering.

Apparently, they found some anomalous outbound traffic on the network late at night. They ran Wireshark to capture the traffic and found TCP connections that contained binary blobs being sent to an IP address in an eastern European country. When they examined the machine where the traffic was coming from, they found a binary running. They have provided you with:

  • A copy of the binary
  • The payloads of a few of the binary blobs they saw
    • By payload, I mean the contents of the socket connection with all the TCP/IP information stripped (In other words, the content passed to socket APIs such as send() and recv()).

“TheBoss” wants to know what is going on. What was this program doing? What information is in these binary blobs that appear to be random data with no patterns/signatures?

Resources

  • Full Description of the project
  • Download the project 1's binaries
    • You have been provided with the files ‘binary’, ‘bin1’, ‘bin2’, and ‘bin3.’ The ‘binary’ file is the executable. The other bin files were pulled from Wireshark and are the payloads of the communication that was viewed.
    • Obtain the binary in the class VM server
      ​​​​​​​​$ mkdir project1
      ​​​​​​​​$ cd project1
      ​​​​​​​​$ cp /class/project1/* ./
      
  • FAQ
    • Q1. Please provide both library file name and library's project name. For example, for libc.so.6, provide libc.so.6 and Standard C libraries (from https://man7.org/linux/man-pages/man7/libc.7.html)
    • Q2. There can be two forms of Internet address. Domain name and IP. A domain name is a string (e.g., google.com), while an IP address is a sequence of integers, separated by . characters. For example, a domain name localhost means the current machine (itself), and the corresponding IP address is 127.0.0.1.
      • Please provide all the addresses that you found, along with port numbers.
    • Q3. What are the inputs used in the identified algorithm?
      • In this sub question, the inputs essentially mean configuration of the algorithm (or arguments of the algorithm's functions).
    • Q5. Are there any signatures you can look for to detect this on other hosts on your network?
      • Assume that there is a malicious activity detector. It can monitor (1) API calls (or system calls) of all the programs and (2) file operations (e.g., creation/deletion of files with target file paths). This is a typical system monitoring agent that most anti-virus services may have.
      • What information you would like to give to such a system to detect whether this program is actively executing?

Project 2: Binary Formats

Your boss has come to you with a new problem. He says there was a very old program that was used to track emails for the help desk. They would store them in some kind of password-protected database that was written by an intern many years ago.

The problem is they need to retrieve data from the database, and the original source code is gone. To make matters worse, the passwords used to access the database are gone as well. All that is known is that some kind of JSON-based input is used to fill the database.

“TheBoss” provides you with:

  • A copy of the binary (program)
  • Their existing database (bin.db)

Turn in

  • A written report detailing all findings – Be as complete as possible.
    • Please use screenshots to describe important code sections.
      The code should have variables and functions properly renamed and labeled.
  • A copy of your annotated Ghidra database.
  • A dump of the database provided with some of the emails extracted.

Resources