># Malicious Attachements Detection
# Dataset [Need to get More]
* [Malicious MS Office documents dataset](https://zenodo.org/record/4559436#.Y_1Dy3ZByUl)
# Feature Extraction
> Using these tools, we can extract many features from the document file so that we can build our own CSV file
## Word, Excel, ppt
>>* **oleid** is a script to analyze OLE files such as MS Office documents
> (e.g. Word, Excel), to detect specific characteristics that could potentially
> indicate that the file is suspicious or malicious, in terms of security (e.g.
> malware). For example it can detect VBA macros, embedded Flash objects,
> fragmentation. The results is displayed as ascii table (but could be returned
> or printed in other formats like CSV, XML or JSON in future). oleid project
> **\-------------------------------------------**
> **| Indicator | Vaue | Risk | Description |**
> **\-------------------------------------------**
> 
>
>>* **olevba** is a script to parse OLE and OpenXML files such as MS Office documents
> (e.g. Word, Excel), to extract VBA Macro code in clear text, deobfuscate
> and analyze malicious macros.
> **You can use multiple argument depend on what you need**
> **-a** -> analysis the file
> 
> **-c** -> extract the code from the file without analysis it
>
>>* oledump
>
>>* **oleobj** is a script to extract embedded objects from OLE files.
> 
## pdf [under development]
# Data you can extract
>* From **oleID**, it can determine if the document is malicious or not (we can't depend on this result only)
>* You can extract **macro code** store in the document
# Keep In Mind
> What if the VBA code is Obfuscated ?
> What if there is no suspicious msg shown in the tools ?
> what if there are an external relationships found in the document ?
>> we need to use **oleobj**
>**TOOLS** can easily return false result
> Can we use CMD watcher
> what if the document format is RTF(Rich Text Format) file format ?
>> we can use **rtfobj**