># Malicious Attachements Detection # Dataset [Need to get More] * [Malicious MS Office documents dataset](https://zenodo.org/record/4559436#.Y_1Dy3ZByUl) # Feature Extraction > Using these tools, we can extract many features from the document file so that we can build our own CSV file ## Word, Excel, ppt >>* **oleid** is a script to analyze OLE files such as MS Office documents > (e.g. Word, Excel), to detect specific characteristics that could potentially > indicate that the file is suspicious or malicious, in terms of security (e.g. > malware). For example it can detect VBA macros, embedded Flash objects, > fragmentation. The results is displayed as ascii table (but could be returned > or printed in other formats like CSV, XML or JSON in future). oleid project > **\-------------------------------------------** > **| Indicator | Vaue | Risk | Description |** > **\-------------------------------------------** > ![](https://i.imgur.com/ICptwmz.png) > >>* **olevba** is a script to parse OLE and OpenXML files such as MS Office documents > (e.g. Word, Excel), to extract VBA Macro code in clear text, deobfuscate > and analyze malicious macros. > **You can use multiple argument depend on what you need** > **-a** -> analysis the file > ![](https://i.imgur.com/cIdB52c.png) > **-c** -> extract the code from the file without analysis it > >>* oledump > >>* **oleobj** is a script to extract embedded objects from OLE files. > ![](https://i.imgur.com/h5eWqhH.png) ## pdf [under development] # Data you can extract >* From **oleID**, it can determine if the document is malicious or not (we can't depend on this result only) >* You can extract **macro code** store in the document # Keep In Mind > What if the VBA code is Obfuscated ? > What if there is no suspicious msg shown in the tools ? > what if there are an external relationships found in the document ? >> we need to use **oleobj** >**TOOLS** can easily return false result > Can we use CMD watcher > what if the document format is RTF(Rich Text Format) file format ? >> we can use **rtfobj**