Malicious URL classification

# Malicious URL classification # url representation learning ### unsupervised learning sequence representations * Hsu, Wei-Ning, Yu Zhang, and James Glass. "Unsupervised learning of disentangled and interpretable representations from sequential data." Advances in neural information processing systems. 2017. * Pei, Wenjie, and David MJ Tax. "Unsupervised Learning of Sequence Representations by Autoencoders." arXiv preprint arXiv:1804.00946 (2018). * Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using temporal order verification." European Conference on Computer Vision. Springer, Cham, 2016. * Denton, Emily L. "Unsupervised learning of disentangled representations from video." Advances in neural information processing systems. 2017. * Chung, Yu-An, et al. "Audio word2vec: Unsupervised learning of audio segment representations using sequence-to-sequence autoencoder." arXiv preprint arXiv:1603.00982 (2016). * Lee, Hsin-Ying, et al. "Unsupervised representation learning by sorting sequences." Proceedings of the IEEE International Conference on Computer Vision. 2017. ### unsupervised learning sentence representations * Pagliardini, Matteo, Prakhar Gupta, and Martin Jaggi. "Unsupervised learning of sentence embeddings using compositional n-gram features." arXiv preprint arXiv:1703.02507 (2017). * Logeswaran, Lajanugen, and Honglak Lee. "An efficient framework for learning sentence representations." arXiv preprint arXiv:1803.02893 (2018). * Hill, Felix, Kyunghyun Cho, and Anna Korhonen. "Learning distributed representations of sentences from unlabelled data." arXiv preprint arXiv:1602.03483 (2016). * # classification ### Unsupervised * anomaly detection * Tang, Adrian, Simha Sethumadhavan, and Salvatore J. Stolfo. "Unsupervised anomaly-based malware detection using hardware features." International Workshop on Recent Advances in Intrusion Detection. Springer, Cham, 2014. * Zhang, Jiong, and Mohammad Zulkernine. "Anomaly based network intrusion detection with unsupervised outlier detection." 2006 IEEE International Conference on Communications. Vol. 5. IEEE, 2006. * one-class classification * Amer, Mennatallah, Markus Goldstein, and Slim Abdennadher. "Enhancing one-class support vector machines for unsupervised anomaly detection." Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description. ACM, 2013. * clustering * Leung, Kingsly, and Christopher Leckie. "Unsupervised anomaly detection in network intrusion detection using clusters." Proceedings of the Twenty-eighth Australasian conference on Computer Science-Volume 38. Australian Computer Society, Inc., 2005. * unsupervised sequence Classification * Tomović, Andrija, Predrag Janičić, and Vlado Kešelj. "n-Gram-based classification and unsupervised hierarchical clustering of genome sequences." Computer methods and programs in biomedicine 81.2 (2006): 137-153. * Tomović, Andrija, Predrag Janičić, and Vlado Kešelj. "n-Gram-based classification and unsupervised hierarchical clustering of genome sequences." Computer methods and programs in biomedicine 81.2 (2006): 137-153. ### Semi-supervised * Dai, Andrew M., and Quoc V. Le. "Semi-supervised sequence learning." Advances in neural information processing systems. 2015. ### supervised text classification task * Le, Hung, et al. "URLNet: learning a URL representation with deep learning for malicious URL detection." arXiv preprint arXiv:1802.03162 (2018). * [github](https://github.com/Antimalweb/URLNet) * Cer, Daniel, et al. "Universal sentence encoder." arXiv preprint arXiv:1803.11175 (2018). * [github](https://github.com/tensorflow/tfjs-models/tree/master/universal-sentence-encoder) * Yu, Adams Wei, et al. "Qanet: Combining local convolution with global self-attention for reading comprehension." arXiv preprint arXiv:1804.09541 (2018). * [github](https://github.com/BangLiu/QANet-PyTorch) * transformer (Bert, XLNet) * [github](https://github.com/huggingface/pytorch-transformers) * Graph Convolutional Networks for Text Classification # information gathering * lexical * whois * HTML view * web page content * Host based * other # public dataset ### [ISCX-URL-2016](https://www.unb.ca/cic/datasets/url-2016.html) ### kaggle 1. -https://www.kaggle.com/antonyj453/urldataset 2. -https://www.kaggle.com/aktank/url-detection 3. -https://www.kaggle.com/deepak730/finding-malicious-url-through-url-features ### Phising URLS 1. Phishtank - https://www.phishtank.com/developer_info.php 2. Open Phis - https://openphish.com/ ### SPAM URLS 1. JWSPAMSPY - http://www.joewein.de/sw/blacklist.htm ### Malware URLS 1. DNS-BH - http://www.malwaredomains.com/wordpress/?page_id=66 2. https://www.malwarepatrol.net/my-account/ 3. http://www.malwaredomainlist.com/ ### Benign URLS 1. Majestic - https://majestic.com/reports/majestic-million ### Another Source 1. https://zeltser.com/malicious-ip-blocklists/

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.