Information Security Report

--- title: Information Security Report tags: Information Security, CSE480, NSYSU --- # Information Security :::info 分組名單: B083040008 黃宸洋 B083040012 陳柏翰 B084020005 蔡昀燁 ::: # 書面報告 ## 摘要大多數人對於資訊安全 (information security) 的普遍印象會是密碼學（Cryptography）的加密 (encryption) 及解密 (decryption)，經常使用的技巧有對稱式演算法 (symmetric algorithm) 及非對稱式演算法 (asymmetric algorithm) 等等。對稱式加密的代表是 AES (Advanced Encryption Standard)，而非對稱式的經典例子則是 RSA 公私鑰系統。但除了這些之外，資訊安全也包刮了一些其他較少被人提及的相關技術，如本文介紹的重點 ─ 藏密學(Steganography)。藏密學是一門關於資訊隱藏的技巧與科學，而所謂資訊隱藏指的是不讓除預期的接收者之外的任何人知曉資訊的傳遞事件或者資訊的內容。一般來說，隱寫的資訊看起來像一些其他的東西，例如1張超市的購物清單，1篇簡單的文章，1篇圖畫或者其他「偽裝」（cover）的訊息。藏密學的原理是將資料根據某種規律拆散成很多個片段，再用大量的無效資料來隱藏被拆成片段的有效資料，通常是用一些傳統的方法進行加密，然後用某種方法修改一些「偽裝內容」（covertext），使其包含被加密過的訊息，形成所謂的「隱秘內容」（stegotext）。只有知道無效資料和有效資料配置規則的人，才有辦法把篩選出有效資料，然後對其進行解密。傳統的藏密學有幾種作法，例如藏頭或藏尾詩，或是使用常溫下不可視的墨水來書寫，直至加熱後才能看見其內容，或是某些由孔洞組成的信件內容必須放在光源下才可視等等。而現代的電子藏密學則有電子浮水印及以藏密學為基礎設計的電腦字體。藏密學與一般加密的演算法最大不同在於對於加密資料的側重點，隱寫偏向於將秘密檔案隱藏，換句話說一旦藏密被識破，則其中的秘密檔案會十分容易被竊取，而一般的加解密則是傾向不致力於提升檔案的隱蔽性，而是加強防禦演算法的攻擊。此外由於藏密學需要大量的雜訊資料（noise）來隱藏，這會導致密文實際上所能攜帶的正確資料其實非常稀少，導致我們必須使用大量的流量才能完整傳送我們所真的想發送的訊息。本文中依序討論了不同的藏密方法，並在最後討論以卷積神經網路加上橢圓曲線密碼學（ECC）來達到更好的資料保護效果。 ## 藏密學的應用現今藏密學主要應用的情境為將一些額外的資訊放進原始資料中，例如版權宣告即是將著作權相關資訊隱藏在多媒體中，若資料遭到非法利用，便可以辨識其中的著作權資訊；且這樣的資訊是無法在不破壞原始圖片的情況下被移除的，為創作方提供了強大的版權保護。其他常見的應用包括在醫院圖像資料庫中，為了防止病人資料外洩，每一張圖像對應到的病人資料會利用藏密學隱藏其中，同時提供了個資保護和連結多媒體和文字資料的功能；藏密學也時常應用在民生情境中，例如在商品目錄中放入人眼看不見的QR code、條碼等資訊，客人利用手機鏡頭便可以獲得商品的額外資訊。 ## 傳統的藏密學方法 ### 用圖片格式隱藏在進入藏密學的領域之前，我們先討論兩種容易實現的做法。第一種是利用圖片格式的結尾辨識符（EOF），將隱藏資料直接附加在圖片正常結尾的後面；第二種是將隱藏資料藏在照片的可交換圖檔格式（EXIF）中，這是一種專門為數位相機所設計的格式，可以將一些照片的附加資訊如：相機資訊、拍攝參數等附加在照片內。這兩種方式就載體圖片的呈現來說是沒有任何影響的，是相當好的做法，但是從資料大小與容易發現的角度來說是相當差勁。對於資料大小的部分，假如隱藏資料太大，則使用儲存空間的分析就可以容易的發現這兩種隱藏的手法。對於容易發現的部分，攻擊者可以容易的使用文字編輯器或是開啟照片的軟體檢查這些地方，所以這兩種方法對於藏密來說是不適合的。 ### 空間域方法在圖像藏密學的範疇裡，利用圖像裡像素的8-bit二進位數值，將隱藏資訊放入空間域是一種常見的方法，藉由替換掉原始資料中特定範圍的位元，來放入隱藏資料的位元，這樣的操作方法可以理解為一種失真的壓縮方法(利用壓縮原始資料來釋放空間給隱藏資料)；不過藏密學的本質在於如何避免被發現有資料隱藏其中，所以如何讓空間域方法這樣的失真壓縮做到近乎「無失真」，便是空間域乃至所有傳統藏密學方法的主要研究方向和瓶頸所在。下圖為空間域方法的實作範例，如要將紅圖的資訊放入黃圖中隱藏，常見的做法為將黃圖的低位元替換為紅圖的高位元數值，這麼一來若想要提取紅圖資訊，只要將黃圖的低位元提取並放入另外一張圖片中，便可以還原紅圖，值得注意的地方是在這樣的過程中，雖然紅圖和黃圖都難以避免的會出現位元數遺失的現象，不過是否會被人眼察覺，取決於被替換的位元所代表的重要性，換言之，若是替換掉黃圖的高位元數值或是該位元存在的像素是位於邊緣等人眼較為敏感的區域，空間域方法所生成的隱藏圖片即有可能被察覺。 ![](https://i.imgur.com/4ouFixp.png =500x) ### 頻率域方法承襲至空間域的替換作法，先利用不同的變換方法如：離散傅立葉轉換（DFT）、離散餘弦轉換（DCT）、離散小波轉換（DWT）等將原始資料轉換至頻率域，再利用替換的方式將隱密資料藏進轉換結果。透過這種方式所得到的結果相較於直接在空間域操作具有較高的不可見性，因為對於頻域的資料來說，有些是高頻的資料，有些是低頻，而取代高頻部分的資訊對於原始資料的大體不會有嚴重的顯示影響，僅會是一些細節的部分被犧牲，所以頻率域的做法相對於空間域做法來的好。這裡舉一個例子：JSteg，這個方法是基於JPEG compression的架構。原始的JPEG compression有以下步驟。將目標分割成多個八乘八小方格，對於每個方格進行DCT運算，透過Quantization table對DCT結果進行圖片內容壓縮、處理，最後利用huffman coding再進行儲存格式的壓縮。透過這種方式，我們可以得到大約相似於原始資料的圖片，但是儲存空間使用更少。JSteg基於這樣的架構，主要透過不同於原始的Quantization table與對於DCT轉換後結果的替換，將資料藏在原始資料中。 ![](https://i.imgur.com/UIBkyss.png) 這類方法對於以人類視覺辨識是否藏有資訊來說是很成功的，目測難以發現為有藏密的圖片。然而這種方法對於一些統計檢測的手段來說是容易發現的，像上述所提到的JSteg，就可以利用簡單的卡方檢定找出藏有資料，所以也不是完美的方法。 ## 以卷積神經網路由基礎的藏密學方法在前述的內容中，提到了傳統藏密學方法的瓶頸，即為在保證隱藏資料不被察覺的情況下，隱藏內容的多寡會受到原始資料大小的限制，例如空間域方法受到像素數量和8-bit位元數的限制；而另外一個瓶頸即是更為重要的隱蔽性問題，在一般情況下雖然經過藏密的圖像是沒辦法被人眼察覺的，但若是利用數值的分析方法，例如比較藏密圖像和載體原圖的像素值分布，就可以很容易的發現有隱密資料隱藏其中。近年來隨著計算機視覺和深度學習的進步和崛起，越來越多文獻開始利用卷積神經網路進行圖像的藏密學研究，並指出其相比於傳統的藏密學方法，可以在相同的載體圖像中增加隱藏的內容，且可以達到更隱密、不被察覺的效果；卷積神經網路之所以可以增加隱藏的內容，是因為卷積神經網路可以將傳統像素特徵的資料，轉變為多維的空間特徵，並在這些特徵中放入隱藏資料，再將空間特徵轉變回像素特徵完成藏密，如此一來藏密空間變不會被局限於固定的像素和位元數量，進而影響資料的隱蔽性。在*StegNet: Mega Image Steganography Capacity with Deep Convolutional Network*這篇論文中，作者採用了編碼-解碼(encode-decode)的架構，同時訓練兩個相同的神經網路；進行編碼的神經網路將載體和隱藏資料當作輸入，輸出藏密過後的圖像，而解碼則是輸入藏密圖像，輸出其中的隱藏資料。在前述內容提到藏密學可以理解為一種追求最「不失真」的失真壓縮技術，因此神經網路實際上便是執行了一連串的壓縮技術，同時追求資料的完整性；這樣的理念可以反映在訓練神經網路時的回饋標準上，在論文的架構中，藏密過後的圖像需要保留原始載體的完整性，而解碼過後的圖像需要盡量的還原原始的隱密資料，因此作者綜合載體和隱藏資料對於原始圖像的差值，作為整個神經網路的回饋依據。下圖為此論文中提出的簡易架構圖 ![](https://i.imgur.com/ptldh9G.png =600x) ## 卷積網路式藏密與密碼學的混合應用除了上述的論文以外，在*A New High Capacity Image Steganography Method Combined With Image Elliptic Curve Cryptography and Deep Neural Network*這篇論文之中也有提到類似的觀念，分別以hiding network及revealed network來進行encode跟decode的動作，並同時進行事前訓練。比起其他的架構，這篇論文與眾不同的地方在於它使用ECC（Elliptic Curve Cryptography）來對我們想要隱藏的圖片（secret image）先進行了1次加密。這樣做共有2個好處：（1）防範人為地偷看secret image，這點通過HVS（Human Vision Stress）測試來檢驗。（2）若不幸hiding network被破解了，ECC加密可以有效防止內容外露。在這個架構底下，整個流程可以被區分為3個階段：前處理階段（preprocessing stage）, 藏密階段（steganographic phase）以及提取階段（extraction phase）。前處理階段主要是進行圖片的標準化及正規化，以方便進行後續的操作。在藏密階段我們會藉由hiding network來把經過ECC加密的encrypted image要用來乘載藏密數據的host image結合用以產稱cotainer image。提取階段則會使用revealed network來還原container image，並使用發送者的public key來進行ECC的解密，得到原圖。下圖為整體框架的示意圖： ![](https://i.imgur.com/ArNHdcd.png) 在實驗結果上，我們可以發現藏密的效果會隨著訓練的epoch而隨之增長，如下圖所示，由上而下分別代表了host images，container images， secret images及revealed images，在一開始（左1）時效果非常糟糕，基本看不出原本的host image與secret image，但在經過50個（左2）epoch的訓練之後就可以看到藏密的效果非常顯著，基本上已經看不出host/container、secret/revealed image之間的差別，這代表了使用確實CNN訓練確實能夠有效提升藏密的容量及效果。 ![](https://i.imgur.com/1tzzOpU.png) 在進行藏密時，我們也必須考慮到藏密前與後的失真率，如下圖所示，我們使用RGB的頻譜分析，可以看到cover image（左）及secret image（右）基本上沒有什麼太大的差別，可見失真幅度非常微弱。 ![](https://i.imgur.com/vh8E0qV.png) 除了彩色圖片以外，我們也試著對黑白圖片進行了藏密的操作，結果如下圖所示，可以看出即使應用在黑白的照片上，CNN搭配ECC的hiding network效果也相當不錯。 ![](https://i.imgur.com/VF2spxh.png) ## 結論藏密學做為一個數位資料保護的分支，有許多的應用，如版權控制、嵌入機密資訊、隱藏式的商品資訊等等，而做法也從一開始的直接藏密，到後來的空間域、頻率域的替換，近來是以卷積網路藏密為主。我們可以發現藏密學的方法發展是循序漸進，且與當下的技術發展有關的，而最後討論的卷積網路方法是透過神經網路的特性，將隱密資料藏在更高的維度，又同時增加了原始載體的負載量，可以說是一舉多得。在文中提到的論文結果顯示藏密學與密碼學的同時應用可以使隱密內容同時具有不可見性與強抗破解性，使資料得到更完善的保護。 # 簡報 ## Categories of the Digital Data Security ![](https://i.imgur.com/LlwsKoJ.png =500x) ## Application * copyright control * enhancing robustness of image search engines * video-audio synchronization * analyze network traffic by embedded ID in TCP/IP packet * keep patient's information confidential by embedding it in medical image * encode data into printed picture can be decoded by phone camera (food wrapper, magazine) ## Method ### Exploiting the Image Format * append after the EOF of image * hiding in image's extended file information ( EXIF ) * easy to implement, easy to discover ### In Image Spatial Domain (LSB Method) * modifies the secret data and cover medium in the **spatial domain** * ex: replacing cover image's LSB with the hiding image's MSBs * generate the *stego-image* * encoding at the level of the LSBs * may generate some *distortion* to the cover image when encoding the **higher-level LSBs** * another example for the LSB method ![](https://i.imgur.com/SOIcjLT.png =500x) * different hiding method for different data format * only two bit have to be modified in this grid ![](https://i.imgur.com/4ouFixp.png =500x) ### In Image Frequency Domain * Some Transformation * Discrete Fourier Transform ( DFT ) * Fast Fourier Transform ( FFT ) * Discrete Cosine Transform ( DCT ) * [JPEG compression](http://www.robertstocker.co.uk/jpeg/jpeg_new_1.htm) * Discrete Wavelet Transform ( DWT ) * hide the secret in the frequency domain -> high invisibility * Example : [JSteg](https://github.com/lukechampine/jsteg) * ![](https://i.imgur.com/UIBkyss.png) * detectable ( $\chi^2$-test ... ) ### CNN-Based Steganography - StegNet * based on [StegNet: Mega Image Steganography Capacity with Deep Convolutional Network](https://arxiv.org/abs/1806.06357) * unlike the method above, which the area for hiding the data will highly restricted by the payload capacity (and the rich-content area) * cnn makes *multi-level high-order* transformations possible for image steganography * higher-level kernel absorb the lower-level feature * what cnn do * learn to compress both cover and secret into *embedding of high-level feature* and convert them into image * for most important, in the less "lossy" way * encode & decode process * encode * transform low-level information (pixel) to the high-level featurewise information (understanding of figure) * decode * extract the cover and hidden feature, then rebuilt the hidden data * example * even after 5 to 10 times magnified, the residual image still similar to cover image ![](https://i.imgur.com/0mffbwK.png =500x) #### Architecture * [github source code](https://github.com/adamcavendish/Deep-Image-Steganography) ![](https://i.imgur.com/ptldh9G.png =600x) ![](https://i.imgur.com/1jxS22C.png) * pipeline processing: identical architecture for decoding and encoding, training at the same time * skip connection avoiding vanishing gradient * batch-normalization and exponential linear unit for quicker convergence * [separable convolution](https://yinguobing.com/separable-convolution/) (cross-channel and spatial correlation are decoupled) * $loss = \frac{1}{4}(L_{CE}+L_{HD}+Var(L_{CE})+Var(L_{HD}))$ :::info #### Outline Discussion & Reference ### Image * [Digital image steganography: Survey and analysis of current methods](https://drive.google.com/file/d/1akzk0lVkL9H1Vatg74tVrZnRKf4b4x4X/view?usp=sharing) * survey paper about the image steganography * we can focus on 1. Steganography applications 2. Steganography methods * [Image Steganography: A Review of the Recent Advances](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9335027) * new methods about the image steganography, including deep learning * [StegNet: Mega Image Steganography Capacity with Deep Convolutional Network](https://arxiv.org/abs/1806.06357) * [A New High Capacity Image Steganography Method Combined With Image Elliptic Curve Cryptography and Deep Neural Network](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8981989) ### Audio * [Comparative study of digital audio steganography techniques](https://asmp-eurasipjournals.springeropen.com/track/pdf/10.1186/1687-4722-2012-25.pdf) * survey paper about the audio steganography * we can focus on 1. Review of Audio Steganography Methods - simply introduce the methods about the audio steganography 2. Applications and trends - how steganography being applied in real world * [A Genetic-Algorithm-Based Approach for Audio Steganography](http://irep.iium.edu.my/1143/1/A_Genetic-Algorithm-Based_Approach_for_Audio_Steganography.pdf) * cool stuff, bonus ::: ### A New High Capacity Image Steganography Method Combined With Image Elliptic Curve Cryptography and Deep Neural Network * In the information hiding system, there are mainly three indicators: Security, Capacity and Imperceptibility. The relationship between the three parameters is reciprocally constrained. For example, as the capacity increases, security and imperceptibility will reduce and vice versa. * In this framework, the hiding network and the revealed network are trained simultaneously, and both are pre-trained. the proposed image steganography framework mainly includes three stages: preprocessing, steganographic phase and extraction phase. * Compared to Ron Rives Adi Shamir Leonard Adleman(RSA), the advantage of ECC can use a shorter key to achieve comparable or even higher security. In other words, the Elliptic Curve Cryptography key is short but strong. * Encrypted images are pseudo-random and look like noise. This property preserves the visual quality of the secret image and makes it unrecognizable by the HVS. Therefore, it can be transmitted securely through the common channel. * It is worth mentioning that the domain conversion of the DCT changes the structure of the secret image. This makes the algorithm more robust to steganographic analysis attacks. * module scheme: * ![](https://i.imgur.com/ArNHdcd.png) * differences: * ![](https://i.imgur.com/DVE0V3U.png) * experiment result: * ![](https://i.imgur.com/HePRIVj.png) * Image Encryption via ECC(P* represent the public Key,n* represent the private key.): * 1) Randomly add 1 or 2 to each pixel value of the image to be encrypted, and save the current channel number of the image. * 2) Group the pixels and convert each group into a single large integer value. The pixel values for grouping are obtained by grp = length [IntergerDigits [p, 258] −1]. * 3) The results obtained in 2) were paired and stored as Pm. * 4) Choose a random Key and calculate KeyG and KeyPb, where KeyPb is the receiver’s public key. * 5) Point addition of keyPb for each Pm value and store it as cipher text Pc. * 6) Convert the ciphertext list from 5) to a value between 0 and 255. * 7) Fill each list to 0 to the left in 6).The number of these lists is less than grp + 1 element to make each list equal in length. * 8) Flatten the list in 7), group them according to the number of image channels we have recorded, and then divide them into the width of the plain image. * 9) Convert the value in 8) to an encrypted image.