There is no commentSelect some text and then click Comment, or simply add a comment to this page from below to start a discussion.
Privacy-preserving & secure web-search
Imagine being able to do the following: You want to perform a search in your browser. But you don't want the browser itself to know anything about your query (essentially, what you're looking for). And you still want the browser to be able to return you an encrypted list of pages that only you can decrypt. To later visit without anyone knowing you even searched for them.
This is the core idea and motivation behind this post.
Image Not ShowingPossible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
This idea was given to me by Matan Prasma when discussing about FHE usecases. He told me that he explained that to his alumni sometimes. And I thought the proposal was really interesting. Hence, all the contents of this doc are simply an elaboration and extension of the main idea that Matan gave me/told me about.
Basic Web-Rank Algorithm: PageRank
PageRank, developed by Larry Page and Sergey Brin, ranks web pages by considering the link structure of the web. The idea is that a page is more important if it is linked to by other important pages. The rank (or score) of each web page depends on the ranks of other pages linking to it.
Steps in PageRank:
Initialization: Each page is assigned an initial rank, often equally distributed if there are pages, so each page starts with rank.
Iteration: For each page, the rank is updated based on the ranks of the pages linking to it. If page links to page , then page passes a portion of its rank to .
Damping Factor: To simulate random web surfing, a damping factor (usually around 0.85) is introduced. The idea is that a user has an 85% chance of following links and a 15% chance of jumping to any random page.
Convergence: This iterative process continues until the rank values stabilize across pages.
Mathematically, the rank of a page is given by: where:
is the damping factor,
represents pages that link to page ,
is the number of outgoing links on page ,
is the rank of page .
The PageRank values can be computed iteratively until the difference in rank values between iterations is minimal (convergence).
2. Arithmetizing PageRank
To represent this algorithm algebraically and in a form that a computer can efficiently solve, we can use linear algebra. The PageRank computation can be formulated as a matrix-vector multiplication problem.
Algebraic Formulation:
Matrix Representation: The web graph is represented as an adjacency matrix, where if page links to page , and otherwise.
Transition Matrix: Define a transition matrix incorporating the damping factor: where:
is a vector of ones (for uniform random surfing),
is the damping factor,
is the total number of pages.
Power Iteration: The PageRank vector is then an eigenvector of , satisfying: This equation can be solved using power iteration: starting with an initial guess for (e.g., all equal values), and iterating until convergence.
3. Diagrams for Intuition
Basic PageRank Flow: This diagram shows a simple directed graph where nodes represent web pages, and directed edges represent hyperlinks. This illustrates how PageRank propagates through links in the network.
Image Not ShowingPossible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Transition Matrix Representation: This matrix visualizes the link structure, where each entry represents the probability of moving from one page to another (based on outlinks). The color intensity helps indicate higher values for better intuition.
Image Not ShowingPossible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Power Iteration Process: This line chart shows the iterative updating of PageRank values for each page over successive iterations. It demonstrates how ranks stabilize, reaching a steady state, representing the algorithm’s convergence.
Image Not ShowingPossible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
These diagrams provide a foundational visualization of PageRank's mechanics and convergence process.
Privacy-preserving search and secure browsing.
The goal is to allow a browser to conduct a search without ever learning the search query itself, and then to retrieve a set of search results that remain encrypted such that only you, the user, can decrypt them. Here’s a list of the requirements and intuitions behind that.
Private Information Retrieval (PIR)
Private Information Retrieval (PIR) allows a user to query a database (or search index) without revealing to the server which item was requested. This can be combined with other cryptographic methods to return results to the user without revealing the query content to the server.
The browser can leverage PIR to submit the query to the search engine, allowing the server to compute search results based on the encrypted query. The browser itself, however, would not be able to interpret the content of the query, thanks to the encryption.
Encrypted Search Query Using Fully Homomorphic Encryption (FHE)
Query Submission: Using FHE, you could encrypt the search query before it’s sent to the server. FHE allows the search engine to perform computations on the encrypted query without decrypting it.
Notice that this is where we could execute the search against an encrypted KV storage, result of running the aformentioned PageRank algorithm or similar algorithms.
Processing on Encrypted Data: The search engine can compute matches for the query terms, rank the results, and package an encrypted response list. This allows the server to operate on your query and return relevant search results without ever seeing the unencrypted content.
Onion Routing for Visiting Pages Privately
To visit these pages without the browser or third parties knowing, you can use Onion Routing (e.g., the Tor network). Onion Routing hides the fact that you’re visiting specific URLs by passing your requests through multiple layers of encrypted, distributed relays.
This ensures that your access to the pages you searched for remains anonymous even if the query and result decryption were already secured.