Try   HackMD

Privacy-preserving & secure web-search

Imagine being able to do the following:
You want to perform a search in your browser. But you don't want the browser itself to know anything about your query (essentially, what you're looking for). And you still want the browser to be able to return you an encrypted list of pages that only you can decrypt. To later visit without anyone knowing you even searched for them.

This is the core idea and motivation behind this post.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

This idea was given to me by Matan Prasma when discussing about FHE usecases. He told me that he explained that to his alumni sometimes. And I thought the proposal was really interesting. Hence, all the contents of this doc are simply an elaboration and extension of the main idea that Matan gave me/told me about.

Basic Web-Rank Algorithm: PageRank

PageRank, developed by Larry Page and Sergey Brin, ranks web pages by considering the link structure of the web. The idea is that a page is more important if it is linked to by other important pages. The rank (or score) of each web page depends on the ranks of other pages linking to it.

Steps in PageRank:

  1. Initialization: Each page is assigned an initial rank, often equally distributed if there are
    N
    pages, so each page starts with
    1N
    rank.
  2. Iteration: For each page, the rank is updated based on the ranks of the pages linking to it. If page
    i
    links to page
    j
    , then page
    i
    passes a portion of its rank to
    j
    .
  3. Damping Factor: To simulate random web surfing, a damping factor
    d
    (usually around 0.85) is introduced. The idea is that a user has an 85% chance of following links and a 15% chance of jumping to any random page.
  4. Convergence: This iterative process continues until the rank values stabilize across pages.

Mathematically, the rank

Ri of a page
i
is given by:
Ri=(1d)+djinlinks(i)Rjoutlinks(j)

where:

  • d
    is the damping factor,
  • inlinks(i)
    represents pages that link to page
    i
    ,
  • outlinks(j)
    is the number of outgoing links on page
    j
    ,
  • Rj
    is the rank of page
    j
    .

The PageRank values can be computed iteratively until the difference in rank values between iterations is minimal (convergence).


2. Arithmetizing PageRank

To represent this algorithm algebraically and in a form that a computer can efficiently solve, we can use linear algebra. The PageRank computation can be formulated as a matrix-vector multiplication problem.

Algebraic Formulation:

  1. Matrix Representation: The web graph is represented as an adjacency matrix

    A, where
    Aij=1outlinks(j)
    if page
    j
    links to page
    i
    , and
    0
    otherwise.

  2. Transition Matrix: Define a transition matrix

    M incorporating the damping factor:
    M=dA+(1d)1N11T

    where:

    • 1
      is a vector of ones (for uniform random surfing),
    • d
      is the damping factor,
    • N
      is the total number of pages.
  3. Power Iteration: The PageRank vector

    R is then an eigenvector of
    M
    , satisfying:
    R=MR

    This equation can be solved using power iteration:
    Rk+1=MRk

    starting with an initial guess for
    R0
    (e.g., all equal values), and iterating until convergence.


3. Diagrams for Intuition

  • Basic PageRank Flow: This diagram shows a simple directed graph where nodes represent web pages, and directed edges represent hyperlinks. This illustrates how PageRank propagates through links in the network.

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

  • Transition Matrix Representation: This matrix visualizes the link structure, where each entry represents the probability of moving from one page to another (based on outlinks). The color intensity helps indicate higher values for better intuition.

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

  • Power Iteration Process: This line chart shows the iterative updating of PageRank values for each page over successive iterations. It demonstrates how ranks stabilize, reaching a steady state, representing the algorithm’s convergence.

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

These diagrams provide a foundational visualization of PageRank's mechanics and convergence process.

Privacy-preserving search and secure browsing.

The goal is to allow a browser to conduct a search without ever learning the search query itself, and then to retrieve a set of search results that remain encrypted such that only you, the user, can decrypt them. Here’s a list of the requirements and intuitions behind that.

Private Information Retrieval (PIR)

  • Private Information Retrieval (PIR) allows a user to query a database (or search index) without revealing to the server which item was requested. This can be combined with other cryptographic methods to return results to the user without revealing the query content to the server.
  • The browser can leverage PIR to submit the query to the search engine, allowing the server to compute search results based on the encrypted query. The browser itself, however, would not be able to interpret the content of the query, thanks to the encryption.

Encrypted Search Query Using Fully Homomorphic Encryption (FHE)

  • Query Submission: Using FHE, you could encrypt the search query before it’s sent to the server. FHE allows the search engine to perform computations on the encrypted query without decrypting it.

Notice that this is where we could execute the search against an encrypted KV storage, result of running the aformentioned PageRank algorithm or similar algorithms.

  • Processing on Encrypted Data: The search engine can compute matches for the query terms, rank the results, and package an encrypted response list. This allows the server to operate on your query and return relevant search results without ever seeing the unencrypted content.

Onion Routing for Visiting Pages Privately

  • To visit these pages without the browser or third parties knowing, you can use Onion Routing (e.g., the Tor network). Onion Routing hides the fact that you’re visiting specific URLs by passing your requests through multiple layers of encrypted, distributed relays.
  • This ensures that your access to the pages you searched for remains anonymous even if the query and result decryption were already secured.