Nyu-iitk-draft

$$ \newcommand{\mper}{\,.} \newcommand{\mcom}{\,,} \newcommand{\vec}[1]{{\bf{#1}}} \newcommand{\bvec}[1]{{\bar{\vec{#1}}}} \newcommand{\pvec}[1]{\vec{#1}'} \newcommand{\ppvec}[1]{\vec{#1}''} \newcommand{\tvec}[1]{{\tilde{\vec{#1}}}} \newcommand{\paren}[1]{\left(#1 \right )} \newcommand{\Paren}[1]{\left(#1 \right )} \newcommand{\brac}[1]{[#1 ]} \newcommand{\Brac}[1]{\left[#1\right]} \newcommand{\Set}[1]{\left\{#1\right\}} \renewcommand{\set}[1]{\left\{#1\right\}} \newcommand{\abs}[1]{\left\lvert#1\right\rvert} \newcommand{\Abs}[1]{\left\lvert#1\right\rvert} \newcommand{\ceil}[1]{\lceil #1 \rceil} \newcommand{\floor}[1]{\lfloor #1 \rfloor} \newcommand{\norm}[1]{\left\lVert#1\right\rVert} \newcommand{\Norm}[1]{\left\lVert#1\right\rVert} \newcommand{\fnorm}[1]{\norm{#1}_\mathrm{F}} \newcommand{\opnorm}[1]{\norm{#1}_\mathrm{op}} \newcommand{\spnorm}[1]{\norm{#1}_\mathrm{2}} \newcommand{\nuclear}[1]{\norm{#1}_{*}} \newcommand{\nnz}[1]{\mathrm{nnz}(#1)} \newcommand{\defeq}{:=} \newcommand{\seteq}{\mathrel{\mathop:}=} \newcommand{\iseq}{\stackrel{\textup{?}}{=}} \newcommand{\isgeq}{\stackrel{\textup{?}}{\geq}} \newcommand{\isleq}{\stackrel{\textup{?}}{\leq}} \newcommand{\vbig}{\vphantom{\bigoplus}} \newcommand{\inprod}[1]{\left\langle #1\right\rangle} \newcommand{\Tr}[1]{\mathrm {Tr}\paren{#1}} \newcommand{\tr}{\mathrm {Tr}} \newcommand{\snorm}[1]{\norm{#1}^2} \newcommand{\dist}[1]{{\sf dist }\paren{#1}} \newcommand{\normt}[1]{\norm{#1}_{\scriptstyle 2}} \newcommand{\snormt}[1]{\norm{#1}^2_2} \newcommand{\projsymb}{{\sf Proj}} \newcommand{\proj}[2]{\projsymb_{#1}\paren{#2}} \newcommand{\projperp}[2]{\projsymb_{#1}^{\perp}\paren{#2}} \newcommand{\projpar}[2]{\projsymb_{#1}^{\parallel}\paren{#2}} \newcommand{\normo}[1]{\norm{#1}_{\scriptstyle 1}} \newcommand{\normi}[1]{\norm{#1}_{\scriptstyle \infty}} \newcommand{\normb}[1]{\norm{#1}_{\scriptstyle \square}} \newcommand{\Z}{{\mathbb Z}} \newcommand{\N}{{\mathbb Z}_{\geq 0}} \newcommand{\R}{\mathbb R} \newcommand{\Q}{\mathbb Q} \newcommand{\Rnn}{\R_+} \newcommand{\subjectto}{\text{subject to}} \newcommand{\suchthat}{~:~} \newcommand{\Esymb}{\mathbb{E}} \newcommand{\Psymb}{\mathbb{P}} \newcommand{\Vsymb}{\mathbb{V}} \newcommand{\Isymb}{\mathbb{I}} \DeclareMathOperator*{\E}{\Esymb} \DeclareMathOperator*{\Var}{{ Var}} \DeclareMathOperator*{\ProbOp}{\Psymb} \newcommand{\var}[1]{\Var \paren{#1}} \newcommand{\given}{\mathrel{}\middle|\mathrel{}} \newcommand{\Given}{\given} \newcommand{\prob}[1]{\ProbOp\Set{#1}} \newcommand{\ber}{\mathsf{Ber}} \newcommand{\Prob}{\mathbb{P}} \newcommand{\probSub}[2]{\mathbb{P}_{#2}\Set{#1}} \newcommand{\indicator}[1]{\mathds{1}{\set{#1}}} \newcommand{\1}[1]{\mathds{1}_{\set{#1}}} \newcommand{\ex}[1]{\E\brac{#1}} \newcommand{\Ex}[1]{\E\Brac{#1}} \newcommand{\exSub}[2]{\E_{#2}\brac{#1}} \newcommand{\Pr}[1]{\ProbOp\Brac{#1}} \newcommand{\pr}[2]{\ProbOp_{#1}\Set{#2}} \newcommand{\ind}[2]{\Isymb_{#1}\brac{#2}} \newcommand{\Ind}[1]{\Isymb\Brac{#1}} \newcommand{\varex}[1]{\E\paren{#1}} \newcommand{\varEx}[1]{\E\Paren{#1}} \newcommand{\eset}{\emptyset} \newcommand{\e}{\epsilon} \newcommand{\ebf}{\mathbf{e}} \newcommand{\Abar}{\bar{A}} \newcommand{\Dbar}{\bar{D}} \newcommand{\zero}{\mathbf{0}} \newcommand{\bern}[1]{\ensuremath{\operatorname{Bern}\paren{ #1 } }} \newcommand{\unif}{\ensuremath{\operatorname{Unif}}} \newcommand{\super}[2]{#1^{\paren{#2}}} \newcommand{\bits}{\{0,1\}} \newcommand{\e}{\varepsilon} \newcommand{\cA}{\mathcal A} \newcommand{\cB}{\mathcal B} \newcommand{\cC}{\mathcal C} \newcommand{\cD}{\mathcal D} \newcommand{\cE}{\mathcal E} \newcommand{\cF}{\mathcal F} \newcommand{\cG}{\mathcal G} \newcommand{\cH}{\mathcal H} \newcommand{\cI}{\mathcal I} \newcommand{\cJ}{\mathcal J} \newcommand{\cK}{\mathcal K} \newcommand{\cL}{\mathcal L} \newcommand{\cM}{\mathcal M} \newcommand{\cN}{\mathcal N} \newcommand{\cO}{\mathcal O} \newcommand{\cP}{\mathcal P} \newcommand{\cQ}{\mathcal Q} \newcommand{\cR}{\mathcal R} \newcommand{\cS}{\mathcal S} \newcommand{\cT}{\mathcal T} \newcommand{\cU}{\mathcal U} \newcommand{\cV}{\mathcal V} \newcommand{\cW}{\mathcal W} \newcommand{\cX}{\mathcal X} \newcommand{\cY}{\mathcal Y} \newcommand{\cZ}{\mathcal Z} \newcommand{\hatN}{\hat{N}} \newcommand{\bbB}{\mathbb B} \newcommand{\bbS}{\mathbb S} \newcommand{\bbR}{\mathbb R} \newcommand{\bbZ}{\mathbb Z} \newcommand{\bbI}{\mathbb I} \newcommand{\bbQ}{\mathbb Q} \newcommand{\bbP}{\mathbb P} \newcommand{\bbE}{\mathbb E} \newcommand{\bbC}{\mathbb C} \newcommand{\sfE}{\mathsf E} \newcommand{\Erdos}{Erd\H{o}s\xspace} \newcommand{\Renyi}{R\'enyi\xspace} \newcommand{\Lovasz}{Lov\'asz\xspace} \newcommand{\cdeg}{\mathrm{cdeg}} \newcommand{\bigO}{O} \newcommand{\bigo}[1]{\bigO\!\left(#1\right)} \newcommand{\tbigO}{\tilde{\mathcal{O}}} \newcommand{\tbigo}[1]{\tbigO\!\left(#1\right)} \newcommand{\tensor}{\otimes} \newcommand{\eigvec}{{\sf v}} \newcommand{\poly}{{\sf poly}} \newcommand{\polylog}{{\sf polylog}} \newcommand{\supp}{{\sf supp}} \newcommand{\rank}{{\sf rank}} \newcommand{\tk}{t_{1/k}} % gausssian cap \newcommand{\rd}{{\sf d}} \newcommand{\tildeN}{\widetilde{N}} \newcommand{\tildeA}{\widetilde{A}} \newcommand{\tX}{\widetilde{X}} \newcommand{\tA}{\widetilde{A}} \newcommand{\tB}{\widetilde{B}} \newcommand{\calI}{\mathcal{I}} \newcommand{\calS}{\mathcal{S}} \newcommand{\barA}{\bar{A}} \newcommand{\barB}{\bar{B}} \newcommand{\Bbd}{B_{\text{bd}}} \newcommand{\was}{Wasserstein} $$ ## Introduction Networks or graphs are typically represented by a matrix $A$ called the adjacency matrix where $A_{ij} = 1/0$ depending on whether the node $i$ is connected to node $j$. The eigenvalues of $A$ provide valuable structural information about the graphs. For example, in a social-media (e.g., Facebook graph) graph, one can reason about typical community structure based on the eigenvalues, check if there are many troll accounts in the network, etc. While computing eigenvalues of a matrix is a well-studied problem, most methods require time that scales with $\bigo{n^3}$, where $n$ is the number of nodes in the graph. For example, the Facebook graph contains upwards of $1.71$ billion nodes [source](https://engineering.fb.com/2016/10/19/core-infra/a-comparison-of-state-of-the-art-graph-processing-systems/#:~:text=The%20Facebook%20social%20graph%20alone,with%20over%201%20trillion%20edges.). For a computer to compute the eigenvalues of the Facebook graph, it could take months? In general, a graph on $n$ nodes can have about $n^2/2$ number of edges. We consider the problem of reasonably approximating (with approximation parameter $\e$) the eigenvalues of $A$, called spectral density estimation, without even looking at all the nodes or all the edges of the graph. Previously, the best-known algorithms took $2^{\mathcal{O}(1/\epsilon)}$ time [CKSV18], or $\mathcal{O}(n/\epsilon^7)$ time [BKM22]. Our work [JMSS23] established a matching lower bound of $2^{\mathcal{O}(1/\epsilon)}$. This means that one cannot hope to improve the running time of $2^{\mathcal{O}(1/\epsilon)}$ by a wide variety of algorithms. Another recent work of ours [JKMSS23] shows that by discarding "non-informative" edges, one can estimate the eigenvalues of $A$ in time $\mathcal{O}(n/\epsilon^3)$, improving upon [BKM22]. For $\e = 0.01$ ($1\%$ error) on huge graphs, this could mean improving the algorithm from taking $3$ years to outputting the same information in $10$ seconds on a typical laptop! ## Research Objectives We aim to design efficient algorithms to compute the spectral density of a network and understand the fundamental limitations of algorithms. - **Faster Algorithms**: Can one improve upon the work of [JKMSS23] to design an even faster algorithm or show a fundamental bottleneck in spectral density estimation? We believe that $\bigo{n/\e^2}$ running time could be achievable. - **Powerful Algorithms**: [JMSS23] showed a broad class of algorithms that cannot estimate the spectral density of the graph without visiting all the nodes of the graph in time better than $2^{\mathcal{O}(1/\epsilon)}$, we believe that even a much more powerful class of algorithms (which we call adaptive walks) cannot do better than $2^{\mathcal{O}(1/\epsilon)}$ time. - **Succinct Representation**: Sometimes a graph can have billions or even trillions of nodes, so storing the graph can be unwieldy. Motivated by the work of [PV18], we hypothesize that a huge graph can be succinctly represented by a much smaller graph while preserving its spectral density. ## Methodology To answer these questions, we plan to design algorithms borrowing ideas from the statistical community (lies?) that provably run faster than time $\bigo{n/\e^3}$. This would involve borrowing ideas from [JKMSS23] and using advanced statistical techniques to improve their sub-routine, from which we can potentially obtain $\bigo{n/\e^2}$ time algorithm. To show that a powerful class of algorithms cannot work, we plan to show that a highly structured graph can "fool" the algorithm into thinking that the graph is a random-graph. This would involve investigating the properties of random graphs and designing structured graphs that look "random-like" to the powerful class of algorithms. Finally, the work of [PV18] demonstrated that by cleverly combining multiple small random graphs, one can approximate specific properties of a huge graph. We plan to extend their idea of combining multiple random graphs to construct a small graph whose spectral density is the same as the original (potentially huge) graph.