# Mutual information testing 13 February 2023 [Return to the main PPE albedo symmetry page](https://hackmd.io/@aidenrobert/rJ-CFJ5Eo) ## Mutual information To summarize mutual information (MI) very quickly, this quantity – coming from information theory – is a measure of how much uncertainty in the knowledge of $Y$ is reduced by having knowledge of $X$ ([Shannon, 1948](https://ieeexplore.ieee.org/document/6773024); [Cover and Thomas, 1991](https://www.researchgate.net/profile/Imre-Csiszar/publication/220220640_Information_Theory_and_Statistics_A_Tutorial/links/5411827b0cf29e4a23296a21/Information-Theory-and-Statistics-A-Tutorial.pdf)). It is given as: $$ MI(X;Y) = \sum_x \sum_y P_{XY}(x,y)\mathrm{log}\frac{P_{XY}(x,y)}{P_X(x)P_Y(y)}, $$ where $P_{XY}$ is the joint probability density function of $X$ and $Y$, and $P_X$, $P_Y$ are the marginal probability density functions of $X$ and $Y$, respectively. It is essentially a ratio of the inherent joint probabilities of the data to the probabilities when assuming complete independence between the distributions. When MI is 0, having knowledge of a given variable $X$ gives us no more information for predicting $Y$. However, more important to us is the MI relative to MI that would result from random association between the variables. To do this, we create surrogate data sets and make distributions of MI that result from the false data, and normalize the MI according to this distribution of spurious MI. Normalized MI (nMI) is given as: $$ nMI = \frac{|MI - \bar D|}{\sigma(D)}, $$ where $\bar D$ and $\sigma(D)$ are the mean and standard deviation, respectively, of a distribution of MI calculated on surrogates of $X$ and $Y$. Since we are not working with time series data, these surrogates are simply permutations of the target data created by randomly shuffling them; this was done 10,000 times for each distribution. The equation can be read as the difference between the MI with the real target data and the MI that would result from random covariance. The level of confidence in the significance of nMI may be adjusted by changing the sigma level in the denominator; what I did was use the percentile thresholds to define bounds related to confidence level, giving us the ability to choose e.g. a 95% confidence criterion, since this is more easily interpreted than e.g. "2σ". ## Testing the importance of hemispheric asymmetries in cloud properties for the albedo symmetry One way to explore which cloud properties are most important to albedo symmetry using the PPE is to test how knowing a hemispheric asymmetry in a cloud property in the PPE can help us predict the asymmetry. Below are the nMI calculated for 95% confidence for hemispheric differences in the following cloud properties: cloud fraction ($f$), in-cloud and grid cell average liquid water path ($\mathrm{LWP}$ and $\mathrm{LWP_{gc}}$, respectively), in-cloud and grid cell average ice water path ($\mathrm{IWP}$ and $\mathrm{IWP_{gc}}$, respectively), in-cloud fraction of ice to liquid ($f_I = \mathrm{\frac{IWP}{IWP+LWP}}$), and vertically averaged cloud droplet concentration ($N_L$). ![](https://i.imgur.com/zANtNKI.png) This tells us that hemispheric asymmetries in LWP (both monthly/grid cell average and in-cloud) have the largest importance to the cloud/albedo asymmetry across the PPE members, followed by cloud fraction, droplet number concentration, ice fraction, and grid cell average IWP. In-cloud IWP has negligible impact on albedo asymmetry. ### Notes on this for writing * Perhaps we can use this when presenting hemispheric asymmetries in clouds that are important to the albedo symmetry, by stating which are most significant. * The difference in cloud fraction is already well known, but it is neat to see the importance of in-cloud LWP and droplet number concentration. (Number concentration becomes relevant when considering that one of the most important parameters, at least according to the MI contained in the PPE, is sea salt emission; see the following section). Explanations for the greater LWP in the SH are still missing, and this gives greater importance to this aspect of cloud cover in the albedo symmetry. ## Testing the importance of individual parameters in cloud properties for the albedo symmetry Now we do the same exercise with parameter inputs to the PPE as our $X$. The nMI of the parameters (calculated for 95% significance) are shown below (horizontal bars denoting the bounds of the 95% and 50% confidence are included): ![](https://i.imgur.com/mYOV0Ep.png) So at 95% significance, only 7 parameters are important, two of which having to do with autoconversion, one that scales sea salt emissions, two of them having to do with convective precipitation efficiency, one that smooths large-scale precipitation frequency, and one that has to do with horizontal turbulence dissipation. The 50% confidence bar is an arbitrary choice to say that the parameters below the bar have a > 50% chance of being randomly associated with the asymmetry chance in the PPE. Parameters below this bar are very weak in the information they can give on albedo asymmetry across the ensemble. That the autoconversion rate dependence has a high predicting ability for the albedo asymmetry makes sense, as it would amplify any asymmetries in LWP present, which we see have the strongest control on albedo asymmetry. What is also a robust result is that sea salt emissions also have a pretty important role in the albedo symmetry in these models. We can also relate this to the significance of cloud droplet number concentration above; emulations may tell us that increasing sea salt emissions probably increases number concentration and thus brightens clouds in a hemispherically asymmetric way (since there are better conditions for sea salt aerosol emissions in the SH than NH midlatitudes, and this will be amplified). This is a neat result and has a potential to be explored more. That horizontal turbulence dissipation (`clubb_c14`) has a robust control on albedo asymmetry is also an interesting result. This was previously seen in the emulator results but the significance and robustness is first really evident here. ### For asymmetries in SW cloud radiative effects Doing the same as above, but for hemispheric differences in SW CRE, we get somewhat different results. There are 7 parameters that are very significant to the SW CRE asymmetry with 95% confidence, and a few more (29 compared to 25) that are related with > 50% confidence. ![](https://i.imgur.com/bxpRicv.png) Two parameters for autoconversion rate, the same rainfall frequency smoothing parameter as in albedo asymmetry, two parameters related to in-cloud vertical velocities, one parameter that controls the number of ice particles allowable during homogeneous freezing, and one parameter controlling the freezing activation of aerosols are robustly significant to the CRE asymmetries in the PPE. Sea salt emissions are not significantly related to the CRE asymmetries. Here the role of precipitation (`clubb_C2rt`, the two autoconversion parameters, `microp_aero_wsubi_min`, and `micro_mg_max_nicons`) and in-cloud vertical velocities (`clubb_C6rtb`, `clubb_C6thlb`) are evident. ### Notes on this for writing * That sea salt emissions have a large predicting ability for the albedo asymmetry across the PPE can be a good significant result that can be followed up with theory. For instance, the hemispheric asymmetries in storminess can be related to hemispheric asymmetries in sea salt emissions in hypothetically symmetric planets (idealized simulations?). We also know that the SH midlatitudes are very poor in ice-nucleating particles (INP), which helps to explain the greater supercooled liquid water content of their clouds compared to the NH midlatitudes; holding INP emissions fixed and introducing any more sea salt aerosols should also increase liquid droplet number concentrations, and thus albedo. ## Grouping parameters by which cloud property they impact the most Next, we turn our attention towards grouping each parameter by which cloud property they have the greatest control over. To do this, I calculated nMI between each parameter and the hemispheric asymmetry in the cloud property. The five cloud properties that I grouped to are LWP, $f$, $N_L$, $f_I$, and IWP (I omitted grid cell averages, since cloud fraction is baked into them); I binned each of the 25 important parameters (from the albedo asymmetry nMI above) into the cloud property group that it most significantly impacts (that which it has the highest nMI to). This results in the following groupings (where nMI calculated with 95% confidence for each of the cloud properties is also shown in the chart in order to understand how they may be impacting other properties); the 7 most robustly significant parameters to the albedo asymmetry are marked with an asterisk*: ![](https://i.imgur.com/A1Ptmau.png) Here we have our final groups of parameters according to the cloud properties that they impact the most according to hemispheric asymmetries. Vertical lines denote the nMI > 1 threshold for significance to the cloud property at 95% confidence. We can see that only one parameter significantly changes multiple cloud properties: `micro_mg_autocon_lwp_exp`, which impacts everything but the IWP.