Dutch Connection - PMPO Meetings

--- tags: dutch-connection --- # Dutch Connection - PMPO Meetings # :::info __Executive Summary__ ::: ## Projects Idea ## - newspaper: front pages only work on contemporary, function changes; - add - papers: - topic linkage - jump entropy - project: Historical 1950-2000 - newspapers - Possibly also #parliamentary data **Research Question** - Does the history of the flow of news different between newspapers of differing ideological backgrounds? - What is the impact of events on this flow? - Can we characterize events? - Can we predict events using the flow of news? **Hypotheses** - As part of capitalist system, news has to sell, newness sells, news stick less. People more easily forget - Progressive periods might be more future-oriented, while conservative periods might be inward-looking/past-oriented - Restructuring of topic linkages signal larger ideological shifts ### Data - Multiple Dutch newspapers - time-frame 1919-1990? - How to deal with different period - Changing layout front pages - focus on front-pages, first few pages, or snippets from entire newspapers, multiple samples to show robustness - how to deal with missing dates, varying ocr quality? - length differences on front pages? ### Methods #### Entropy (as in newsfluxus) - seasonality #### Jump Entropy Calculate entropy between window of dates and simarily-sized window in the past or future, as defined by positive or negative jump size. This allows us to compare how similar a period is to one in the past or present. The entropy function in newsfluxus has a window that extend into the future or past from a given point, this might even out certain flows of the news. By comparing something with something a week, 2 weeks, or a month before. We also capture this flow of the news. ***N.B.** How to deal with the relation between trends in flow with the larger ltrend in entropy?* ```python jump_ranges = chain(range(-1000, -5, 1), range(0, 1000, 5)) jumps=dict() for j in jump_ranges: print(j) N_hat = np.zeros(m) N_sd = np.zeros(m) for i, x in enumerate(theta): submat = theta[(i - window):(i+window),] submat_jump = theta[((i - window) + j):((i+window)+j),] tmp = np.zeros(submat.shape[0]) if submat.any(): for ii, (xx, xx_j) in enumerate(zip(submat, submat_jump)): tmp[ii] = meas(xx, xx_j) else: tmp = np.zeros([window]) + weight N_hat[i] = np.mean(tmp) N_sd[i] = np.std(tmp) jumps[j] = N_hat df_kld = pd.DataFrame.from_dict(jumps, orient='columns') df_kld['dates'] = time ``` ![](https://i.imgur.com/i4txAiH.png) ![](https://i.imgur.com/x1Mi0sP.png) How to measure the characteristic of these event flows? *Approach A* Take $N$ points with lowest jump entropies and calculate the mean and variance of the entropy and the jump sizes. This captures the width of the dip. Position events based on their mean and variance of the lowest jump entropies. Possible interpretation: - Low mean and low variance in jump_entropy means an unbalanced impactful event.The entropy stays up goes down and stay down, or vice versa. - Low mean and high variance indicates a sudden and quick event. The KL goes quickly goes down and then goes up, explaining the higher variance - Higher mean and lower variance, means an event that is part of larger narrative. The KL goes down stays low for quite some time. - Higher mean and high variance means a random day without a clear disruption of the historical flow. ![](https://i.imgur.com/4EM04Cw.png) ![](https://i.imgur.com/QXHAe1H.png) ![](https://i.imgur.com/YMg850v.png) *Approach B* Calculate slopes on both sides of dip, capturing _anticipation_ and _release_. ![](https://i.imgur.com/aJRZghZ.png) ![](https://i.imgur.com/P5HcbRy.png) *Approach C* measure distances between all v-shapes window around a date/event between newspapers #to do: kristoffer affiliations