HW1: Twitter API by Python tool

# HW1: Twitter API by Python tool ###### tags: `HW1` ## Introduce We use the wtwt dataset and get tweet_ids in this. Then, we put the tweet_ids to pytwitter which is a tweet api tool and get the tweet text. ## Code ```python import pandas as pd import json from pandas import json_normalize import numpy as np with open('wtwt_ids.json', newline='') as jsonfile: data = json.load(jsonfile) df = json_normalize(data) print(df) ``` tweet_id merger stance 0 971761970117357568 CI_ESRX support 1 950934259371520000 CI_ESRX unrelated 2 973718376496357377 CI_ESRX comment 3 996772902006599680 CI_ESRX support 4 971712098253320193 CI_ESRX support ... ... ... ... 51279 928683906731270144 FOXA_DIS comment 51280 950566340926099456 FOXA_DIS unrelated 51281 927233376427311104 FOXA_DIS unrelated 51282 952235091010506752 FOXA_DIS comment 51283 902139974732070912 FOXA_DIS unrelated [51284 rows x 3 columns] ```python print(df.merger.unique()) ``` ['CI_ESRX' 'CVS_AET' 'AET_HUM' 'FOXA_DIS' 'ANTM_CI'] ```python merger_id_dict = {} for i, merger in enumerate(df.merger.unique()): merger_id_dict[merger] = i print(merger_id_dict) ``` {'CI_ESRX': 0, 'CVS_AET': 1, 'AET_HUM': 2, 'FOXA_DIS': 3, 'ANTM_CI': 4} ```python take_some_samples = [] for merger in df.merger.unique(): take_some_samples.append(df[df['merger'] == merger].head(30)) new_df = pd.concat(take_some_samples).reset_index(drop=True) # drop the old index print(new_df) ``` tweet_id merger stance 0 971761970117357568 CI_ESRX support 1 950934259371520000 CI_ESRX unrelated 2 973718376496357377 CI_ESRX comment 3 996772902006599680 CI_ESRX support 4 971712098253320193 CI_ESRX support .. ... ... ... 145 565759518685290496 ANTM_CI unrelated 146 451009663832580096 ANTM_CI unrelated 147 587752130888540160 ANTM_CI unrelated 148 592615405748760576 ANTM_CI unrelated 149 490629012751142913 ANTM_CI unrelated [150 rows x 3 columns] ```python # ! pip install pytwitter # For python 3.8 and windows 10 ``` ```python from pytwitter import Api consumer_key = '' consumer_secret = '' access_token = '' access_token_secret = '' bearer_token = '' api = Api(bearer_token=bearer_token) api = Api( consumer_key= consumer_key, consumer_secret= consumer_secret, access_token= access_token, access_secret= access_token_secret ) ``` ```python def get_tweet(tweet_id): try: query_result = api.get_tweet(str(tweet_id), return_json=False) #print('[!] preparing for {}. Success!'.format(tweet_id)) except Exception as e: print('[!] Preparing for {}. Failed!!!'.format(tweet_id)) return np.nan return query_result.data.text ``` ```python new_df['tweet_text'] = new_df.apply(lambda x: get_tweet(x['tweet_id']), axis=1) ``` [!] Preparing for 973604160364048384. Failed!!! [!] Preparing for 973921707361697792. Failed!!! [!] Preparing for 981915145961029633. Failed!!! [!] Preparing for 942941435808092161. Failed!!! [!] Preparing for 937866915338506240. Failed!!! [!] Preparing for 858076149158744065. Failed!!! [!] Preparing for 937482636792094720. Failed!!! [!] Preparing for 938437186361462784. Failed!!! [!] Preparing for 887456087498084352. Failed!!! [!] Preparing for 604349497473376256. Failed!!! [!] Preparing for 940758202756599808. Failed!!! [!] Preparing for 941465395155820545. Failed!!! [!] Preparing for 942816621982298113. Failed!!! [!] Preparing for 557206874601562112. Failed!!! [!] Preparing for 563069755918409729. Failed!!! [!] Preparing for 567723090231050241. Failed!!! [!] Preparing for 565759518685290496. Failed!!! ```python new_df = new_df.dropna(axis = 0) # drop the row which has nan value new_json = new_df.to_json(orient="records") parsed = json.loads(new_json) with open('sample_data.json', 'w') as fp: json.dump(obj=parsed, fp=fp, indent=4) ``` ## Result <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>tweet_id</th> <th>merger</th> <th>stance</th> <th>tweet_text</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>971761970117357568</td> <td>CI_ESRX</td> <td>support</td> <td>Cigna and ESI set to merge. Here we go...</td> </tr> <tr> <th>1</th> <td>950934259371520000</td> <td>CI_ESRX</td> <td>unrelated</td> <td>Express Scripts Closes Acquisition Of eviCore;...</td> </tr> <tr> <th>2</th> <td>973718376496357377</td> <td>CI_ESRX</td> <td>comment</td> <td>RT @_diginsurance: Cigna-Express Scripts deal ...</td> </tr> <tr> <th>3</th> <td>996772902006599680</td> <td>CI_ESRX</td> <td>support</td> <td>Here's the just-released 400+ page merger prox...</td> </tr> <tr> <th>4</th> <td>971712098253320193</td> <td>CI_ESRX</td> <td>support</td> <td>Cigna nears deal for Express Scripts https://t...</td> </tr> <tr> <th>...</th> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr> <tr> <th>143</th> <td>601243177329143808</td> <td>ANTM_CI</td> <td>unrelated</td> <td>Lot of 5 Trace Magazines Foxy Brown Tyra Banks...</td> </tr> <tr> <th>146</th> <td>451009663832580096</td> <td>ANTM_CI</td> <td>unrelated</td> <td>#ForThePerfectView Anthem AV signs MSR Acousti...</td> </tr> <tr> <th>147</th> <td>587752130888540160</td> <td>ANTM_CI</td> <td>unrelated</td> <td>THE GASLIGHT ANTHEM monster poster will be up ...</td> </tr> <tr> <th>148</th> <td>592615405748760576</td> <td>ANTM_CI</td> <td>unrelated</td> <td>. @AusNextTopModel takeover tomorrow morning a...</td> </tr> <tr> <th>149</th> <td>490629012751142913</td> <td>ANTM_CI</td> <td>unrelated</td> <td>ANTM yet to Decide on the Acquisition Plan of ...</td> </tr> </tbody> </table> <p>133 rows × 4 columns</p> </div>