# HW1: Twitter API by Python tool
###### tags: `HW1`
## Introduce
We use the wtwt dataset and get tweet_ids in this. Then, we put the tweet_ids to pytwitter which is a tweet api tool and get the tweet text.
## Code
```python
import pandas as pd
import json
from pandas import json_normalize
import numpy as np
with open('wtwt_ids.json', newline='') as jsonfile:
data = json.load(jsonfile)
df = json_normalize(data)
print(df)
```
tweet_id merger stance
0 971761970117357568 CI_ESRX support
1 950934259371520000 CI_ESRX unrelated
2 973718376496357377 CI_ESRX comment
3 996772902006599680 CI_ESRX support
4 971712098253320193 CI_ESRX support
... ... ... ...
51279 928683906731270144 FOXA_DIS comment
51280 950566340926099456 FOXA_DIS unrelated
51281 927233376427311104 FOXA_DIS unrelated
51282 952235091010506752 FOXA_DIS comment
51283 902139974732070912 FOXA_DIS unrelated
[51284 rows x 3 columns]
```python
print(df.merger.unique())
```
['CI_ESRX' 'CVS_AET' 'AET_HUM' 'FOXA_DIS' 'ANTM_CI']
```python
merger_id_dict = {}
for i, merger in enumerate(df.merger.unique()):
merger_id_dict[merger] = i
print(merger_id_dict)
```
{'CI_ESRX': 0, 'CVS_AET': 1, 'AET_HUM': 2, 'FOXA_DIS': 3, 'ANTM_CI': 4}
```python
take_some_samples = []
for merger in df.merger.unique():
take_some_samples.append(df[df['merger'] == merger].head(30))
new_df = pd.concat(take_some_samples).reset_index(drop=True) # drop the old index
print(new_df)
```
tweet_id merger stance
0 971761970117357568 CI_ESRX support
1 950934259371520000 CI_ESRX unrelated
2 973718376496357377 CI_ESRX comment
3 996772902006599680 CI_ESRX support
4 971712098253320193 CI_ESRX support
.. ... ... ...
145 565759518685290496 ANTM_CI unrelated
146 451009663832580096 ANTM_CI unrelated
147 587752130888540160 ANTM_CI unrelated
148 592615405748760576 ANTM_CI unrelated
149 490629012751142913 ANTM_CI unrelated
[150 rows x 3 columns]
```python
# ! pip install pytwitter
# For python 3.8 and windows 10
```
```python
from pytwitter import Api
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
bearer_token = ''
api = Api(bearer_token=bearer_token)
api = Api(
consumer_key= consumer_key,
consumer_secret= consumer_secret,
access_token= access_token,
access_secret= access_token_secret
)
```
```python
def get_tweet(tweet_id):
try:
query_result = api.get_tweet(str(tweet_id), return_json=False)
#print('[!] preparing for {}. Success!'.format(tweet_id))
except Exception as e:
print('[!] Preparing for {}. Failed!!!'.format(tweet_id))
return np.nan
return query_result.data.text
```
```python
new_df['tweet_text'] = new_df.apply(lambda x: get_tweet(x['tweet_id']), axis=1)
```
[!] Preparing for 973604160364048384. Failed!!!
[!] Preparing for 973921707361697792. Failed!!!
[!] Preparing for 981915145961029633. Failed!!!
[!] Preparing for 942941435808092161. Failed!!!
[!] Preparing for 937866915338506240. Failed!!!
[!] Preparing for 858076149158744065. Failed!!!
[!] Preparing for 937482636792094720. Failed!!!
[!] Preparing for 938437186361462784. Failed!!!
[!] Preparing for 887456087498084352. Failed!!!
[!] Preparing for 604349497473376256. Failed!!!
[!] Preparing for 940758202756599808. Failed!!!
[!] Preparing for 941465395155820545. Failed!!!
[!] Preparing for 942816621982298113. Failed!!!
[!] Preparing for 557206874601562112. Failed!!!
[!] Preparing for 563069755918409729. Failed!!!
[!] Preparing for 567723090231050241. Failed!!!
[!] Preparing for 565759518685290496. Failed!!!
```python
new_df = new_df.dropna(axis = 0) # drop the row which has nan value
new_json = new_df.to_json(orient="records")
parsed = json.loads(new_json)
with open('sample_data.json', 'w') as fp:
json.dump(obj=parsed, fp=fp, indent=4)
```
## Result
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>tweet_id</th>
<th>merger</th>
<th>stance</th>
<th>tweet_text</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>971761970117357568</td>
<td>CI_ESRX</td>
<td>support</td>
<td>Cigna and ESI set to merge. Here we go...</td>
</tr>
<tr>
<th>1</th>
<td>950934259371520000</td>
<td>CI_ESRX</td>
<td>unrelated</td>
<td>Express Scripts Closes Acquisition Of eviCore;...</td>
</tr>
<tr>
<th>2</th>
<td>973718376496357377</td>
<td>CI_ESRX</td>
<td>comment</td>
<td>RT @_diginsurance: Cigna-Express Scripts deal ...</td>
</tr>
<tr>
<th>3</th>
<td>996772902006599680</td>
<td>CI_ESRX</td>
<td>support</td>
<td>Here's the just-released 400+ page merger prox...</td>
</tr>
<tr>
<th>4</th>
<td>971712098253320193</td>
<td>CI_ESRX</td>
<td>support</td>
<td>Cigna nears deal for Express Scripts https://t...</td>
</tr>
<tr>
<th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<th>143</th>
<td>601243177329143808</td>
<td>ANTM_CI</td>
<td>unrelated</td>
<td>Lot of 5 Trace Magazines Foxy Brown Tyra Banks...</td>
</tr>
<tr>
<th>146</th>
<td>451009663832580096</td>
<td>ANTM_CI</td>
<td>unrelated</td>
<td>#ForThePerfectView Anthem AV signs MSR Acousti...</td>
</tr>
<tr>
<th>147</th>
<td>587752130888540160</td>
<td>ANTM_CI</td>
<td>unrelated</td>
<td>THE GASLIGHT ANTHEM monster poster will be up ...</td>
</tr>
<tr>
<th>148</th>
<td>592615405748760576</td>
<td>ANTM_CI</td>
<td>unrelated</td>
<td>. @AusNextTopModel takeover tomorrow morning a...</td>
</tr>
<tr>
<th>149</th>
<td>490629012751142913</td>
<td>ANTM_CI</td>
<td>unrelated</td>
<td>ANTM yet to Decide on the Acquisition Plan of ...</td>
</tr>
</tbody>
</table>
<p>133 rows × 4 columns</p>
</div>