---
tags: reviews
---
# Review variables
All api documentation can be found here: https://developer.bog.nu/datatypes/index.html
The columns listed below are based on Telma's current csv dataset
* id
* unique for each row
* created_by
* an integer, having values [2, 880, 906, 960, 200, 197,5,424,544]
* date
* when the review was published
* see under referenced titles for title publication dates
* headline
* review headline
* quote
* quote from the review (not the book)
* grade
* original grade
* if missing, it is set to 0-100
* grade normalized
* grade on the 0-100 scale
* url
* url to the review
* referenced_title_ids
* array<int>
* the title ID the review is about
* **referenced_titles** (dict)
* a *dictionary* about the title metadata
* see https://developer.bog.nu/datatypes/title.html
* potentially important:
* language: language_primary, language_translated_from, is_translated
* date and year: year_published_first, year_published_latest, date_published_first, date_published_latest
* author entities (again a dictionary)
* format: e.g. ebog, hardback
* keywords: e.g. historisk roman, erindringer, krimi, politik...
* reviewer_id:
* id of the reviewer, numeric
* 2321 unique values (89442 entries)
* reviewer (dict!):
* a dictionary: id, first_name, last_name, gender, media_id (primary media with which the reviewer is associated)
* *to be expanded into several columns* (with predicted gender for missing gender values)
* work-in-progress
* underheading (optional): subheading of the review
* grading_max (of the original grade scale)
* grading_min
* grading_precision
* Precision, ie. the increment between each grade. If the precision is 1, it means that grades can be 0, 1, 2, 3 etc. while a precision of 0.5 means that the grade can be 0, 0.5, 1, 1.5 etc.
* grading_symbol
* media_importance:
* has values: 100, 200, 300, nan
* media_initials:
* 179 unique ids
* e.g. BE, D, GF
* media_name:
* 304 unique names
* e.g. Bech's Books, Din boganmelder, Gaffa (for the respective id above)
* so probably overlaps with initials?
* media_type_id:
* values: 1,2,3,4,5,6, ,9, nan (7 and 8 missing?)
* media_type_name:
* name of the media type
* unique types in the data: Blog, Onlinemedie, Ugeblad, Avis, Fagblad, Onlinemedie uden citat, Regianl avis, nan
* media_url:
* url link to the media site
### added now
* title_id
* NB! there is only one, and some of them are not included in the list of referenced_title_ids
* probably some works have many title ids, as the reviewer information still matches..
* title_title: the name of the book
* title_category_id: the id of the book category
* all categories: Skønlitteratur, Faglitteratur, Biografier og erindringer, Børnebog, Ungdomsbog, Biografi, nan
* title_keywords:
* a list of all keywords associated with the title
* title_language_primary
* title_language_translated_from
* is_translated
* **To be investigated**: sometimes both language_primary and translated_from are the same, but is_translated is still True
* rev_id, rev_first_name, rev_last_name, rev_gender, rev_id
* adding reviewer info
* clean_grades_scale6:
* some grades reviews are set to be on the scale 1-100 eventhough the grade 83 indicates that this is 5/6
* reversed engineered so 83 becomes 5/6
* clean_grading_max_scale6:
* If grade cleaned in clean_grades_scale6, max grade is updated to 6
* grades_transformed_6:
* Linear tranformation to 6-point scale performed on all grades which is not on the 6-point scale
* clean_grades_scale5:
* some grades reviews are set to be on the scale 1-100 eventhough the grade 80 indicates that this is 4/5
* reversed engineered so 80 becomes 4/5
* clean_grading_max_scale5
* If grade cleaned in clean_grades_scale5, max grade is updated to 5
* grades_transformed_5
* Linear tranformation to 5-point scale performed on all grades which is not on the 5-point scale
### author data
Currently, all authors are listed in a json dictionary, authors.json
* key: title_id
* value: list of dictionaries (one dictionary per author)
# Fiction overview
total nr of reviews: 65474
total nr of titles: 7671
total nr of authors: 17359
| Multiple titles | Multiple authors | nr |
| -------- | -------- | -------- |
| 0 | 0 | 60568 |
| 0 | 1 | 3975 |
| 1 | 0 | 782 |
| 1 | 1 | 146 |
**Type and media importance**


## Translations
There are three columns related to language:
* "title_translated_from"
* "language_primary"
* "title_is_translated"
* is translated: 63523 / 65474
* is not translated: 1951 / 65474
* this does not match with the table below (?)
| Original language | Book read in | nr |
| -------- | -------- | -------- |
| other | da | 35040 |
| other | other | 89 |
| da | da | 30328 |
| da | other | 17 |
# Media overview
### What is online media? What is 'online media without quote'?
**Name of 'Online Media'**:
['Din boganmelder' 'Litteratursiden.dk' 'Bogvægten.dk'
'historie-online.dk' 'berlingske.dk' 'Bogblogger.dk' 'Kulturkapellet'
'Littuna.nu' 'Bogrummet.dk' 'Serieland' 'pov.international' 'ATLAS'
'jyllands-posten.dk' 'bogbotten.dk' 'De unges ord' 'Altinget'
'Babelfisken' 'Kulturen.nu' 'Nummer9' 'Den smalle bog' 'Kontrast'
'#Bogsnak' 'Kulturinformation' 'Modspor' 'Qland' 'Fiktion & Kultur'
'Kulturmagasinet Fine Spind' 'akademikerbladet.dk' 'Himmelskibet'
'naturguide.dk' 'politiken.dk' 'Vores bogliv' 'kristeligt-dagblad.dk'
'dagens.dk' 'nordjyske.dk' 'Planet Pulp' 'kommunen.dk' 'Netavisen Pio'
'InsideBusiness' 'LitteraturNu' 'krigeren.dk'
'Dansk Psykologisk Forening' 'Pædagogen.dk' 'information.dk'
'gastromand.dk' 'gaffa.dk' 'Om kunsten om kunsten og kunsten']
Hmm... But some newspapers' website is listed here. Why is this not a newspaper??
* information.dk
* nordjyske.dk
* politikken.dk
* jyllands-posten.dk
* berlingske.dk
* kristeligt-dagblad.dk
**^THIS IS FIXED NOW!**
**Name of 'Online Media without quote'**:
['Bookeater' 'RomeoReviews' 'SilentGirl' 'Bachs bøger' 'Litteraturlinjer'
'FrobbieStories' 'Boginspiration' 'Den lille bogblog'
'Læsehest med fantasy - Bookeater' 'Reading raindrops' 'Frk. Litteratur'
'Emmas bogreol' 'The Secret Life of a Book Collector' 'Bøger på stribe'
'Fullybooked' 'The Small Wonders of Life' 'Moonlit Madness'
'Fed Lyrik & Smal Prosa' 'Sabrinas Blog' 'Cilles læsesal' 'Bookscape'
'Pipalukbooks' 'Isas Boghjørne' 'Literary Linings' 'Hyggemoster' 'Bogliv'
'AndrupsBookshelf' 'Lunas kaffekrog' 'Bogligt.dk' 'Readers Wall'
'Mit bogunivers' 'Bibliotekattens Bøger' 'Ajnat.dk' 'Litteraturlistig'
'Geek Culture' 'LibricSeculum' 'Trilogien - bøger & tech'
'Arktiske Anmeldelser' 'Carrotstick.dk' 'Mit superego'
'My Sparkling Thoughts' 'Ethniqa Magazine' "MissRaaskou's Bogblog"]
### Which newspapers do we have?
**Name of 'Newspaper'**:
['Jyllands-Posten' 'Berlingske' 'Politiken' 'Kristeligt Dagblad'
'Weekendavisen' 'Børsen' 'Information' 'The Financial Times'
'Ekstra Bladet' 'The Nation' 'The Sunday Times' 'B.T.' '24timer'
'JydskeVestkysten' 'Arbejderen' 'Vejle Amts Folkeblad' 'MetroXpress']
17 different newspapers
**Name of 'Regional Newspaper'**:
['Fyens Stiftstidende' 'Nordjyske' 'Horsens Posten' 'Flensborg Avis'
'Midtjyllands Avis' 'minby.dk' 'Fyns Amts Avis' 'JP Aarhus'
'Helsingør Dagblad' 'Københavneravisen']
10 different newspapers
### What is a blog?
There are 165 different blogs. I (IM) have tried to visit some of them, and they all describe themselves as blogs. Quite explicit!
However, there are also blogs in the category 'Online media without quote' such as Sabrinas Blog, Den lille bogblog og Bibliotekattens Bøger.
#Bogsnak and Vores bogliv that are listed as an online medie, is also self-described as blogs.