# Source of data for your research
## Primary data:
Here you collect the data yourself from your participants. For this, you will need to write how to obtain the data from the participants, what elements of information you will collect, and how you will approach them. Then you seek Ethical approval from the University (and other authorities as appropriate), arrange to collect data. Such data can be text based data, or narratives or stories, or video data, audio data, etc. Your analytical tools will be text mining tools for qualitative research and statistical data analysis tools such as R, or SPSS for your quantitative data analysis and modelling.
## Secondary data
Here you use data already collected by other researchers or by the government and given to you for free or for a fee so that you can use such data for your own purposes and to test your hypotheses, or use for descriptive purposes. If you decide to conduct narrative review or systematic reviews or meta analyses. Your source of data are original studies with detailed methods and results presented. You may also be able to obtain their original data sources. This is needed for individual level meta analyses. Otherwise, you can pool your own results from their studies and conduct your own investigation.
## Tertiary data
Here, a good place to start looking is our own [University of Canterbury libguide pages](https://canterbury.libguides.com/startingresearch/data). I have listed some more sources here.Tertiary data are similar to almanacs, dictionaries, thesauri, and other sources that are somewhat 'meta' in the sense they compile other data or other narratives or lists.
## List of available data sets
1. https://guides.lib.berkeley.edu/publichealth/healthstatistics/rawdata
2. https://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free
3. https://libraryguides.missouri.edu/c.php?g=213300&p=1407295
4. https://www.google.com/publicdata/directory
5. https://www.health.govt.nz/nz-health-statistics/national-collections-and-surveys/surveys/new-zealand-health-survey
## Internet and social media as data sources for research
Beyond providing links to other databases and data sets and repositories of raw data, Internet and texts written on the Internet (and websites) can be important sources of data themselves. Such data need to be analysed using text mining and qualitative data analyses tools, but not necessarily. In one of the earliest papers on the topic, [Sandor Fekete (2002), a Hungarian psychologist "mined"](https://www.researchgate.net/profile/Sandor_Fekete_dr/publication/233449600_The_Internet_-_A_New_Source_of_Data_on_Suicide_Depression_and_Anxiety_A_Preliminary_Study/links/546d15690cf2a7492c55b164/The-Internet-A-New-Source-of-Data-on-Suicide-Depression-and-Anxiety-A-Preliminary-Study.pdf) suicidal and mental health newsgroups (that were popular back then, and you can still find such sources), and commented:
> Internet newsgroups advocating suicide can discourage individuals from seeking psychiatric help. Professional helpers need to know about Internet resources on sui-cide and to understand how suicide fatal-ities in£uence the behaviors of vulnerable people who express suicidal ideation incyberspace ... The investigation of this new data set may provide amuch broader perspective in understanding suicidal process. The greatest advantage is that it can make it possible to interact with these individuals on the Net in an indirect way, which is not threatening, and to reach people who do not contact the health care system in the traditional way. The author would like to draw the mental health professionals’ attention tothis new phenomenon, and suggests that his finding presented in the ... study might be beneficial in mental healthcare.
As you can see, researchers can argue that the impacts go a lot further than what would normally seem reasonable as observational studies alone.
A few years later [Jones and Alony (2008)](https://ro.uow.edu.au/cgi/viewcontent.cgi?)
wrote a position paper where they stated that people blog for seven different reasons, as it was perceived then. Each of these reasons could be topic of your own discovery of knowledge for your own research. They position blogs as source of qualitative data. They regard blogs as easy to access, information rich, unbiased primary sources. They also identify that there are some disadvantages including 'impurity of data' by which they mean that many blogs are poorly written in terms of grammar and textual information or issues with deception and misinformation, so one needs to be judicious. They stated that as a researcher, you can do content and discourse analyses with blogs. They did not mention twitter or social media in their paper as Twitter/Facebook were quite immature and not as content rich as blogs back in the days, but the pictures changed since.
Two points to note here:
- You can not only use qualitative data analysis software such as Nvivo to analyse such data, but you can also use several internet based text analytical tools such as [quanteda](https://quanteda.io) or [tidytext](https://juliasilge.github.io/tidytext/) for data analyses.
- Twitter, Facebook and other social media tools are good sources of data to learn about the world. Social media analytics is a discipline on itself. In order to use Twitter and Facebook for instance as data sources, you will need to use API (application production interface) data mining tools.[ Wasim Ahmed wrote a blog post on the topic of using Twitter as data source, you may want to check it out. ](http://eprints.lse.ac.uk/70829/1/blogs.lse.ac.uk-Challenges%20of%20using%20Twitter%20as%20a%20data%20source%20An%20overview%20of%20current%20resources.pdf)
[Gittelman et.al (2015)](https://www.jmir.org/2015/4/e98/pdf) wrote an influential paper on their study of the use of Facebook likes on predictive health modelling. They concluded:
> Whether this data ultimately comes from Facebook or not is oflittle importance. The online landscape may change and it mayprovide a different source of data that proves more viable in thefuture. So long as the source reflects people’s activities in dailylife, the same relationships may hold. Even if Facebook doesprove to endure as a social institution, however, there is stillroom for a great deal of improvement on the models presentedhere. With cooperation from the social media outlets themselves,we may be able to obtain better estimates in categories that alignbetter with our needs. In the end, our data may not sufferbecause of the rising costs of research. Instead, exploring newlyopened avenues of data collection online could lead to morereliable, timely, and cost-effective county-level data than thatobtainable from traditional public health surveillance systemsas well as serve as an adjunct to those systems
## Questions for the class (write your thoughts next to the questions)
- What are the differences between primary and secondary research?
- What are some common ways to obtain data from participants, online and offline?
- What is Qualtrics and how to use Qualtrics to obtain data?
- What are some of the web based sources of data for different types of research?
When in doubt, for search, chat with Margaret Paterson:
margaret.paterson@canterbury.ac.nz