># Pubmed Scraping
## Project Goal
Scrape Europe Pubmed https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=SRC:MED&cursorMark=*&format=json&pageSize=25&sort=FIRST_IDATE_D+desc&resultType=core for author and article data.
**Customizing URLs:**
Article Source: *[query=SRC:MED]* (refer to the source table below)
Page No: *[&pageSize=25]* (Max 1000 allowed)
Next Page: *[&cursorMark=]* (Catch the "cursorMark" from the first request and include it on the next, repeat it till the last page)
Query Response Format: *[&format=json]* ( json | xml )
Response Format for Authors:

These are the sources and record count
| Source | Count |
| -------- | -------- |
| AGR | 762,487 |
| CBA | 142,377 |
| CTX | 3,699 |
| ETH | 53,091 |
| HIR | 2,868 |
| MED | 30,955,010 |
| PAT | 4,229,297 |
| PMC | 616,064 |
| PPR | 136,139 |
## Output
Output needs to be suited for building a relational database that will be similar to "SciLeads".
1. Author module
- **Fields**: AuthorID, FirstName, middleInitial(s), LastName, Email, Phone, externalID(ORCID), OrganizationID, DeptID
2. Organization module
An organization is an entity that we can easily delimit and refer to. The following entities are typical organizations: University, Company, Government agency, Non-profit organization, Research institute (not affiliated with an university). Note that a university is an organization, but a department or a lab within that university is not an "organization" perse. In PubMed, "Affliation" corresponds to the concept of organization here.
- **Fields**: OrganizationID, organizationName, address, city, state (Province), zip, country
3. Department module
- **Fields**: DeptID, DeptName
4. Article module
- **Fields**: ArticleID, authorList, articleTitle, abstractText, pubYear, keywords, source, sourceurl, doi, pmid, pmcid, journalID, pageNumbers
5. Journal module
- **Fields**: JournalID, JournalTitle, Publisher, pISSN, eISSN
- List of journals can be obtained from https://europepmc.org/journalList?format=csv