># Pubmed Scraping ## Project Goal Scrape Europe Pubmed https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=SRC:MED&cursorMark=*&format=json&pageSize=25&sort=FIRST_IDATE_D+desc&resultType=core for author and article data. **Customizing URLs:** Article Source: *[query=SRC:MED]* (refer to the source table below) Page No: *[&pageSize=25]* (Max 1000 allowed) Next Page: *[&cursorMark=]* (Catch the "cursorMark" from the first request and include it on the next, repeat it till the last page) Query Response Format: *[&format=json]* ( json | xml ) Response Format for Authors: ![](https://i.imgur.com/JDzwW2C.png) These are the sources and record count | Source | Count | | -------- | -------- | | AGR | 762,487 | | CBA | 142,377 | | CTX | 3,699 | | ETH | 53,091 | | HIR | 2,868 | | MED | 30,955,010 | | PAT | 4,229,297 | | PMC | 616,064 | | PPR | 136,139 | ## Output Output needs to be suited for building a relational database that will be similar to "SciLeads". 1. Author module - **Fields**: AuthorID, FirstName, middleInitial(s), LastName, Email, Phone, externalID(ORCID), OrganizationID, DeptID 2. Organization module An organization is an entity that we can easily delimit and refer to. The following entities are typical organizations: University, Company, Government agency, Non-profit organization, Research institute (not affiliated with an university). Note that a university is an organization, but a department or a lab within that university is not an "organization" perse. In PubMed, "Affliation" corresponds to the concept of organization here. - **Fields**: OrganizationID, organizationName, address, city, state (Province), zip, country 3. Department module - **Fields**: DeptID, DeptName 4. Article module - **Fields**: ArticleID, authorList, articleTitle, abstractText, pubYear, keywords, source, sourceurl, doi, pmid, pmcid, journalID, pageNumbers 5. Journal module - **Fields**: JournalID, JournalTitle, Publisher, pISSN, eISSN - List of journals can be obtained from https://europepmc.org/journalList?format=csv