# [[CAT-2662](https://sourceability.atlassian.net/browse/CAT-2662)] Find out expected / required return values for any scraper
The data types given here are what we usually attach to this field in the scraper, though the scraper does not enforce any data types.
## All fields
### Product
#### identifier
This field is used to link the scraper results back to a scraper request from PIM. It was introduced with [this ticket](https://sourceability.atlassian.net/browse/PIM-318). PIM sends a request ID with each scraper request, and the scraper attaches this to each result using `IdentifierPassthroughMiddleware` in `identifier.py`. In this way, PIM can track how many results are returned for each request, how long each request takes, etc.
This field is required to update the ProductScrapeSchedule `lastScrapeCompletedAt` field on the PIM side. Not having it does not hinder the creation of offers or part informations.
#### scraped_url
The URL which was scraped. This has multiple uses on the PIM side. First, it can be seen in the Offer page in Catalog so that the details of the offer can be checked against the source page. Additionaly, it can be used to scrape the PDP without having to search for the part again. This second functionality is not implemented for on-demand scraping, only for bulk scraping.
We have a couple scrapers for which this URL is basically useless for the PIM side. For example, WPG and CDM Electronics both hit an API endpoint which require special settings and custom request building. This means that the `scraped_url` cannot be hit from the Offer page in Catalog or used directly as a URL for scraping without the special settings.
#### scraped_at
This field is a timestamp which tells us when the scrape occured. This is set in `loaders.py` as the current utc time. It is used as `sourcedAt` when the `RawOfferScraped` objects are created in PIM.
#### scraper
This field contains two keys, `name` and `type`. `name` is the name of the scraper as set in the spider variable `name`. This name should match a [part information source name](https://catalog.sourcengine.com/sources) in PIM so that we can create part information with the scraper result as well as an entry in the `on_demand_offer_vendor_source_enum` so that we can create offers with the scraper result. If these names do not match, we cannot create offers or part informations with the result.
`type` is the type of scraper (`aggregator` or `source`) as defined in the spider variable `scraper_type`. If `type` is `NULL`, the raw offer will be filtered out in `OfferScrapedFactory->createFromRawOffer`. In `NonPositiveQuantityScrapedOfferPipeline` in the `NewOfferPipeline`, offers with 0 quantity from aggregators are filtered out. After the offers are saved, a source `type` of `aggregator` means that the offer will not be returned on our offer API endpoints because of the `FilterScrapedAggregatorOfferPipeline`.
#### manufacturer
This is the raw name of the part manufacturer as found on the scraped page. For on-demand offers, if the manufacturer does not match the scraper request's manufacturer, the result is filtered out in `FilteredVendorSearchClient`. This is also checked later in `ScrapedOfferProductMatchingPipeline`. For all scraped offers, the manufacturer is used to match the offer to a product in the `NewOfferPipeline`.
#### mpn
This is the raw name of the manufacturer part number as found on the scraped page. For on-demand offers, if the mpn does not match the scraper request's mpn, the result is filtered out in `FilteredVendorSearchClient`. This is also checked later in `ScrapedOfferProductMatchingPipeline`. For all scraped offers, the mpn is used to match the offer to a product in the `NewOfferPipeline`.
#### image_urls
This is a list of string URLs to images of the part. We want to filter out blank/default images on the scraper side. It is later saved in a `PartInformation` for the result's product and source.
#### datasheet_urls
This is a list of string URLs to PDF datasheets for the part. It is later saved in a `PartInformation` for the result's product and source.
#### offers
This is a list of offers which matches the format of `ProductOffer` as described below.
#### specifications
Electrical/mechanical/compliance details about the part as a dictionary of string keys and string values. Any distributor or packaging-specific details should be removed. This should only contain characteristics about a single part. It is later saved in a `PartInformation` for the result's product and source.
#### category
A list of string category specifiers from least specific to most specific. It is later saved in a `PartInformation` for the result's product and source.
#### description
The description field, if available from the source, as a string. It is later saved in a `PartInformation` for the result's product and source.
#### manufacturerLeadTime
The lead time from the manufacturer to the distributor, if available. This is not the lead time from the distributor to Sourceability. It needs to be in a format that the `leadTimeParser` in PIM can understand. It is later saved in a `PartInformation` for the result's product and source.
#### lifecycleStatus
The lifecycle status of the part, like `Active` or `EOL`, as a string. It is parsed into the `PartStatusEnum` in the `SetPartStatusPipe`. By default, it will be parsed specifically by the `SourceabilityPartStatus` mapper, but this is quite a rough mapping. A better approach is to define a custom mapper for any new source which covers all cases. See `DigikeyPartStatus` for an example for the `digikey` scraper.
### ProductOffer
#### dateCode
This field is used to determine how recently the product was manufacturerd. Many buyers do not want to buy products that are more than two years old. It should be returned as a string. It is parsed by the `StringParser` when the `RawOffer` is converted to an `Offer`. We only use this value for display. It is not used when setting the `DateCodeWithin2years` attribute.
#### quantity
This field determines how many of the part are available to be sold at the vendor. It should be an integer-like string. It is required to create an offer. If this is not set, the offer will be ignored because of the check in `offerShouldBeIgnored` in the `ScraperResultFactory`.
#### availability
#### leadTime
#### packagingCondition
#### mpq
#### packagingType
#### rohs
#### eccn
#### dateCodeWithin2years
#### region
#### country
#### location
#### countryOfOrigin
#### vendor
#### vendorType
#### priceTiers
#### delivery
### OfferPriceTier
#### price
#### mpq
## Fields that we should reconsider
vendor_type - This can and should be determined on the PIM side when matching the scraped vendor name to a vendor in PIM.
rohs - This should be on the `Product` level, not the `ProductOffer` level.
## What fields are required to create an offer?
## What fields are required to create a part information?