Timeout error on NextGeoss production server
=====
Full stacktrace of the Timeout error in NextGeoss Vito CGS_S1_GRD_L1 harvester running on production.
When encountering the timeout, the harvester is attempting to retrieve datasets using this link:
http://www.vito-eodata.be/openSearch/findProducts.atom?collection=urn:eop:VITO:CGS_S1_GRD_L1&start=2018-01-01&end=2018-01-02&count=500
```
2020-01-23T12:36:14.813820958Z Traceback (most recent call last):
2020-01-23T12:36:14.813849063Z File "/usr/bin/paster", line 11, in <module>
2020-01-23T12:36:14.813854242Z sys.exit(run())
2020-01-23T12:36:14.813857820Z File "/usr/lib/python2.7/site-packages/paste/script/command.py", line 102, in run
2020-01-23T12:36:14.813861558Z invoke(command, command_name, options, args[1:])
2020-01-23T12:36:14.813865051Z File "/usr/lib/python2.7/site-packages/paste/script/command.py", line 141, in invoke
2020-01-23T12:36:14.814069473Z exit_code = runner.run(args)
2020-01-23T12:36:14.814089495Z File "/usr/lib/python2.7/site-packages/paste/script/command.py", line 236, in run
2020-01-23T12:36:14.814093705Z result = self.command()
2020-01-23T12:36:14.814096700Z File "/srv/app/src/ckanext-harvest/ckanext/harvest/commands/harvester.py", line 185, in command
2020-01-23T12:36:14.814100023Z gather_callback(consumer, method, header, body)
2020-01-23T12:36:14.814103257Z File "/srv/app/src/ckanext-harvest/ckanext/harvest/queue.py", line 298, in gather_callback
2020-01-23T12:36:14.814106475Z harvest_object_ids = gather_stage(harvester, job)
2020-01-23T12:36:14.814109385Z File "/srv/app/src/ckanext-harvest/ckanext/harvest/queue.py", line 352, in gather_stage
2020-01-23T12:36:14.814112416Z harvest_object_ids = harvester.gather_stage(job)
2020-01-23T12:36:14.814166224Z File "/usr/lib/python2.7/site-packages/ckanext/nextgeossharvest/harvesters/cgss1.py", line 227, in gather_stage
2020-01-23T12:36:14.814178063Z for harvest_object in self._gather_(harvest_url, timeout=timeout):
2020-01-23T12:36:14.814181330Z File "/usr/lib/python2.7/site-packages/ckanext/nextgeossharvest/harvesters/cgss1.py", line 448, in _gather_
2020-01-23T12:36:14.814383156Z open_search_url, auth=auth, timeout=timeout):
2020-01-23T12:36:14.814393316Z File "/usr/lib/python2.7/site-packages/ckanext/nextgeossharvest/harvesters/cgss1.py", line 492, in _open_search_pages_from
2020-01-23T12:36:14.814545410Z r = self._get_url(harvest_url, auth=auth, **kwargs)
2020-01-23T12:36:14.814561262Z File "/usr/lib/python2.7/site-packages/ckanext/nextgeossharvest/harvesters/cgss1.py", line 438, in _get_url
2020-01-23T12:36:14.814565494Z response = requests.get(url, **kwargs)
2020-01-23T12:36:14.814568554Z File "/usr/lib/python2.7/site-packages/requests/api.py", line 70, in get
2020-01-23T12:36:14.827439566Z return request('get', url, params=params, **kwargs)
2020-01-23T12:36:14.827463091Z File "/usr/lib/python2.7/site-packages/requests/api.py", line 56, in request
2020-01-23T12:36:14.827468087Z return session.request(method=method, url=url, **kwargs)
2020-01-23T12:36:14.827471614Z File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 475, in request
2020-01-23T12:36:14.827601146Z resp = self.send(prep, **send_kwargs)
2020-01-23T12:36:14.827610255Z File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 628, in send
2020-01-23T12:36:14.827880923Z r.content
2020-01-23T12:36:14.827892676Z File "/usr/lib/python2.7/site-packages/requests/models.py", line 755, in content
2020-01-23T12:36:14.839723385Z self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
2020-01-23T12:36:14.839746781Z File "/usr/lib/python2.7/site-packages/requests/models.py", line 683, in generate
2020-01-23T12:36:14.839752207Z raise ConnectionError(e)
2020-01-23T12:36:14.839771179Z requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.vito-eodata.be', port=80): Read timed out.
```
Trying to access the provider link from the server (host machine) with`curl`, yields the results in 200ms:
`curl -v http://www.vito-eodata.be/openSearch/findProducts.atom?collection=urn:eop:VITO:CGS_S1_GRD_L1&start=2018-01-01&end=2018-01-02&count=500`
However, trying to access it from within the docker container of the app takes very long, eventually leading to the timeout:
