This document tries to document the exact set of authentication procedures needed to access data behind earthdata login, documented at the network level. This means explicitly stating, for example, that HTTP Basic Auth is used - or a particular s3 based flow needs to be performed.
https://docs.google.com/document/d/18GyoMZj0I2HKAXwqyeziO0ISbOwHxo1TN4eAlR4mH3U/view https://github.com/nasa/cumulus/blob/master/packages/api/endpoints/s3credentials-readme/instructions/index.md https://github.com/asfadmin/thin-egress-app/blob/master/docs/s3access.md
Both HTTPS:// and S3:// is available when you're working from AWS us-west-2
.
HTTPS:// – Typically, when you make an HTTP request, you'll get redirected to urs for authentication, and then you'll be redirected (eventually) to an signed-S3 HTTP link.
S3:// – To use S3 you'll need AWS access keys. Typically you can request S3 access key from a DAAC endpoint via HTTP. When you have the S3 access keys, you can use AWS-aware tools like normal. However, those keys will expire in 1 hour, and so you'll need to request keys regularly.
It does not count against the egress cap for NASA regardless of the protocol being used.
Only HTTPS is allowed, not s3://
.
Only HTTPS is allowed.
Automated OAuth2 is used!
So each https://urs.earthdata.nasa.gov/documentation/faq#How do I encode a username and password for HTTP basic authentication? Kinda hilarious that the examples for earth data basic access are in… perl
Other example from GES DISC: https://disc.gsfc.nasa.gov/data-access
suggests we use standard HTTP Basic Authentication
(Link to Luis' notebook here)
Redirect loop:
Original URL -> Earthdata OAuth -> Original URL -> CloudFront -> Final URL
Not everything is behind TEA! Rewrite of Fat Egress app. It primarily does egress here. It is also possible that we don't know how to do this in 'one go', as there are probably many different things that are going.
The goal is to find a way to get
Clear action item:
To go through this redirect loop and identify exactly the part where curl breaks. (Using curl as an example is good because very widespread). Find where curl breaks but wget doesn't, and figure out the fix for that. (This is a strategic approach too since "not new package")
Helping the team that maintains TEA in the long run, less piecemeal in the future. "So that end-users won't have to hear the term US-West2".
Additionally:
Next Steps (Ignoring EULA part for now; tackle that separately)
We went through and tried to document the urls that are happening and why people have trouble accessing from us-west-2 and get an unhelpful 404 error (not to mention that as a user you first need to know what uswest 2 means). This isn't a problem when you're on prem, only on the cloud. This has a technical fix, not a social-political fix.
We have some understanding of why the above might be. We're coming from the idea that TEA is a problem; we'll tackle this this upstream (TEA). It's not the end-all solution because there are performance problems but this will fix the problem of access and we can punt the issue or what happens with large-scale datasets and performance until later, since that is a more narrow usecase (accessing data is the first hurdle to tackle).
We'll frame this interms of curl, the most widespread https client.
^ this is a bug that should be fixed. Frame this as a small fix that this team can do and contributes to a larger scale fix. Yuvi helps id a minimum technical fix
Yuvi - writes a few GH issues
Joe: Some DAACs use TEA, others use Cumulus. Joe will try to inventory who uses what for which dataset - Brianna, Luis and I can do this internally; what is being to serve data
If we concluded to fix TEA for HTTP maybe there isn't need for fsspec
Joe: https://github.com/nasa/cumulus/blob/master/packages/api/endpoints/s3credentials-readme/instructions/index.md vs https://github.com/asfadmin/thin-egress-app/blob/master/docs/s3access.md
Alexey: This is the opposite problem than we've been talking about: https://gist.github.com/ashiklom/6d3cf6e12ea2582221e9e7446bc94f6a > the "failing" link works for Brianna
If we can specifically document [ ] this would be a win. This is where we got blocked, we tried it on Pangeo Forge
Issues: