owned this note
owned this note
Published
Linked with GitHub
# Error Messages Meeting May 4th, 2020
## Participants
- Bernard
- Ilan
- Georgia
- Nicole
- Paul
- Pradyun
- Sumana
- Tzu-Ping
## Agenda
### Discuss
- The (inadequate) messages users currently get when the new resolver makes an error
- How people respond to that errror msg, ideally using data from the survey
- Our "should we report this or not" questions — in cases of dependency conflicts with already-installed packages, we want to understand user expectations of what good behavior would be, what they expect, what would be easy to reason about.
### By end of meeting
- Bernard, Georgia, and Nicole should have a clearer understanding of what they need to research, and we should all have an expectation of how/when they will feed the rest of the team their preliminary and subsequent analyses
- Tzu-Ping, Pradyun and Paul should have a Plan A and Plan B approach for instrumenting the resolver for error reporting
- Ilan should be prepared to write tests against the Plan A and Plan B approach
### Context
- the "Installed Packages causing conflicts (UX inputs would be great)" section and "pip output messages - how closely do we have to match current? (UX inputs would be great)" section of our notes from https://wiki.python.org/psf/PackagingWG/2020-04-09-pip
- https://github.com/pypa/pip/pull/8033
- the in-depth explanation Paul wrote on the 17th in this Zulip thread
https://python.zulipchat.com/#narrow/stream/218659-pip-development/topic/Error.20messages/near/194422368
## NOTES:
- The (inadequate) messages users currently get when the new resolver makes an error
- the in-depth explanation Paul wrote on the 17th in this Zulip thread
https://python.zulipchat.com/#narrow/stream/218659-pip-development/topic/Error.20messages/near/194422368
- g & b ASK: is there a detailed list of the errors that can happen from the resolver?
- Paul: not 100% sure. There are 2-3 exceptions that resolvlib can give back. Resolution impossible one, mentioned in thread. Also one - "nested too deep" -- what that means for user, practically, Paul doesn't know. Would have to check code for 3rd one. Not sure how to make it happen.
- What scenarios cause that to happen?
- ResolutionImpossible: If the resolver cannot logically find an intersection from the requirements provided.
- A variant to this error is to have a non-exist package.
- ResolutionTooDeep: The resolver tries too many combinations and thinks it is taking too many “time”. Implementation-wise the resolver simply keeps a counter of how many things it has tried, and quits with this when the counter goes over the limit.
- TP: the "too deep" error msg usually happens when resolver is taking too much time to find a solution. Many combinations possible ... resolver may need to look at each one, and it's taking too much time. Not in terms of absolute time, but resolver is looking at too many things. It thinks it is at the point that it should give up. Not sure how we can explain this to user. Maybe .... 1 idea TP has had: record how many versions we looked at, for this user, and tell user "we looked at n versions, and it is failing, give us some more info to help"... not sure how that would work
- Pradyun: https://github.com/sarugaku/resolvelib/blob/master/src/resolvelib/resolvers.py#L130 (2 here) and https://github.com/sarugaku/resolvelib/blob/master/src/resolvelib/resolvers.py#L32 -- 3 exceptions. (nothing to add to that)
- Bernard: if there is not a list of defined error msgs, is there a list of failure cases where error msgs are in some way ... if they are displayed?
- TP: there are 3 - the 3 Paul mentioned - those are all the possible cases the resolver will fail.
- B: is there an easy way to find and list them? "here are the failure cases, here is the error that is displayed"
- TP: those are all tested. We have tests to make sure they happen when they should
- Sumana: it might be diffcult to use the test suite as a specific UX test/research point.
- TP: Pradyun added the lines that throws the exceptions above. [Here’s where the message is printed to the user.](https://github.com/pypa/pip/blob/master/src/pip/_internal/resolution/resolvelib/resolver.py#L98-L110)
- How people respond to that error msg, ideally using data from the survey
[see below, "User Interview Findings So Far"]
The dependency conflict survey has questions like "share an error that you have seen before and tell us how you solved it" and we have 63 answers
TP, Paul, and Bernard have been on a call understanding some of those responses. We've done some of that a few weeks ago. Could be useful to do more of that. Or developers could look at the and highlight things they want to talk through.
Paul asks: if we do something like that, where would it be leading? collecting info is key .... we want to know what to put in the error message. how do we get to "there" from "here"
Sumana: We want to know about user mental models, understanding / lack of python packages, dependency resolution etc. We might address this in larger docs, or the error messages directly etc. Paul -- did you feel that there was NOTHING actionable about the user's troubleshooting, or how to help with that.
Paul: There's improvements that someone could work out, but I don't know what they are -- Someone tell me what to do -- I can't progress this myself.
Bernard: I think it's important to say : the purpose of that survey wasn't error messages.
Sumana: we need more information from users -- about what needs to be done.
Georgia: More user interviews, potentially interactive, for discussing resolution / other error messaging. Understanding other tools they use (expectations from other tooling / mental models)
Sumana: user interviews vs user tests. interview = asking questions + discussion / test = trying things?
Georgia: testing -- there is some early version (pre-release?) to try and give feedback on. The other thing you're describing is a part of either of those -- an interview w/ specific/common errors -- a different prompt for an interview.
Sumana: word for "try what we have already"?
Georgia: interview -> more discovery, test -> try out.
- Our "should we report this or not" questions — in cases of dependency conflicts with already-installed packages, we want to understand user expectations of what good behavior would be, what they expect, what would be easy to reason about.
Bernard: testing software. scenario based -- need a scenario to test against. What do we mean by "testing error messages?" -- specific area where we can start? I don't know that. That; the most important thing -- usability testing is about testing scenarios -- error messaging is too broad. Where to start? User interview -- quialitatibe, semi-structured, to get a richer understanding of what a user does in a broader area. My view right now isn't going to tell us anything specific -- need a scenario to test to give inputs.
User interviews + survey specifically on how do you debug / how do you fix things, w.r.t. error messages -- my Point of view -- we need direction about what error messages (if we can't get that, that's okay [lost audio])
Georgia: the request from UX side: have some clearer.... we have talked about test cases & error conditions. Failure cases. If we can have those mapped out from an actions perspective: "based on these things, these conditions happen" then we can use those in interviews with people
Nicole: could we use a gherkin syntax? Given, Then, When? to formalize different use cases? Behavior-Driven Development & this type of testing?
Paul: I have seen it and it sounds useful
but I don't personally know how to deliver that info with
BDD -- https://github.com/behave/behave
Intro for Nicole:
- UX researcher, worked on warehouse, PyPI Rollout
- Brought in for providing additional UX support, been here for a week. onboard ATM.
- Glad that she brings in pre-existing knowledge of the ecosystem! :)
Bernard: Where would we find the tests?
- https://github.com/pypa/pip/blob/master/tests/functional/test_new_resolver.py
- `test_new_resolver_no_dist_message`
- `test_new_resolver_requires_python_error`
Paul Simple Example:
- When the user tries to install A version 2 and A version 1 in the same command, we report "Requirements are inconsistent".
Bernard: it's hard to know what to design if we don't know how it exists and why it happens. If we don't know how or where it happens, we can only give very vague error message content. Taking a step back - re: research - it's hard to research anything useful if we don't know where we need to start. That's why I'm asking about test cases, or "this error msg is displayed when this code runs".
B ran into P's case yesterday, but with apt install, and the error message said "you're trying to isntall 2 versions of the same thing, go away, sort it out and come back to me"
Nicole example of gerkin/BDD: https://automationpanda.com/2017/01/27/bdd-101-gherkin-by-example/
Paul: in those simple cases where I can express problem, I can usually express an error message to use. But in harder cases, I cannot generate them not give an error message or solution, so who can help with either end of that? I'm too close, can't see wood for trees.
Bernard: let's start with simple things, then start ....
Tzu-Ping, Pradyun and Paul should have a Plan A and Plan B approach for instrumenting the resolver for error reporting
Ilan should be prepared to write tests against the Plan A and Plan B approach
Sumana: any way to choose approaches?
Pradyun: need more info. We need Ilan to find failure cases. Cascading nature of communication has caused frustration.
Sumana: Ilan finding and accumulating failure cases
Ilan: I haven't focused on that, have looked at some install scenarios and added those tests, and so far they are all working very well.... the more complicated examples were failig with olf resolver, but with new one they give correct result.
There are examples where new resolver can't give a corret installation because there are conflicting things or ambiguity. I haven't looked
Paul: We're not looking for the new resolver giving a wrong result - we're looking for cases where the new resolver would be expected to fail because there is no solution.
Pradyun: this should fail — https://github.com/pypa/pip/blob/master/tests/yaml/install/conflicting_triangle.yml#L1
Ilan:
TODO: <Someone> add text/narrative documentation around the test cases
- Bernard can do this with Pradyun and Nicole when we speak (5 May 2020 @ 2PM GMT+1)
- TODO: talk - schedule this. (done?)
TODO: Pradyun walk through the tests with UX team
- document tests with "plain language" examples(?)
TODO: Ilan to spend 2 hours today and see how many scenarios he can come up with and then turn them into YAML tests - pull request
Ilan: It's easiest to have these cases in YAML files in a PR, because otherwise they are hard
Pradyun: YAML tests
Sumana: let's spend most of our Thursday call on this as well
Nicole + Georgia won't be available
Please take very good notes
Pradyun: not sure what the assignments are.
[Sumana fleshes out earlier TODOs]
TP: We should add a test: about conflicting requirements. There are no tests re resolver emitting 2 things depending on different versions of the same dependency.
Pradyun: there is 1 YAML test, we need to expand it and make it better.
TODO: resolver team is expanding these tests.
TODO: Sumana: formatting and archiving these notes
Inspirational note: if this were easy it would be done already. [rueful smile]
---
## Reference Notes from UX team
### Guidelines for Error Messages
- Actionable
- Plain language
- Maintain User context (don’t jump out of the terminal, let the user choose to go to a web resource)
- References:
- [Nielsens 10 Heuristics](https://www.nngroup.com/articles/ten-usability-heuristics/)
- #1: Visibility of system status
- #4: Consistency and standards
- #9: Help users recognize, diagnose, and recover from errors
- [Error message guidelines](https://www.nngroup.com/articles/error-message-guidelines/)
- Additional reading (in-progress):
- [Best Practices Building a CLI Tool for Your Service](https://zapier.com/engineering/how-to-cli/)
- [The Wizard of Oz Guide to User Testing the Command Line Interface](https://uxplanet.org/the-wizard-of-oz-guide-to-user-testing-the-command-line-interface-3847b4418d49)
- [Exploring CLI Best Practices](https://eng.localytics.com/exploring-cli-best-practices/)
- [How to Write Good Error Messages](https://uxplanet.org/how-to-write-good-error-messages-858e4551cd4)
- [Good error messages](https://www.cypress.io/blog/2017/07/26/good-error-messages/)
### User Interview Findings so far
- TODO Collect some more insights from: the maintainer interviews (audios and transcripts) & user interviews
- Some initial insights from interviews
- People struggle with the pip output when troubleshooting.
- If people cannot tell what to do from the error message, they most commonly turn to issue queues or community forums (stack exchange, pip GH repo) to share output from their terminal.
- From our conversations with the development team, trying to understand these types of issue reports, it’s useful in order to understand terminal output:
- the context of someone’s setup
- what they do (i.e. scientist using python for work vs professional python developer vs web dev using python based framework, etc)
- From the [dependency conflict research](https://app.crowdsignal.com/surveys/2552280/report/overview) ⬅️ see email for results access
### Questions/Ideas
- What are the errors we need to design for?
- What errors can happen? Is there a list that we can work through?
- Questions for additional user interviews:
- What other packaging tools (even outside of the python ecosystem) do you use?
- What was your first coding language?
- What other command line tools/languages do you use regularly?
- Work through some example errors/tests
### Troubleshooting Interventions & Ideas
- Tree views
- Interactive troubleshooting tools (see graphviz thread)
- Error Output that allows someone else to diagnose the issue better
- More community support possible
- Quicker to answer each others questions
Chat notes:
CHAT
PRADYUN GEDAM
Huh, my mic isn't working.
I'll rejoin.
I think it's the browser.
https://hackmd.io/el6WZNrTR-mK34szStCtrQ?both
GEORGIA
https://hackmd.io/el6WZNrTR-mK34szStCtrQ
GEORGIA
i think we're having some delays, so i'm turning off video in favor of hearing people.
BERNARD
Yeah I'm having issues too. I don't know if I'm being heard.
GEORGIA
you are, but there's a lag/delay
BERNARD
63
yep 2 weeks
BERNARD
@sumana: you've got a link to the conflicting deps survey results by email.
I had shared it with Pradyun, Paul and TZ.
BERNARD
@nicole You've got it too.
BERNARD
I think it's important to say : the purpose of that survey wasn't error messages.
So frustrating when the mic doesn't work! Sympathies
BERNARD
@nicole Yes that'd be useful. If we can see where the tests are we can then start.
BERNARD
Where would we find the tests?
TZU-PING CHUNG
On that
BERNARD
TZ: merci!
PAUL MOORE
Very simple case. When the user tries to install A version 2 and A version 1 in the same command, we report "Requirements are inconsistent".
Is that what you mean?
BERNARD
PM: yep.
NICOLE HARRIS
good example for gherkin:
https://automationpanda.com/2017/01/27/bdd-101-gherkin-by-example/
GEORGIA
https://github.com/pypa/pip/blob/master/tests/functional/test_new_resolver.py
BERNARD
im looking now
ok this is a good start
PRADYUN GEDAM
"BDD in pip" proposed in the past: https://github.com/pypa/pip/issues/4592
PAUL MOORE
:-)
GEORGIA
one note: comments in the testing doc would be useful to help understand what's expected from the tests. right now the most readable piece there is the test name.
BERNARD
Would someone have time to go thru those tests with me/nicole/georgia? We can then categorise these cases and possibly use them as a place to start?
NICOLE HARRIS
+1 that would be very useful
GEORGIA
i don't know if that's in-line with coding/comment standards or not, but just flagging/asking.
PAUL MOORE
We're not looking for the new resolver giving a wrong result - we're looking for cases where the new resolver would be expected to fail because there is no solution.
BERNARD
+1 to Georgia's request.
PRADYUN GEDAM
I'm happy to do that.
BERNARD
We can write those documentations.
@pradyun thank you. :heart:
PRADYUN GEDAM
One example of this should fail: https://github.com/pypa/pip/blob/master/tests/yaml/install/conflicting_triangle.yml#L1
PAUL MOORE
Just need to take a break for 2 mins
PRADYUN GEDAM
^ the example above @nicode @bernard
PAUL MOORE
Sorry - back now
BERNARD
@pradyun Yup. This one is easy. Paul and I spoke about this one and others. Can we speak about the tests/errors this week? When is good for you?
GEORGIA
i won't be there, because i'm supposed to be on vacation this week
BERNARD
@georgia: you're banned from meeting this week. ;)
*meetingS
PRADYUN GEDAM
@bernard Same time as this meeting, but tomorrow?
BERNARD
@pradyun Thats OK for me. Thank you.
NICOLE HARRIS
@pradyun @bernard - I'd like to join you and same time tomorrow works for me :)
BERNARD 10:00 AM
@nicole are you available tomorrow at 2?
@nicole Yes of course!
NICOLE HARRIS
great!
PRADYUN GEDAM
Awesome. I'll send the invite -- we'll use Uberconference then. :)
BERNARD
@TZ you mean like this: https://github.com/pypa/pip/blob/master/tests/yaml/install/conflicting_triangle.yml#L1 ?
PRADYUN GEDAM
https://github.com/pypa/pip/blob/master/tests/yaml/conflicting_triangle.yml
TZU-PING CHUNG
Yes, something like that.
GEORGIA
👋🏻
NICOLE HARRIS
thank you !
New Message