---
tags: conda
---
[context-class]: https://github.com/conda/conda/blob/b10fcfdd4dca4955b6d88f447fbfb6f44c08ab32/conda/base/context.py#L152
[configuration-class]: https://github.com/conda/conda/blob/b10fcfdd4dca4955b6d88f447fbfb6f44c08ab32/conda/common/configuration.py#L1280
[parameter-class]: https://github.com/conda/conda/blob/f356172ccbb38c7aef0a925de165c671af387657/conda/common/configuration.py#L895
[sequence-parameter-class]: https://github.com/conda/conda/blob/f356172ccbb38c7aef0a925de165c671af387657/conda/common/configuration.py#L1052
[pydantic-v2]: https://pydantic-docs.helpmanual.io/blog/pydantic-v2/
[pydantic-github]: https://github.com/pydantic
[pydantic-news-release-one]: https://www.linkedin.com/feed/update/urn:li:activity:6993552369410002944/
[conda-context-deep-dive]: https://docs.conda.io/projects/conda/en/latest/dev-guide/deep-dives/context.html
[channel-parameters-pull-request]: https://github.com/conda/conda/pull/12033
[wikipedia-leaky-abstraction]: https://en.wikipedia.org/wiki/Leaky_abstraction
[pydantic]: https://pydantic-docs.helpmanual.io/
# Giving conda Better Configuration Handling
## Outline
- Why do we want to update conda's configuration?
- It doesn't handle errors well when things are misconfigured (usually blows up with a big unhelpful stacktrace)
- It's a custom built solution that's difficult to customize
- Parameter definition is clunky, relying on class definitions instead of type hints
- How do we do this?
- Switching to Pydantic would help simplify the context object
- We re-write `conda.common.configuration` to be a lot simpler, focusing primarily
on the merging and precedence of configuration variables sources and not on
type coercion
- Why should I care?
- Pydantic v2 is currently planning to rewrite the core of the library in Rust
[promising a 10x speed up][pydantic-news-release-one], this will have ripple effects for conda's performance
- Better error messages for our end users
- Easier for developers to reason about how conda's configuration works
- What are the downsides?
- Another conda dependency
- Risks that core refactors carry (e.g. unknown side-effects and bugs)
- Needs to be fully backwards compatible to avoid bugs
---
## Resources
Various helpful links:
- https://github.com/travishathaway/honda
- :point_up: this repository contains a sort-of-working proof of concept
- [Pydantic Documentation][pydantic]
---
## Tasks
These are the ongoing tasks for implementing this feature.
- [ ] Implement `#! final` logic for configuration files
- Exactly how this works: https://www.anaconda.com/blog/conda-configuration-engine-power-users
- [ ] Better error messages
### Better error messaging
Creating better error messaging will be an extremely important part of the
new configuration system. These new errors should be focused on providing
actionable information to our users. Here are couple of ideas of how this
could be done:
- Examples of what a correctly defined variable looks like
- Specific details about exactly which parameter is incorrectly defined
- Links to applicable documentation for the configuration parameter
- This may involve an overhaul of the current documentation to make it easier
to link to the exact configuration variable.
#### Tasks
- [x] Add a error message formatter that pretty prints a pydantic validation error
- [ ] Make the error message about the valid data type not show Python data types.
`channels` is a good example of this. It currently just says value is not a tuple.
Instead it should show YAML specific values. Bonus points for designing this
so that it could easily handle other file formats (e.g. JSON or TOML).
- [ ] Set up benchmarking so we can compare performance of not using lazy loading.
This will also help to compare the versions of pydantic with and without Rust.
---
## Article
### Abstract
Runtime configuration in conda is currently implemented by a series of "Configuration",
"Parameter" and "ParameterLoader" classes. While these classes have served their purpose
well over the years, there are still improvements that can be made that would make the
code easier to maintain and provide better error messages to our users when errors during
configuration parsing are made. In this article, I make a proposal for adding Pydantic as
a dependency to enable conda to make the aforementioned improvements. I go over the benefits
as well as the downsides to this approach while providing clear code examples to show how
the new configuration will be laid out.
### How configuration currently works
Configuration in conda is responsible for modifying its behavior at runtime. A couple examples
of its use include telling conda which channels to search packages for, providing configuration
options to the solver or deciding which solver to use. These configuration settings come from
several different places:
- Configuration files; also known as `condarc` files
- Environment variables; usually prefixed with `CONDA_*`
- Command line arguments and options
The diagram in figure one shows the order of precedence
for all configuration sources. The further right a source is placed
the more important this configuration source is:

<p style="text-align: center; margin-top: -30px; margin-bottom:30px">
<b>Figure 1: </b>configuration parse order</i>
</p>
In the code, this is all held together by the singleton [Context][context-class]
object which itself is a subclass of the [Configuration][configuration-class]
object. In addition to this, there are also several different types of
[Parameter][parameter-class] classes, which allow you to define
the various configuration parameters that the application uses.
The last piece of the puzzle is a [ParameterLoader][parameter-loader-class]
class that orchestrates the retrieval, parsing and merging configuration
parameters. This is done lazily to help increase the speed of
context object creation.
A simplified version of the `Context` object is is shown below to
illustrate how these different classes work together:
```python
from conda.common.configuration import (
Configuration,
ParameterLoader,
PrimitiveParameter
)
class Context(Configuration):
string_field = ParameterLoader(
PrimitiveParameter("default", str)
)
list_of_int_field = ParameterLoader(
SequenceParameter([1, 2, 3], int)
)
map_of_foat_values_field = ParameterLoader(
MapParameter({"key": 1.0}, float)
)
```
For a more detailed overview of how this works and the other classes at play,
please check out the
[deep dive article on context and configuration available in the conda documentation][conda-context-deep-dive].
### Criticisms of the current approach
There are several problems with the current system of configuration:
1. Extending and modifying its behavior is not as easy as it should be.
2. Error reporting for an incorrectly defined configuration file is
is brittle and messages can be confusing.
3. Lazy loading our parameters means that the all configuration errors
are not caught at once. When multiple errors exist in configuration,
our users must discover these one by one.
#### Extending and modifying
We start with going over why modifying the current configuration system's behavior
is not as easy as it could be. In a recent pull request, we tried to extend the
behavior of the [SequenceParamter][sequence-parameter-class] class
([more information here][channel-parameters-pull-request]).
The main goal was to enable the parameter parser to except a mixed list of
data types in the configuration file. The current API did not support this
and forced me to go into the code itself and perform an extensive refactor.
Refactors like this not only consume developer time but also carry a risk of
breaking existing code. A future configuration system should be flexible
enough to anticipate a variety of use cases and not just the current ones.
#### Error messages
The second problem with the current configuration system is its brittle
parsing behavior and less than clear error messages when this parsing does
work.
We see this in action with an example. The `channels` parameter is defined
as a list of a strings in our configuration files. Here is what that typically
looks like:
```yaml=
channels:
- defaults
- conda-forge
```
But, what if someone were to incorrectly define this as a mapping:
```yaml=
channels:
first: defaults
seconds: conda-forge
```
The example is a bit contrived, but it does illustrate how brittle our current
parsing system is. When `conda info` is run to test things out, here is the
output we receive:
```
$ conda info
# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
Traceback (most recent call last):
File "/opt/conda-src/conda/exceptions.py", line 1118, in __call__
return func(*args, **kwargs)
File "/opt/conda-src/conda/cli/main.py", line 69, in main_subshell
exit_code = do_call(args, p)
File "/opt/conda-src/conda/cli/conda_argparse.py", line 91, in do_call
return getattr(module, func_name)(args, parser)
File "/opt/conda-src/conda/cli/main_info.py", line 320, in execute
info_dict = get_info_dict(args.system)
File "/opt/conda-src/conda/cli/main_info.py", line 137, in get_info_dict
channels = list(all_channel_urls(context.channels))
File "/opt/conda-src/conda/base/context.py", line 803, in channels
return tuple(IndexedSet((*local_add, *self._channels)))
File "/opt/conda-src/conda/common/configuration.py", line 1227, in __get__
matches = [self.type.load(self.name, match) for match in raw_matches]
File "/opt/conda-src/conda/common/configuration.py", line 1227, in <listcomp>
matches = [self.type.load(self.name, match) for match in raw_matches]
File "/opt/conda-src/conda/common/configuration.py", line 1095, in load
loaded_child_value = self._element_type.load(name, child_value)
File "/opt/conda-src/conda/common/configuration.py", line 994, in load
match.value(self._element_type),
AttributeError: 'str' object has no attribute 'value'
```
It looks like conda ran into an uncaught exception and printed a big stacktrace.
This is not ideal. Our users will have no way of knowing exactly which configuration
option is improperly configured or if there even is a problem with the configuration
at all. These types of errors must be avoided at all costs to keep a pleasant
user experience for conda.
Above was the worst possible scenario, but what happens when the parsing errors
are actually caught and a message is presented to the user. Below is another example
of an invalid configuration. This time we define `channels` as a string:
```
channels: defaults
```
This example is a lot more likely to happen than the first invalid configuration
that was shown. Here is the error message that is returned when we again try to
run `conda info`:
```
$ conda info
InvalidTypeError: Parameter _channels = 'defaults' declared in /home/test_user/.condarc has type str.
Valid types:
- tuple
```
This message is already a lot better as it tells us exactly which file this
error is occurring in, but there is still room for improvement. The first
word that we come across is `InvalidTypeError`. Although valid, it is my
opinion that we should not leak application internals to users in this
way. Instead, it we be more informative to begin by saying that a
configuration parsing error was encountered instead as this says more
about the nature of the problem and how we may possibly fix it.
The second problem we run across is the underscore placed in front of the
parameter name. Instead of `channels` we get `_channels`. Conda developers
would know exactly why this is showing up this way, but for a causal user this
could be a little confusing. Yes, they will probably eventually figure out
exactly which configuration parameter is causing the problem, but this is
extra work they should not have to do.
The last piece of criticism for this particular error message is
the `Valid types:` section. In it, we see `tuple` listed. The problem here
is that this is not relevant to the YAML format at all and is instead an
internal data type for the Python language. This might not be very helpful
for someone unfamiliar with Python. Finally, a `tuple` of what? From this
error message, a user would only know that the configuration parameter has to be a
sequence of some kind, but they would still be unsure exactly what belongs
in this sequence.
Ultimately, a user will most likely head to our documentation to see examples
of the correct configuration values to fix the problem. A future
configuration system should make it obvious what needs to be fixed to
prevent this additional trip. But, if they would like to see the documentation
anyways, we should be nice and provide them with a link directly
in the error message itself.
#### Lazy loading
The way that lazy loading currently works means that errors only bubble up
one by one. So when multiple errors exist in configuration, the user must
run into these individually and them fix them individually. This is sub-optimal
user experience because we could have simply reported all known errors to the
user at the time of initial parsing. This helps save our users time and
frustration.
Initially, this comprise may have been made for performance reasons, and
any new configuration system will have to keep this in mind. But, if it
is possible to get the same (or roughly the same) performance when all
configuration variables are parsed up front, it will be worth it to switch
away from a lazy load technique for configuration parameters.
### Proposal for a new software architecture
Already mentioned were two pieces of criticism that should be dealt with when
designing a new architecture for our configuration system, namely extendability
and error reporting. Performance will also be a determining factor behind any
new solutions we develop. For example, something that is two times slower yet meets
all other criteria would not be an acceptable solution for us. The last key
requirement we will have to meet is full backwards compatibility. This solution
will essentially be a drop in replace for our existing `Context` object. We
may attempt some refactors across the codebase, but these should initially
be kept minimal to avoid the risk of unintentionally introducing more bugs
in the software.
#### Extendability
*TBD*
#### Error reporting
*tbd*
#### Performance
*tbd*
#### Backwards compatibility
*tbd*