conda.common.configuration
to be a lot simpler, focusing primarily
on the merging and precedence of configuration variables sources and not on
type coercionVarious helpful links:
These are the ongoing tasks for implementing this feature.
#! final
logic for configuration files
Creating better error messaging will be an extremely important part of the new configuration system. These new errors should be focused on providing actionable information to our users. Here are couple of ideas of how this could be done:
channels
is a good example of this. It currently just says value is not a tuple.
Instead it should show YAML specific values. Bonus points for designing this
so that it could easily handle other file formats (e.g. JSON or TOML).Runtime configuration in conda is currently implemented by a series of "Configuration", "Parameter" and "ParameterLoader" classes. While these classes have served their purpose well over the years, there are still improvements that can be made that would make the code easier to maintain and provide better error messages to our users when errors during configuration parsing are made. In this article, I make a proposal for adding Pydantic as a dependency to enable conda to make the aforementioned improvements. I go over the benefits as well as the downsides to this approach while providing clear code examples to show how the new configuration will be laid out.
Configuration in conda is responsible for modifying its behavior at runtime. A couple examples of its use include telling conda which channels to search packages for, providing configuration options to the solver or deciding which solver to use. These configuration settings come from several different places:
condarc
filesCONDA_*
The diagram in figure one shows the order of precedence for all configuration sources. The further right a source is placed the more important this configuration source is:
Figure 1: configuration parse order
In the code, this is all held together by the singleton Context object which itself is a subclass of the Configuration object. In addition to this, there are also several different types of Parameter classes, which allow you to define the various configuration parameters that the application uses. The last piece of the puzzle is a [ParameterLoader][parameter-loader-class] class that orchestrates the retrieval, parsing and merging configuration parameters. This is done lazily to help increase the speed of context object creation.
A simplified version of the Context
object is is shown below to
illustrate how these different classes work together:
from conda.common.configuration import (
Configuration,
ParameterLoader,
PrimitiveParameter
)
class Context(Configuration):
string_field = ParameterLoader(
PrimitiveParameter("default", str)
)
list_of_int_field = ParameterLoader(
SequenceParameter([1, 2, 3], int)
)
map_of_foat_values_field = ParameterLoader(
MapParameter({"key": 1.0}, float)
)
For a more detailed overview of how this works and the other classes at play, please check out the deep dive article on context and configuration available in the conda documentation.
There are several problems with the current system of configuration:
We start with going over why modifying the current configuration system's behavior is not as easy as it could be. In a recent pull request, we tried to extend the behavior of the SequenceParamter class (more information here). The main goal was to enable the parameter parser to except a mixed list of data types in the configuration file. The current API did not support this and forced me to go into the code itself and perform an extensive refactor.
Refactors like this not only consume developer time but also carry a risk of breaking existing code. A future configuration system should be flexible enough to anticipate a variety of use cases and not just the current ones.
The second problem with the current configuration system is its brittle parsing behavior and less than clear error messages when this parsing does work.
We see this in action with an example. The channels
parameter is defined
as a list of a strings in our configuration files. Here is what that typically
looks like:
channels:
- defaults
- conda-forge
But, what if someone were to incorrectly define this as a mapping:
channels:
first: defaults
seconds: conda-forge
The example is a bit contrived, but it does illustrate how brittle our current
parsing system is. When conda info
is run to test things out, here is the
output we receive:
$ conda info
# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
Traceback (most recent call last):
File "/opt/conda-src/conda/exceptions.py", line 1118, in __call__
return func(*args, **kwargs)
File "/opt/conda-src/conda/cli/main.py", line 69, in main_subshell
exit_code = do_call(args, p)
File "/opt/conda-src/conda/cli/conda_argparse.py", line 91, in do_call
return getattr(module, func_name)(args, parser)
File "/opt/conda-src/conda/cli/main_info.py", line 320, in execute
info_dict = get_info_dict(args.system)
File "/opt/conda-src/conda/cli/main_info.py", line 137, in get_info_dict
channels = list(all_channel_urls(context.channels))
File "/opt/conda-src/conda/base/context.py", line 803, in channels
return tuple(IndexedSet((*local_add, *self._channels)))
File "/opt/conda-src/conda/common/configuration.py", line 1227, in __get__
matches = [self.type.load(self.name, match) for match in raw_matches]
File "/opt/conda-src/conda/common/configuration.py", line 1227, in <listcomp>
matches = [self.type.load(self.name, match) for match in raw_matches]
File "/opt/conda-src/conda/common/configuration.py", line 1095, in load
loaded_child_value = self._element_type.load(name, child_value)
File "/opt/conda-src/conda/common/configuration.py", line 994, in load
match.value(self._element_type),
AttributeError: 'str' object has no attribute 'value'
It looks like conda ran into an uncaught exception and printed a big stacktrace. This is not ideal. Our users will have no way of knowing exactly which configuration option is improperly configured or if there even is a problem with the configuration at all. These types of errors must be avoided at all costs to keep a pleasant user experience for conda.
Above was the worst possible scenario, but what happens when the parsing errors
are actually caught and a message is presented to the user. Below is another example
of an invalid configuration. This time we define channels
as a string:
channels: defaults
This example is a lot more likely to happen than the first invalid configuration
that was shown. Here is the error message that is returned when we again try to
run conda info
:
$ conda info
InvalidTypeError: Parameter _channels = 'defaults' declared in /home/test_user/.condarc has type str.
Valid types:
- tuple
This message is already a lot better as it tells us exactly which file this
error is occurring in, but there is still room for improvement. The first
word that we come across is InvalidTypeError
. Although valid, it is my
opinion that we should not leak application internals to users in this
way. Instead, it we be more informative to begin by saying that a
configuration parsing error was encountered instead as this says more
about the nature of the problem and how we may possibly fix it.
The second problem we run across is the underscore placed in front of the
parameter name. Instead of channels
we get _channels
. Conda developers
would know exactly why this is showing up this way, but for a causal user this
could be a little confusing. Yes, they will probably eventually figure out
exactly which configuration parameter is causing the problem, but this is
extra work they should not have to do.
The last piece of criticism for this particular error message is
the Valid types:
section. In it, we see tuple
listed. The problem here
is that this is not relevant to the YAML format at all and is instead an
internal data type for the Python language. This might not be very helpful
for someone unfamiliar with Python. Finally, a tuple
of what? From this
error message, a user would only know that the configuration parameter has to be a
sequence of some kind, but they would still be unsure exactly what belongs
in this sequence.
Ultimately, a user will most likely head to our documentation to see examples of the correct configuration values to fix the problem. A future configuration system should make it obvious what needs to be fixed to prevent this additional trip. But, if they would like to see the documentation anyways, we should be nice and provide them with a link directly in the error message itself.
The way that lazy loading currently works means that errors only bubble up one by one. So when multiple errors exist in configuration, the user must run into these individually and them fix them individually. This is sub-optimal user experience because we could have simply reported all known errors to the user at the time of initial parsing. This helps save our users time and frustration.
Initially, this comprise may have been made for performance reasons, and any new configuration system will have to keep this in mind. But, if it is possible to get the same (or roughly the same) performance when all configuration variables are parsed up front, it will be worth it to switch away from a lazy load technique for configuration parameters.
Already mentioned were two pieces of criticism that should be dealt with when
designing a new architecture for our configuration system, namely extendability
and error reporting. Performance will also be a determining factor behind any
new solutions we develop. For example, something that is two times slower yet meets
all other criteria would not be an acceptable solution for us. The last key
requirement we will have to meet is full backwards compatibility. This solution
will essentially be a drop in replace for our existing Context
object. We
may attempt some refactors across the codebase, but these should initially
be kept minimal to avoid the risk of unintentionally introducing more bugs
in the software.
TBD
tbd
tbd
tbd