Try   HackMD

Giving conda Better Configuration Handling

Outline

  • Why do we want to update conda's configuration?
    • It doesn't handle errors well when things are misconfigured (usually blows up with a big unhelpful stacktrace)
    • It's a custom built solution that's difficult to customize
    • Parameter definition is clunky, relying on class definitions instead of type hints
  • How do we do this?
    • Switching to Pydantic would help simplify the context object
    • We re-write conda.common.configuration to be a lot simpler, focusing primarily on the merging and precedence of configuration variables sources and not on type coercion
  • Why should I care?
    • Pydantic v2 is currently planning to rewrite the core of the library in Rust promising a 10x speed up, this will have ripple effects for conda's performance
    • Better error messages for our end users
    • Easier for developers to reason about how conda's configuration works
  • What are the downsides?
    • Another conda dependency
    • Risks that core refactors carry (e.g. unknown side-effects and bugs)
    • Needs to be fully backwards compatible to avoid bugs

Resources

Various helpful links:


Tasks

These are the ongoing tasks for implementing this feature.

Better error messaging

Creating better error messaging will be an extremely important part of the new configuration system. These new errors should be focused on providing actionable information to our users. Here are couple of ideas of how this could be done:

  • Examples of what a correctly defined variable looks like
  • Specific details about exactly which parameter is incorrectly defined
  • Links to applicable documentation for the configuration parameter
    • This may involve an overhaul of the current documentation to make it easier to link to the exact configuration variable.

Tasks

  • Add a error message formatter that pretty prints a pydantic validation error
  • Make the error message about the valid data type not show Python data types. channels is a good example of this. It currently just says value is not a tuple. Instead it should show YAML specific values. Bonus points for designing this so that it could easily handle other file formats (e.g. JSON or TOML).
  • Set up benchmarking so we can compare performance of not using lazy loading. This will also help to compare the versions of pydantic with and without Rust.

Article

Abstract

Runtime configuration in conda is currently implemented by a series of "Configuration", "Parameter" and "ParameterLoader" classes. While these classes have served their purpose well over the years, there are still improvements that can be made that would make the code easier to maintain and provide better error messages to our users when errors during configuration parsing are made. In this article, I make a proposal for adding Pydantic as a dependency to enable conda to make the aforementioned improvements. I go over the benefits as well as the downsides to this approach while providing clear code examples to show how the new configuration will be laid out.

How configuration currently works

Configuration in conda is responsible for modifying its behavior at runtime. A couple examples of its use include telling conda which channels to search packages for, providing configuration options to the solver or deciding which solver to use. These configuration settings come from several different places:

  • Configuration files; also known as condarc files
  • Environment variables; usually prefixed with CONDA_*
  • Command line arguments and options

The diagram in figure one shows the order of precedence for all configuration sources. The further right a source is placed the more important this configuration source is:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Figure 1: configuration parse order

In the code, this is all held together by the singleton Context object which itself is a subclass of the Configuration object. In addition to this, there are also several different types of Parameter classes, which allow you to define the various configuration parameters that the application uses. The last piece of the puzzle is a [ParameterLoader][parameter-loader-class] class that orchestrates the retrieval, parsing and merging configuration parameters. This is done lazily to help increase the speed of context object creation.

A simplified version of the Context object is is shown below to illustrate how these different classes work together:

from conda.common.configuration import (
    Configuration,
    ParameterLoader,
    PrimitiveParameter
)

class Context(Configuration):
    string_field = ParameterLoader(
        PrimitiveParameter("default", str)
    )
    list_of_int_field = ParameterLoader(
        SequenceParameter([1, 2, 3], int)
    )
    map_of_foat_values_field = ParameterLoader(
        MapParameter({"key": 1.0}, float)
    )

For a more detailed overview of how this works and the other classes at play, please check out the deep dive article on context and configuration available in the conda documentation.

Criticisms of the current approach

There are several problems with the current system of configuration:

  1. Extending and modifying its behavior is not as easy as it should be.
  2. Error reporting for an incorrectly defined configuration file is is brittle and messages can be confusing.
  3. Lazy loading our parameters means that the all configuration errors are not caught at once. When multiple errors exist in configuration, our users must discover these one by one.

Extending and modifying

We start with going over why modifying the current configuration system's behavior is not as easy as it could be. In a recent pull request, we tried to extend the behavior of the SequenceParamter class (more information here). The main goal was to enable the parameter parser to except a mixed list of data types in the configuration file. The current API did not support this and forced me to go into the code itself and perform an extensive refactor.

Refactors like this not only consume developer time but also carry a risk of breaking existing code. A future configuration system should be flexible enough to anticipate a variety of use cases and not just the current ones.

Error messages

The second problem with the current configuration system is its brittle parsing behavior and less than clear error messages when this parsing does work.

We see this in action with an example. The channels parameter is defined as a list of a strings in our configuration files. Here is what that typically looks like:

channels: - defaults - conda-forge

But, what if someone were to incorrectly define this as a mapping:

channels: first: defaults seconds: conda-forge

The example is a bit contrived, but it does illustrate how brittle our current parsing system is. When conda info is run to test things out, here is the output we receive:

$ conda info

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File "/opt/conda-src/conda/exceptions.py", line 1118, in __call__
        return func(*args, **kwargs)
      File "/opt/conda-src/conda/cli/main.py", line 69, in main_subshell
        exit_code = do_call(args, p)
      File "/opt/conda-src/conda/cli/conda_argparse.py", line 91, in do_call
        return getattr(module, func_name)(args, parser)
      File "/opt/conda-src/conda/cli/main_info.py", line 320, in execute
        info_dict = get_info_dict(args.system)
      File "/opt/conda-src/conda/cli/main_info.py", line 137, in get_info_dict
        channels = list(all_channel_urls(context.channels))
      File "/opt/conda-src/conda/base/context.py", line 803, in channels
        return tuple(IndexedSet((*local_add, *self._channels)))
      File "/opt/conda-src/conda/common/configuration.py", line 1227, in __get__
        matches = [self.type.load(self.name, match) for match in raw_matches]
      File "/opt/conda-src/conda/common/configuration.py", line 1227, in <listcomp>
        matches = [self.type.load(self.name, match) for match in raw_matches]
      File "/opt/conda-src/conda/common/configuration.py", line 1095, in load
        loaded_child_value = self._element_type.load(name, child_value)
      File "/opt/conda-src/conda/common/configuration.py", line 994, in load
        match.value(self._element_type),
    AttributeError: 'str' object has no attribute 'value'

It looks like conda ran into an uncaught exception and printed a big stacktrace. This is not ideal. Our users will have no way of knowing exactly which configuration option is improperly configured or if there even is a problem with the configuration at all. These types of errors must be avoided at all costs to keep a pleasant user experience for conda.

Above was the worst possible scenario, but what happens when the parsing errors are actually caught and a message is presented to the user. Below is another example of an invalid configuration. This time we define channels as a string:

channels: defaults

This example is a lot more likely to happen than the first invalid configuration that was shown. Here is the error message that is returned when we again try to run conda info:

$ conda info

InvalidTypeError: Parameter _channels = 'defaults' declared in /home/test_user/.condarc has type str.
Valid types:
  - tuple

This message is already a lot better as it tells us exactly which file this error is occurring in, but there is still room for improvement. The first word that we come across is InvalidTypeError. Although valid, it is my opinion that we should not leak application internals to users in this way. Instead, it we be more informative to begin by saying that a configuration parsing error was encountered instead as this says more about the nature of the problem and how we may possibly fix it.

The second problem we run across is the underscore placed in front of the parameter name. Instead of channels we get _channels. Conda developers would know exactly why this is showing up this way, but for a causal user this could be a little confusing. Yes, they will probably eventually figure out exactly which configuration parameter is causing the problem, but this is extra work they should not have to do.

The last piece of criticism for this particular error message is the Valid types: section. In it, we see tuple listed. The problem here is that this is not relevant to the YAML format at all and is instead an internal data type for the Python language. This might not be very helpful for someone unfamiliar with Python. Finally, a tuple of what? From this error message, a user would only know that the configuration parameter has to be a sequence of some kind, but they would still be unsure exactly what belongs in this sequence.

Ultimately, a user will most likely head to our documentation to see examples of the correct configuration values to fix the problem. A future configuration system should make it obvious what needs to be fixed to prevent this additional trip. But, if they would like to see the documentation anyways, we should be nice and provide them with a link directly in the error message itself.

Lazy loading

The way that lazy loading currently works means that errors only bubble up one by one. So when multiple errors exist in configuration, the user must run into these individually and them fix them individually. This is sub-optimal user experience because we could have simply reported all known errors to the user at the time of initial parsing. This helps save our users time and frustration.

Initially, this comprise may have been made for performance reasons, and any new configuration system will have to keep this in mind. But, if it is possible to get the same (or roughly the same) performance when all configuration variables are parsed up front, it will be worth it to switch away from a lazy load technique for configuration parameters.

Proposal for a new software architecture

Already mentioned were two pieces of criticism that should be dealt with when designing a new architecture for our configuration system, namely extendability and error reporting. Performance will also be a determining factor behind any new solutions we develop. For example, something that is two times slower yet meets all other criteria would not be an acceptable solution for us. The last key requirement we will have to meet is full backwards compatibility. This solution will essentially be a drop in replace for our existing Context object. We may attempt some refactors across the codebase, but these should initially be kept minimal to avoid the risk of unintentionally introducing more bugs in the software.

Extendability

TBD

Error reporting

tbd

Performance

tbd

Backwards compatibility

tbd