Runway News | Stop Writing YAML: Configuring ML Systems with confingy

Stop Writing YAML: Configuring ML Systems with confingy

May 11, 2026

by Ethan Rosenthal, Research Engineering

Stop Writing YAML: Configuring ML Systems with confingy

This blog post is based on a talk given at PyAI.

This year marks 20 years since the Netflix Prize and even more since MapReduce. Industrial-scale Machine Learning (ML) has been in production for multiple decades, and yet configuring these complicated systems remains a primary challenge across the industry. The tooling has matured, but the config problem hasn't. It keeps getting reinvented the same way. This blog post introduces confingy, an open source library for configuring Python-based ML systems. But first, let's try to understand the origin of this problem via a pedagogical story.

The Evolution of Every ML Codebase

At every company, the same system always gets built. It starts with The Do Everything Script. The Do Everything Script repeats itself, hardcodes string values, uses magic numbers everywhere and reads like the stream of consciousness code equivalent of On The Road. It's usually written to quickly solve a problem. The user assumes it will only be run once, so who cares how ugly it is.

As a concrete example, let's say one wants to use an LLM to classify some text data. The script reads a singular CSV, uses a fixed prompt, calls a specific model and prints out an F1 score.

The Do-Everything Script: a single Python entrypoint with hardcoded data, prompt, model, and metric. — Figure 1. The Do-Everything Script. Every choice — data file, prompt version, model, metric — is hardcoded in the body of `main.py`.

The script is good enough to provide some real value, but now stakeholders want it to be improved. This requires experimenting with different data, prompts, models and metrics, and the easiest way to do this is by sticking a CLI in between the user and the script.

The same script, now configured via a CLI with --data, --prompt, --model and --metric flags. — Figure 2. The CLI stage. The script's hardcoded values become flags; logic still lives in one file.

The script drives more value, and further improvements are requested. This only begets more requirements for flexibility, so one often reaches for a YAML configuration file.

A config.yaml file with structured fields for data, prompt, model and metrics. — Figure 3. The YAML config stage. Configuration moves out of the call site and into a structured file.

This induces a painful feedback loop of updating the script each time the structure of the config changes, but surely this is the end, right?

Wrong.

Invariably, there will be different data sources to query, complicated prompts to construct and so on, until one realizes that the script should actually be a DAG where each node is a class or function that follows some higher level abstractions.

The pipeline as a DAG of components: data fetcher, prompt constructor, model maker, metrics, and a final print step. — Figure 4. The DAG stage. Each node is a class or function with its own configuration surface.

This pattern is at odds with a YAML config because it's unclear how to specify and instantiate classes from YAML. Commonly, people end up reaching for YAML tags and implementing string-based dynamic class instantiation, and this is where ML codebases often evolve to across the industry.

A YAML config using a !Load tag to instantiate a SQLQuerier class with query, start and end fields. — Figure 5. The YAML-as-DI stage. Tags reach into the codebase and instantiate classes by string-based dotted paths.

This is the system that gets built at every company. Now let's talk about why it's so painful to live with.

The YAML Trap

All of those ML codebases end up drowning in complexity and terrible developer experience.

By pushing so much logic into configuration, one eventually ends up transforming YAML into a Turing-complete DSL. At Runway, we relied on OmegaConf to expand our need for control. We built a system that allowed configs to inherit from other configs by unioning fields at any arbitrary depth. A single config could inherit from a web of other configs. We supported global variables that could also be overwritten via this inheritance system. We added tags to support inline execution of string-based python code. A single training config ended up as thousands of lines of YAML inherited from dozens of files.

The developer experience was just as bad. Cmd-clicking to go to a function definition doesn't work on classes defined as strings YAML. When the parameters of the class' constructor are a YAML dictionary, we lose modern type hinting and validation. Refactoring classes is impossible when we can't easily see which classes are being used in production when they're spread across configs.

Beyond all of this, using YAML configs in this way actually hurts the code structure. If all of the classes are dynamically instantiated from YAML configs, then dependency injection becomes annoying since all classes must now support dynamic instantiation. As a result, one must either choose inheritance over composition or create God Classes with continually growing constructor flags.

Just Write Code

At first pass, it seems like one should just be able to write code. Use a dataclass or pydantic model and simply set the classes as fields.

from dataclasses import dataclass
from src.data import BaseQuerier, SQLQuerier

@dataclass
class Config:
    data_fetcher: BaseQuerier
    ...

my_config = Config(
    data_fetcher=SQLQuerier(
        query="""
        SELECT *
        FROM my_table
        WHERE
          date BETWEEN {start} AND {end}
        """,
        start="2026-01-01",
        end="2026-02-01"
    ),
    ...
)

Unfortunately, this runs into two problems:

How to track the arguments to the class constructors?

Reusability of code is important in experimental workflows – especially in AI, where billion dollar companies do this as a service. In the above example, how does one keep track of the start and end dates for a particular run of the job?

What if the classes are too "expensive" to instantiate when creating the config?

Maybe the class connects to the database upon instantiation. Maybe it allocates a trillion parameters worth of memory for an LLM. Maybe the config will be defined locally but won't be run until its on a remote machine. In all of these scenarios, one needs to be able to define the object now but lazily instantiate it.

Sometimes people solve for these problems by associating a config class with every class.

from dataclasses import dataclass
from src.data import BaseQuerier, SQLQuerier, SQLQuerierConfig

@dataclass
class Config:
    data_fetcher_class: type[BaseQuerier]
    data_fetcher_config: dict
    ...

my_config = Config(
    data_fetcher_class=SQLQuerier,
    data_fetcher_config=SQLQuerierConfig(
        query="""
        SELECT *
        FROM my_table
        WHERE
          date BETWEEN {start} AND {end}
        """,
        start="2026-01-01",
        end="2026-02-01"
    ),
    ...
)
data_fetcher = (
    my_config
    .data_fetcher_class
    .from_config(
        my_config.data_fetcher_config
    )
)

While this works, it's a major drag. It will encourage inheritance over composition, because who wants to go through all that boilerplate for a new class?

Everything up to this point crystallizes into the four requirements that any real solution has to satisfy:

Everything should be Python.
Track constructor arguments.
Lazy instantiation.
Don't make me refactor my entire codebase!

Meet confingy

confingy is a Python library that supports the above requirements. Internally, we have migrated all of our YAML configuration over to confingy. You may wonder how we pulled off such a feat, but it turns out refactors are easy when people hate the existing system enough.

The library has four main capabilities: serialization/deserialization, lazy-loading, validation and transpilation. All of them fall out naturally from a single @track class decorator:

from confingy import track

@track
class SQLQuerier:
    def __init__(self, query: str, start: str, end: str):
        self.query = query
        self.start = start
        self.end = end

    def execute(self): ...

querier = SQLQuerier("SELECT * FROM table", "2026-01-01", "2026-02-01")

When a tracked class is instantiated, confingy stores the constructor arguments and other information about the class in a private _tracked_info attribute on the object. In doing so, many features fall out.

Serialization and Deserialization

confingy can serialize any tracked object to JSON. It even tracks a hash of the class' code.

from confingy import serialize_fingy

print(serialize_fingy(querier))
{
    "_confingy_class": "SQLQuerier",
    "_confingy_module": "my_script",
    "_confingy_init": {
        "query": "SELECT * FROM table",
        "start": "2026-01-01",
        "end": "2026-02-01",
    },
    "_confingy_class_hash": "3aa6871...",
}

Serialized "fingys" can then be deserialized back into Python.

from confingy import deserialize_fingy

print(deserialize_fingy(serialize_fingy(querier)))
# <my_script.SQLQuerier object at 0x7c670313c0d0>

If a constructor argument is also tracked, then it will be nicely serialized, yielding dependency injection for free. Let's take the following classes that all stack inside of each other, Matryoshka-style.

@track
class DBConnector:
    def __init__(self, connection_string: str):
        self.connection_string = connection_string

    def connect(self): ...

@track
class SQLQuerier:
    def __init__(self, db: DBConnector, query: str, start: str, end: str):
        self.db = db
        self.query = query
        self.start = start
        self.end = end

    def execute(self): ...

@track
class Dataloader:
    def __init__(self, querier: SQLQuerier, batch_size: int):
        self.querier = querier
        self.batch_size = batch_size

    def load(self): ...

They can be packaged up into a single dataclass field:

@dataclass
class Config:
    data: Dataloader

config = Config(
    data=Dataloader(
        SQLQuerier(
            db=DBConnector("postgresql://user:pass@host:port/db"),
            query="SELECT * FROM table",
            start="2026-01-01",
            end="2026-02-01",
        ),
        batch_size=128,
    )
)

They can then be clean serialized (and deserialized!) as a nest of JSON:

{
  "_confingy_class": "Config",
  "_confingy_module": "my_script",
  "_confingy_dataclass": true,
  "_confingy_fields": {
    "data": {
      "_confingy_class": "Dataloader",
      "_confingy_module": "my_script",
      "_confingy_init": {
        "querier": {
          "_confingy_class": "SQLQuerier",
          "_confingy_module": "my_script",
          "_confingy_init": {
            "db": {
              "_confingy_class": "DBConnector",
              "_confingy_module": "my_script",
              "_confingy_init": {
                "connection_string": "postgresql://user:pass@host:port/db"
              },
              "_confingy_class_hash": "21d1d02cf..."
            },
            "query": "SELECT * FROM table",
            "start": "2026-01-01",
            "end": "2026-02-01"
          },
          "_confingy_class_hash": "3aa687197d..."
        },
        "batch_size": 128
      },
      "_confingy_class_hash": "6630e2d4..."
    }
  }
}

Lazy-loading

Any tracked class also gets a .lazy() classmethod. Passing the constructor arguments to this method will create a lazy version of the class

from confingy import Lazy, track

@track
class DBConnector:
    def __init__(self, connection_string: str):
        self.connection_string = connection_string

    def connect(self): ...

lazy_db = DBConnector.lazy("postgresql://user:pass@host:port/db")

print(lazy_db)
# Lazy<DBConnector>(
#   config={'connection_string': 'postgresql://user:pass@host:port/db'}
# )

db = lazy_db.instantiate()

Lazy classes even play nicely with types. The below code will play nicely with mypy, assuming you have added the confingy mypy_plugin.

from confingy import Lazy, track

@track
class SQLQuerier:
    def __init__(self, db: Lazy[DBConnector], query: str, start: str, end: str):
        self.db = db
        self.query = query
        self.start = start
        self.end = end

    def execute(self):
        connection = self.db.instantiate().connect()
        ...

Validation

While lazy-instantation is nice, lazy-failing is bad. There's no reason to wait until a cluster is spun up to realize that a string was passed for a float argument. confingy not only tracks constructor arguments, it also parses the constructor parameter names and type hints. As a result, confingy will raise validation errors for both lazy and non-lazy classes.

@track
class DBConnector:
    def __init__(self, connection_string: str):
        self.connection_string = connection_string

    def connect(self): ...

lazy_db = DBConnector.lazy(99.1)

# confingy.exceptions.ValidationError: Validation failed for DBConnector:
#   • Field 'connection_string': Input should be a valid string (got 99)
#
# Provided configuration:
#   connection_string: 99

Transpilation

When a confingy-serialized object is deserialized back into Python, it's hard for a user (or computer or agent) to understand what is inside of this object. While one can open the object in a REPL and tab-complete on its attributes to see what was inside, this is suboptimal. Ideally, one would be able to see the original Python code that produced the serialized object. With confingy.transpile_fingy, a serialized object can be transpiled back into the Python that would have produced it. This Python can then be checked into version control, inspected in an IDE and so on.

Open Source & What's Coming

confingy is now open sourced and installable via PyPI. We welcome feedback and contributions. While it is feature complete enough to replace our old YAML setup, there are always things to work on. For example, the serialization support can be limiting (e.g. we do not yet support Pydantic). We're also still trying to figure out best practices in this brave new world of code as config. What's the best way to diff changes? How should we handle mutability and immutability?

While configuration is clearly not "solved," and will likely never be, we now (thankfully) spend much less time trying to solve it.

Many thanks to Wei Zhang, Elad Richardson, Pablo Acuaviva, Daniel Mendelevitch, Nasir Khalid and Kamil Sindi for ideas, support and stress-testing of confingy.

If you've found yourself fixing or fighting configuration systems, join us — there's a lot of exciting work ahead.

Discover more

News

Runway Partners with Lionsgate

Customer Stories

How “House of David” Used Runway to Become Amazon’s Latest Hit Series

News

Exploring the Future of Filmmaking: Runway’s programming partnership with Tribeca Festival 2024