Don't Build One AI Pipeline. Build 100s Instead.

A radical new paradigm for developing AI & ML systems that are built to last

AI
Machine Learning
RAG
Gilad Rubin
Gilad Rubin

Nov 25, 2024

Gilad Rubin

In AI development, many teams unknowingly fall into the trap of creating rigid and brittle codebases that are difficult to maintain and adapt over time.

A common telltale sign of this issue is the presence of a single, hard-coded pipeline that results from the development process. This design pattern often leads to significant challenges in AI development:

Common Pipeline Problems
  • Limited Flexibility: Pipelines designed for one use case become hard to adapt when new models, data, or requirements emerge
  • Lost Knowledge with Each Iteration: Insights gained from testing different configurations are often discarded after a better solution has been found, leading to technical debt and wasted effort
  • Slow Iteration: Manual trial-and-error processes replace automated and systematic experimentation, making it difficult to explore different configurations
  • Unintentional Overfitting: Continuous "peeking" at evaluation results increases the risk of over-optimizing only for specific partitions of datasets, missing out on more robust, generalized solutions

As a result, teams waste time on unnecessary problems and struggle to adapt to the ever-changing demands in an AI product's lifecycle. Unfortunately, this is a very prevalent issue.

Personally, it took me about a decade to break free from this pattern and begin developing in a new way that finally feels right.

In this article, I'll walk you through these challenges in detail, showing how they manifest in real-world scenarios. Then, I'll introduce a radical new approach for AI workflow development — one that enables flexibility and systematic exploration using an open-source Python framework called Hypster.

A Closer Look at a Typical RAG Pipeline Implementation

Does your AI pipeline look like this?

bm25_retriever = BM25Retriever(top_k=30)
embedding_retriever = EmbeddingRetriever(model="all-MiniLM-L6-v2", top_k=30)
joiner = DocumentJoiner(join_mode="reciprocal_rank_fusion", top_k=20)
reranker = CrossEncoderRanker(model="ms-marco-TinyBERT-L-2-v2", top_k=5)
llm = OpenAI(model="gpt-4o", temperature=0.7, max_tokens=500)

retrieval_pipeline = [
    bm25_retriever,
    embedding_retriever,
    joiner,
    reranker,
    llm,
]
Note

This is a toy example, not real working code.

At first glance, this code looks fine. It's readable, straightforward, and gets the job done.

However, there's a critical flaw in this design. Can you spot it?

Actually, there are a couple of things that are flawed in this code snippet, most notably the hard-coding of strings and "magic numbers". However, I'm pointing at something deeper that is hiding in plain sight — this code constructs only one pipeline.


Why a Single Pipeline is Never a Good Idea

The unfortunate reality of building a codebase aimed at creating one optimal pipeline, particularly for a system with initially unknown structure and hyperparameters, is that it often leads to bad coding & scientific practices: commenting out code sections, deleting, rewriting, and duplicating code just to test different ideas.

The final resulting pipeline code isn't the core issue; it's a symptom of a larger, more fundamental problem. To uncover it, we need to ask ourselves two simple questions:

Critical Questions
  1. How did we arrive at this specific pipeline?
  2. How will we adapt and improve this pipeline over time?

From my experience, the answers to both questions are often disappointing. Let's examine a straightforward, real-life example that illustrates the typical responses to these questions.


Iterating on our RAG pipeline

Let's revisit our (toy) RAG pipeline. Imagine that it is functioning well in production, but we're facing a challenge: users are reporting slow response times. Our task is to reduce latency while maintaining quality.

After some profiling, we've identified that the reranking step is consuming 20% of the processing time. This insight points us towards experimenting with alternatives.

Trial #1: Removing the Reranker

joiner = DocumentJoiner(join_mode="reciprocal_rank_fusion", top_k=5)  # Changed from 20
# reranker = CrossEncoderRanker(model="ms-marco-TinyBERT-L-2-v2", top_k=5)
llm = OpenAI(model="gpt-4o", temperature=0.7, max_tokens=500)

retrieval_pipeline = [
    bm25_retriever,
    embedding_retriever,
    joiner,
    # reranker,
    llm,
]

Code Changes:

  • Changed the document joiner's top_k from 20 to 5, as the reranker is no longer narrowing down options.
  • Removed the reranker initialization and the item from the pipeline list.

Question: Should we delete the changed top_k value and the removed reranker from the pipeline, or just comment them out?

Experiment Result: Latency is improved, but performance degrades too much.

Trial #2: Switching Reranker Models

reranker = CrossEncoder(
    # "cross-encoder/ms-marco-TinyBERT-L-2-v2",  # Original
    # "cross-encoder/ms-marco-MiniLM-L-6-v2",   # Faster but worse
    "cross-encoder/ms-marco-MiniLM-L-12-v2",   # Current attempt
    top_k=5,
)

retrieval_pipeline = [
    bm25_retriever,
    embedding_retriever,
    joiner,
    reranker,
]

Code Changes:

  • Changed the document joiner's top_k back to 20.
  • Switched the name of the reranker model.
  • Brought the reranker back into the pipeline.

Question: Should we keep the history of previous model names that we tested?

Experiment Result: Performance improves, but latency is still too high for our liking.

Trial #3: Different Reranker Provider

reranker = CohereReranker(model="rerank-multilingual-v3.0", top_k=5)

retrieval_pipeline = [
    bm25_retriever,
    embedding_retriever,
    joiner,
    docs_to_cohere,  # Added this
    reranker,
]

Code Changes:

  • Replaced the existing reranker with Cohere's.
  • Introduced a docs_to_cohere step in the pipeline.

Experiment Result: Both performance and latency are satisfactory. Costs have increased, but the trade-off seems worthwhile for now.

Knowledge Loss

Even if everything is stored in version control, logged in your experimentation platform, or kept as comments in your codebase, you're still only preserving snapshots of the single valid pipelines that were tested. Knowledge is lost along the way.

Great! That was a successful run. It took some effort, but now we've gained valuable knowledge from this process. Surely, we can reproduce it, conduct new experiments, and combine different solutions we've discovered along the way. Right?

Well, actually, that's usually not the case. Since we're constantly discarding what didn't make it to the final step, and the final step contains only one pipeline, knowledge is lost along the way. We haven't improved our iteration process at all, our codebase remains mostly rigid, and implementing a change is always a cumbersome process.

This example only scratches the surface of the problem. In larger, more complex codebases, these problems are amplified and compounded, and this can be detrimental to the project's success.


How This Impacts Your Projects

Slow Development

  • Each change requires manual code updates, sometimes in multiple areas of the codebase.
  • The process is inherently error-prone and can lead to significant time waste.
  • Testing new configurations usually means running them manually and waiting for completion to decide on the next step. This limits most experiments to work hours, wasting valuable time.

Suboptimal Performance

  • Overfitting by hyperparameters: When hyperparameters fit specific characteristics of the training data (including noise and outliers), improvements may only apply to the current data partition and may not generalize.
  • Skipping valuable configurations: Some potentially effective configurations are too complex or risky to implement in a brittle codebase, so they remain untested.
  • Partial optimization: Entire sections of the pipeline, particularly preprocessing steps, are frequently left unoptimized.

Lack of Flexibility and Future-Proofing

  • New models emerge with game-changing capabilities.
  • Higher quality training data becomes available.
  • Bug fixes in data processing pipelines alter fundamental assumptions.

Instead of quickly re-evaluating configurations to find new optimal solutions, teams must restart manual trial-and-error. Many potential improvements are left unexplored due to time and resource constraints.


Understanding the Root Cause

Why do we fall into this trap?

I believe it's due to two main reasons:

Complexity Overwhelm

Building real-world AI pipelines involves multiple moving parts, each with its own nuances, configurations, and dependencies. It's incredibly challenging to reach a point where even a single pipeline works. As a result, it's tempting to simply hardcode whatever works and move on.

Missing Tools and Methodologies

Until recently, we lacked proper tools and methodologies for managing complex AI configurations systematically. Most tutorials and solutions focus on single pipelines, as if developers know in advance what needs to go into their pipeline.

Even if someone recognized these bad practices and wanted to do something different, there weren't many satisfying solutions available.


Noteworthy Solutions

Configuration Files

Some teams attempt to address these challenges by using configuration files (YAML/JSON). While this approach mitigates some issues, limitations remain:

  • Configuration files typically store only current settings, not possible options, limiting flexibility.
  • Complex types and conditional logic are difficult to implement, creating tight coupling between configuration files and execution code.
  • This coupling makes it harder for the system to grow in complexity.

Hyperparameter Optimization Frameworks

Another approach is using specialized libraries for hyperparameter optimization, such as Optuna, RayTune, or similar frameworks. While these tools excel at large-scale experimentation and automated search, they come with challenges:

  • They are optimized for broad parameter sweeps but become cumbersome when developers need to test specific configurations during day-to-day development.
  • Integrating these frameworks often requires significant architectural changes to your codebase, creating a tight coupling between application logic and the optimization framework.

Moving from a Single AI Pipeline to 100s of Pipelines with Hypster

To address these challenges and limitations, we need a paradigm shift.

After years of contemplation, experimentation, and development, I've concluded that the only sensible approach for AI projects is to transition from building a single pipeline to designing a space of potential pipelines — a collection of possible valid pipeline configurations that can be systematically explored and optimized.

This can be achieved by defining configuration functions, which contain:

  • Hyperparameters and their potential values.
  • Conditional dependencies between these hyperparameters.
  • Support for hierarchical and swappable configurations to enable modularity and proper coding practices.

In the following section, I'll introduce Hypster, a powerful tool that I've developed specifically for this purpose.

Introducing Hypster

Hypster is a lightweight open-source tool in the Python AI ecosystem that allows developers to define configuration functions and instantiate them into concrete pipelines using an intuitive, Pythonic API.

Key features include:

  • Pythonic API with minimal, intuitive syntax.
  • Hierarchical configurations with nested and swappable components.
  • Built-in type hints and validation using Pydantic.
  • Easy serialization and loading of configurations for portability.
  • Built-in support for hyperparameter optimization.
  • Jupyter widgets integration for interactive parameter selection.

Introducing HyPSTER: A Pythonic Framework for Managing Configurations to Build Highly Optimized AI…


Converting our RAG Pipeline into Configuration Functions

We'll first take a look at how our RAG pipeline can be implemented using Hypster and then cover the benefits of this approach.

Step 1: Defining Separate Reranker Configurations

from hypster import config, HP

@config
def sentence_transformers_reranker(hp: HP):
    model = hp.select([
        "ms-marco-TinyBERT-L-2-v2",
        "cross-encoder/ms-marco-MiniLM-L-12-v2",
        "cross-encoder/ms-marco-MiniLM-L-6-v2",
    ])
    reranker = CrossEncoderRanker(model=model, top_k=hp.int(5))
    return reranker

sentence_transformers_reranker.save("sentence_transformers_reranker.py")

@config
def cohere_reranker(hp: HP):
    model = hp.select(["rerank-english-v3.0", "rerank-multilingual-v3.0"])
    reranker = CohereReranker(model=model, top_k=hp.int(5))
    return reranker

cohere_reranker.save("cohere_reranker.py")
Note

These configuration functions don't actually create rerankers; they define the possible values and parameters that can be used when instantiating a reranker later.

By separating these configurations, we maintain modularity and can easily reuse them across different pipelines or projects.

Step 2: Configuring the Full Retrieval Pipeline

@config
def retrieval_config(hp: HP):
    retrieval_pipeline = []

    bm25_retriever = BM25Retriever(top_k=hp.int(30))

    embedders = [
        "all-MiniLM-L6-v2",
        "all-mpnet-base-v2",
        "paraphrase-multilingual-MiniLM-L12-v2",
    ]

    embedding_retriever = EmbeddingRetriever(
        model=hp.select(embedders, default="all-MiniLM-L6-v2"),
        top_k=hp.int(30),
    )

    join_modes = [
        "reciprocal_rank_fusion",
        "distribution_based_rank_fusion",
        "merge",
        "concatenate",
    ]

    joiner = DocumentJoiner(
        join_mode=hp.select(join_modes),
        top_k=hp.int(20),
    )

    retrieval_pipeline.extend([
        bm25_retriever,
        embedding_retriever,
        joiner,
    ])

    reranker_type = hp.select(["none", "sentence_transformers", "cohere"])
    match reranker_type:
        case "sentence_transformers":
            reranker = hp.nest("sentence_transformers_reranker.py")
            retrieval_pipeline.append(reranker)
        case "cohere":
            reranker = hp.nest("cohere_reranker.py")
            retrieval_pipeline.extend([docs_to_cohere, reranker])

    llm_model = hp.select(["gpt-4o", "gpt-4o-mini", "o1-mini"])

    llm = OpenAI(
        model=llm_model,
        temperature=hp.number(0.1, min=0, max=1),
        max_tokens=hp.int(500),
    )

    retrieval_pipeline.append(llm)
    return {"retrieval_pipeline": retrieval_pipeline}

Key Components of This Design:

  • Hyperparameters are defined for each component of the pipeline (retrievers, joiner, rerankers, LLMs) along with their potential values.
  • Conditional dependencies determine which reranker configuration is selected.
  • Nested configurations (hp.nest) keep the code modular and reusable across projects.

With this approach, we can define hundreds of potential pipelines by mixing and matching the available configurations.

For a full working solution for modular RAG with Haystack and Hypster, check out Implementing "Modular RAG" with Haystack and Hypster.


Instantiating a Specific Pipeline

Once we've defined the configuration function, the next step is to instantiate a specific pipeline and test it. This is where the power of Hypster comes into play.

# Instantiate a specific pipeline by passing concrete values
results = retrieval_config(values={
    "bm25_retriever.top_k": 30,
    "embedding_retriever.model": "all-mpnet-base-v2",
    "joiner.join_mode": "reciprocal_rank_fusion",
    "reranker_type": "sentence_transformers",
    "reranker.model": "ms-marco-MiniLM-L-12-v2",
    "llm_model": "gpt-4o",
    "llm.temperature": 0.6,
})

pipeline = results["retrieval_pipeline"]

We can perform this process in code by defining our values as shown above. Another option is to use Hypster's native Jupyter UI integration, enabling easy exploration of configurations in a visual and user-friendly way.

from hypster.ui import interactive_config

results = interactive_config(retrieval_config)

Each change to an existing hyperparameter value updates the components and their respective values, based on the conditional dependency structure in the code.


Automated Hyperparameter Optimization

While we have the option to manually define configurations, we can also specify a subspace for automatic exploration. This allows us to run searches automatically and simultaneously without constant monitoring. We can even continue improving the codebase while the search is running.

# Define the subspace we want to explore
sub_space = {
    "embedding_retriever.model": ["all-MiniLM-L6-v2", "all-mpnet-base-v2"],
    "reranker_type": ["none", "cohere"],
    "reranker.top_k": [5, 10],
    "joiner.join_mode": ["reciprocal_rank_fusion", "merge"],
    "llm.temperature": [0.5, 0.7, 0.9],
}

# "get_combinations" is a placeholder for a function that creates
# a cartesian product of the hyperparameter values defined above
combinations = get_combinations(sub_space)

results = []
for comb in combinations:  # this can be parallelized
    pipeline = retrieval_config(values=comb)
    metrics = evaluate(pipeline["retrieval_pipeline"], test_data)
    results.append({"config": comb, "metrics": metrics})

In the future, Hypster will support native integrations with optimization libraries such as Optuna to leverage intelligent optimization algorithms.


Summary — Transforming AI Development

This approach fundamentally changes how we approach AI pipeline development. Instead of manually writing code for every experiment or model update, we can define configuration functions that allow us to explore and instantiate hundreds of different pipelines from the same codebase.

Problems Solved:

  • Improved code quality through cleaner, modular code with reusable configuration functions.
  • Knowledge accumulation where insights from every experiment are preserved and ready to be reused or combined.
  • Easily manageable configurations, including nested levels, making it simple to track changes.

Benefits:

  • Accelerated development by switching between configurations, testing new ideas, and iterating without rewriting code.
  • End-to-end hyperparameter optimization to systematically find the best-performing pipelines.
  • Flexibility and future-proofing as new models, frameworks, or requirements emerge.

Now, going back to the two questions that were asked earlier, the answers are straightforward and satisfying:

  1. How did we arrive at this specific pipeline? We arrived at it by thoroughly examining the space of configurations defined in our configuration functions.
  2. How will we adapt and improve this pipeline over time? We'll continue to adapt and improve by adding new components and re-evaluating experiments based on new data and requirements.

Outro

Making this knowledge accessible is important to me, so your inputs are valuable. If you have any questions or comments on Hypster or this approach in general, please feel free to reach out.

I also offer consultation and freelance services to companies looking for a structured, common-sense approach to solving business problems using state-of-the-art Generative AI and Machine Learning tools.


Continue reading

Introducing Hypster
Introducing Hypster

A Pythonic Framework for Managing Configurations to Build Highly Optimized AI Workflows

AIOptimizationOpen Source
Gilad Rubin

Gilad Rubin

© 2025 All rights reserved.