In AI development, many teams unknowingly fall into the trap of creating rigid and brittle codebases that are difficult to maintain and adapt over time.
A common telltale sign of this issue is the presence of a single, hard-coded pipeline that results from the development process. This design pattern often leads to significant challenges in AI development:
- Limited Flexibility: Pipelines designed for one use case become hard to adapt when new models, data, or requirements emerge
- Lost Knowledge with Each Iteration: Insights gained from testing different configurations are often discarded after a better solution has been found, leading to technical debt and wasted effort
- Slow Iteration: Manual trial-and-error processes replace automated and systematic experimentation, making it difficult to explore different configurations
- Unintentional Overfitting: Continuous "peeking" at evaluation results increases the risk of over-optimizing only for specific partitions of datasets, missing out on more robust, generalized solutions
As a result, teams waste time on unnecessary problems and struggle to adapt to the ever-changing demands in an AI product's lifecycle. Unfortunately, this is a very prevalent issue.
Personally, it took me about a decade to break free from this pattern and begin developing in a new way that finally feels right.
In this article, I'll walk you through these challenges in detail, showing how they manifest in real-world scenarios. Then, I'll introduce a radical new approach for AI workflow development — one that enables flexibility and systematic exploration using an open-source Python framework called Hypster.
A Closer Look at a Typical RAG Pipeline Implementation
Does your AI pipeline look like this?
bm25_retriever = BM25Retriever(top_k=30)
embedding_retriever = EmbeddingRetriever(model="all-MiniLM-L6-v2", top_k=30)
joiner = DocumentJoiner(join_mode="reciprocal_rank_fusion", top_k=20)
reranker = CrossEncoderRanker(model="ms-marco-TinyBERT-L-2-v2", top_k=5)
llm = OpenAI(model="gpt-4o", temperature=0.7, max_tokens=500)
retrieval_pipeline = [
bm25_retriever,
embedding_retriever,
joiner,
reranker,
llm,
]
This is a toy example, not real working code.
At first glance, this code looks fine. It's readable, straightforward, and gets the job done.
However, there's a critical flaw in this design. Can you spot it?
Actually, there are a couple of things that are flawed in this code snippet, most notably the hard-coding of strings and "magic numbers". However, I'm pointing at something deeper that is hiding in plain sight — this code constructs only one pipeline.
Why a Single Pipeline is Never a Good Idea
The unfortunate reality of building a codebase aimed at creating one optimal pipeline, particularly for a system with initially unknown structure and hyperparameters, is that it often leads to bad coding & scientific practices: commenting out code sections, deleting, rewriting, and duplicating code just to test different ideas.
The final resulting pipeline code isn't the core issue; it's a symptom of a larger, more fundamental problem. To uncover it, we need to ask ourselves two simple questions:
- How did we arrive at this specific pipeline?
- How will we adapt and improve this pipeline over time?
From my experience, the answers to both questions are often disappointing. Let's examine a straightforward, real-life example that illustrates the typical responses to these questions.
Iterating on our RAG pipeline
Let's revisit our (toy) RAG pipeline. Imagine that it is functioning well in production, but we're facing a challenge: users are reporting slow response times. Our task is to reduce latency while maintaining quality.
After some profiling, we've identified that the reranking step is consuming 20% of the processing time. This insight points us towards experimenting with alternatives.
Trial #1: Removing the Reranker
joiner = DocumentJoiner(join_mode="reciprocal_rank_fusion", top_k=5) # Changed from 20
# reranker = CrossEncoderRanker(model="ms-marco-TinyBERT-L-2-v2", top_k=5)
llm = OpenAI(model="gpt-4o", temperature=0.7, max_tokens=500)
retrieval_pipeline = [
bm25_retriever,
embedding_retriever,
joiner,
# reranker,
llm,
]
Code Changes:
- Changed the document joiner's
top_k
from 20 to 5, as the reranker is no longer narrowing down options. - Removed the reranker initialization and the item from the pipeline list.
Question: Should we delete the changed top_k
value and the removed reranker from the pipeline, or just comment them out?
Experiment Result: Latency is improved, but performance degrades too much.
Trial #2: Switching Reranker Models
reranker = CrossEncoder(
# "cross-encoder/ms-marco-TinyBERT-L-2-v2", # Original
# "cross-encoder/ms-marco-MiniLM-L-6-v2", # Faster but worse
"cross-encoder/ms-marco-MiniLM-L-12-v2", # Current attempt
top_k=5,
)
retrieval_pipeline = [
bm25_retriever,
embedding_retriever,
joiner,
reranker,
]
Code Changes:
- Changed the document joiner's
top_k
back to 20. - Switched the name of the reranker model.
- Brought the reranker back into the pipeline.
Question: Should we keep the history of previous model names that we tested?
Experiment Result: Performance improves, but latency is still too high for our liking.
Trial #3: Different Reranker Provider
reranker = CohereReranker(model="rerank-multilingual-v3.0", top_k=5)
retrieval_pipeline = [
bm25_retriever,
embedding_retriever,
joiner,
docs_to_cohere, # Added this
reranker,
]
Code Changes:
- Replaced the existing reranker with Cohere's.
- Introduced a
docs_to_cohere
step in the pipeline.
Experiment Result: Both performance and latency are satisfactory. Costs have increased, but the trade-off seems worthwhile for now.
Even if everything is stored in version control, logged in your experimentation platform, or kept as comments in your codebase, you're still only preserving snapshots of the single valid pipelines that were tested. Knowledge is lost along the way.
Great! That was a successful run. It took some effort, but now we've gained valuable knowledge from this process. Surely, we can reproduce it, conduct new experiments, and combine different solutions we've discovered along the way. Right?
Well, actually, that's usually not the case. Since we're constantly discarding what didn't make it to the final step, and the final step contains only one pipeline, knowledge is lost along the way. We haven't improved our iteration process at all, our codebase remains mostly rigid, and implementing a change is always a cumbersome process.
This example only scratches the surface of the problem. In larger, more complex codebases, these problems are amplified and compounded, and this can be detrimental to the project's success.
How This Impacts Your Projects
Slow Development
- Each change requires manual code updates, sometimes in multiple areas of the codebase.
- The process is inherently error-prone and can lead to significant time waste.
- Testing new configurations usually means running them manually and waiting for completion to decide on the next step. This limits most experiments to work hours, wasting valuable time.
Suboptimal Performance
- Overfitting by hyperparameters: When hyperparameters fit specific characteristics of the training data (including noise and outliers), improvements may only apply to the current data partition and may not generalize.
- Skipping valuable configurations: Some potentially effective configurations are too complex or risky to implement in a brittle codebase, so they remain untested.
- Partial optimization: Entire sections of the pipeline, particularly preprocessing steps, are frequently left unoptimized.
Lack of Flexibility and Future-Proofing
- New models emerge with game-changing capabilities.
- Higher quality training data becomes available.
- Bug fixes in data processing pipelines alter fundamental assumptions.
Instead of quickly re-evaluating configurations to find new optimal solutions, teams must restart manual trial-and-error. Many potential improvements are left unexplored due to time and resource constraints.
Understanding the Root Cause
Why do we fall into this trap?
I believe it's due to two main reasons:
Complexity Overwhelm
Building real-world AI pipelines involves multiple moving parts, each with its own nuances, configurations, and dependencies. It's incredibly challenging to reach a point where even a single pipeline works. As a result, it's tempting to simply hardcode whatever works and move on.
Missing Tools and Methodologies
Until recently, we lacked proper tools and methodologies for managing complex AI configurations systematically. Most tutorials and solutions focus on single pipelines, as if developers know in advance what needs to go into their pipeline.
Even if someone recognized these bad practices and wanted to do something different, there weren't many satisfying solutions available.
Noteworthy Solutions
Configuration Files
Some teams attempt to address these challenges by using configuration files (YAML/JSON). While this approach mitigates some issues, limitations remain:
- Configuration files typically store only current settings, not possible options, limiting flexibility.
- Complex types and conditional logic are difficult to implement, creating tight coupling between configuration files and execution code.
- This coupling makes it harder for the system to grow in complexity.
Hyperparameter Optimization Frameworks
Another approach is using specialized libraries for hyperparameter optimization, such as Optuna, RayTune, or similar frameworks. While these tools excel at large-scale experimentation and automated search, they come with challenges:
- They are optimized for broad parameter sweeps but become cumbersome when developers need to test specific configurations during day-to-day development.
- Integrating these frameworks often requires significant architectural changes to your codebase, creating a tight coupling between application logic and the optimization framework.
Moving from a Single AI Pipeline to 100s of Pipelines with Hypster
To address these challenges and limitations, we need a paradigm shift.
After years of contemplation, experimentation, and development, I've concluded that the only sensible approach for AI projects is to transition from building a single pipeline to designing a space of potential pipelines — a collection of possible valid pipeline configurations that can be systematically explored and optimized.
This can be achieved by defining configuration functions, which contain:
- Hyperparameters and their potential values.
- Conditional dependencies between these hyperparameters.
- Support for hierarchical and swappable configurations to enable modularity and proper coding practices.
In the following section, I'll introduce Hypster, a powerful tool that I've developed specifically for this purpose.
Introducing Hypster
Hypster is a lightweight open-source tool in the Python AI ecosystem that allows developers to define configuration functions and instantiate them into concrete pipelines using an intuitive, Pythonic API.
Key features include:
- Pythonic API with minimal, intuitive syntax.
- Hierarchical configurations with nested and swappable components.
- Built-in type hints and validation using Pydantic.
- Easy serialization and loading of configurations for portability.
- Built-in support for hyperparameter optimization.
- Jupyter widgets integration for interactive parameter selection.
Introducing HyPSTER: A Pythonic Framework for Managing Configurations to Build Highly Optimized AI…
Converting our RAG Pipeline into Configuration Functions
We'll first take a look at how our RAG pipeline can be implemented using Hypster and then cover the benefits of this approach.
Step 1: Defining Separate Reranker Configurations
from hypster import config, HP
@config
def sentence_transformers_reranker(hp: HP):
model = hp.select([
"ms-marco-TinyBERT-L-2-v2",
"cross-encoder/ms-marco-MiniLM-L-12-v2",
"cross-encoder/ms-marco-MiniLM-L-6-v2",
])
reranker = CrossEncoderRanker(model=model, top_k=hp.int(5))
return reranker
sentence_transformers_reranker.save("sentence_transformers_reranker.py")
@config
def cohere_reranker(hp: HP):
model = hp.select(["rerank-english-v3.0", "rerank-multilingual-v3.0"])
reranker = CohereReranker(model=model, top_k=hp.int(5))
return reranker
cohere_reranker.save("cohere_reranker.py")
These configuration functions don't actually create rerankers; they define the possible values and parameters that can be used when instantiating a reranker later.
By separating these configurations, we maintain modularity and can easily reuse them across different pipelines or projects.
Step 2: Configuring the Full Retrieval Pipeline
@config
def retrieval_config(hp: HP):
retrieval_pipeline = []
bm25_retriever = BM25Retriever(top_k=hp.int(30))
embedders = [
"all-MiniLM-L6-v2",
"all-mpnet-base-v2",
"paraphrase-multilingual-MiniLM-L12-v2",
]
embedding_retriever = EmbeddingRetriever(
model=hp.select(embedders, default="all-MiniLM-L6-v2"),
top_k=hp.int(30),
)
join_modes = [
"reciprocal_rank_fusion",
"distribution_based_rank_fusion",
"merge",
"concatenate",
]
joiner = DocumentJoiner(
join_mode=hp.select(join_modes),
top_k=hp.int(20),
)
retrieval_pipeline.extend([
bm25_retriever,
embedding_retriever,
joiner,
])
reranker_type = hp.select(["none", "sentence_transformers", "cohere"])
match reranker_type:
case "sentence_transformers":
reranker = hp.nest("sentence_transformers_reranker.py")
retrieval_pipeline.append(reranker)
case "cohere":
reranker = hp.nest("cohere_reranker.py")
retrieval_pipeline.extend([docs_to_cohere, reranker])
llm_model = hp.select(["gpt-4o", "gpt-4o-mini", "o1-mini"])
llm = OpenAI(
model=llm_model,
temperature=hp.number(0.1, min=0, max=1),
max_tokens=hp.int(500),
)
retrieval_pipeline.append(llm)
return {"retrieval_pipeline": retrieval_pipeline}
Key Components of This Design:
- Hyperparameters are defined for each component of the pipeline (retrievers, joiner, rerankers, LLMs) along with their potential values.
- Conditional dependencies determine which reranker configuration is selected.
- Nested configurations (
hp.nest
) keep the code modular and reusable across projects.
With this approach, we can define hundreds of potential pipelines by mixing and matching the available configurations.
For a full working solution for modular RAG with Haystack and Hypster, check out Implementing "Modular RAG" with Haystack and Hypster.
Instantiating a Specific Pipeline
Once we've defined the configuration function, the next step is to instantiate a specific pipeline and test it. This is where the power of Hypster comes into play.
# Instantiate a specific pipeline by passing concrete values
results = retrieval_config(values={
"bm25_retriever.top_k": 30,
"embedding_retriever.model": "all-mpnet-base-v2",
"joiner.join_mode": "reciprocal_rank_fusion",
"reranker_type": "sentence_transformers",
"reranker.model": "ms-marco-MiniLM-L-12-v2",
"llm_model": "gpt-4o",
"llm.temperature": 0.6,
})
pipeline = results["retrieval_pipeline"]
We can perform this process in code by defining our values as shown above. Another option is to use Hypster's native Jupyter UI integration, enabling easy exploration of configurations in a visual and user-friendly way.
from hypster.ui import interactive_config
results = interactive_config(retrieval_config)
Each change to an existing hyperparameter value updates the components and their respective values, based on the conditional dependency structure in the code.
Automated Hyperparameter Optimization
While we have the option to manually define configurations, we can also specify a subspace for automatic exploration. This allows us to run searches automatically and simultaneously without constant monitoring. We can even continue improving the codebase while the search is running.
# Define the subspace we want to explore
sub_space = {
"embedding_retriever.model": ["all-MiniLM-L6-v2", "all-mpnet-base-v2"],
"reranker_type": ["none", "cohere"],
"reranker.top_k": [5, 10],
"joiner.join_mode": ["reciprocal_rank_fusion", "merge"],
"llm.temperature": [0.5, 0.7, 0.9],
}
# "get_combinations" is a placeholder for a function that creates
# a cartesian product of the hyperparameter values defined above
combinations = get_combinations(sub_space)
results = []
for comb in combinations: # this can be parallelized
pipeline = retrieval_config(values=comb)
metrics = evaluate(pipeline["retrieval_pipeline"], test_data)
results.append({"config": comb, "metrics": metrics})
In the future, Hypster will support native integrations with optimization libraries such as Optuna to leverage intelligent optimization algorithms.
Summary — Transforming AI Development
This approach fundamentally changes how we approach AI pipeline development. Instead of manually writing code for every experiment or model update, we can define configuration functions that allow us to explore and instantiate hundreds of different pipelines from the same codebase.
Problems Solved:
- Improved code quality through cleaner, modular code with reusable configuration functions.
- Knowledge accumulation where insights from every experiment are preserved and ready to be reused or combined.
- Easily manageable configurations, including nested levels, making it simple to track changes.
Benefits:
- Accelerated development by switching between configurations, testing new ideas, and iterating without rewriting code.
- End-to-end hyperparameter optimization to systematically find the best-performing pipelines.
- Flexibility and future-proofing as new models, frameworks, or requirements emerge.
Now, going back to the two questions that were asked earlier, the answers are straightforward and satisfying:
- How did we arrive at this specific pipeline? We arrived at it by thoroughly examining the space of configurations defined in our configuration functions.
- How will we adapt and improve this pipeline over time? We'll continue to adapt and improve by adding new components and re-evaluating experiments based on new data and requirements.
Outro
Making this knowledge accessible is important to me, so your inputs are valuable. If you have any questions or comments on Hypster or this approach in general, please feel free to reach out.
I also offer consultation and freelance services to companies looking for a structured, common-sense approach to solving business problems using state-of-the-art Generative AI and Machine Learning tools.