0% found this document useful (0 votes)
74 views15 pages

The Tools Landscape For LLM Pipelines Orchestration (Part 1)

Yaa

Uploaded by

Aleks Vorobets
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views15 pages

The Tools Landscape For LLM Pipelines Orchestration (Part 1)

Yaa

Uploaded by

Aleks Vorobets
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

The Tools Landscape for LLM Pipelines Orchestration (Part 1)

1 message

👋
Reply-to: 👋 Damien from the AiEdge
Damien from the AiEdge <[email protected]> Mon, Dec 2, 2024 at 8:18 AM

<reply+2iejnf&c7rrj&&8ff534b661b841cda187b618da29e39340fdf6200cd2f6ca0efa3b16a4add252@mg1.substack.com>
To: [email protected]

Forwarded this email? Subscribe here for more

👋 Hey Damien here! This is a sneak peek of today’s paid newsletter for our
premium subscribers. Get access to this issue and all future issues by
subscribing here:

Upgrade to paid

Last day for Back Friday deal - 40% off:

Machine Learning Fundamentals Bootcamp

Train, Fine-Tune, and Deploy Large Language Models Bootcamp

Machine Learning Fundamentals + Train, Fine-Tune, and Deploy LLMs


Bootcamp Bundle

The Tools Landscape for LLM


Pipelines Orchestration (Part 1)
Micro-orchestration VS Macro-orchestration

DAMIEN BENVENISTE

DEC 2 ∙ PREVIEW

READ IN APP
Micro-Orchestration

Prompt Management

Input preprocessing and output postprocessing

Handling of model-specific parameters and configurations

Chaining of multiple LLM calls within a single logical operation

Integration of external tools or APIs at a task-specific level

Tracing and Logging

Macro-Orchestration

Complex Graphical Applications

Stateful Application

Agentic Design

For a long time, I was in love with LangChain, mostly because the
documentation was structured to educate the users about LLM pipeline
orchestration and showcased how they approached building a solution for
implementing those pipelines. To some extent, all the existing frameworks
took their own opinionated approach to provide solutions to the complexities
around LLM pipeline orchestration.

Getting a wide overview of the different capabilities provided by those


frameworks is a real learning experience in terms of what it means to build
LLM applications, what the typical difficulties are, and how to address those.
There are many overlaps in the capabilities of the different frameworks, but I
tend to separate those by their specialties:

Micro-orchestration: I refer to Micro-orchestration as the fine-grained


coordination and management of individual LLM interactions and
related processes. It is more about the granular details of how data
flows into, through, and out of LLM within a single task or a small set of
related tasks. It involves things like:

Macro-orchestration: it is more about the high-level design,


coordination, and management of complex workflows that may
incorporate multiple LLM interactions, as well as other AI and non-AI
components. It focuses on the overall structure and flow of larger
systems or applications.

Agentic Design Frameworks: These frameworks focus on creating and


managing autonomous or semi-autonomous AI agents that can perform
complex tasks, often involving multiple steps, decision-making, and
interaction with other agents or systems:

Optimizer frameworks: These frameworks use algorithmic approaches,


often inspired by techniques like backpropagation, to optimize prompts,
outputs, and overall system performance in LLM applications. The
optimization process is typically guided by specific performance metrics
or objectives.

As time went on, I realized it tends to be easier to implement myself the


different utilities provided by micro-orchestration frameworks. They tend to
over-complicate things, and it can take longer to debug those frameworks for
a custom use-case than to implement everything from scratch by using the
underlying APIs. However, It is important to not overlook the capabilities
provided for tracing and logging of the different LLM calls.

I believe, however, that it is critical to look at the macro-orchestration


frameworks more closely as they provide a higher level of control that is
fundamental for building large applications.
Nevertheless, let’s review the utilities provided by micro and macro
orchestration frameworks!

Micro-Orchestration
I refer to Micro-orchestration in LLM pipelines as the fine-grained
coordination and management of individual LLM interactions and related
processes. It is more about the granular details of how data flows into,
through, and out of language models within a single task or a small set of
closely related tasks. It can involve things like:

Prompt Management

Input preprocessing and output postprocessing

Data connection

Handling of model-specific parameters and configurations

Chaining of multiple LLM calls within a single logical operation

Integration of external tools or APIs at a task-specific level

The best examples of that are LangChain, LlamaIndex, Haystack, Semantic


Kernel, and AdalFlow.

Prompt Management

All those frameworks, for the better or worse, have a way to structure the
prompt inputs to a model. For example, in LangChain, we can wrap a string
with the PromptTemplate class:
from langchain_core.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
"Tell me a joke about {topic}"
)

prompt_template.invoke({"topic": "cats"})

> StringPromptValue(text='Tell me a joke about cats')

For example, AdalFlow and Haystack use the Jinja2 package as the
templating engine:

from jinja2 import Template

prompt_template = Template("Tell me a joke about {{ topic }}")


prompt_template.render(topic="cats")

> 'Tell me a joke about cats'

This may seem unnecessary in some cases, as we can do pretty much the
same thing with the default Python string:

prompt = "Tell me a joke about {topic}"


prompt.format(topic="cats")

> 'Tell me a joke about cats'

However, this can help with maintenance and safer handling of user inputs as
it allows for the enforcement of all the required variables. Let’s take, for
example, how we create messages in Haystack:

from haystack.dataclasses import ChatMessage


ChatMessage.from_user("Tell me a joke about {topic}")

> ChatMessage(content='Tell me a joke about {topic}', role=<ChatRole


'user'>, name=None, meta={})

It is a Python data class that provides a more robust Python object to validate
the text input than the simpler:

message = {
"content": "Tell me a joke about {topic}",
"role": "user"
}

For example, in Langchain, we can create ChatPrompTemplate object that


will parse all the information:

from langchain_core.prompts import ChatPromptTemplate

messages = [
("system", "You are an AI assistant."),
("user", "Tell me a joke about {topic}"),
]
prompt = ChatPromptTemplate.from_messages(messages)

> ChatPromptTemplate(input_variables=['topic'], messages=[


SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[]
template='You are an AI assistant.')), HumanMessagePromptTemplate(
prompt=PromptTemplate(input_variables=['topic'], template='Tell me a
{topic}'))])

And it becomes easier to manipulate the underlying data. For example, I can
more easily access the input variables:

prompt.input_variables
> ['topic']

Also, the class is going to throw an error if the wrong role is injected:

messages = [
("system", "You are an AI assistant."),
("wrong_role", "Tell me a joke about {topic}"),
]
prompt = ChatPromptTemplate.from_messages(messages)

Although it is not groundbreaking, it provides intermediary checks across the


code for data validation, so bugs are easier to detect.

In most cases, this allows for better integration of the prompting aspect with
the rest of the software. For example, it is used in Langchain to integrate with
the other components, such as models:

from langchain_openai import ChatOpenAI

model = ChatOpenAI()
chain = prompt | model
chain.invoke('cat')

> AIMessage(content="Sure, here's a cat joke for you:\n\nWhy was the


sitting on the computer?\n\nBecause it wanted to keep an eye on the
additional_kwargs={'refusal': None}, response_metadata={'token_usage
{'completion_tokens': 30, 'prompt_tokens': 23, 'total_tokens': 53,
'completion_tokens_details': {'audio_tokens': 0, 'reasoning_tokens':
'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0},
'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}},
'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None,
'finish_reason': 'stop', 'logprobs': None}, id='run-040b7a4e-472a-45
6dccf689cf74-0', usage_metadata={'input_tokens': 23, 'output_tokens'
'total_tokens': 53})

This provides a shorthand notation for injecting prompts into a model in a


controlled manner.

Having more control over the prompt object allows the implementation of
prompt-specific operations. For example, here is how we can build a few-
shots example prompt in Langchain:

from langchain_core.prompts import FewShotPromptTemplate


from langchain_core.prompts import PromptTemplate

# Define the example template


example_prompt = PromptTemplate.from_template(
"Question: {question}\n{answer}"
)

# Examples
examples = [
{"question": "What's 2+2?", "answer": "2+2 = 4"},
{"question": "What's 3+3?", "answer": "3+3 = 6"}
]

# Build the prompt


prompt = FewShotPromptTemplate(
examples=examples,
example_prompt=example_prompt,
suffix="Question: {input}",
input_variables=["input"],
)

prompt.invoke({"input": "What's 5+2?"}).to_string()

> Question: What's 2+2?


2+2 = 4

Question: What's 3+3?


3+3 = 6

Question: What's 5+2?

And here is how you would do the same thing in Jinja2:

from jinja2 import Template

# Define the example template


example_template = Template("Question: {{ question }}\n{{ answer }}"

# Define the full prompt template


prompt_template = Template(
"""{% for example in examples %}
{{ example_template.render(question=example.question, answer=example
{% endfor %}

Question: {{ input }}"""


)

# Render the prompt


prompt = prompt_template.render(
examples=examples,
example_template=example_template,
input="What's 5+2?"
)

Input preprocessing and output postprocessing


Another important aspect of micro-orchestration is the set of utility functions
available to preprocess and post-process the data coming in and out of
models. Most frameworks provide functionalities to load local data:

# LlamaIndex
from llama_index.core import SimpleDirectoryReader
loader = SimpleDirectoryReader("./book")
documents = loader.load_data()

# LangChain
from langchain.document_loaders import DirectoryLoader
loader = DirectoryLoader("./book")
documents = loader.load()

# Haystack
from haystack.components.converters import TextFileToDocument
from pathlib import Path
text_converter = TextFileToDocument()
documents = text_converter.run(
sources=[str(p) for p in Path("./book").glob("*.txt")]
)

> {'documents': [Document(id=cdd554d8c6fb6987d37481b471114e


adce6457a2ced36dbdc821d8f0dfdb4b32, content: '
Chapter I.]

It is a truth universally acknowledged, that a single man in posse


of a goo...', meta: {'file_path': 'book/pride-and-prejudice.txt'})
Those frameworks provide support for a wide variety of data types and file
extentions such as .txt, .pdf, .HTML, .md, .json, .csv, .docx, .xlsx, .pptx, … and
they make it easy to inject data in the application. All those frameworks
maintain a framework-specific Document class to handle text data and its
metadata moving around. All those frameworks provide text splitters
capabilities to split the text in smaller manageable chunks of text that can be
passed to LLMs:

# LangChain
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_ove
docs = splitter.split_documents(documents)

# LlamaIndex
from llama_index.core.node_parser import SentenceSplitter
splitter = SentenceSplitter(chunk_size=1200, chunk_overlap=100)
nodes = splitter.get_nodes_from_documents(documents)

# Haystack
from haystack.components.preprocessors import DocumentSplitter
splitter = DocumentSplitter(split_by="sentence", split_length=3)
docs = splitter.run(documents['documents'])

# AdalFlow
from adalflow.components.data_process.text_splitter import TextSplit
splitter = TextSplitter(split_by="word",chunk_size=50, chunk_overlap
docs = splitter.call(documents=docs)

None of those methods are hard to implement, but they are often useful
utilities that are worth using.

Post-processors can be very useful to convert the free-form text output of


LLMs into structured data that can be used programmatically. All those
frameworks contain multiple types of parser. Here, for example, how we
could parse a JSON formatted string into a Pydantic model in LangChain and
LlamaIndex:
from pydantic import BaseModel, Field
from typing import List

class Actor(BaseModel):
name: str = Field(description="name of an actor")
film_names: List[str] = Field(description="list of names of film
starred in")

json_str = '{"name": "Tom Hanks", "film_names": ["Forrest Gump"]}'

# Langchain
from langchain_core.output_parsers import PydanticOutputParser
parser = PydanticOutputParser(pydantic_object=Actor)
parser.parse(json_str)

# llamaindex
from llama_index.core.output_parsers import PydanticOutputParser
parser = PydanticOutputParser(output_cls=Actor)
parsed = parser.parse(json_str)

> Actor(name='Tom Hanks', film_names=['Forrest Gump'])

In Langchain, we can even use the help of another LLM to correct the format
in case the previous misformatted the output. For example, the following is
not a correct jSON string:

misformatted = "{'name': 'Tom Hanks', 'film_names': 'Forrest Gump']"

But we can create a new parser to reformat the output correctly:

from langchain.output_parsers import OutputFixingParser


new_parser = OutputFixingParser.from_llm(
parser=parser, llm=ChatOpenAI()
)
new_parser.parse(misformatted)
> Actor(name='Tom Hanks', film_names=['Forrest Gump'])

Another useful post-processing is when we need to rerank the documents


coming from a datastore retrieval, for example.

Here is how we can rerank documents in Haystack to induce more diversity in


the provided document based on a specific query:

from haystack.components.rankers import SentenceTransformersDiversit

ranker = SentenceTransformersDiversityRanker(
model="sentence-transformers/all-MiniLM-L6-v2",
similarity="cosine"
)
ranker.warm_up()

query = "How can I maintain physical fitness?"


docs = ranker.run(query=query, documents=docs['documents'])

Handling of model-specific parameters and


configurations
One important aspect of those frameworks is to abstract away the specificity
of the third-party APIs or models you chose to build your pipelines. The way
the models are used is uniformized across the different APIs. For example, in
LangChain, we can instantiate an LLM object interacting with the OpenAI API:
from langchain_openai import ChatOpenAI

# OpenAI model
llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0.7,
)

but we can do the same thing for a local model when using Llama.cpp:

from langchain_community.llms import LlamaCpp

# Local model using llama.cpp


llm = LlamaCpp(
model_path="./models/mistral-7b.gguf",
temperature=0.7,
max_tokens=500,
n_ctx=2048,
n_gpu_layers=1 # Number of layers to offload to GPU
)

As far as the rest of the code is concerned, we just use an LLM object that is
independent from the underlying model, and we can predict without thinking
of the specific API:

chain = prompt | llm


chain.invoke('cat')

That is where the value of those frameworks becomes interesting. When we


build pipelines, we need to integrate multiple tools together, such as LLM
APIs or datastore, and we want to use different LLMs or tools for different
cases without needing to adapt the code for it. So, those frameworks provide
a uniformized platform that will take away the complexity of integration. Most
of those frameworks support those LLM providers:

Amazon Bedrock
Anthropic

Azure OpenAI

Cohere

Google Gemini

HuggingFace

Llama.cpp

Mistral

Nvidia

Ollama

OpenAI

Sagemaker

VertexAI

Those integrations will be able to handle the model-specific configurations


when instantiating.

Chaining of multiple LLM calls within a single logical


operation...

Continue reading this post for free in the Substack


app

Claim my free post

Or upgrade your subscription. Upgrade to paid

LIKE COMMENT

You might also like