An empty workflow, with some payload. The response will be the same as the payload.
Request:
POST /api/jobs
{
"_workflow": {},
"foo": 555,
"bar": [ 1, 2 ]
}
Response:
{
"foo": 555,
"bar": [ 1, 2 ]
}
A basic workflow, without functions yet. "xin" is a JMESPath expression that transforms the input for the function to invoke. There's no function yet, so the response will be the same as the transformed input.
Request:
POST /api/jobs
{
"_workflow": {
"steps": [
{
"xin": "{ field1: state.foo, field2: state.bar }"
}
]
},
"foo": 555,
"bar": [ 1, 2 ]
}
What happened?
- The input is transformed using the JMESPath expression in "xin".
- The expression is creating a new object, see leading '{' and trailing '}'.
- The object contains two fields:
field1
andfield2
. field1
is set to the value ofstate.foo
, which is 555.field2
is set to the value ofstate.bar
, which is [ 1, 2 ].
Note the use of state.
prefix, which allows to reference the input payload. While running the workflow, the orchestrator
maintains an internal context object. The context object contains multiple fields, including:
start
: the initial payload, e.g.start.foo
is 555 andstart.bar
is[ 1, 2 ]
.state
: the current state of the payload, which is updated as the workflow progresses.
Response:
{
"field1": 555,
"field2": [ 1, 2 ]
}
A basic workflow, without functions yet. "xout" is a JMESPath expression that transforms the function output.
There's no function yet, so the transformation is applied to the output of "xin". The example shows also how to
use the start.
prefix to reference the initial payload.
Request:
POST /api/jobs
{
"_workflow": {
"steps": [
{
"xin": "{ field1: start.foo, field2: start.bar }",
"xout": "[ state.field1, state.field2[0] ]"
}
]
},
"foo": 555,
"bar": [ 1, 2 ]
}
What happened?
- The input was transformed using the JMESPath expression in "xin". After this step, the state
object contains two fields:
field1
andfield2
:{ field1: 555, field2: [ 1, 2 ] }
. - There is no function to invoke, so the output is the same as the state object.
- The current state object is transformed using the JMESPath expression in "xout". The expression is creating a new array, see leading '[' and trailing ']'.
- The array contains two elements:
state.field1
andstate.field2[0]
. state.field1
is555
andstate.field2[0]
is1
.- The final output is
[ 555, 1 ]
.
Response:
[ 555, 1]
The project includes a Wikipedia function based on English Wikipedia API that can be used to fetch content, e.g. to run tests.
The following examples show how to fetch some content from Wikipedia, with and without transformations.
The "wikipedia" function expects in input a title
field, which is provided in the request payload.
POST /api/jobs
{
"_workflow": {
"steps": [
{
"function": "wikipedia/en"
}
]
},
"title": "Moon"
}
The following example transforms the output of the Wikipedia function using a JMESPath expression, so that the output is a single string. Note that the string is JSON encoded.
POST /api/jobs
{
"_workflow": {
"steps": [
{
"function": "wikipedia/en",
"xout": "state.content"
}
]
},
"title": "Moon"
}
The following example transforms the input to read the title
field from a configuration
input object,
and applies the same transformation to return a string.
Using configuration and context input objects is a fundamental pattern in the framework, it allows to change configurations on the fly, without redeploying the orchestrator or the functions. In fact functions work better with the framework when they don't have any configuration hardcoded.
POST /api/jobs
{
"_workflow": {
"steps": [
{
"xin": "{ title: state.configuration.title }",
"function": "wikipedia/en",
"xout": "state.content"
}
]
},
"configuration": {
"title": "Moon"
}
}
The extractor tool provides functions to extract text from documents and images. The following example shows how to extract text from a Word file. The Word file is provided as a base64-encoded.
See the examples folder for a real base64 value if you want to run the example.
Request:
POST /api/jobs
{
"_workflow": {
"steps": [
{
"function": "extractor"
}
]
},
"fileName": "test.docx",
"mimeType": "",
"content": "UEsDBBQAAAAIALJ4WVrTkNHUd.....54bWxQSwUGAAAAAAwADAAIAwAADisAAAAA"
}
Response:
{
"sections": [
{
"content": "This is a test. The Word document has only one line of text.\n",
"metadata": {
"PageNumber": -1,
"completeSentences": "true"
}
}
],
"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"fullText": "This is a test. The Word document has only one line of text.\n",
"size": 11820
}
When sending a request with data and a workflow, it can be useful to include data
that is not part of the workflow, or not part of the first step, and only need to
be used in later steps. A classic example is sending configuration settings for each
step. Another example is uploading a file and including tags used to classify the
content.
The following example shows how to send a file, including tags, and merging values
using the merge
JMESPath function.
Request:
POST /api/jobs
{
"_workflow": {
"steps": [
{
"function": "extractor",
"xout": "merge(content: state.fullText, { reference:start.reference, tags:start.tags })"
}
]
},
"fileName": "test.docx",
"content": "UEsDBBQAAAAIALJ4WVrTkNHUd.....54bWxQSwUGAAAAAAwADAAIAwAADisAAAAA",
"tags": [ "user:001", "user:007", "source:doc" ],
"reference": "https://sharepoint.com/sites/contoso/Shared%20Documents/test.docx"
}
Response:
{
"content": "This is a test. The Word document has only one line of text.\n",
"reference": "https://sharepoint.com/sites/contoso/Shared%20Documents/test.docx",
"tags": [
"user:1",
"user:2",
"project:test"
]
}
Note the usage of start.
prefix to reference the initial payload, state.
prefix
to reference the current state of the payload, and merge
function to merge data from
the two objects.
The following example shows how to combine all the features seen so far, plus the ability
to concatenate multiple steps into a single workflow. The example shows how to extract text
from a Word file, and chunk it into smaller pieces. The request includes all the settings
for the chunking step under a config
object.
The final response includes the output of the chunker step plus the tags
and reference
field from the initial payload.
The response includes also a count
field calculated by the JMESPath length
function.
For more functions and JMESPath examples see the JMESPath documentation.
Request:
POST /api/jobs
{
"_workflow": {
"id": "base64upload",
"steps": [
{
"function": "extractor",
"xout": "merge({ tags:start.tags, reference:start.reference }, state)"
},
{
"xin": "{ text: state.fullText, maxTokensPerChunk: start.config.chunkSize, overlap: start.config.overlap, chunkHeader: start.config.chunkHeader, tokenizer: start.config.tokenizer}",
"function": "chunker",
"xout": "{ chunks: state.chunks, count: length(state.chunks), tags: start.tags }"
}
]
},
"config": {
"chunkSize": 500,
"overlap": 0,
"chunkHeader":"==============\n",
"tokenizer": "cl100k_base"
},
"tags": [
"user:1",
"user:2",
"project:test"
],
"reference": "https://sharepoint.com/sites/contoso/Shared%20Documents/test.docx",
"fileName": "example.docx",
"content": "UEsDBBQAAAAIALJ4WVrTkNHUd.....54bWxQSwUGAAAAAAwADAAIAwAADisAAAAA",
}
Response:
Note: the content is truncated for brevity.
{
"chunks": [
"==============\nA strange and surprising event........\n",
"==============\nId est, Rubio's 'lost in space........\n",
[...]
"==============\nReligious leaders offered dive........\n",
"==============\nSatellite observation networks........"
],
"count": 6,
"tags": [
"user:1",
"user:2",
"project:test"
]
}
This example is similar to the previous one, but uses Wikipedia tool which is included in the project for learning/test purposes. Rather than uploading a file, the example asks the Wikipedia tool to fetch some content, and then chunk it.
Request:
POST /api/jobs
{
"_workflow": {
"steps": [
{
"xin": "{ title: start.config.page }",
"function": "wikipedia/en"
},
{
"xin": "{ text: state.content, maxTokensPerChunk: start.config.chunkSize, overlap: start.config.overlap, chunkHeader: start.config.chunkHeader, tokenizer: start.config.tokenizer}",
"function": "chunker"
}
]
},
"config": {
"page": "Microsoft",
"chunkSize": 500,
"overlap": 0,
"chunkHeader": "==============\n",
"tokenizer": "cl100k_base"
}
}
Response:
Note: the content is truncated for brevity.
{
"chunks": [
"==============\nMicrosoft Corporation is.......... ",
"==============\nIts flagship hardware pr.........\n",
[...]
"==============\nMicrosoft was the first .......... ",
"==============\ngovernment to curb the p...........:"
]
}
The following example shows how to generate text embeddings using the embedding-generator
tools,
plus a few additional features:
- The embedding generator can work on a single
input
string or multipleinputs
strings, and vectors can be truncated to a specific number of dimensions (if the model supports it). - String literals can be inlined using the
'
single quote, boolean and number values can be inlined using the backtick`
character. - Steps can have a unique ID, allowing "input" and "output" of each step to be referenced
in other steps, using the
<step id>.in.<field>
and<step id>.out.<field>
syntax.
Request:
POST /api/jobs
{
"_workflow": {
"steps": [
{
"id": "embedding1",
"xin": "{ modelId: 'openai-text-embedding-3-small', input: 'one', dimensions: `3` }",
"function": "embedding-generator"
},
{
"id": "embedding2",
"xin": "{ modelId: 'openai-text-embedding-3-small', inputs: ['one', 'two'], dimensions: `3` }",
"function": "embedding-generator"
},
{
"xout": "{ example1: embedding1.out.embedding, example2: embedding2.out.embeddings }"
}
]
}
}
Response:
{
"example1": [ -0.11096065, -0.762834, 0.63700235 ],
"example2": [
[ -0.11166364, -0.7626128, 0.6371444 ],
[ -0.15845111, -0.3335682, 0.9293145 ]
]
}
The following example shows several features put together, chunking text, generating embeddings with different models, and using custom models not present in the configuration.
- The input includes two strings, one to be chunked and vectorized, one to be just vectorized.
- The two vectorizations are generated with different embedding models.
- One of the embedding model is configured ("openai-text-embedding-3-small"), while the configuration for the other ("text-embedding-3-small") is specified in the workflow, including the endpoint to use.
- The workflow has a defined ID, so that state on disk is stored on a specific folder.
Request:
POST /api/jobs
{
"_workflow": {
"id": "chunkAndEmbed",
"steps": [
{
"id": "chunker",
"xin": "{ text: state.content, maxTokensPerChunk: start.config.chunkSize, overlap: start.config.overlap }",
"function": "chunker"
},
{
"id": "sentenceembedding",
"xin": "{ modelId: start.config.configuredModel, input: start.sentence, dimensions: `3` }",
"function": "embedding-generator"
},
{
"id": "chunksembedding",
"xin": "{ provider: 'azureai', auth: 'AzureIdentity', endpoint: start.config.customEndpoint, modelId: start.config.customModel, inputs: chunker.out.chunks, supportsCustomDimensions: `true`, dimensions: `5` }",
"function": "embedding-generator/custom"
},
{
"xout": "{ tags: start.tags, sentence: sentenceembedding.out.embedding, chunks: chunksembedding.out.embeddings }"
}
]
},
"config": {
"chunkSize": 20, "overlap": 0,
"customEndpoint": "https://contoso.cognitiveservices.azure.com/",
"customModel": "text-embedding-3-small",
"configuredModel": "openai-text-embedding-3-small"
},
"content": "Cerco l'estate tutto l'anno\nE all'improvviso eccola qua\nLei è partita per le spiagge\nE sono solo quassù in città\nSento fischiare sopra i tetti\nUn aeroplano che se ne va",
"sentence": "Hello world, let's play a song",
"tags": [ "type:song", "lang:it-it" ]
}
Response:
{
"tags": [ "type:song", "lang:it-it" ],
"sentence": [ 0.088575736, -0.99545836, 0.034884922 ],
"chunks": [
[ 0.3107477, -0.36262417, 0.25312138, 0.8075352, 0.23612736 ],
[ 0.7127997, -0.26156384, 0.64893836, 0.038096927, -0.030472761 ],
[ 0.11102999, -0.13491312, 0.8772006, 0.17376299, -0.4120635 ]
]
}
Dive into EXAMPLE-RAG-INGESTION.md to learn more.