List Jobs.

Path Parameters

corpusId string required

The ID of the corpus owning the source to list jobs in.

sourceId string required

The parent source to list jobs in.

Query Parameters

limit int32

The maximum number of results to return. The service may return fewer than this value. If unspecified, the service will choose a sensible default. The maximum allowed value is 1000.

offset int32

The number of entries in the result set to skip.

Responses

200
default

application/json

Schema

Example (from schema)

Schema

jobs object[]

The list of jobs.

Array [

corpusId string

The corpus that this job belongs to.

sourceId string

The source that this job belongs to.

jobId string

The unique ID of this job.

parentJobId string

For document updates and deletions, each job may spawn children to update documents derived from the updated document. If this job is such a child, then this is the parent job ID. (The parent is owned by the same corpus and source.)

state enum

Possible values: [JOB_STATE_UNSPECIFIED, JOB_STATE_PENDING, JOB_STATE_RUNNING, JOB_STATE_COMPLETED, JOB_STATE_FAILED, JOB_STATE_CANCELLED]

The current state of the job.

created date-time

The timestamp that the job was requested.

started date-time

The timestamp that the job began.

completed date-time

The timestamp that the job completed.

errorMessage string

If the job failed (or was cancelled), the error message describing the failure.

loadSpec object

The specification of how to acquire documents for this source.

maxDocuments int32

The maximum number of documents to ingest. This cannot exceed 200 in general. If you need more documents in a single corpus, please contact the Fixie team.

maxDocumentBytes int32

The maximum size of an individual document in bytes. If unset, a reasonable default will be chosen by Fixie.

relevantDocumentTypes object

The types of documents to keep. Any documents surfaced during loading that don't match this filter will be discarded. If unset, all documents will be kept.

include object

Mime types must be in this set to be kept. Empty implies the universal set. That is, all mime types will be kept save those in the exclude set.

mimeTypes string[]

exclude object

Mime types must not be in this set to be kept. Empty imples the empty set.

mimeTypes string[]

web object

Allows loading documents by crawling the web.

Only one of the web or static fields may be populated when creating a new source.

startUrls string[] required

The list of start URLs to crawl.

maxDepth int32

The maximum depth of links to traverse. If 0 (or unset), there will be no depth limit.

includeGlobPatterns string[]

A set of glob patterns matched against any additional discovered URLs. URLs matching these patterns will be included in the crawl, unless the URL matches any of the exclude_glob_patterns.

excludeGlobPatterns string[]

A set of glob patterns matched against any additional discovered URLs. URLs matching these patterns will be excluded from the crawl.

static object

Allows loading documents from a static source (e.g. a file upload).

Only one of the web or static fields may be populated when creating a new source.

documents object[] required

The documents to load.

Array [

filename string required

The filename of the document.

mimeType string required

The MIME type of the document.

contents bytes required

The contents of the document.

metadata object

The metadata to attach to this document.

publicUrl string

The public URL of the document, if any.

language string

The BCP47 language code of the document, if known.

title string

The title of the document, if known.

description string

The description of the document, if known.

published date-time

The timestamp that the document was published, if known.

]

updatedDocumentId string

The ID of the document that was updated whose direct children should be reprocessed and whose chunks should be recomputed as part of this job. (The document is owned by the same corpus and source as this job.)

Only one of the load_spec, updated_document_id, or deleted_document_id fields may be populated.

deletedDocumentId string

The ID of the document to be deleted whose direct children and chunks should be deleted (or updated in the case of a child created by an aggregation processing step) as part of this job. (The document is owned by the same corpus and source as this job.)

Only one of the load_spec, updated_document_id, or deleted_document_id fields may be populated.

processSteps object[]

The ProcessSteps used during this job. These are the ProcessSteps defined for the Source at the time this job was created. In the case of an updated/deleted document child job, this may be a sublist of the Source's ProcessSteps.

Array [

stepName string required

The human-readable name of the step.

relevantDocumentTypes object

A Filter to apply to mime types.

include object

Mime types must be in this set to be kept. Empty implies the universal set. That is, all mime types will be kept save those in the exclude set.

mimeTypes string[]

exclude object

Mime types must not be in this set to be kept. Empty imples the empty set.

mimeTypes string[]

htmlToMarkdown object

Transforms an HTML document into Markdown.

unstructuredProcessor object

Transforms a binary document into plain text.

]

chunkSpec object

Specification of how to chunk documents.

inputSelector object

The input documents that should be chunked. Only documents that correspond to UTF-8 encoded text can be chunked. Any other kind of document will fail.

mimeTypeFilter object

Filters documents based on their mime type.

include object

Mime types must be in this set to be kept. Empty implies the universal set. That is, all mime types will be kept save those in the exclude set.

mimeTypes string[]

exclude object

Mime types must not be in this set to be kept. Empty imples the empty set.

mimeTypes string[]

originFilter object

Filters documents based on their origin.

origins object[]

Document origins must match one of these to be kept.

Array [

load boolean

processStep string

]

chunkSize int32

The desired chunk size for each chunk, in tokens. This is a strict maximum, as well as a target. Adjacent chunks will be combined if their total size is under this limit.

maxChunksPerDocument int32

The maximum number of chunks to produce for an individual document.

maxChunksTotal int32

The maximum number of chunks to produce in total. This cannot exceed 5000 in general. If you need more chunks in a single source, please contact the Fixie team.

embedSteps object[]

The EmbedSteps used during this job. These are the EmbedSteps defined for the Source at the time this job was created.

Array [

stepName string

The human-readable name for this step.

direct object

Directly embeds chunks.

parentChild object

Embeds chunks using a parent-child strategy. Each chunk is split into multiple children, which are embedded individually. When the child is semantically similar to a query string, the parent is returned. This strategy produces no results for small chunks as it never returns the parent chunk itself. To embed the parent chunks also, use the DirectEmbedStrategy in addition to this one.

]

loadResult object

The results from loading.

started date-time

The timestamp at which loading began.

completed date-time

The timestamp at which loading completed.

createdDocsCount int32

The number of documents created.

updatedDocsCount int32

The number of documents that existed previously and were updated.

unchangedDocsCount int32

The number of documents that existed previously and were not modified.

deletedDocsCount int32

The number of documents deleted because they're no longer present in the source.

sizeFilteredDocsCount int32

The number of documents omitted due to content size.

typeFilteredDocsCount int32

The number of documents omitted due to mime type.

processStepResults object[]

The results of each processing step.

Array [

stepName string

The step that produced these results.

expectedOutputDocsCount int32

The number of documents expected from this step prior to execution.

started date-time

The timestamp that this processing step began.

completed date-time

The timestamp that this processing step completed.

producedDocsCount int32

The total number of documents produced by this step.

failedDocsCount int32

The number of documents that failed to be processed.

createdDocsCount int32

The number of documents created.

updatedDocsCount int32

The number of documents that existed previously and were updated.

unchangedDocsCount int32

The number of documents that existed previously for which processing produced the same result as before.

deletedDocsCount int32

The number of documents deleted because they were previously produced by this step but weren't with the latest input.

]

chunkResult object

The results from chunking.

started date-time

The timestamp that chunking began.

completed date-time

The timestamp that chunking completed.

expectedDocsCount int32

The number of documents expected to be chunked prior to execution.

successfulDocsCount int32

The number of documents successfully chunked.

failedDocsCount int32

The number of documents that failed to be chunked.

createdChunksCount int32

The number of chunks created.

unchangedChunksCount int32

The number of chunks that were not modified.

deletedChunksCount int32

The number of chunks deleted.

embedStepResults object[]

The results of each embedding step.

Array [

stepName string

The step that produced these results.

started date-time

The timestamp that this embedding step began.

completed date-time

The timestamp that this embedding step completed.

expectedChunksCount int32

The number of chunks expected to be embedded by this step prior to execution.

successfulChunksCount int32

The number of chunks successfully embedded.

failedChunksCount int32

The number of chunks that failed to be embedded.

createdVectorsCount int32

The number of vectors created.

unchangedVectorsCount int32

The number of vectors that were not modified.

deletedVectorsCount int32

The number of vectors deleted.

]

pageInfo object

Information about the page of results returned.

requestedPageSize int32

The number of results requested.

requestedOffset int32

The offset specified in the request.

totalResultCount int32

The total number of results available.

{
  "jobs": [
    {
      "corpusId": "string",
      "sourceId": "string",
      "jobId": "string",
      "parentJobId": "string",
      "state": "JOB_STATE_UNSPECIFIED",
      "created": "2024-03-07T22:56:08.251Z",
      "started": "2024-03-07T22:56:08.251Z",
      "completed": "2024-03-07T22:56:08.251Z",
      "errorMessage": "string",
      "loadSpec": {
        "maxDocuments": 0,
        "maxDocumentBytes": 0,
        "relevantDocumentTypes": {
          "include": {
            "mimeTypes": [
              "string"
            ]
          },
          "exclude": {
            "mimeTypes": [
              "string"
            ]
          }
        },
        "web": {
          "startUrls": [
            "string"
          ],
          "maxDepth": 0,
          "includeGlobPatterns": [
            "string"
          ],
          "excludeGlobPatterns": [
            "string"
          ]
        },
        "static": {
          "documents": [
            {
              "filename": "string",
              "mimeType": "string",
              "contents": "string",
              "metadata": {
                "publicUrl": "string",
                "language": "string",
                "title": "string",
                "description": "string",
                "published": "2024-03-07T22:56:08.254Z"
              }
            }
          ]
        }
      },
      "updatedDocumentId": "string",
      "deletedDocumentId": "string",
      "processSteps": [
        {
          "stepName": "string",
          "relevantDocumentTypes": {
            "include": {
              "mimeTypes": [
                "string"
              ]
            },
            "exclude": {
              "mimeTypes": [
                "string"
              ]
            }
          },
          "htmlToMarkdown": {},
          "unstructuredProcessor": {}
        }
      ],
      "chunkSpec": {
        "inputSelector": {
          "mimeTypeFilter": {
            "include": {
              "mimeTypes": [
                "string"
              ]
            },
            "exclude": {
              "mimeTypes": [
                "string"
              ]
            }
          },
          "originFilter": {
            "origins": [
              {
                "load": true,
                "processStep": "string"
              }
            ]
          }
        },
        "chunkSize": 0,
        "maxChunksPerDocument": 0,
        "maxChunksTotal": 0
      },
      "embedSteps": [
        {
          "stepName": "string",
          "direct": {},
          "parentChild": {}
        }
      ],
      "loadResult": {
        "started": "2024-03-07T22:56:08.256Z",
        "completed": "2024-03-07T22:56:08.256Z",
        "createdDocsCount": 0,
        "updatedDocsCount": 0,
        "unchangedDocsCount": 0,
        "deletedDocsCount": 0,
        "sizeFilteredDocsCount": 0,
        "typeFilteredDocsCount": 0
      },
      "processStepResults": [
        {
          "stepName": "string",
          "expectedOutputDocsCount": 0,
          "started": "2024-03-07T22:56:08.256Z",
          "completed": "2024-03-07T22:56:08.256Z",
          "producedDocsCount": 0,
          "failedDocsCount": 0,
          "createdDocsCount": 0,
          "updatedDocsCount": 0,
          "unchangedDocsCount": 0,
          "deletedDocsCount": 0
        }
      ],
      "chunkResult": {
        "started": "2024-03-07T22:56:08.256Z",
        "completed": "2024-03-07T22:56:08.256Z",
        "expectedDocsCount": 0,
        "successfulDocsCount": 0,
        "failedDocsCount": 0,
        "createdChunksCount": 0,
        "unchangedChunksCount": 0,
        "deletedChunksCount": 0
      },
      "embedStepResults": [
        {
          "stepName": "string",
          "started": "2024-03-07T22:56:08.256Z",
          "completed": "2024-03-07T22:56:08.256Z",
          "expectedChunksCount": 0,
          "successfulChunksCount": 0,
          "failedChunksCount": 0,
          "createdVectorsCount": 0,
          "unchangedVectorsCount": 0,
          "deletedVectorsCount": 0
        }
      ]
    }
  ],
  "pageInfo": {
    "requestedPageSize": 0,
    "requestedOffset": 0,
    "totalResultCount": 0
  }
}

Default error response

application/json

Schema

Example (from schema)

Schema

code int32

The status code, which should be an enum value of [google.rpc.Code][google.rpc.Code].

message string

A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the [google.rpc.Status.details][google.rpc.Status.details] field, or localized by the client.

details object[]

A list of messages that carry the error details. There is a common set of message types for APIs to use.

Array [

@type string

The type of the serialized message.

]

{
  "code": 0,
  "message": "string",
  "details": [
    {
      "@type": "string"
    }
  ]
}

List Jobs.​

List Jobs.