List Sources in a Corpus.
List Sources in a Corpus.
Path Parameters
The parent corpus to list sources in.
Query Parameters
The maximum number of sources to return. The service may return fewer than this value. If unspecified, Fixie will choose a sensible default. The maximum allowed value is 1000.
The number of entries in the result set to skip.
- 200
- default
OK
Schema
- Array [
- Array [
- ]
- Array [
- ]
- Array [
- ]
- Array [
- ]
- ]
sources object[]
The list of sources.
A human-readable name for this source.
When this source was created.
When this source was last modified.
The corpus that this source belongs to.
The unique ID of this source.
loadSpec object
The specification of how to acquire documents for this source.
The maximum number of documents to ingest. This cannot exceed 200 in general. If you need more documents in a single corpus, please contact the Fixie team.
The maximum size of an individual document in bytes. If unset, a reasonable default will be chosen by Fixie.
relevantDocumentTypes object
The types of documents to keep. Any documents surfaced during loading that don't match this filter will be discarded. If unset, all documents will be kept.
include object
Mime types must be in this set to be kept. Empty implies the universal set. That is, all mime types will be kept save those in the exclude set.
exclude object
Mime types must not be in this set to be kept. Empty imples the empty set.
web object
Allows loading documents by crawling the web.
Only one of the web
or static
fields may be populated when creating a new source.
The list of start URLs to crawl.
The maximum depth of links to traverse. If 0 (or unset), there will be no depth limit.
A set of glob patterns matched against any additional discovered URLs. URLs matching these patterns will be included in the crawl, unless the URL matches any of the exclude_glob_patterns.
A set of glob patterns matched against any additional discovered URLs. URLs matching these patterns will be excluded from the crawl.
static object
Allows loading documents from a static source (e.g. a file upload).
Only one of the web
or static
fields may be populated when creating a new source.
documents object[] required
The documents to load.
The filename of the document.
The MIME type of the document.
The contents of the document.
metadata object
The metadata to attach to this document.
The public URL of the document, if any.
The BCP47 language code of the document, if known.
The title of the document, if known.
The description of the document, if known.
The timestamp that the document was published, if known.
processSteps object[]
How to process documents for this source. Each step will be run in order over all applicable documents (as defined by each step's relevant_document_types).
If no steps are specified, the service will add default document transformation steps to convert various document types into text easily understood by an LLM.
The human-readable name of the step.
relevantDocumentTypes object
A Filter to apply to mime types.
include object
Mime types must be in this set to be kept. Empty implies the universal set. That is, all mime types will be kept save those in the exclude set.
exclude object
Mime types must not be in this set to be kept. Empty imples the empty set.
Transforms an HTML document into Markdown.
Transforms a binary document into plain text.
chunkSpec object
Specification of how to chunk documents.
inputSelector object
The input documents that should be chunked. Only documents that correspond to UTF-8 encoded text can be chunked. Any other kind of document will fail.
mimeTypeFilter object
Filters documents based on their mime type.
include object
Mime types must be in this set to be kept. Empty implies the universal set. That is, all mime types will be kept save those in the exclude set.
exclude object
Mime types must not be in this set to be kept. Empty imples the empty set.
originFilter object
Filters documents based on their origin.
origins object[]
Document origins must match one of these to be kept.
The desired chunk size for each chunk, in tokens. This is a strict maximum, as well as a target. Adjacent chunks will be combined if their total size is under this limit.
The maximum number of chunks to produce for an individual document.
The maximum number of chunks to produce in total. This cannot exceed 5000 in general. If you need more chunks in a single source, please contact the Fixie team.
embedSteps object[]
How to embed chunks for this source. Defaults will be provided if unset.
The human-readable name for this step.
Directly embeds chunks.
Embeds chunks using a parent-child strategy. Each chunk is split into multiple children, which are embedded individually. When the child is semantically similar to a query string, the parent is returned. This strategy produces no results for small chunks as it never returns the parent chunk itself. To embed the parent chunks also, use the DirectEmbedStrategy in addition to this one.
A human-readable description of this source.
stats object
The stats for a source.
Possible values: [SOURCE_STATUS_UNSPECIFIED
, SOURCE_STATUS_INITIALIZING
, SOURCE_STATUS_READY
, SOURCE_STATUS_UPDATING
]
The current status of this source, indicating whether it affects queries.
This field should not be populated in client requests.
When a job last completed for this source.
This field should not be populated in client requests.
The total number of documents in this source.
schedule object
How/whether to automatically refresh a source.
Whether to enable automatic refresh for this source.
pageInfo object
Information about the page of results returned.
The number of results requested.
The offset specified in the request.
The total number of results available.
{
"sources": [
{
"displayName": "string",
"created": "2024-03-07T22:56:08.200Z",
"modified": "2024-03-07T22:56:08.200Z",
"corpusId": "string",
"sourceId": "string",
"loadSpec": {
"maxDocuments": 0,
"maxDocumentBytes": 0,
"relevantDocumentTypes": {
"include": {
"mimeTypes": [
"string"
]
},
"exclude": {
"mimeTypes": [
"string"
]
}
},
"web": {
"startUrls": [
"string"
],
"maxDepth": 0,
"includeGlobPatterns": [
"string"
],
"excludeGlobPatterns": [
"string"
]
},
"static": {
"documents": [
{
"filename": "string",
"mimeType": "string",
"contents": "string",
"metadata": {
"publicUrl": "string",
"language": "string",
"title": "string",
"description": "string",
"published": "2024-03-07T22:56:08.201Z"
}
}
]
}
},
"processSteps": [
{
"stepName": "string",
"relevantDocumentTypes": {
"include": {
"mimeTypes": [
"string"
]
},
"exclude": {
"mimeTypes": [
"string"
]
}
},
"htmlToMarkdown": {},
"unstructuredProcessor": {}
}
],
"chunkSpec": {
"inputSelector": {
"mimeTypeFilter": {
"include": {
"mimeTypes": [
"string"
]
},
"exclude": {
"mimeTypes": [
"string"
]
}
},
"originFilter": {
"origins": [
{
"load": true,
"processStep": "string"
}
]
}
},
"chunkSize": 0,
"maxChunksPerDocument": 0,
"maxChunksTotal": 0
},
"embedSteps": [
{
"stepName": "string",
"direct": {},
"parentChild": {}
}
],
"description": "string",
"stats": {
"status": "SOURCE_STATUS_UNSPECIFIED",
"lastUpdated": "2024-03-07T22:56:08.203Z",
"numDocs": 0
},
"schedule": {
"enablePeriodicRefresh": true
}
}
],
"pageInfo": {
"requestedPageSize": 0,
"requestedOffset": 0,
"totalResultCount": 0
}
}
Default error response
Schema
- Array [
- ]
The status code, which should be an enum value of [google.rpc.Code][google.rpc.Code].
A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the [google.rpc.Status.details][google.rpc.Status.details] field, or localized by the client.
details object[]
A list of messages that carry the error details. There is a common set of message types for APIs to use.
The type of the serialized message.
{
"code": 0,
"message": "string",
"details": [
{
"@type": "string"
}
]
}