API Reference

class averbis.DocumentCollection(project, name)[source]
create_and_run_process(process_name, pipeline, annotation_types=None, send_to_search=None, send_chunks_to_search=None)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Creates a process and runs the document analysis. :param process_name : The name of the newly created process :param pipeline : The name of the pipeline or a reference to the Pipeline that should be used :type annotation_types: Union[None, str, List[str]] :param annotation_types: Optional parameter indicating which types should be saved. Supports wildcard expressions,

  • Example 1: “de.averbis.types.*” returns all types with prefix “de.averbis.types”.

  • Example 2: Can also be a list of type names, e.g. [“de.averbis.types.health.Diagnosis”, “de.averbis.types.health.Medication”]

Parameters:
  • send_to_search (Optional[bool]) – Determines if the created process should be searchable.

  • send_chunks_to_search (Optional[bool]) – Determines if the created process should make the chunks searchable.

Return type:

Process

Returns:

The created process

create_process(process_name, is_manual_annotation=False, annotation_types=None, send_to_search=None, send_chunks_to_search=None)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Creates a process without a pipeline (e.g. for later manual annotation or text analysis result import) :param process_name : The name of the newly created process :type is_manual_annotation: bool :param is_manual_annotation: The created process will be used for manual annotation i.e. the underlying data structure will be initialized immediately and not on later text analysis result import :param annotation_types : Optional String parameter indicating which types should be saved. Supports wildcard expressions,

  • Example 1: “de.averbis.types.*” returns all types with prefix “de.averbis.types”.

  • Example 2: Can also be a list of type names, e.g. [“de.averbis.types.health.Diagnosis”, “de.averbis.types.health.Medication”]

Parameters:
  • send_to_search (Optional[bool]) – Determines if the created process should be searchable.

  • send_chunks_to_search (Optional[bool]) – Determines if the created process should make the chunks searchable.

Return type:

Process

Returns:

The created process

delete()[source]

Deletes the document collection.

Return type:

dict

delete_document(document_name)[source]

Delete the document identified by name from this docuemnt collection

Return type:

dict

export_json_document_stream(document_names=None)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Export documents from a collection and return a generator of document objects.

Parameters:

document_names (Optional[List[str]]) – Optional list of document names to export (empty list exports all)

Yields:

Document dictionaries from the export response

Return type:

ExportDocumentStream

get_number_of_documents()[source]

Returns the number of documents in that collection.

Return type:

int

get_process(process_name)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Get a process :rtype: Process :return: The process

import_documents(source, mime_type=None, filename=None, typesystem=None, textanalysis_mode=None)[source]

Imports documents from a given file, from a given string or from a dictionary (representing the json-format). Supported file content types are - plain text (text/plain), - json, containing the text in field ‘content’, the document name in field ‘documentName’ and optional key-value pair metadata (application/json) or multiple documents with these fields in a field named “documents”, - Averbis Solr XML (application/vnd.averbis.solr+xml), - Supported UIMA File Types.

If a document is provided as a CAS object, the type system information can be automatically picked from the CAS object and should not be provided explicitly. If a CAS is provided as a string XML representation, then a type system must be explicitly provided.

The method tries to automatically determine the format (mime type) of the provided document, so setting the mime type parameter should usually not be necessary.

If possible, the method obtains the filename from the provided source. If this is not possible (e.g. if the source is a string or a CAS object), the filename should explicitly be provided. Note that a file in the Averbis Solr XML format can contain multiple documents and each of these has its name encoded within the XML. In this case, the setting filename parameter is not permitted at all.

Parameters:

textanalysis_mode (Optional[TextanalysisMode]) – Optional text analysis mode, controls how the imported document is analysed. TextanalysisMode.TRIGGER_PROCESSES (default): trigger a reprocess in all processes of the document TextanalysisMode.DO_NOTHING: no processes are triggered, textanalysis results of the document are kept TextanalysisMode.REMOVE_RESULTS: remove the textanalysis results of all processes for this document

Return type:

List[dict]

import_json_document_stream(document_generator)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Import documents to a collection from a document generator.

Parameters:

document_generator (Iterator[Dict[str, Any]]) – Iterator yielding document dictionaries

Return type:

dict

Returns:

Response object from the import request

list_documents()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Lists the documents in the collection.

Return type:

dict

list_processes()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Lists the processes of the collection.

Return type:

List[Process]

read_document_text(document_name)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Read the text of the given document.

Parameters:

document_name (str) – Name of the document to read

Return type:

str

Returns:

Document text

class averbis.EvaluationConfiguration(comparison_annotation_type_name, features_to_compare, reference_annotation_type_name=None, **kwargs)[source]
add_feature(feature_name)[source]
Return type:

EvaluationConfiguration

use_enclosing_annotation_partial_match(enclosing_annotation_type_name)[source]

Annotations that are covered by the given annotation type are used to calculate partial positives. Normally, these will replace a FalsePositive or FalseNegative if a partial match is identified.

Return type:

EvaluationConfiguration

use_overlap_partial_match()[source]

Overlapping annotations are used to calculate partial positives. Normally, these will replace a FalsePositive or FalseNegative if a partial match is identified.

Return type:

EvaluationConfiguration

use_range_variance_partial_match(range_variance)[source]

Annotations that are offset by the given variance are used to calculate partial positives. Normally, these will replace a FalsePositive or FalseNegative if a partial match is identified.

Return type:

EvaluationConfiguration

exception averbis.ExtendedRequestException(*args, status_code=None, reason=None, url=None, error_message=None, **kwargs)[source]
class averbis.Pear(project, identifier)[source]
delete()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Deletes the PEAR.

get_default_configuration()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Get the default configuration of the PEAR.

Return type:

dict

class averbis.Pipeline(project, name)[source]
STATE_STARTED = 'STARTED'
STATE_STARTING = 'STARTING'
STATE_STOPPED = 'STOPPED'
STATE_STOPPING = 'STOPPING'
analyse_cas_to_cas(source, language=None, timeout=None)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear. Processes text using a pipeline and returns the result as a UIMA CAS.

Parameters:
  • source (Cas) – The CAS to be analyzed.

  • language (Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.

  • timeout (Optional[float]) – Optional timeout (in seconds) specifiying how long the request is waiting for a server response.

Return type:

Cas

Returns:

A cassis.Cas object

analyse_fhir(source, annotation_types=None, language=None, timeout=None, meta_data=None)[source]

Analyze the given text or text file in FHIR json format using the pipeline and return json.

Parameters:
  • source (Union[Path, IO, str, dict]) – The document to be analyzed in fhir format. This can be a fhir text, file or a dictionary containing the fhir data.

  • annotation_types (Union[None, str, List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”

  • language (Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.

  • timeout (Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.

  • meta_data (Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document

Return type:

List[dict]

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

analyse_fhir_to_cas(source, annotation_types=None, language=None, timeout=None, meta_data=None)[source]

Analyze the given text or text file in FHIR json format using the pipeline and return Cas.

Parameters:
  • source (Union[Path, IO, str, dict]) – The document to be analyzed.

  • annotation_types (Union[None, str, List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”

  • language (Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.

  • timeout (Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.

  • meta_data (Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document

Return type:

Cas

Returns:

Analyzed document as a Cas

analyse_fhir_to_fhir(source, annotation_types=None, language=None, timeout=None, meta_data=None)[source]

Analyze the given text or text file in FHIR json format using the pipeline and return FHIR json.

Parameters:
  • source (Union[Path, IO, str, dict]) – The document to be analyzed.

  • annotation_types (Union[None, str, List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”

  • language (Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.

  • timeout (Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.

  • meta_data (Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document

Return type:

dict

Returns:

The raw payload of the server response as a dictionary. Future versions of this library may return a better-suited representation.

analyse_html(source, annotation_types=None, language=None, timeout=None)[source]

Analyze the given HTML string or HTML file using the pipeline.

Parameters:
  • source (Union[Path, IO, str]) – The document to be analyzed.

  • annotation_types (Union[None, str, List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”

  • language (Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.

  • timeout (Optional[float]) – Optional timeout (in seconds) specifiying how long the request is waiting for a server response.

Return type:

dict

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

analyse_pdf_to_fhir(source, annotation_types=None, language=None, timeout=None, meta_data=None)[source]

Analyze the given pdf using the pipeline. The returned analysis is in fhir format.

Parameters:
  • source (Union[Path, IO]) – The pdf document to be analyzed.

  • annotation_types (Union[None, str, List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”

  • language (Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.

  • timeout (Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.

  • meta_data (Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document

Return type:

dict

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

analyse_pdf_to_json(source, annotation_types=None, language=None, timeout=None, meta_data=None)[source]

Analyze the given pdf using the pipeline. The returned analysis is in json format.

Parameters:
  • source (Union[Path, IO]) – The pdf document to be analyzed.

  • annotation_types (Union[None, str, List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”

  • language (Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.

  • timeout (Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.

  • meta_data (Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document

Return type:

dict

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

analyse_pdf_to_pdf(source, annotation_types=None, language=None, timeout=None, meta_data=None)[source]

Analyze the given pdf using the pipeline. The returned analysis is a marked pdf.

Parameters:
  • source (Union[Path, IO]) – The pdf document to be analyzed.

  • annotation_types (Union[None, str, List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”

  • language (Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.

  • timeout (Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.

  • meta_data (Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document

Return type:

bytes

analyse_text(source, annotation_types=None, language=None, timeout=None, meta_data=None)[source]

Analyze the given text or text file using the pipeline.

Parameters:
  • source (Union[Path, IO, str]) – The document to be analyzed.

  • annotation_types (Union[None, str, List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”

  • language (Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.

  • timeout (Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.

  • meta_data (Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document

Return type:

List[dict]

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

analyse_text_to_cas(source, language=None, timeout=None, annotation_types=None, meta_data=None)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear. Processes text using a pipeline and returns the result as a UIMA CAS.

Parameters:
  • source (Union[Path, IO, str]) – The document to be analyzed.

  • language (Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.

  • timeout (Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.

  • annotation_types (Union[None, str, List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”. Available from Health Discovery 7.3.0 onwards.

  • meta_data (Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document

Return type:

Cas

Returns:

A cassis.Cas object

analyse_text_to_fhir(source, annotation_types=None, language=None, timeout=None, meta_data=None)[source]

Analyze the given text or text file using the pipeline and return FHIR.

Parameters:
  • source (Union[Path, IO, str]) – The document to be analyzed.

  • annotation_types (Union[None, str, List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”

  • language (Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.

  • timeout (Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.

  • meta_data (Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document

Return type:

dict

Returns:

The raw payload of the server response as a dictionary. Future versions of this library may return a better-suited representation.

analyse_texts(sources, parallelism=0, annotation_types=None, language=None, timeout=None, meta_data=None)[source]

Analyze the given texts or files using the pipeline. If feasible, multiple documents are processed in parallel. Note that this call produces an iterator! It means that you get individual results back as soon as they have been processed. These results may be out-of-order! Also, if you want to hold on to the results while iterating through them, you need to put them into some kind of collection. An easy way to do this is e.g. calling list(pipeline.analyse_texts(…)). If you process a large number of documents though, you are better off handling the results one-by-one. This can be done with a simple for loop: ``` for result in pipeline.analyse_texts(…):

if result.successful():

response = result.data # do something with the json response

else:

print(f”Exception for document with source {result.source}: {result.exception}”)

```

Parameters:
  • sources (Iterable[Union[Path, IO, str]]) – The documents to be analyzed.

  • parallelism (int) – Number of parallel instances in the platform.

  • annotation_types (Union[None, str, List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”

  • language (Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.

  • timeout (Optional[float]) – Optional timeout (in seconds) specifiying how long the request is waiting for a server response.

  • meta_data (Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document

Return type:

Iterator[Result]

Returns:

An iterator over the results produced by the pipeline.

analyse_texts_to_cas(sources, parallelism=0, language=None, timeout=None, annotation_types=None, meta_data=None)[source]

Analyze the given texts or files using the pipeline. If feasible, multiple documents are processed in parallel. Note that this call produces an iterator! It means that you get individual results back as soon as they have been processed. These results may be out-of-order! Also, if you want to hold on to the results while iterating through them, you need to put them into some kind of collection. An easy way to do this is e.g. calling list(pipeline.analyse_texts_to_cas(…)). If you process a large number of documents though, you are better off handling the results one-by-one. This can be done with a simple for loop: ``` for result in pipeline.analyse_texts_to_cas(…):

if result.successful():

cas = result.data # do something with the CAS

else:

print(f”Exception for document with source {result.source}: {result.exception}”)

```

Parameters:
  • sources (Iterable[Union[Path, IO, str]]) – The documents to be analyzed.

  • parallelism (int) – Number of parallel instances in the platform.

  • language (Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.

  • timeout (Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.

  • annotation_types (Union[None, str, List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”. Available from Health Discovery 7.3.0 onwards.

  • meta_data (Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document

Return type:

Iterator[Result]

Returns:

An iterator over the results produced by the pipeline.

analyse_texts_to_fhir(sources, parallelism=0, annotation_types=None, language=None, timeout=None, meta_data=None)[source]

Analyze the given texts or files using the pipeline. If feasible, multiple documents are processed in parallel. Note that this call produces an iterator! It means that you get individual results back as soon as they have been processed. These results may be out-of-order! Also, if you want to hold on to the results while iterating through them, you need to put them into some kind of collection. An easy way to do this is e.g. calling list(pipeline.analyse_texts(…)). If you process a large number of documents though, you are better off handling the results one-by-one. This can be done with a simple for loop: ``` for result in pipeline.analyse_texts(…):

if result.successful():

response = result.data # do something with the json response

else:

print(f”Exception for document with source {result.source}: {result.exception}”)

```

Parameters:
  • sources (Iterable[Union[Path, IO, str]]) – The documents to be analyzed.

  • parallelism (int) – Number of parallel instances in the platform.

  • annotation_types (Union[None, str, List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”

  • language (Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.

  • timeout (Optional[float]) – Optional timeout (in seconds) specifiying how long the request is waiting for a server response.

  • meta_data (Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document

Return type:

Iterator[Result]

Returns:

An iterator over the results produced by the pipeline.

collection_process_complete()[source]

Trigger collection process complete of the given pipeline.

Return type:

dict

create_resource_container(name, resources_zip_path=None)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Create empty resource container with given name for this pipeline or additionally upload zipped resources.

Return type:

ResourceContainer

delete()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear. Deletes an existing pipeline from the server.

Return type:

dict

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

delete_resources()[source]

DEPRECATED: Use ResourceContainer.delete() instead.

Delete the resources of the pipeline.

Return type:

None

download_resources(target_zip_path)[source]

DEPRECATED: Use ResourceContainer.export_resources() instead.

Download pipeline resources and store in given path.

Return type:

None

ensure_started()[source]

Checks if the pipline has started. If the pipeline has not started yet, an attempt will be made to start it. The call will block for a certain time. If the time expires without the pipeline becoming available, an exception is generated.

Return type:

Pipeline

Returns:

the pipeline object for chaining additional calls.

ensure_stopped()[source]

Causes the pipeline on the server to shut done.

Return type:

Pipeline

Returns:

the pipeline object for chaining additional calls.

get_configuration()[source]

Obtain the pipeline configuration.

Return type:

dict

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

get_info()[source]

Obtain information about the server-side pipeline.

Return type:

dict

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

get_type_system()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear. Processes text using a pipeline and returns the result as a UIMA CAS.

Return type:

TypeSystem

is_started()[source]

Checks if the pipeline has already started.

Return type:

bool

Returns:

Whether the pipeline has started.

list_resource_containers()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

List the resource containers for the current pipeline.

Return type:

List[ResourceContainer]

list_resources()[source]

DEPRECATED: Use ResourceContainer.list_resources() instead.

List the resources of the pipeline.

Return type:

List[str]

Returns:

The list of pipeline resources.

set_configuration(configuration)[source]

Updates the pipeline configuration. If the pipeline is already running, it will be stopped because changes to the configuration can only be performed while a pipeline is stopped. If the pipeline was stopped, it is restarted after the configuration has been updated. As a side-effect, the cached type system previously obtained for this pipeline is cleared.

Parameters:

configuration (dict) – a pipeline configuration in the form returned by get_configuration()

Return type:

None

start()[source]

Start the server-side pipeline. This call returns immediately. However, the pipeline will usually take a while to boot and become available.

Return type:

dict

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

stop()[source]

Stop the server-side pipeline.

Return type:

dict

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

upload_resources(source, path_in_zip='')[source]

DEPRECATED: Use create_resource_container() instead.

Upload file to the pipeline resources. Existing files with same path/name will be overwritten.

Return type:

List[str]

Returns:

List of resources after upload.

wait_for_pipeline_to_arrive_at_state(target_state)[source]
Return type:

None

wait_for_pipeline_to_leave_transient_state()[source]
Return type:

str

class averbis.Process(project, name, document_source_name, pipeline_name=None, preceding_process_name=None)[source]
class ProcessState(*args, **kwargs)[source]
create_and_run_process(process_name, pipeline, send_to_search=None, send_chunks_to_search=None)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Creates a process upon the results of this process.

Return type:

Process

delete()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Deletes the process as soon as it becomes IDLE. All document analysis results will be deleted.

evaluate_against(reference_process, process_name, evaluation_configurations, number_of_pipeline_instances=1)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Starts the evaluation of this process in comparison to the given one as a new process. Returns the new evaluation process.

See Evaluation process for a usage example and more information.

Return type:

Process

export_text_analysis(annotation_types=None, page=None, page_size=100, document_names=None)[source]

Exports a given text analysis process as a json.

The parameter page and page_size can be used to only export a subset of all results. For instance, setting page=1 and page_size=10 will export documents 1-10 of the text analysis result. Setting page=2 and page_size=30 will export document 31-60 of the text analysis result. Documents are currently sorted by their internal ID and not by the filename.

Parameters:
  • annotation_types (Union[None, str, List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”

  • page (Optional[int]) – Optional parameter indicating which batch of pages should be export. This can only be used if document_names are not used.

  • page_size (Optional[int]) – Optional parameter defining how many documents are exported at once (default=100). Only restricts the number of documents to that number if the parameter page is given. This can only be used if document_names are not used.

  • document_names (Optional[List[str]]) – Optional parameter indicating which documents should be exported. If this is set, the page and page_size parameters cannot be used.

Return type:

dict

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

export_text_analysis_to_cas(document_name, type_system=None, annotation_types=None)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Returns an analysis as a UIMA CAS. :type document_name: str :param document_name: the name of the document whose text analysis result will be exported :type type_system: Optional[TypeSystem] :param type_system: Optional parameter for the typesystem that the exported CAS will be set up with. :type annotation_types: Union[None, str, List[str]] :param annotation_types: Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”

Return type:

Cas

get_process_state()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Returns the current process state.

Return type:

ProcessState

import_text_analysis_result(source, document_name, mime_type=None, typesystem=None, overwrite=False)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Add or update a text analysis result (i.e. annotated content) to a specified document in this process. This process must be a manual or imported process (i.e. without automatic processing via pipeline).

Param:

overwrite: Overwrite the found text analysis result if it already exists.

Param:

document_name: name of the document that the text analysis result should be associated with

Param:

source: the text analysis result as a CAS object, stream or path to it

The supported file content types are the Supported UIMA File Types

If a document is provided as a CAS object, the type system information can be automatically picked from the CAS object and should not be provided explicitly. The mime_type is also not needed. If a CAS is provided as a path or stream, then a mime_type needs to be given. The typesystem might need to be provided depending on the content type.

process_unprocessed()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Triggers a processing of all unprocessed documents in this process.

rename(name)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Rename this process to the given name and return the process

Return type:

Process

rerun(document_names=None)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Triggers a rerun if the process is IDLE. All current results will be deleted and the documents will be reprocessed. :type document_names: Optional[List[str]] :param document_names: only rerun the textanalysis process on these documents

class averbis.Project(client, name)[source]
classify_text(text, classification_set='Default')[source]

Classify the given text.

Return type:

dict

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

create_document_collection(name)[source]

Creates a new document collection.

Return type:

DocumentCollection

Returns:

The document collection.

create_pipeline(configuration, name=None)[source]

Create a new pipeline.

Return type:

Pipeline

Returns:

The pipeline.

create_resource_container(name, resources_zip_path=None)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Create empty resource container with given name for this project or additionally upload zipped resources.

Return type:

ResourceContainer

create_terminology(terminology_name, label, languages, concept_type='de.averbis.extraction.types.Concept', version='', hierarchical=True)[source]

Create a new terminology.

Return type:

Terminology

Returns:

The terminology.

delete()[source]

Delete the project.

Return type:

None

delete_pear(pear_identifier)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Delete the pear by identifier.

Return type:

dict

delete_resources()[source]

DEPRECATED: Use ResourceContainer.delete() instead.

Delete the resources of the project.

Return type:

None

download_resources(target_zip_path)[source]

DEPRECATED: Use ResourceContainer.export_resources() instead.

Download Project-level pipeline resources and store in given path.

Return type:

None

exists_document_collection(name)[source]

Checks if a document collection exists.

Returns:

Whether the collection exists

exists_pipeline(name)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Checks if a pipeline exists.

Return type:

bool

get_document_collection(collection)[source]

Obtain an existing document collection.

Return type:

DocumentCollection

Returns:

The document collection.

get_pipeline(name)[source]

Access an existing pipeline.

Return type:

Pipeline

Returns:

The pipeline.

get_terminology(terminology)[source]

Obtain an existing terminology.

Return type:

Terminology

Returns:

The terminology.

install_pear(file_or_path)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Install a pear by file or path.

Return type:

Pear

list_annotators()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

List annotators (components) for the current project i.e. their identifier, displayName, bundle and version.

Return type:

List[dict]

Returns:

List of annotator information.

list_document_collections()[source]

Lists all document collections.

Return type:

List[DocumentCollection]

Returns:

List of DocumentCollection objects

list_pears()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

List all existing pears by identifier. :rtype: List[str] :return: The list of pear identifiers.

list_pipelines()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

List pipelines for the current project.

Return type:

List[Pipeline]

Returns:

List of pipelines.

list_processes()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

List all existing processes by name and document source name. :rtype: List[Process] :return: The list of processes.

list_resource_containers()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

List the resource containers for the current project.

Return type:

List[ResourceContainer]

list_resources()[source]

DEPRECATED: Use ResourceContainer.list_resources() instead.

List the resources of the project.

Return type:

List[str]

Returns:

The list of project resources.

list_terminologies()[source]

List all existing terminologies.

Return type:

dict

Returns:

The terminology list.

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Perform a neural (dense vector) search using server-side neural search endpoint.

There are two ways to use this method:

  1. Explicit parameters (recommended): Provide text and pipeline_name as required parameters, with optional parameters.

  2. Dictionary parameters (advanced): Provide all parameters as a dictionary using neural_search_parameter.

Parameters:
  • text (Optional[str]) – The text query to search for (required when using explicit parameters)

  • pipeline_name (Optional[str]) – Name of the pipeline to use for embedding generation (required when using explicit parameters)

  • language (Optional[str]) – Language code (e.g., ‘en’, ‘de’)

  • top_k (Optional[int]) – Maximum number of results to return

  • threshold (Optional[float]) – Minimum similarity threshold for results

  • neural_search_parameter (Optional[NeuralSearchParams]) – Alternative: provide all parameters as a dict (for backward compatibility or advanced usage)

  • query_params (Any) –

    Additional Solr query parameters forwarded to the server. Common parameters include:

    • fq (str): Filter query to restrict results (e.g., ‘category:medical’)

    • sort (str): Sort order (e.g., ‘score desc’, ‘timestamp asc’)

    • start (int): Starting offset for pagination (default: 0)

    • rows (int): Number of results to return

    • fl (str): Field list - comma-separated fields to return (e.g., ‘id,content,score’)

    • debugQuery (bool): Enable debug information in response

Return type:

Dict[str, Any]

Returns:

The raw payload of the server response (typically a Solr-like JSON response wrapped in payload).

Examples

### Using explicit parameters (recommended) results = project.neural_search(

text=”Wie alt ist der Patient?”, pipeline_name=”ChunkEmbedder”, language=”de”, top_k=5, threshold=0.1, // Common Solr query parameters: rows=20, // Limit number of results start=0, // Pagination offset sort=”score desc”, // Sort by relevance fl=”id,content,score”, // Return only these fields fq=”status:active”, // Filter results debugQuery=True // Include debug info

)

### Using dict (backward compatibility) params = {

‘text’: ‘Wie alt ist der Patient?’, ‘language’: ‘de’, ‘pipelineName’: ‘ChunkEmbedder’, ‘topK’: 5, ‘threshold’: 0.1

} results = project.neural_search(

neural_search_parameter=params, rows=10, start=20, // Get results 21-30 sort=’timestamp desc’, // Sort by newest first fl=’id,title’, // Return only id and title fq=’category:medical’, // Filter to medical category debugQuery=False

)

search(query='', **kwargs)[source]

Search for documents matching the query.

Return type:

dict

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

upload_resources(source, path_in_zip='')[source]

DEPRECATED: Use create_resource_container() instead.

Upload file to the project resources. Existing files with same path/name will be overwritten.

Return type:

List[str]

Returns:

List of resources after upload.

class averbis.ResourceContainer(client, name, scope, base_url)[source]
delete()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear. Delete this resource container

Return type:

None

delete_resource(resource_path)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear. Delete the resource located at the given path in the container from the container.

Return type:

None

export_resource(target_path, resource_path)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear. Export a specific resource file from within the container at the given resources_path.

Return type:

None

export_resources(target_path)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear. Export the whole resource container as a zip file.

Return type:

None

list_resources()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear. List paths of resources in the container.

Return type:

List[str]

upsert_resource(upload_file_path, resource_path)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear. Insert or update the resource at the given resource_path in the container with the file located at resource_file_path.

Return type:

None

class averbis.Result(data=None, exception=None, source=None)[source]
successful()[source]
class averbis.Terminology(project, name)[source]
EXPORT_STATE_ABORTED = 'ABORTED'
EXPORT_STATE_ABORTING = 'ABORTING'
EXPORT_STATE_COMPLETED = 'COMPLETED'
EXPORT_STATE_FAILED = 'FAILED'
EXPORT_STATE_PREPARING = 'PREPARING'
EXPORT_STATE_PROCESSING_EXPORT = 'PROCESSING_EXPORT'
IMPORT_STATE_ABORTED = 'ABORTED'
IMPORT_STATE_ABORTING = 'ABORTING'
IMPORT_STATE_COMPLETED = 'COMPLETED'
IMPORT_STATE_FAILED = 'FAILED'
IMPORT_STATE_PREPARING = 'PREPARING'
IMPORT_STATE_PROCESSING_CONCEPTS = 'PROCESSING_CONCEPTS'
IMPORT_STATE_PROCESSING_RELATIONS = 'PROCESSING_RELATIONS'
PENDING_EXPORT_STATES = ['PROCESSING_EXPORT', 'PREPARING']
PENDING_IMPORT_STATES = ['PREPARING', 'PROCESSING_CONCEPTS', 'PROCESSING_RELATIONS']
concept_autosuggest(query, include_concept_identifier=True, max_suggestions=5)[source]

Trigger the concept autosuggest for this terminology.

Parameters:
  • query (str) – The query string for autosuggest.

  • include_concept_identifier (bool) – Whether the conceptId field of the terminology concepts is also searched by the query string.

  • max_suggestions (int) – The maximum number of suggestions to return.

Return type:

dict

Returns:

The raw payload of the server response.

delete()[source]

Delete the terminology.

Return type:

None

get_export_status()[source]

Obtain the status of the terminology export.

Return type:

dict

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

get_import_status()[source]

Obtain the status of the import of the terminology.

Return type:

dict

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

import_data(source, importer='OBO Importer', timeout=120)[source]

Imports the given terminology into the platform and waits for the import process to complete. If the import does not complete successfully within the given period of time, an OperationTimeoutError is generated.

Return type:

dict

provision(timeout=120)[source]

Provisions the terminology so it can be used by pipelines. If the data cannot be successfully provisioned in the given period of time, an OperationTimeoutError is generated.

Return type:

Optional[dict]

start_export(terminology_format='Obo 1.4 Exporter')[source]

Trigger the export of the terminology.

Return type:

None

start_import(source, importer='OBO Importer')[source]

Upload the given terminology and trigger its import.

Return type:

None

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

class averbis.TextanalysisMode(*values)[source]

Enum for the text analysis modes used to define process behaviour when importing documents

DO_NOTHING = 'doNothing'
REMOVE_RESULTS = 'removeResults'
TRIGGER_PROCESSES = 'triggerProcesses'
class averbis.Client(url_or_id, api_token=None, verify_ssl=True, settings=None, username=None, password=None, timeout=None, polling_timeout=30, poll_delay=5)[source]
__init__(url_or_id, api_token=None, verify_ssl=True, settings=None, username=None, password=None, timeout=None, polling_timeout=30, poll_delay=5)[source]

A Client is the base object for all calls within the Averbis Python API.

The Client can be initialized by passing the required parameters (e.g. URL and API Token) or by creating a client-settings.json file in which the information is stored. The client-settings.json allows specifying different profiles for different servers. Please see the example in the project README for more information.

Parameters:
  • url_or_id (str) – The URL to the platform instance or an identifier of a profile in a client settings file

  • api_token (Optional[str]) – The API Token enabling users to perform requests in the platform

  • verify_ssl (Union[str, bool]) – Whether the SSL verifcation should be activated (default=True)

  • settings (Union[str, Path, dict, None]) – Either a dictionary containing settings information or a path to the settings file. As fallback, a “client-settings.json” file is searched in the current directory and in $HOME/.averbis/

  • username (Optional[str]) – If no API token is provided, then a username can be provided together with a password to generate a new API token

  • password (Optional[str]) – If no API token is provided, then a username can be provided together with a password to generate a new API token

  • timeout (Optional[float]) – An optional global timeout (in seconds) specifiying how long the Client is waiting for a server response (default=None).

  • polling_timeout (int) – Timeout (in seconds) after which polling for specific status requests is no longer tried.

  • poll_delay (int) – Time (in seconds) between requests to server for specific status.

change_password(user, old_password, new_password)[source]

Changes the password of the given user.

Return type:

dict

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

create_project(name, description='', exist_ok=False)[source]

Creates a new project.

Parameters:
  • name (str) – The name of the new project

  • description (str) – The description of the new project

  • exist_ok – If exist_ok is False (the default), a ValueError is raised if the project already exists. If

exist_ok is True and the project exists, then the existing project is returned. :rtype: Project :return: The project.

create_resource_container(name, resources_zip_path=None)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear. Create global empty resource container or additionally upload provided resources to the new container.

Return type:

ResourceContainer

delete_resources()[source]

DEPRECATED: Use ResourceContainer.delete() instead.

Delete the global resources.

Return type:

None

download_resources(target_zip_path)[source]

DEPRECATED: Use ResourceContainer.export_resources() instead.

Download Client-level pipeline resources and store in given path.

Return type:

None

ensure_available(timeout=120)[source]

Checks whether the server is available and responding. The call will block for a given time if the server is not available. If the time has passed without the server becoming available , an exception will be generated.

Return type:

Client

exists_project(name)[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear.

Checks if a project exists.

Return type:

bool

generate_api_token(user, password)[source]

Generates an API token using the given user/password and stores the API token in the client for further use. Normally, you would never call this method but rather work with a previously generated API token.

Return type:

Optional[str]

Returns:

the API token that was obtained

get_api_token_status(user, password)[source]

Obtains the status of the given API token.

Return type:

str

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

get_build_info()[source]

Obtains information about the version of the server instance.

Return type:

dict

Returns:

The raw payload of the server response. Future versions of this library may return a better-suited representation.

get_project(name)[source]

Access an existing project.

Return type:

Project

Returns:

The project.

get_spec_version()[source]

Helper function that returns the spec version of the server instance.

Return type:

str

Returns:

The spec version as string

invalidate_api_token(user, password)[source]

Invalidates the API token for the given user. This method does not clear the API token from this client object. If the client is currently using the API token that is being cleared, subsequent operations will fail.

Return type:

None

list_projects()[source]

Returns a list of the projects.

Return type:

dict

list_resource_containers()[source]

HIGHLY EXPERIMENTAL API - may soon change or disappear. List all global resource containers

Return type:

List[ResourceContainer]

list_resources()[source]

DEPRECATED: Use ResourceContainer.list_resources() instead.

List the resources that are globally available.

Return type:

List[str]

Returns:

List of resources.

regenerate_api_token(user, password)[source]

Regenerates an API token using the given user/password and stores the API token in the client for further use. Normally, you would never call this method but rather work with a previously generated API token.

Return type:

Optional[str]

Returns:

the API token that was obtained

set_timeout(timeout)[source]

Overwriting the Client-level timeout with a new timeout.

Parameters:

timeout (float) – Timeout duration in seconds

Return type:

Client

Returns:

The client.

upload_resources(source, path_in_zip='')[source]

DEPRECATED: Use create_resource_container() instead.

Upload file to the global resources. Existing files with same path/name will be overwritten.

Return type:

List[str]

Returns:

List of resources after upload.

Supported UIMA File Types

The supported file content types (mime_types) are

  • UIMA CAS XMI (application/vnd.uima.cas+xmi)

  • XCAS (application/vnd.uima.cas+xcas)

  • binary CAS (application/vnd.uima.cas+binary)

  • binary TSI (application/vnd.uima.cas+binary.tsi)

  • compressed (application/vnd.uima.cas+compressed)

  • compressed TSI (application/vnd.uima.cas+compressed.tsi)

  • compressed filtered (application/vnd.uima.cas+compressed.filtered)

  • compressed filtered TS (application/vnd.uima.cas+compressed.filtered.ts)

  • compressed filtered TSI (application/vnd.uima.cas+compressed.filtered.tsi)

  • serialized CAS (application/vnd.uima.cas+serialized)

  • serialized TSI (application/vnd.uima.cas+serialized.tsi)

Evaluation process

HIGHLY EXPERIMENTAL API - may soon change or disappear.

It is possible to start an evaluation process that compares the current process to another one, the reference process, using the averbis.Process.evaluate_against() method. Depending on the averbis.EvaluationConfiguration, annotations of a specific type are compared by selected features to each other. The comparison result is annotated as an evaluation annotation e.g. a TruePositive annotation is created for an annotation if it matches the corresponding annotation in the reference process. A FalsePositive is created, if the annotation exists in the current process, but not in the reference process.

During evaluation configuration, it is possible to distinguish between exact and partial matches. Annotations are marked as an exact match if their type, features and position in the text are identical. For a more fine-grained comparison than a hit or a miss, it is also possible to define a partial match. Annotations that are not exactly identical, but still meet these criteria, are annotated as PartialPositive.

Starting an evaluation process for exact matches of Diagnosis annotations

The given evaluation configuration describes an evaluation of diagnosis annotations by their begin and end features i.e. two annotations match if they are Diagnosis annotations at the same position.

comparison_process = collection.get_process("process name")
reference_process = collection.get_process("reference process name")

diagnosis_config = EvaluationConfiguration(
      "de.averbis.types.health.Diagnosis",
      ["begin", "end"]
)

evaluation_process = comparison_process.evaluate_against(
  reference_process,
  "evaluation_of_diagnosis",
  [diagnosis_config]
)

You can then query the state of the evaluation process until it is done and export text analysis results from it using export methods from this API.

Trouble shooting the evaluation

If evaluation annotations are not created as expected, it might be that the annotation type that has been configured for configuration is not an annotation that can stand alone but rather one that is only referenced as a feature of other annotations (i.e. the annotation is not in the CAS index). For this, it is not sufficient to adapt the evaluation configuration, but rather the annotation creation has to be examined in the product.