API Reference
- class averbis.DocumentCollection(project, name)[source]
- create_and_run_process(process_name, pipeline, annotation_types=None, send_to_search=None, send_chunks_to_search=None)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Creates a process and runs the document analysis. :param process_name : The name of the newly created process :param pipeline : The name of the pipeline or a reference to the Pipeline that should be used :type annotation_types:
Union[None,str,List[str]] :param annotation_types: Optional parameter indicating which types should be saved. Supports wildcard expressions,Example 1: “de.averbis.types.*” returns all types with prefix “de.averbis.types”.
Example 2: Can also be a list of type names, e.g. [“de.averbis.types.health.Diagnosis”, “de.averbis.types.health.Medication”]
- Parameters:
send_to_search (
Optional[bool]) – Determines if the created process should be searchable.send_chunks_to_search (
Optional[bool]) – Determines if the created process should make the chunks searchable.
- Return type:
- Returns:
The created process
- create_process(process_name, is_manual_annotation=False, annotation_types=None, send_to_search=None, send_chunks_to_search=None)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Creates a process without a pipeline (e.g. for later manual annotation or text analysis result import) :param process_name : The name of the newly created process :type is_manual_annotation:
bool:param is_manual_annotation: The created process will be used for manual annotation i.e. the underlying data structure will be initialized immediately and not on later text analysis result import :param annotation_types : Optional String parameter indicating which types should be saved. Supports wildcard expressions,Example 1: “de.averbis.types.*” returns all types with prefix “de.averbis.types”.
Example 2: Can also be a list of type names, e.g. [“de.averbis.types.health.Diagnosis”, “de.averbis.types.health.Medication”]
- Parameters:
send_to_search (
Optional[bool]) – Determines if the created process should be searchable.send_chunks_to_search (
Optional[bool]) – Determines if the created process should make the chunks searchable.
- Return type:
- Returns:
The created process
- delete_document(document_name)[source]
Delete the document identified by name from this docuemnt collection
- Return type:
dict
- export_json_document_stream(document_names=None)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Export documents from a collection and return a generator of document objects.
- Parameters:
document_names (
Optional[List[str]]) – Optional list of document names to export (empty list exports all)- Yields:
Document dictionaries from the export response
- Return type:
ExportDocumentStream
- get_number_of_documents()[source]
Returns the number of documents in that collection.
- Return type:
int
- get_process(process_name)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Get a process :rtype:
Process:return: The process
- import_documents(source, mime_type=None, filename=None, typesystem=None, textanalysis_mode=None)[source]
Imports documents from a given file, from a given string or from a dictionary (representing the json-format). Supported file content types are - plain text (text/plain), - json, containing the text in field ‘content’, the document name in field ‘documentName’ and optional key-value pair metadata (application/json) or multiple documents with these fields in a field named “documents”, - Averbis Solr XML (application/vnd.averbis.solr+xml), - Supported UIMA File Types.
If a document is provided as a CAS object, the type system information can be automatically picked from the CAS object and should not be provided explicitly. If a CAS is provided as a string XML representation, then a type system must be explicitly provided.
The method tries to automatically determine the format (mime type) of the provided document, so setting the mime type parameter should usually not be necessary.
If possible, the method obtains the filename from the provided source. If this is not possible (e.g. if the source is a string or a CAS object), the filename should explicitly be provided. Note that a file in the Averbis Solr XML format can contain multiple documents and each of these has its name encoded within the XML. In this case, the setting filename parameter is not permitted at all.
- Parameters:
textanalysis_mode (
Optional[TextanalysisMode]) – Optional text analysis mode, controls how the imported document is analysed. TextanalysisMode.TRIGGER_PROCESSES (default): trigger a reprocess in all processes of the document TextanalysisMode.DO_NOTHING: no processes are triggered, textanalysis results of the document are kept TextanalysisMode.REMOVE_RESULTS: remove the textanalysis results of all processes for this document- Return type:
List[dict]
- import_json_document_stream(document_generator)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Import documents to a collection from a document generator.
- Parameters:
document_generator (
Iterator[Dict[str,Any]]) – Iterator yielding document dictionaries- Return type:
dict- Returns:
Response object from the import request
- list_documents()[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Lists the documents in the collection.
- Return type:
dict
- class averbis.EvaluationConfiguration(comparison_annotation_type_name, features_to_compare, reference_annotation_type_name=None, **kwargs)[source]
-
- use_enclosing_annotation_partial_match(enclosing_annotation_type_name)[source]
Annotations that are covered by the given annotation type are used to calculate partial positives. Normally, these will replace a FalsePositive or FalseNegative if a partial match is identified.
- Return type:
- use_overlap_partial_match()[source]
Overlapping annotations are used to calculate partial positives. Normally, these will replace a FalsePositive or FalseNegative if a partial match is identified.
- Return type:
- exception averbis.ExtendedRequestException(*args, status_code=None, reason=None, url=None, error_message=None, **kwargs)[source]
- class averbis.Pipeline(project, name)[source]
- STATE_STARTED = 'STARTED'
- STATE_STARTING = 'STARTING'
- STATE_STOPPED = 'STOPPED'
- STATE_STOPPING = 'STOPPING'
- analyse_cas_to_cas(source, language=None, timeout=None)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear. Processes text using a pipeline and returns the result as a UIMA CAS.
- Parameters:
source (
Cas) – The CAS to be analyzed.language (
Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.timeout (
Optional[float]) – Optional timeout (in seconds) specifiying how long the request is waiting for a server response.
- Return type:
Cas- Returns:
A cassis.Cas object
- analyse_fhir(source, annotation_types=None, language=None, timeout=None, meta_data=None)[source]
Analyze the given text or text file in FHIR json format using the pipeline and return json.
- Parameters:
source (
Union[Path,IO,str,dict]) – The document to be analyzed in fhir format. This can be a fhir text, file or a dictionary containing the fhir data.annotation_types (
Union[None,str,List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”language (
Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.timeout (
Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.meta_data (
Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document
- Return type:
List[dict]- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- analyse_fhir_to_cas(source, annotation_types=None, language=None, timeout=None, meta_data=None)[source]
Analyze the given text or text file in FHIR json format using the pipeline and return Cas.
- Parameters:
source (
Union[Path,IO,str,dict]) – The document to be analyzed.annotation_types (
Union[None,str,List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”language (
Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.timeout (
Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.meta_data (
Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document
- Return type:
Cas- Returns:
Analyzed document as a Cas
- analyse_fhir_to_fhir(source, annotation_types=None, language=None, timeout=None, meta_data=None)[source]
Analyze the given text or text file in FHIR json format using the pipeline and return FHIR json.
- Parameters:
source (
Union[Path,IO,str,dict]) – The document to be analyzed.annotation_types (
Union[None,str,List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”language (
Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.timeout (
Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.meta_data (
Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document
- Return type:
dict- Returns:
The raw payload of the server response as a dictionary. Future versions of this library may return a better-suited representation.
- analyse_html(source, annotation_types=None, language=None, timeout=None)[source]
Analyze the given HTML string or HTML file using the pipeline.
- Parameters:
source (
Union[Path,IO,str]) – The document to be analyzed.annotation_types (
Union[None,str,List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”language (
Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.timeout (
Optional[float]) – Optional timeout (in seconds) specifiying how long the request is waiting for a server response.
- Return type:
dict- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- analyse_pdf_to_fhir(source, annotation_types=None, language=None, timeout=None, meta_data=None)[source]
Analyze the given pdf using the pipeline. The returned analysis is in fhir format.
- Parameters:
source (
Union[Path,IO]) – The pdf document to be analyzed.annotation_types (
Union[None,str,List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”language (
Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.timeout (
Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.meta_data (
Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document
- Return type:
dict- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- analyse_pdf_to_json(source, annotation_types=None, language=None, timeout=None, meta_data=None)[source]
Analyze the given pdf using the pipeline. The returned analysis is in json format.
- Parameters:
source (
Union[Path,IO]) – The pdf document to be analyzed.annotation_types (
Union[None,str,List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”language (
Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.timeout (
Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.meta_data (
Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document
- Return type:
dict- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- analyse_pdf_to_pdf(source, annotation_types=None, language=None, timeout=None, meta_data=None)[source]
Analyze the given pdf using the pipeline. The returned analysis is a marked pdf.
- Parameters:
source (
Union[Path,IO]) – The pdf document to be analyzed.annotation_types (
Union[None,str,List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”language (
Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.timeout (
Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.meta_data (
Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document
- Return type:
bytes
- analyse_text(source, annotation_types=None, language=None, timeout=None, meta_data=None)[source]
Analyze the given text or text file using the pipeline.
- Parameters:
source (
Union[Path,IO,str]) – The document to be analyzed.annotation_types (
Union[None,str,List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”language (
Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.timeout (
Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.meta_data (
Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document
- Return type:
List[dict]- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- analyse_text_to_cas(source, language=None, timeout=None, annotation_types=None, meta_data=None)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear. Processes text using a pipeline and returns the result as a UIMA CAS.
- Parameters:
source (
Union[Path,IO,str]) – The document to be analyzed.language (
Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.timeout (
Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.annotation_types (
Union[None,str,List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”. Available from Health Discovery 7.3.0 onwards.meta_data (
Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document
- Return type:
Cas- Returns:
A cassis.Cas object
- analyse_text_to_fhir(source, annotation_types=None, language=None, timeout=None, meta_data=None)[source]
Analyze the given text or text file using the pipeline and return FHIR.
- Parameters:
source (
Union[Path,IO,str]) – The document to be analyzed.annotation_types (
Union[None,str,List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”language (
Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.timeout (
Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.meta_data (
Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document
- Return type:
dict- Returns:
The raw payload of the server response as a dictionary. Future versions of this library may return a better-suited representation.
- analyse_texts(sources, parallelism=0, annotation_types=None, language=None, timeout=None, meta_data=None)[source]
Analyze the given texts or files using the pipeline. If feasible, multiple documents are processed in parallel. Note that this call produces an iterator! It means that you get individual results back as soon as they have been processed. These results may be out-of-order! Also, if you want to hold on to the results while iterating through them, you need to put them into some kind of collection. An easy way to do this is e.g. calling list(pipeline.analyse_texts(…)). If you process a large number of documents though, you are better off handling the results one-by-one. This can be done with a simple for loop: ``` for result in pipeline.analyse_texts(…):
- if result.successful():
response = result.data # do something with the json response
- else:
print(f”Exception for document with source {result.source}: {result.exception}”)
- Parameters:
sources (
Iterable[Union[Path,IO,str]]) – The documents to be analyzed.parallelism (
int) – Number of parallel instances in the platform.annotation_types (
Union[None,str,List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”language (
Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.timeout (
Optional[float]) – Optional timeout (in seconds) specifiying how long the request is waiting for a server response.meta_data (
Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document
- Return type:
Iterator[Result]- Returns:
An iterator over the results produced by the pipeline.
- analyse_texts_to_cas(sources, parallelism=0, language=None, timeout=None, annotation_types=None, meta_data=None)[source]
Analyze the given texts or files using the pipeline. If feasible, multiple documents are processed in parallel. Note that this call produces an iterator! It means that you get individual results back as soon as they have been processed. These results may be out-of-order! Also, if you want to hold on to the results while iterating through them, you need to put them into some kind of collection. An easy way to do this is e.g. calling list(pipeline.analyse_texts_to_cas(…)). If you process a large number of documents though, you are better off handling the results one-by-one. This can be done with a simple for loop: ``` for result in pipeline.analyse_texts_to_cas(…):
- if result.successful():
cas = result.data # do something with the CAS
- else:
print(f”Exception for document with source {result.source}: {result.exception}”)
- Parameters:
sources (
Iterable[Union[Path,IO,str]]) – The documents to be analyzed.parallelism (
int) – Number of parallel instances in the platform.language (
Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.timeout (
Optional[float]) – Optional timeout (in seconds) specifying how long the request is waiting for a server response.annotation_types (
Union[None,str,List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”. Available from Health Discovery 7.3.0 onwards.meta_data (
Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document
- Return type:
Iterator[Result]- Returns:
An iterator over the results produced by the pipeline.
- analyse_texts_to_fhir(sources, parallelism=0, annotation_types=None, language=None, timeout=None, meta_data=None)[source]
Analyze the given texts or files using the pipeline. If feasible, multiple documents are processed in parallel. Note that this call produces an iterator! It means that you get individual results back as soon as they have been processed. These results may be out-of-order! Also, if you want to hold on to the results while iterating through them, you need to put them into some kind of collection. An easy way to do this is e.g. calling list(pipeline.analyse_texts(…)). If you process a large number of documents though, you are better off handling the results one-by-one. This can be done with a simple for loop: ``` for result in pipeline.analyse_texts(…):
- if result.successful():
response = result.data # do something with the json response
- else:
print(f”Exception for document with source {result.source}: {result.exception}”)
- Parameters:
sources (
Iterable[Union[Path,IO,str]]) – The documents to be analyzed.parallelism (
int) – Number of parallel instances in the platform.annotation_types (
Union[None,str,List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”language (
Optional[str]) – Optional parameter setting the language of the document, e.g. “en” or “de”.timeout (
Optional[float]) – Optional timeout (in seconds) specifiying how long the request is waiting for a server response.meta_data (
Optional[dict]) – Optional key-value pairs that are used as generic metadata in addition to the document
- Return type:
Iterator[Result]- Returns:
An iterator over the results produced by the pipeline.
- collection_process_complete()[source]
Trigger collection process complete of the given pipeline.
- Return type:
dict
- create_resource_container(name, resources_zip_path=None)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Create empty resource container with given name for this pipeline or additionally upload zipped resources.
- Return type:
- delete()[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear. Deletes an existing pipeline from the server.
- Return type:
dict- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- delete_resources()[source]
DEPRECATED: Use ResourceContainer.delete() instead.
Delete the resources of the pipeline.
- Return type:
None
- download_resources(target_zip_path)[source]
DEPRECATED: Use ResourceContainer.export_resources() instead.
Download pipeline resources and store in given path.
- Return type:
None
- ensure_started()[source]
Checks if the pipline has started. If the pipeline has not started yet, an attempt will be made to start it. The call will block for a certain time. If the time expires without the pipeline becoming available, an exception is generated.
- Return type:
- Returns:
the pipeline object for chaining additional calls.
- ensure_stopped()[source]
Causes the pipeline on the server to shut done.
- Return type:
- Returns:
the pipeline object for chaining additional calls.
- get_configuration()[source]
Obtain the pipeline configuration.
- Return type:
dict- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- get_info()[source]
Obtain information about the server-side pipeline.
- Return type:
dict- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- get_type_system()[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear. Processes text using a pipeline and returns the result as a UIMA CAS.
- Return type:
TypeSystem
- is_started()[source]
Checks if the pipeline has already started.
- Return type:
bool- Returns:
Whether the pipeline has started.
- list_resource_containers()[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
List the resource containers for the current pipeline.
- Return type:
List[ResourceContainer]
- list_resources()[source]
DEPRECATED: Use ResourceContainer.list_resources() instead.
List the resources of the pipeline.
- Return type:
List[str]- Returns:
The list of pipeline resources.
- set_configuration(configuration)[source]
Updates the pipeline configuration. If the pipeline is already running, it will be stopped because changes to the configuration can only be performed while a pipeline is stopped. If the pipeline was stopped, it is restarted after the configuration has been updated. As a side-effect, the cached type system previously obtained for this pipeline is cleared.
- Parameters:
configuration (
dict) – a pipeline configuration in the form returned by get_configuration()- Return type:
None
- start()[source]
Start the server-side pipeline. This call returns immediately. However, the pipeline will usually take a while to boot and become available.
- Return type:
dict- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- stop()[source]
Stop the server-side pipeline.
- Return type:
dict- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- class averbis.Process(project, name, document_source_name, pipeline_name=None, preceding_process_name=None)[source]
-
- create_and_run_process(process_name, pipeline, send_to_search=None, send_chunks_to_search=None)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Creates a process upon the results of this process.
- Return type:
- delete()[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Deletes the process as soon as it becomes IDLE. All document analysis results will be deleted.
- evaluate_against(reference_process, process_name, evaluation_configurations, number_of_pipeline_instances=1)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Starts the evaluation of this process in comparison to the given one as a new process. Returns the new evaluation process.
See Evaluation process for a usage example and more information.
- Return type:
- export_text_analysis(annotation_types=None, page=None, page_size=100, document_names=None)[source]
Exports a given text analysis process as a json.
The parameter page and page_size can be used to only export a subset of all results. For instance, setting page=1 and page_size=10 will export documents 1-10 of the text analysis result. Setting page=2 and page_size=30 will export document 31-60 of the text analysis result. Documents are currently sorted by their internal ID and not by the filename.
- Parameters:
annotation_types (
Union[None,str,List[str]]) – Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”page (
Optional[int]) – Optional parameter indicating which batch of pages should be export. This can only be used if document_names are not used.page_size (
Optional[int]) – Optional parameter defining how many documents are exported at once (default=100). Only restricts the number of documents to that number if the parameter page is given. This can only be used if document_names are not used.document_names (
Optional[List[str]]) – Optional parameter indicating which documents should be exported. If this is set, the page and page_size parameters cannot be used.
- Return type:
dict- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- export_text_analysis_to_cas(document_name, type_system=None, annotation_types=None)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Returns an analysis as a UIMA CAS. :type document_name:
str:param document_name: the name of the document whose text analysis result will be exported :type type_system:Optional[TypeSystem] :param type_system: Optional parameter for the typesystem that the exported CAS will be set up with. :type annotation_types:Union[None,str,List[str]] :param annotation_types: Optional parameter indicating which types should be returned. Supports wildcard expressions, e.g. “de.averbis.types.*” returns all types with prefix “de.averbis.types”- Return type:
Cas
- get_process_state()[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Returns the current process state.
- Return type:
ProcessState
- import_text_analysis_result(source, document_name, mime_type=None, typesystem=None, overwrite=False)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Add or update a text analysis result (i.e. annotated content) to a specified document in this process. This process must be a manual or imported process (i.e. without automatic processing via pipeline).
- Param:
overwrite: Overwrite the found text analysis result if it already exists.
- Param:
document_name: name of the document that the text analysis result should be associated with
- Param:
source: the text analysis result as a CAS object, stream or path to it
The supported file content types are the Supported UIMA File Types
If a document is provided as a CAS object, the type system information can be automatically picked from the CAS object and should not be provided explicitly. The mime_type is also not needed. If a CAS is provided as a path or stream, then a mime_type needs to be given. The typesystem might need to be provided depending on the content type.
- process_unprocessed()[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Triggers a processing of all unprocessed documents in this process.
- rename(name)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Rename this process to the given name and return the process
- Return type:
- rerun(document_names=None)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Triggers a rerun if the process is IDLE. All current results will be deleted and the documents will be reprocessed. :type document_names:
Optional[List[str]] :param document_names: only rerun the textanalysis process on these documents
- class averbis.Project(client, name)[source]
- classify_text(text, classification_set='Default')[source]
Classify the given text.
- Return type:
dict- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- create_document_collection(name)[source]
Creates a new document collection.
- Return type:
- Returns:
The document collection.
- create_pipeline(configuration, name=None)[source]
Create a new pipeline.
- Return type:
- Returns:
The pipeline.
- create_resource_container(name, resources_zip_path=None)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Create empty resource container with given name for this project or additionally upload zipped resources.
- Return type:
- create_terminology(terminology_name, label, languages, concept_type='de.averbis.extraction.types.Concept', version='', hierarchical=True)[source]
Create a new terminology.
- Return type:
- Returns:
The terminology.
- delete_pear(pear_identifier)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Delete the pear by identifier.
- Return type:
dict
- delete_resources()[source]
DEPRECATED: Use ResourceContainer.delete() instead.
Delete the resources of the project.
- Return type:
None
- download_resources(target_zip_path)[source]
DEPRECATED: Use ResourceContainer.export_resources() instead.
Download Project-level pipeline resources and store in given path.
- Return type:
None
- exists_document_collection(name)[source]
Checks if a document collection exists.
- Returns:
Whether the collection exists
- exists_pipeline(name)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Checks if a pipeline exists.
- Return type:
bool
- get_document_collection(collection)[source]
Obtain an existing document collection.
- Return type:
- Returns:
The document collection.
- get_terminology(terminology)[source]
Obtain an existing terminology.
- Return type:
- Returns:
The terminology.
- install_pear(file_or_path)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Install a pear by file or path.
- Return type:
- list_annotators()[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
List annotators (components) for the current project i.e. their identifier, displayName, bundle and version.
- Return type:
List[dict]- Returns:
List of annotator information.
- list_document_collections()[source]
Lists all document collections.
- Return type:
List[DocumentCollection]- Returns:
List of DocumentCollection objects
- list_pears()[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
List all existing pears by identifier. :rtype:
List[str] :return: The list of pear identifiers.
- list_pipelines()[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
List pipelines for the current project.
- Return type:
List[Pipeline]- Returns:
List of pipelines.
- list_processes()[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
List all existing processes by name and document source name. :rtype:
List[Process] :return: The list of processes.
- list_resource_containers()[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
List the resource containers for the current project.
- Return type:
List[ResourceContainer]
- list_resources()[source]
DEPRECATED: Use ResourceContainer.list_resources() instead.
List the resources of the project.
- Return type:
List[str]- Returns:
The list of project resources.
- list_terminologies()[source]
List all existing terminologies.
- Return type:
dict- Returns:
The terminology list.
- neural_search(text=None, pipeline_name=None, language=None, top_k=None, threshold=None, *, neural_search_parameter=None, **query_params)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Perform a neural (dense vector) search using server-side neural search endpoint.
There are two ways to use this method:
Explicit parameters (recommended): Provide text and pipeline_name as required parameters, with optional parameters.
Dictionary parameters (advanced): Provide all parameters as a dictionary using neural_search_parameter.
- Parameters:
text (
Optional[str]) – The text query to search for (required when using explicit parameters)pipeline_name (
Optional[str]) – Name of the pipeline to use for embedding generation (required when using explicit parameters)language (
Optional[str]) – Language code (e.g., ‘en’, ‘de’)top_k (
Optional[int]) – Maximum number of results to returnthreshold (
Optional[float]) – Minimum similarity threshold for resultsneural_search_parameter (
Optional[NeuralSearchParams]) – Alternative: provide all parameters as a dict (for backward compatibility or advanced usage)query_params (
Any) –Additional Solr query parameters forwarded to the server. Common parameters include:
fq (str): Filter query to restrict results (e.g., ‘category:medical’)
sort (str): Sort order (e.g., ‘score desc’, ‘timestamp asc’)
start (int): Starting offset for pagination (default: 0)
rows (int): Number of results to return
fl (str): Field list - comma-separated fields to return (e.g., ‘id,content,score’)
debugQuery (bool): Enable debug information in response
- Return type:
Dict[str,Any]- Returns:
The raw payload of the server response (typically a Solr-like JSON response wrapped in payload).
Examples
### Using explicit parameters (recommended) results = project.neural_search(
text=”Wie alt ist der Patient?”, pipeline_name=”ChunkEmbedder”, language=”de”, top_k=5, threshold=0.1, // Common Solr query parameters: rows=20, // Limit number of results start=0, // Pagination offset sort=”score desc”, // Sort by relevance fl=”id,content,score”, // Return only these fields fq=”status:active”, // Filter results debugQuery=True // Include debug info
)
### Using dict (backward compatibility) params = {
‘text’: ‘Wie alt ist der Patient?’, ‘language’: ‘de’, ‘pipelineName’: ‘ChunkEmbedder’, ‘topK’: 5, ‘threshold’: 0.1
} results = project.neural_search(
neural_search_parameter=params, rows=10, start=20, // Get results 21-30 sort=’timestamp desc’, // Sort by newest first fl=’id,title’, // Return only id and title fq=’category:medical’, // Filter to medical category debugQuery=False
)
- class averbis.ResourceContainer(client, name, scope, base_url)[source]
- delete()[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear. Delete this resource container
- Return type:
None
- delete_resource(resource_path)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear. Delete the resource located at the given path in the container from the container.
- Return type:
None
- export_resource(target_path, resource_path)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear. Export a specific resource file from within the container at the given resources_path.
- Return type:
None
- export_resources(target_path)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear. Export the whole resource container as a zip file.
- Return type:
None
- class averbis.Terminology(project, name)[source]
- EXPORT_STATE_ABORTED = 'ABORTED'
- EXPORT_STATE_ABORTING = 'ABORTING'
- EXPORT_STATE_COMPLETED = 'COMPLETED'
- EXPORT_STATE_FAILED = 'FAILED'
- EXPORT_STATE_PREPARING = 'PREPARING'
- EXPORT_STATE_PROCESSING_EXPORT = 'PROCESSING_EXPORT'
- IMPORT_STATE_ABORTED = 'ABORTED'
- IMPORT_STATE_ABORTING = 'ABORTING'
- IMPORT_STATE_COMPLETED = 'COMPLETED'
- IMPORT_STATE_FAILED = 'FAILED'
- IMPORT_STATE_PREPARING = 'PREPARING'
- IMPORT_STATE_PROCESSING_CONCEPTS = 'PROCESSING_CONCEPTS'
- IMPORT_STATE_PROCESSING_RELATIONS = 'PROCESSING_RELATIONS'
- PENDING_EXPORT_STATES = ['PROCESSING_EXPORT', 'PREPARING']
- PENDING_IMPORT_STATES = ['PREPARING', 'PROCESSING_CONCEPTS', 'PROCESSING_RELATIONS']
- concept_autosuggest(query, include_concept_identifier=True, max_suggestions=5)[source]
Trigger the concept autosuggest for this terminology.
- Parameters:
query (
str) – The query string for autosuggest.include_concept_identifier (
bool) – Whether the conceptId field of the terminology concepts is also searched by the query string.max_suggestions (
int) – The maximum number of suggestions to return.
- Return type:
dict- Returns:
The raw payload of the server response.
- get_export_status()[source]
Obtain the status of the terminology export.
- Return type:
dict- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- get_import_status()[source]
Obtain the status of the import of the terminology.
- Return type:
dict- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- import_data(source, importer='OBO Importer', timeout=120)[source]
Imports the given terminology into the platform and waits for the import process to complete. If the import does not complete successfully within the given period of time, an OperationTimeoutError is generated.
- Return type:
dict
- provision(timeout=120)[source]
Provisions the terminology so it can be used by pipelines. If the data cannot be successfully provisioned in the given period of time, an OperationTimeoutError is generated.
- Return type:
Optional[dict]
- class averbis.TextanalysisMode(*values)[source]
Enum for the text analysis modes used to define process behaviour when importing documents
- DO_NOTHING = 'doNothing'
- REMOVE_RESULTS = 'removeResults'
- TRIGGER_PROCESSES = 'triggerProcesses'
- class averbis.Client(url_or_id, api_token=None, verify_ssl=True, settings=None, username=None, password=None, timeout=None, polling_timeout=30, poll_delay=5)[source]
- __init__(url_or_id, api_token=None, verify_ssl=True, settings=None, username=None, password=None, timeout=None, polling_timeout=30, poll_delay=5)[source]
A Client is the base object for all calls within the Averbis Python API.
The Client can be initialized by passing the required parameters (e.g. URL and API Token) or by creating a client-settings.json file in which the information is stored. The client-settings.json allows specifying different profiles for different servers. Please see the example in the project README for more information.
- Parameters:
url_or_id (
str) – The URL to the platform instance or an identifier of a profile in a client settings fileapi_token (
Optional[str]) – The API Token enabling users to perform requests in the platformverify_ssl (
Union[str,bool]) – Whether the SSL verifcation should be activated (default=True)settings (
Union[str,Path,dict,None]) – Either a dictionary containing settings information or a path to the settings file. As fallback, a “client-settings.json” file is searched in the current directory and in $HOME/.averbis/username (
Optional[str]) – If no API token is provided, then a username can be provided together with a password to generate a new API tokenpassword (
Optional[str]) – If no API token is provided, then a username can be provided together with a password to generate a new API tokentimeout (
Optional[float]) – An optional global timeout (in seconds) specifiying how long the Client is waiting for a server response (default=None).polling_timeout (
int) – Timeout (in seconds) after which polling for specific status requests is no longer tried.poll_delay (
int) – Time (in seconds) between requests to server for specific status.
- change_password(user, old_password, new_password)[source]
Changes the password of the given user.
- Return type:
dict- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- create_project(name, description='', exist_ok=False)[source]
Creates a new project.
- Parameters:
name (
str) – The name of the new projectdescription (
str) – The description of the new projectexist_ok – If exist_ok is False (the default), a ValueError is raised if the project already exists. If
exist_ok is True and the project exists, then the existing project is returned. :rtype:
Project:return: The project.
- create_resource_container(name, resources_zip_path=None)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear. Create global empty resource container or additionally upload provided resources to the new container.
- Return type:
- delete_resources()[source]
DEPRECATED: Use ResourceContainer.delete() instead.
Delete the global resources.
- Return type:
None
- download_resources(target_zip_path)[source]
DEPRECATED: Use ResourceContainer.export_resources() instead.
Download Client-level pipeline resources and store in given path.
- Return type:
None
- ensure_available(timeout=120)[source]
Checks whether the server is available and responding. The call will block for a given time if the server is not available. If the time has passed without the server becoming available , an exception will be generated.
- Return type:
- exists_project(name)[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear.
Checks if a project exists.
- Return type:
bool
- generate_api_token(user, password)[source]
Generates an API token using the given user/password and stores the API token in the client for further use. Normally, you would never call this method but rather work with a previously generated API token.
- Return type:
Optional[str]- Returns:
the API token that was obtained
- get_api_token_status(user, password)[source]
Obtains the status of the given API token.
- Return type:
str- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- get_build_info()[source]
Obtains information about the version of the server instance.
- Return type:
dict- Returns:
The raw payload of the server response. Future versions of this library may return a better-suited representation.
- get_spec_version()[source]
Helper function that returns the spec version of the server instance.
- Return type:
str- Returns:
The spec version as string
- invalidate_api_token(user, password)[source]
Invalidates the API token for the given user. This method does not clear the API token from this client object. If the client is currently using the API token that is being cleared, subsequent operations will fail.
- Return type:
None
- list_resource_containers()[source]
HIGHLY EXPERIMENTAL API - may soon change or disappear. List all global resource containers
- Return type:
List[ResourceContainer]
- list_resources()[source]
DEPRECATED: Use ResourceContainer.list_resources() instead.
List the resources that are globally available.
- Return type:
List[str]- Returns:
List of resources.
- regenerate_api_token(user, password)[source]
Regenerates an API token using the given user/password and stores the API token in the client for further use. Normally, you would never call this method but rather work with a previously generated API token.
- Return type:
Optional[str]- Returns:
the API token that was obtained
Supported UIMA File Types
The supported file content types (mime_types) are
UIMA CAS XMI (application/vnd.uima.cas+xmi)
XCAS (application/vnd.uima.cas+xcas)
binary CAS (application/vnd.uima.cas+binary)
binary TSI (application/vnd.uima.cas+binary.tsi)
compressed (application/vnd.uima.cas+compressed)
compressed TSI (application/vnd.uima.cas+compressed.tsi)
compressed filtered (application/vnd.uima.cas+compressed.filtered)
compressed filtered TS (application/vnd.uima.cas+compressed.filtered.ts)
compressed filtered TSI (application/vnd.uima.cas+compressed.filtered.tsi)
serialized CAS (application/vnd.uima.cas+serialized)
serialized TSI (application/vnd.uima.cas+serialized.tsi)
Evaluation process
HIGHLY EXPERIMENTAL API - may soon change or disappear.
It is possible to start an evaluation process that compares the current process to another one, the reference process,
using the averbis.Process.evaluate_against() method. Depending on the averbis.EvaluationConfiguration, annotations of a specific type
are compared by selected features to each other. The comparison result is annotated as an evaluation annotation
e.g. a TruePositive annotation is created for an annotation if it matches the corresponding annotation in the reference process.
A FalsePositive is created, if the annotation exists in the current process, but not in the reference process.
During evaluation configuration, it is possible to distinguish between exact and partial matches. Annotations are marked as an exact match if their type, features and position in the text are identical. For a more fine-grained comparison than a hit or a miss, it is also possible to define a partial match. Annotations that are not exactly identical, but still meet these criteria, are annotated as PartialPositive.
Starting an evaluation process for exact matches of Diagnosis annotations
The given evaluation configuration describes an evaluation of diagnosis annotations by their begin and end features i.e. two annotations match if they are Diagnosis annotations at the same position.
comparison_process = collection.get_process("process name")
reference_process = collection.get_process("reference process name")
diagnosis_config = EvaluationConfiguration(
"de.averbis.types.health.Diagnosis",
["begin", "end"]
)
evaluation_process = comparison_process.evaluate_against(
reference_process,
"evaluation_of_diagnosis",
[diagnosis_config]
)
You can then query the state of the evaluation process until it is done and export text analysis results from it using export methods from this API.
Trouble shooting the evaluation
If evaluation annotations are not created as expected, it might be that the annotation type that has been configured for configuration is not an annotation that can stand alone but rather one that is only referenced as a feature of other annotations (i.e. the annotation is not in the CAS index). For this, it is not sufficient to adapt the evaluation configuration, but rather the annotation creation has to be examined in the product.