Source Files

Source files are files that contain things you want to annotate or run models on. Currently we support PDF and .txt file formats

Retrieve source information

GET https://api.annolab.ai/v1/source/{source_id}

Returns basic source information, including a signed URL to download the original file and its tags.

Query Parameters

NameTypeDescription

source_id*

Int

Id of the source

Headers

NameTypeDescription

Authorization*

String

Your API key {"Authorization": "Api-Key XXXXXXX-XXXXXXX-XXXXXXX"}

// Example Response
{
    "id": 2,
    "projectName": "My Project",
    "projectId": 1,
    "directoryName": "Uploads",
    "directoryId": 1,
    "name": "REGISTRATION.pdf",
    "sourceName": "REGISTRATION.pdf",
    "type": "pdf",
    "text": "Example PDF Text",
    "url": "https://download-example-url.pdf",
    "createdAt": "2023-05-22T20:32:02.633Z",
    "tags": [
        {
            "domainEntityId": 1,
            "typeName": "Airframe Inventory",
            "attributes": [
                {
                    "name": "Make",
                    "value": "CESSNA"
                },
                {
                    "name": "Model",
                    "value": "421C"
                },
                {
                    "name": "Serial Number",
                    "value": "421C-5837"
                }
            ],
            "createdBy": {
                "id": 58473,
                "email": "testuser@gmail.com",
                "username": "testuser"
            }
        }
    ]
}

Upload a PDF

POST https://api.annolab.ai/v1/source/upload-pdf

Upload a PDF and specify an OCR method to apply. (optional) invoke a workflow of AI models

Headers

NameTypeDescription

Authorization*

String

Where you put your api key. Creating a directory requires a key with "Write" permissions. {"Authorization": "Api-Key XXXXXXX-XXXXXXX-XXXXXXX"}

Request Body

NameTypeDescription

projectIdentifier*

string|number

Either id of the project or name of the project where file will reside

directoryIdentifier

string

name of the directory where the file will reside

sourceIdentifier*

string

Name of the source that will be created

ocrProvider

string

Only used if processMode is set to OCR. Valid values are "textract", "textract_plus", and "gcv". "textract_plus" recommended for highest quality

preprocessor

string

Valid options are "faa" and None.

groupName*

string

Name of the group that owns the project

tags

CanonicalTag[]

Array of CanonicalTag objects

workflow

string

Workflow (aka package of AI models) that will be invoked immediately after upload. Recommend "FAA_CD" or "FAA_CD_WITH_TAGGING"

processMode

string

Use "OCR" if the pdf is not already text enriched. Use "EXTRACT" if pdf already has text embedded

import os
import json
import requests

ANNO_LAB_API_KEY = 'XXXXXXX-XXXXXXX-XXXXXXX-XXXXXXX'

url_base = 'https://api.annolab.ai'

input_pdf = '/Users/grantdelozier/devel/ocr-these3/TEST-REGISTRATION.PDF'

headers = {
  'Authorization': 'Api-Key '+ANNO_LAB_API_KEY,
}

url = url_base+'/v1/source/create-pdf'

requestBody = {
  'groupName': 'AnnoLab',
  'projectIdentifier': 'title-demo',
  'directoryIdentifier': 'testing',
  'sourceIdentifier': 'TEST-REGISTRATION.PDF',
  'preprocessor': 'faa',
  'processMode': 'OCR',
  'ocrProvider': 'textract_plus',
  'workflow': 'FAA_CD'
}

fileToUpload = {
  'file': ('TEST-REGISTRATION.PDF', open(input_pdf, 'rb'), 'application/pdf')
}

url = url_base+'/v1/source/upload-pdf'

response = requests.post(url, headers=headers, data=requestBody, files=fileToUpload)
print(response.json())

Create Source Text

POST https://api.annolab.ai/v1/source/create-text

Create a new text file source within a directory.

Headers

NameTypeDescription

Authorization*

string

Where you put your api key. Creating a directory requires a key with "Write" permissions. {"Authorization": "Api-Key XXXXXXX-XXXXXXX-XXXXXXX"}

Request Body

NameTypeDescription

projectIdentifier

string

Identifier for the project that will contain the source file. Either the id or the unique name

directoryIdentifier

string

Identifier for the directory that will contain the source file. Either the id or the unique name

sourceName

string

Name of the file you wish to create

text

string

Text that exists within the file

{
    "sourceName": "athens.txt,
    "directoryName": "Wikipedia Subset",
    "directoryId": 12,
    "projectName": "New NER Project",
    "projectId": 22,
    "id": 145
}

This code shows how to create a new text file source

import requests

ANNO_LAB_API_KEY = 'XXXXXXX-XXXXXXX-XXXXXXX-XXXXXXX'

source = {
  'projectIdentifier': 'New NER Project',
  'directoryIdentifier': 'Wikipedia Subset',
  'sourceName': 'athens.txt'
  'text': 'Athens (Greek: Αθήνα, Athína), is the capital city of Greece with a metropolitan population of 3.7 million inhabitants.'
}

headers = {
  'Authorization': 'Api-Key '+ANNO_LAB_API_KEY,
}

url = 'https://api.annolab.ai/v1/source/create-text'

response = requests.post(url, headers=headers, json=source)

print(response.json())

Last updated