Source Files
Source files are files that contain things you want to annotate or run models on. Currently we support PDF and .txt file formats
Retrieve source information
GET
https://api.annolab.ai/v1/source/{source_id}
Returns basic source information, including a signed URL to download the original file and its tags.
Query Parameters
source_id*
Int
Id of the source
Headers
Authorization*
String
Your API key
{"Authorization": "Api-Key XXXXXXX-XXXXXXX-XXXXXXX"}
// Example Response
{
"id": 2,
"projectName": "My Project",
"projectId": 1,
"directoryName": "Uploads",
"directoryId": 1,
"name": "REGISTRATION.pdf",
"sourceName": "REGISTRATION.pdf",
"type": "pdf",
"text": "Example PDF Text",
"url": "https://download-example-url.pdf",
"createdAt": "2023-05-22T20:32:02.633Z",
"tags": [
{
"domainEntityId": 1,
"typeName": "Airframe Inventory",
"attributes": [
{
"name": "Make",
"value": "CESSNA"
},
{
"name": "Model",
"value": "421C"
},
{
"name": "Serial Number",
"value": "421C-5837"
}
],
"createdBy": {
"id": 58473,
"email": "[email protected]",
"username": "testuser"
}
}
]
}
Upload a PDF
POST
https://api.annolab.ai/v1/source/upload-pdf
Upload a PDF and specify an OCR method to apply. (optional) invoke a workflow of AI models
Headers
Authorization*
String
Where you put your api key. Creating a directory requires a key with "Write" permissions.
{"Authorization": "Api-Key XXXXXXX-XXXXXXX-XXXXXXX"}
Request Body
projectIdentifier*
string|number
Either id of the project or name of the project where file will reside
directoryIdentifier
string
name of the directory where the file will reside
sourceIdentifier*
string
Name of the source that will be created
ocrProvider
string
Only used if processMode is set to OCR. Valid values are "textract", "textract_plus", and "gcv". "textract_plus" recommended for highest quality
preprocessor
string
Valid options are "faa" and None.
groupName*
string
Name of the group that owns the project
workflow
string
Workflow (aka package of AI models) that will be invoked immediately after upload. Recommend "FAA_CD" or "FAA_CD_WITH_TAGGING"
processMode
string
Use "OCR" if the pdf is not already text enriched. Use "EXTRACT" if pdf already has text embedded
import os
import json
import requests
ANNO_LAB_API_KEY = 'XXXXXXX-XXXXXXX-XXXXXXX-XXXXXXX'
url_base = 'https://api.annolab.ai'
input_pdf = '/Users/grantdelozier/devel/ocr-these3/TEST-REGISTRATION.PDF'
headers = {
'Authorization': 'Api-Key '+ANNO_LAB_API_KEY,
}
url = url_base+'/v1/source/create-pdf'
requestBody = {
'groupName': 'AnnoLab',
'projectIdentifier': 'title-demo',
'directoryIdentifier': 'testing',
'sourceIdentifier': 'TEST-REGISTRATION.PDF',
'preprocessor': 'faa',
'processMode': 'OCR',
'ocrProvider': 'textract_plus',
'workflow': 'FAA_CD'
}
fileToUpload = {
'file': ('TEST-REGISTRATION.PDF', open(input_pdf, 'rb'), 'application/pdf')
}
url = url_base+'/v1/source/upload-pdf'
response = requests.post(url, headers=headers, data=requestBody, files=fileToUpload)
print(response.json())
Create Source Text
POST
https://api.annolab.ai/v1/source/create-text
Create a new text file source within a directory.
Headers
Authorization*
string
Where you put your api key. Creating a directory requires a key with "Write" permissions.
{"Authorization": "Api-Key XXXXXXX-XXXXXXX-XXXXXXX"}
Request Body
projectIdentifier
string
Identifier for the project that will contain the source file. Either the id or the unique name
directoryIdentifier
string
Identifier for the directory that will contain the source file. Either the id or the unique name
sourceName
string
Name of the file you wish to create
text
string
Text that exists within the file
{
"sourceName": "athens.txt,
"directoryName": "Wikipedia Subset",
"directoryId": 12,
"projectName": "New NER Project",
"projectId": 22,
"id": 145
}
This code shows how to create a new text file source
import requests
ANNO_LAB_API_KEY = 'XXXXXXX-XXXXXXX-XXXXXXX-XXXXXXX'
source = {
'projectIdentifier': 'New NER Project',
'directoryIdentifier': 'Wikipedia Subset',
'sourceName': 'athens.txt'
'text': 'Athens (Greek: Αθήνα, Athína), is the capital city of Greece with a metropolitan population of 3.7 million inhabitants.'
}
headers = {
'Authorization': 'Api-Key '+ANNO_LAB_API_KEY,
}
url = 'https://api.annolab.ai/v1/source/create-text'
response = requests.post(url, headers=headers, json=source)
print(response.json())
Last updated
Was this helpful?