Integration Guide
A complete overview of all the basic features that would be required to be supported to integrate with the Agolo Entity Intelligence platform.
1. Seed the Knowledge Base
Initially seeding the Agolo Entity Intelligence Knowledge Base with known identities gives the system more powerful and meaningful entity linking results from the start.
Seeding the knowledge base with identities that will be found within and/or are related to the entities that will be found in the unstructured text documents generally produces better extraction and linking results.
1a. Individually
Imagine a use case in which a biographer is writing the life story of Muhammad Ali and wants to ingest thousands of news articles about him to build a knowledge base. The first obvious seed to start with would of course be the identity Muhammad Ali himself. Here is how you would add the famous boxing legend to your knowledge base.
curl --location '{YOUR_ENV_BASE_URL}/entity-analytics/v2/identities' \
--header 'Content-Type: application/json' \
--data '{
"id": "Q36107",
"label": "Muhammad Ali",
"status": "Authoritative",
"aliases": {
"multi_lan": [
{
"label": "El Mejor"
}
],
"en": [
{
"label": "Cassius Clay"
}
]
},
"references": {
"multi_lan": [
{
"text": "Después de 20 segundos así, Ali lo castigó con un gancho seguido de un recto de derecha que lo hicieron doblarse; fue el principio del fin de la pelea."
}
],
"en": [
{
"text": "Born Cassius Marcellus Clay on Jan. 17, 1942 in Louisville, Kentucky, to middle-class parents, Ali started boxing when he was 12, winning Golden Gloves titles before heading to the 1960 Olympics in Rome, where he won a gold medal as a light heavyweight."
}
]
}
}'
Note: In the above example, replace the following parameter:
{YOUR_ENV_BASE_URL}
- replace with either the hosted demo URL Agolo provided you with or else your own deployment URL if you have already installed the platform locally.
1b. Bulk
Seeding the knowledge base in bulk is done by providing the identities in a .jsonl file, formatted per the above example / the Agolo Entity Intelligence API documentation. JSONL files expect a series of JSON objects, each separated by a line break.
curl --location '{YOUR_ENV_BASE_URL}/entity-analytics/v2/identities/upload' \
--form 'seeds=@"{YOUR_LOCAL_KB_SEED_FILE}.jsonl"'
Note: In the above example, replace the following parameters:
{YOUR_ENV_BASE_URL}
- replace with either the hosted demo URL Agolo provided you with or else your own deployment URL if you have already installed the platform locally.{YOUR_LOCAL_KB_SEED_FILE}
- replace with a local seed file.
2-3. Processing Documents:
With the knowledge base seeded, you can begin uploading documents for processing. Documents must be uploaded to a specific collection
. So first we'll cover Collection Management, then we'll move on to Documents.
2. Collection Management
When installed, the Entity Intelligence platform creates a new collection called default
, which becomes the default collection to upload documents to. However, in a real production environment, it typically makes sense to create collections for logical groupings of documents. See the following for examples of collection management, e.g. adding/deleting collections.
2a. List all collections
Let's find out what collections exist already on your system.
curl -X 'GET' \
'{YOUR_ENV_BASE_URL}/entity-analytics/v2/collections?pageIndex=0&pageSize=100' \
-H 'accept: application/json'
Note: In the above example, replace the following parameters:
{YOUR_ENV_BASE_URL}
- replace with either the hosted demo URL Agolo provided you with or else your own deployment URL if you have already installed the platform locally.
2b. Create a Collection
Let's create a new collection for the Muhammad Ali project, referenced in the above example. ID is optional, but if provided, should only be alphanumeric with only hyphens or underscores, no special characters or spaces.
curl --location '{YOUR_ENV_BASE_URL}/entity-analytics/v2/collections' \
--header 'Content-Type: application/json' \
--data '{
"id": "muhammad-ali",
"name": "Muhammad Ali"
}'
Note: In the above example, replace the following parameters:
{YOUR_ENV_BASE_URL}
- replace with either the hosted demo URL Agolo provided you with or else your own deployment URL if you have already installed the platform locally.
2c. Delete a Collection
Deleting a collection is as simple as creating one:
curl -X 'DELETE' '{YOUR_ENV_BASE_URL}/entity-analytics/v2/collections/muhammad-ali' \
-H 'accept: */*'
Note: In the above example, replace the following parameters:
{YOUR_ENV_BASE_URL}
- replace with either the hosted demo URL Agolo provided you with or else your own deployment URL if you have already installed the platform locally.
3. Exploiting a Document
Now that we understand that documents need to be uploaded to a specific collection and we understand how to create new collections, let's begin processing some documents. We'll set the API flag 'overwriteIfExists' so that even if we've already exploited this same document, the call won't fail, and it will overwrite the existing exploitatation job. There are a lot of possible API flag settings for this call, but we'll keep it simple for this example. The defaults are mostly a good starting point.
curl --location '{YOUR_ENV_BASE_URL}/entity-analytics/v2/collections/default/documents/link' \
--form 'documents=@"{YOUR_LOCAL_TXT_FILE}.txt"' \
--form 'overwriteIfExists="true"'
Note: In the above example, replace the following parameters:
{YOUR_ENV_BASE_URL}
- replace with either the hosted demo URL Agolo provided you with or else your own deployment URL if you have already installed the platform locally.{YOUR_LOCAL_TXT_FILE}
- replace with a local unstructured text document file in .TXT format.
4. Searching the Knowledge Base
To search the knowledge base, you can optionally specify a semantic query (text string), any number of collection IDs, any number of Identity IDs, as well as pagination options:
curl -X 'POST' '{YOUR_ENV_BASE_URL}/entity-analytics/v2/references/search' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"searchBy": {
"text": "politics",
"collectionIds": [
"default"
],
"identityIds": [
"Q212"
]
},
"pagination": {
"pageIndex": 0,
"pageSize": 100
}
}'
Note: In the above example, replace the following parameters:
{YOUR_ENV_BASE_URL}
- replace with either the hosted demo URL Agolo provided you with or else your own deployment URL if you have already installed the platform locally.
5. Curating the Knowledge Base
5a. Merging Identities
Manual human curation is often necessary to clean up false negatives. This is when a Ghost identity was created from a processed document, but the entity mention actually should have been linked to an existing Authoritative Identity in the Knowledge Base. A user can merge two identities in this scenario, typically merging the Ghost into the Authoritative Identity. This is how this would be achieved using the API:
curl --location --request PATCH '{YOUR_ENV_BASE_URL}/entity-analytics/v1/identities/merge' \
--header 'Content-Type: application/json' \
--data '{
"primaryIdentityID": "kKaoQ4gBDu5cfCPH-Vk_",
"secondaryIdentityID": "jqaoQ4gBDu5cfCPH-Vk_"
}'
Note: In the above example, replace the following parameters:
{YOUR_ENV_BASE_URL}
- replace with either the hosted demo URL Agolo provided you with or else your own deployment URL if you have already installed the platform locally.
5b. Splitting Identities
The converse of merging identities is splitting identities. This is done when two identities were incorrectly linked to one another. The following is how a human curator would split two identities in the knowledge base that should not have been associated:
curl -X 'POST' '{YOUR_ENV_BASE_URL}/entity-analytics/v2/identities/split' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"referenceIds": [
"{REFERENCE_ID_TO_SPLIT}"
]
}'
Note: In the above example, replace the following parameters:
{YOUR_ENV_BASE_URL}
- replace with either the hosted demo URL Agolo provided you with or else your own deployment URL if you have already installed the platform locally.{REFERENCE_ID_TO_SPLIT}
- this will split the reference into a Ghost identity