Extract Structured Data • Documind Documentation

Request

Let’s run a sample extraction with the Documind open-source package.

import { extract } from 'documind';

const result = await extract({
  file: 'https://example.com/bank_statement.pdf',
  schema: [
    {
      "name": "accountNumber",
      "type": "string",
      "description": "The account number of the bank statement."
    },
    {
      "name": "openingBalance",
      "type": "number",
      "description": "The opening balance in the account."
    },
    {
      "name": "transactions",
      "type": "array",
      "description": "A list of transactions in the account.",
      "children": [
        {
          "name": "date",
          "type": "string",
          "description": "The date of the transaction."
        },
        {
          "name": "creditAmount",
          "type": "number",
          "description": "The amount credited in the transaction."
        },
        {
          "name": "debitAmount",
          "type": "number",
          "description": "The amount debited in the transaction."
        },
        {
          "name": "description",
          "type": "string",
          "description": "A short note about the transaction."
        }
      ]
    },
    {
      "name": "closingBalance",
      "type": "number",
      "description": "The closing balance in the account."
    },
    {
    name: "highValueAccount",
    type: "boolean",
    description: "Closing balance is more than 50000 ."
  },
  {
    name: "statementType",
    type: "enum",
    description: "The type of document",
    values: ["Current Account", "Savings Account"]
  }
  ]
});

console.log(result);

Parameters

Currently, only URLs are accepted. Ensure your document is hosted and accessible via a public URL.

file

string

required

The file URL.

schema

object[]

required

The schema that defines the structure of the data you want to extract. Read more on how to define a schema.

model

string

The model you choose. Find the list of supported models here.

template

string

You can select a template schema that matches your document. [Template options] (/guides/templates/overview)

autoSchema

boolean | object

Use autoSchema to auto-generate your schema

Example Output

Once the extraction process is complete, the result will return a structured JSON object with the extracted data:

{
  "success": true,
  "pages": 1,
  "data": {
    "accountNumber": "100002345",
    "openingBalance": 3200,
    "transactions": [
      {
        "date": "2021-05-12",
        "creditAmount": null,
        "debitAmount": 100,
        "description": "transfer to Tom"
      },
      {
        "date": "2021-05-12",
        "creditAmount": 50,
        "debitAmount": null,
        "description": "For lunch the other day"
      },
      {
        "date": "2021-05-13",
        "creditAmount": 20,
        "debitAmount": null,
        "description": "Refund for voucher"
      },
      {
        "date": "2021-05-13",
        "creditAmount": null,
        "debitAmount": 750,
        "description": "May's rent"
      }
    ],
    "closingBalance": 2420,
    "highValueAccount": false,
    "statementType": "Savings Account"
  },
  "fileName": "bank_statement.pdf"
  "markdown": "## Bank Statement\n\n**Account Number:** 100002345\n\n**Opening Balance:** $3200.00\n\n**Closing Balance:** $2420.00\n\n**Statement Type:** Savings Account\n\n**High Value Account:** No\n\n## Transactions\n\n| Date       | Description                | Credit | Debit |\n|------------|----------------------------|--------|-------|\n| 2021-05-12 | transfer to Tom           |        | $100.00 |\n| 2021-05-12 | For lunch the other day   | $50.00 |       |\n| 2021-05-13 | Refund for voucher        | $20.00 |       |\n| 2021-05-13 | May's rent                 |        | $750.00 |\n\n"
}

success

boolean

Indicates whether the extraction was successful or not.

pages

number

The number of pages processed in the document.

data

object

The extracted data based on the schema.

fileName

string

The name of the processed file

markdown

string

The markdown of the file

Introduction

Guides

Extracting Structured Data

Request

Parameters

Example Output

Introduction

Guides

​Request

​Parameters

​Example Output

Request

Parameters

Example Output