Request

Let’s run a sample extraction with the Documind open-source package.

import { extract } from 'documind';

const result = await extract({
  file: 'https://example.com/bank_statement.pdf',
  schema: [
    {
      "name": "accountNumber",
      "type": "string",
      "description": "The account number of the bank statement."
    },
    {
      "name": "openingBalance",
      "type": "number",
      "description": "The opening balance in the account."
    },
    {
      "name": "transactions",
      "type": "array",
      "description": "A list of transactions in the account.",
      "children": [
        {
          "name": "date",
          "type": "string",
          "description": "The date of the transaction."
        },
        {
          "name": "creditAmount",
          "type": "number",
          "description": "The amount credited in the transaction."
        },
        {
          "name": "debitAmount",
          "type": "number",
          "description": "The amount debited in the transaction."
        },
        {
          "name": "description",
          "type": "string",
          "description": "A short note about the transaction."
        }
      ]
    },
    {
      "name": "closingBalance",
      "type": "number",
      "description": "The closing balance in the account."
    },
    {
    name: "highValueAccount",
    type: "boolean",
    description: "Closing balance is more than 50000 ."
  },
  {
    name: "statementType",
    type: "enum",
    description: "The type of document",
    values: ["Current Account", "Savings Account"]
  }
  ]
});

console.log(result);

Parameters

Currently, only URLs are accepted. Ensure your document is hosted and accessible via a public URL.

file
string
required

The file URL.

schema
object[]
required

The schema that defines the structure of the data you want to extract. Read more on how to define a schema.

model
string

The model you choose. Find the list of supported models here.

template
string

You can select a template schema that matches your document. [Template options] (/guides/templates/overview)

autoSchema
boolean | object

Use autoSchema to auto-generate your schema

Example Output

Once the extraction process is complete, the result will return a structured JSON object with the extracted data:

{
  "success": true,
  "pages": 1,
  "data": {
    "accountNumber": "100002345",
    "openingBalance": 3200,
    "transactions": [
      {
        "date": "2021-05-12",
        "creditAmount": null,
        "debitAmount": 100,
        "description": "transfer to Tom"
      },
      {
        "date": "2021-05-12",
        "creditAmount": 50,
        "debitAmount": null,
        "description": "For lunch the other day"
      },
      {
        "date": "2021-05-13",
        "creditAmount": 20,
        "debitAmount": null,
        "description": "Refund for voucher"
      },
      {
        "date": "2021-05-13",
        "creditAmount": null,
        "debitAmount": 750,
        "description": "May's rent"
      }
    ],
    "closingBalance": 2420,
    "highValueAccount": false,
    "statementType": "Savings Account"
  },
  "fileName": "bank_statement.pdf"
  "markdown": "## Bank Statement\n\n**Account Number:** 100002345\n\n**Opening Balance:** $3200.00\n\n**Closing Balance:** $2420.00\n\n**Statement Type:** Savings Account\n\n**High Value Account:** No\n\n## Transactions\n\n| Date       | Description                | Credit | Debit |\n|------------|----------------------------|--------|-------|\n| 2021-05-12 | transfer to Tom           |        | $100.00 |\n| 2021-05-12 | For lunch the other day   | $50.00 |       |\n| 2021-05-13 | Refund for voucher        | $20.00 |       |\n| 2021-05-13 | May's rent                 |        | $750.00 |\n\n"
}
success
boolean

Indicates whether the extraction was successful or not.

pages
number

The number of pages processed in the document.

data
object

The extracted data based on the schema.

fileName
string

The name of the processed file

markdown
string

The markdown of the file