Product Updates
Open-Source
🎉 Initial Release
The first public release of Documind. Here’s what’s included:
🚀 Features:
- Data Extraction: Extract structured data from PDF documents based on custom schema definitions.
- Schema Support: Easily define schemas to tailor your extraction process for specific document types.
- Accurate Parsing: Handles nested data structures and complex formatting.
- Simple Setup: Start processing documents immediately.
Removed Supabase dependency
Supabase dependency has been removed on the npm package. No need for external storage.
Autoschema and Ingestion formats
- No need to specify schemas, just set
autoSchema
to true and Documind will automatically generate a suitable schema and apply it. - We’ve included functions to simple convert your documents to formats suitable for LLMs. Currently only text and markdown formats are supported.
New file and schema field types
- Support for DOC, TXT, PNG, JPG and HTML file types
- Schemas fields now include boolean and enum
Release of v1.1.0
- Flexible Arrays: If you’re extracting a list of single-type data, results are now returned as a simple array instead of unnecessary object wrapping.
- Markdown in Results: Now you also get the document’s markdown alongside extracted data.
- Full Ollama integration: No more needing an OpenAI key when using the local Ollama model—just provide the base URL, and you’re good to go.
- Google Gemini Models: More options! You can now use Documind with Google’s Gemini models.
- Autoschema Upgrade: Just tell Documind what to extract in plain English, and it will generate the schema for you automatically.