#

pdf-parser

Here are 108 public repositories matching this topic...

opendatalab / MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具，支持PDF/网页/多格式电子书提取。

python pdf parser ocr pdf-converter extract-data document-analysis pdf-parser layout-analysis ai4science pdf-extractor-rag pdf-extractor-llm pdf-extractor-pretrain

Updated Nov 14, 2024
Python

py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

python pdf help-wanted pdf-documents pypdf2 pdf-manipulation pdf-parsing pdf-parser

Updated Nov 13, 2024
Python

dromara / yft-design

一款美观且功能强大的在线设计工具，具备海报设计和图片编辑功能，基于fabric.js的开源版【稿定设计】。适用于多种场景，如海报生成、电商产品图制作、文章长图设计、视频/公众号封面编辑等。A beautiful and powerful online design tool

fabricjs online-editor clipper image-crop pdf-parser pdf-editor poster-design online-design canvas-editor element-plus fabric-editor text2path vue3-fabric psd-parse psd-editor

Updated Nov 10, 2024
TypeScript

adithya-s-k / marker-api

Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.

api rest-api pdf-converter pdf-files marker pdf-parsing pdf-parser fastapi

Updated Oct 15, 2024
Python

titipata / scipdf_parser

Python PDF parser for scientific publications: content and figures

pdf parser pdf-parser python-parser grobid scipdf-parser

Updated Mar 21, 2024
Python

michelcrypt4d4mus / pdfalyzer

Analyze PDFs. With colors. And Yara.

pdf malware-analysis pdf-documents pdf-format pdf-parser malicious-pdf-files

Updated Oct 26, 2024
Python

lazyFrogLOL / llmdocparser

A package for parsing PDFs and analyzing their content using LLMs.

nlp ocr chunking document-analysis pdf-parser pdfparser rag llm text-chunking

Updated Aug 6, 2024
Python

ispras / dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

html pdf ocr table-of-contents excel html-parser docx documents doc scanned-documents txt document-analysis odt pdf-parser table-recognition docx-parser document-content-extraction logical-structure-extraction

Updated Nov 14, 2024
Python

sypht-python-client

sypht-team / sypht-python-client

A python client for the Sypht API

Updated Jul 10, 2024
Python

codereverser / casparser

Parser for Consolidated Account Statements (CAS) generated from CAMS/Karvy/Kfintech

parser python3 cas capital-gain mutual-funds cams pdf-parser capital-gains capital-gains-calculator consolidated-account-statements karvy mutual-fund-portfolio kfintech 112a

Updated Apr 25, 2024
Python

yobix-ai / extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

nlp rust pdf machine-learning natural-language-processing ocr etl tika extraction docx data-pipelines pdf-parser unstructured unstructured-data rag etl-pipelines llm

Updated Nov 14, 2024
Rust

sypht-java-client

sypht-team / sypht-java-client

A Java client for the Sypht API

Updated Jun 4, 2021
Java

datalogics / adobe-pdf-library-samples

Sample code for the Datalogics C++, Java, and .NET interfaces of the Adobe PDF Library

pdf ocr pdf-converter pdf-document pdf-conversion pdf-generation pdf-to-text pdf-manipulation pdfa pdf-split pdf-merger pdf-parser pdf-to-image pdf-tools pdf-compression pdf-lib pdf-render ocr-pdf pdf-to-office

Updated May 22, 2023

BitMiracle / Docotic.Pdf.Samples

C# and VB.NET samples for Docotic.Pdf library

Updated Oct 22, 2024
Visual Basic .NET

tuffstuff9 / nextjs-pdf-parser

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.

nextjs content-extraction pdf-parsing react-pdf pdf-parser pdf2json filepond pdf-upload pdf-parse nextjs-pdf-parser nextjs-pdf react-pdf-parser nextjs-pdf-parse nextjs-pdf-parsing

Updated Dec 8, 2023
TypeScript

davendw49 / sciparser

PDF parsing toolkit for preparing academic text corpus

pdf-parser large-language-models

Updated Jul 12, 2024
Python

k16shikano / hpdft

tools to poke pdf using haskell

Updated Jun 17, 2024
Haskell

ashutoshvarma / pyxpdf

Fast and memory-efficient Python PDF Parser based on xpdf sources

python pdf cython pdf-converter pdftotext pdf-parser xpdf pdfparser pdftohtml xpdf-reader pdftopng

Updated Dec 15, 2023
Cython

SimpleApp / PDFParser

Swift PDFParser for PDF parsing and text mining. Includes a TrueType font parser

swift truetype pdf-parser

Updated Aug 5, 2019
Swift

sypht-golang-client

sypht-team / sypht-golang-client

A Golang client for the Sypht API

Updated Jul 3, 2020
Go

Improve this page

Add a description, image, and links to the pdf-parser topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pdf-parser topic, visit your repo's landing page and select "manage topics."