Technology Sharing

Analysis of open source tools for converting PDF to Markdown

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Marker: Analysis of open source tools for converting PDF to Markdown
Marker is an open source project developed by VikParuchuri on GitHub. Its core function is to convert PDF files into Markdown format. The following is a detailed analysis of the Marker project:

Project Overview:

Project link: https://github.com/VikParuchuri/marker.git
Maintainer: VikParuchuri
Main function: Quickly and accurately convert PDF to Markdown format, supporting a variety of document types, especially books and scientific papers.

Technical features:

Deep learning models: Marker uses a series of deep learning models to extract text, detect page layout, clean and format text blocks, and finally combine them into Markdown documents.
OCR support: For scenarios that require OCR, Marker supports the use of OCR tools such as Surya and Tesseract to ensure the accuracy of text extraction.
Multi-platform support: Marker can run on GPU, CPU or MPS to meet the needs of different hardware environments.

Functional details:

Document processing: supports removing headers, footers and other impurities, formatting tables and code blocks, and extracting and saving images.
Language support: Marker supports all languages, and users can optimize OCR effects by specifying a language list.
Equation conversion: Able to convert most equations into LaTeX format, making it easy to embed mathematical formulas in Markdown documents.

Performance:

Speed ​​and Accuracy: Marker excels in speed and accuracy, especially when compared to other tools such as Nougat.
Resource usage: When running on A6000 Ada, each task takes up an average of about 4GB of VRAM, supporting parallel processing of multiple documents.

user's guidance:

Installation: Users need to install the marker-pdf package through pip

pip install marker-pdf 

  • 1
  • 2
(GraphRAG) PS D:python-workspaceGraphRAG> pip install marker-pdf 
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting marker-pdf
  Downloading https://mirrors.aliyun.com/pypi/packages/05/c1/782f56407ea60bd35c127c829b8e43da99a0da41f6c9ee002cab97e430c5/marker_pdf-0.2.15-py3-none-any.whl (63 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.2/63.2 kB 563.9 kB/s eta 0:00:00
Requirement already satisfied: Pillow<11.0.0,>=10.1.0 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (10.4.0)
Requirement already satisfied: filetype<2.0.0,>=1.2.0 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (1.2.0)
Collecting ftfy<7.0.0,>=6.1.1 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/f4/f0/21efef51304172736b823689aaf82f33dbc64f54e9b046b75f5212d5cee7/ftfy-6.2.0-py3-none-any.whl (54 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.4/54.4 kB 353.5 kB/s eta 0:00:00
Requirement already satisfied: grpcio<2.0.0,>=1.63.0 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (1.64.1)
Requirement already satisfied: numpy<2.0.0,>=1.26.1 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (1.26.4)
Collecting pdftext<0.4.0,>=0.3.10 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/54/78/8dd39d5ed3b90fb7ecaa20f92ff09c4594877a88501de6352d22e8c53aa0/pdftext-0.3.10-py3-none-any.whl (25 kB)
Requirement already satisfied: pydantic<3.0.0,>=2.4.2 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (2.8.0)
Collecting pydantic-settings<3.0.0,>=2.0.3 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/e8/4f/aad03d5f711717d94d7de9684cb542343b392df1ad6889118636674fc983/pydantic_settings-2.3.4-py3-none-any.whl (22 kB)
Requirement already satisfied: python-dotenv<2.0.0,>=1.0.0 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (1.0.1)
Collecting rapidfuzz<4.0.0,>=3.8.1 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/60/a6/6c2f5e9be933150a6d55ffce4ff6d9701ddfc5b267c789a84674eadbd373/rapidfuzz-3.9.4-cp311-cp311-win_amd64.whl (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 684.6 kB/s eta 0:00:00
Requirement already satisfied: regex<2025.0.0,>=2024.4.28 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (2024.5.15)
Requirement already satisfied: scikit-learn<2.0.0,>=1.3.2 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (1.5.0)
Collecting surya-ocr<0.5.0,>=0.4.14 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/62/a8/dd78c484fa9a459e388a31aa3a45d23eb454c6aeb2a17710284631088615/surya_ocr-0.4.14-py3-none-any.whl (94 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 94.5/94.5 kB 773.2 kB/s eta 0:00:00
Collecting tabulate<0.10.0,>=0.9.0 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/40/44/4a5f08c96eb108af5cb50b41f76142f0afa346dfa99d5296fe7202a11854/tabulate-0.9.0-py3-none-any.whl (35 kB)
Collecting texify<0.2.0,>=0.1.10 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/76/26/c12d194dd90bd78b524a7054e9125685efc32149d29005ca61c72ff4c126/texify-0.1.10-py3-none-any.whl (30 kB)
Collecting torch<3.0.0,>=2.2.2 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/d3/1d/a257913c89572de61316461db91867f87519146e58132cdeace3d9ffbe1f/torch-2.3.1-cp311-cp311-win_amd64.whl (159.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 159.8/159.8 MB 635.4 kB/s eta 0:00:00
Requirement already satisfied: tqdm<5.0.0,>=4.66.1 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (4.66.4)
Collecting transformers<5.0.0,>=4.36.2 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/20/5c/244db59e074e80248fdfa60495eeee257e4d97c3df3487df68be30cd60c8/transformers-4.42.3-py3-none-any.whl (9.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.3/9.3 MB 639.8 kB/s eta 0:00:00
Collecting wcwidth<0.3.0,>=0.2.12 (from ftfy<7.0.0,>=6.1.1->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/fd/84/fd2ba7aafacbad3c4201d395674fc6348826569da3c0937e75505ead3528/wcwidth-0.2.13-py2.py3-none-any.whl (34 kB)
Collecting pypdfium2<5.0.0,>=4.29.0 (from pdftext<0.4.0,>=0.3.10->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/25/bd/56d9ec6b9f0fc4e0d95288759f3179f0fcd34b1a1526b75673d2f6d5196f/pypdfium2-4.30.0-py3-none-win_amd64.whl (2.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.9/2.9 MB 627.3 kB/s eta 0:00:00
Requirement already satisfied: annotated-types>=0.4.0 in e:programdataminiconda3envsgraphraglibsite-packages (from pydantic<3.0.0,>=2.4.2->marker-pdf) (0.7.0)
Requirement already satisfied: pydantic-core==2.20.0 in e:programdataminiconda3envsgraphraglibsite-packages (from pydantic<3.0.0,>=2.4.2->marker-pdf) (2.20.0)
Requirement already satisfied: typing-extensions>=4.6.1 in e:programdataminiconda3envsgraphraglibsite-packages (from pydantic<3.0.0,>=2.4.2->marker-pdf) (4.12.2)
Requirement already satisfied: scipy>=1.6.0 in e:programdataminiconda3envsgraphraglibsite-packages (from scikit-learn<2.0.0,>=1.3.2->marker-pdf) (1.12.0)
Requirement already satisfied: joblib>=1.2.0 in e:programdataminiconda3envsgraphraglibsite-packages (from scikit-learn<2.0.0,>=1.3.2->marker-pdf) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in e:programdataminiconda3envsgraphraglibsite-packages (from scikit-learn<2.0.0,>=1.3.2->marker-pdf) (3.5.0)
Collecting opencv-python<5.0.0.0,>=4.9.0.80 (from surya-ocr<0.5.0,>=0.4.14->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/ec/6c/fab8113424af5049f85717e8e527ca3773299a3c6b02506e66436e19874f/opencv_python-4.10.0.84-cp37-abi3-win_amd64.whl (38.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.8/38.8 MB 546.5 kB/s eta 0:00:00
Collecting filelock (from torch<3.0.0,>=2.2.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/ae/f0/48285f0262fe47103a4a45972ed2f9b93e4c80b8fd609fa98da78b2a5706/filelock-3.15.4-py3-none-any.whl (16 kB)
Collecting sympy (from torch<3.0.0,>=2.2.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/61/53/e18c8c97d0b2724d85c9830477e3ebea3acf1dcdc6deb344d5d9c93a9946/sympy-1.12.1-py3-none-any.whl (5.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 624.0 kB/s eta 0:00:00
Requirement already satisfied: networkx in e:programdataminiconda3envsgraphraglibsite-packages (from torch<3.0.0,>=2.2.2->marker-pdf) (3.3)
Collecting jinja2 (from torch<3.0.0,>=2.2.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/31/80/3a54838c3fb461f6fec263ebf3a3a41771bd05190238de3486aae8540c36/jinja2-3.1.4-py3-none-any.whl (133 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.3/133.3 kB 492.1 kB/s eta 0:00:00
Requirement already satisfied: fsspec in e:programdataminiconda3envsgraphraglibsite-packages (from torch<3.0.0,>=2.2.2->marker-pdf) (2024.6.1)
Collecting mkl<=2021.4.0,>=2021.1.1 (from torch<3.0.0,>=2.2.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/fe/1c/5f6dbf18e8b73e0a5472466f0ea8d48ce9efae39bd2ff38cebf8dce61259/mkl-2021.4.0-py2.py3-none-win_amd64.whl (228.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 228.5/228.5 MB 597.0 kB/s eta 0:00:00
Requirement already satisfied: colorama in e:programdataminiconda3envsgraphraglibsite-packages (from tqdm<5.0.0,>=4.66.1->marker-pdf) (0.4.6)
Collecting huggingface-hub<1.0,>=0.23.2 (from transformers<5.0.0,>=4.36.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/69/d6/73f9d1b7c4da5f0544bc17680d0fa9932445423b90cd38e1ee77d001a4f5/huggingface_hub-0.23.4-py3-none-any.whl (402 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 402.6/402.6 kB 598.2 kB/s eta 0:00:00
Requirement already satisfied: packaging>=20.0 in e:programdataminiconda3envsgraphraglibsite-packages (from transformers<5.0.0,>=4.36.2->marker-pdf) (23.2)
Requirement already satisfied: pyyaml>=5.1 in e:programdataminiconda3envsgraphraglibsite-packages (from transformers<5.0.0,>=4.36.2->marker-pdf) (6.0.1)
Requirement already satisfied: requests in e:programdataminiconda3envsgraphraglibsite-packages (from transformers<5.0.0,>=4.36.2->marker-pdf) (2.32.3)
Collecting safetensors>=0.4.1 (from transformers<5.0.0,>=4.36.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/cb/f6/19f268662be898ff2a23ac06f8dd0d2956b2ecd204c96e1ee07ba292c119/safetensors-0.4.3-cp311-none-win_amd64.whl (287 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 287.3/287.3 kB 571.3 kB/s eta 0:00:00
Collecting tokenizers<0.20,>=0.19 (from transformers<5.0.0,>=4.36.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/65/8e/6d7d72b28f22c422cff8beae10ac3c2e4376b9be721ef8167b7eecd1da62/tokenizers-0.19.1-cp311-none-win_amd64.whl (2.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.2/2.2 MB 625.9 kB/s eta 0:00:00
Collecting intel-openmp==2021.* (from mkl<=2021.4.0,>=2021.1.1->torch<3.0.0,>=2.2.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/6f/21/b590c0cc3888b24f2ac9898c41d852d7454a1695fbad34bee85dba6dc408/intel_openmp-2021.4.0-py2.py3-none-win_amd64.whl (3.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.5/3.5 MB 487.9 kB/s eta 0:00:00
Collecting tbb==2021.* (from mkl<=2021.4.0,>=2021.1.1->torch<3.0.0,>=2.2.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/f1/24/500811330b3b070e5995c3275181dbcd00c06cef26c6ebfe6ee1ca9b6223/tbb-2021.13.0-py3-none-win_amd64.whl (286 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 286.9/286.9 kB 505.8 kB/s eta 0:00:00
Collecting MarkupSafe>=2.0 (from jinja2->torch<3.0.0,>=2.2.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/b7/a2/c78a06a9ec6d04b3445a949615c4c7ed86a0b2eb68e44e7541b9d57067cc/MarkupSafe-2.1.5-cp311-cp311-win_amd64.whl (17 kB)
Requirement already satisfied: charset-normalizer<4,>=2 in e:programdataminiconda3envsgraphraglibsite-packages (from requests->transformers<5.0.0,>=4.36.2->marker-pdf) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in e:programdataminiconda3envsgraphraglibsite-packages (from requests->transformers<5.0.0,>=4.36.2->marker-pdf) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in e:programdataminiconda3envsgraphraglibsite-packages (from requests->transformers<5.0.0,>=4.36.2->marker-pdf) (2.2.2)
Requirement already satisfied: certifi>=2017.4.17 in e:programdataminiconda3envsgraphraglibsite-packages (from requests->transformers<5.0.0,>=4.36.2->marker-pdf) (2024.6.2)
Collecting mpmath<1.4.0,>=1.1.0 (from sympy->torch<3.0.0,>=2.2.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/43/e3/7d92a15f894aa0c9c4b49b8ee9ac9850d6e63b03c9c32c0367a13ae62209/mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 495.3 kB/s eta 0:00:00
Installing collected packages: wcwidth, tbb, mpmath, intel-openmp, tabulate, sympy, safetensors, rapidfuzz, pypdfium2
, opencv-python, mkl, MarkupSafe, ftfy, filelock, jinja2, huggingface-hub, torch, tokenizers, pydantic-settings, transformers, pdftext, texify, surya-ocr, marker-pdf
Successfully installed MarkupSafe-2.1.5 filelock-3.15.4 ftfy-6.2.0 huggingface-hub-0.23.4 intel-openmp-2021.4.0 jinja2-3.1.4 marker-pdf-0.2.15 mkl-2021.4.0 mpmath-1.3.0 opencv-python-4.10.0.84 pdftext-0.3.10 pydantic-settings-2.3.4 pypdfium2-4.30.0 rapidfuzz-3.9.4 safetensors-0.4.3 surya-ocr-0.4.14 sympy-1.12.1 tabulate-0.9.0 tbb-2021.13.0 texify-0.1.10 tokenizers-0.19.1 torch-2.3.1 transformers-4.42.3 wcwidth-0.2.13


使用示例:

```bash
(GraphRAG) PS D:python-workspaceGraphRAG> marker_single GPT.pdf ./folder --batch_multiplier 2 --max_pages 52 --langs English
config.json: 100%|█████████████████████████████████████████████████████████████████████| 1.18k/1.18k [00:00<?, ?B/s] 
model.safetensors: 100%|█████████████████████████████████████████████████████████| 120M/120M [00:07<00:00, 16.7MB/s] 
Loaded detection model vikp/surya_det2 on device cpu with dtype torch.float32
preprocessor_config.json: 100%|████████████████████████████████████████████████████████████| 430/430 [00:00<?, ?B/s] 
config.json: 100%|█████████████████████████████████████████████████████████████████████| 1.57k/1.57k [00:00<?, ?B/s] 
model.safetensors: 100%|█████████████████████████████████████████████████████████| 120M/120M [00:06<00:00, 18.0MB/s] 
Loaded detection model vikp/surya_layout2 on device cpu with dtype torch.float32
preprocessor_config.json: 100%|████████████████████████████████████████████████████████████| 430/430 [00:00<?, ?B/s] 
config.json: 100%|█████████████████████████████████████████████████████████████████████| 5.04k/5.04k [00:00<?, ?B/s] 
model.safetensors: 100%|█████████████████████████████████████████████████████████| 550M/550M [00:34<00:00, 16.2MB/s] 
generation_config.json: 100%|██████████████████████████████████████████████████████████████| 160/160 [00:00<?, ?B/s] 
Loaded reading order model vikp/surya_order on device cpu with dtype torch.float32
preprocessor_config.json: 100%|████████████████████████████████████████████████████████████| 684/684 [00:00<?, ?B/s] 
config.json: 100%|█████████████████████████████████████████████████████████████| 6.91k/6.91k [00:00<00:00, 6.82MB/s] 
model.safetensors: 100%|███████████████████████████████████████████████████████| 1.05G/1.05G [01:04<00:00, 16.2MB/s] 
generation_config.json: 100%|██████████████████████████████████████████████████████████████| 181/181 [00:00<?, ?B/s]
Loaded recognition model vikp/surya_rec on device cpu with dtype torch.float32
preprocessor_config.json: 100%|█████████████████████████████████████████████████████| 608/608 [00:00<00:00, 605kB/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████| 4.92k/4.92k [00:00<?, ?B/s]
model.safetensors: 100%|█████████████████████████████████████████████████████████| 625M/625M [00:38<00:00, 16.4MB/s]
generation_config.json: 100%|██████████████████████████████████████████████████████████████| 191/191 [00:00<?, ?B/s]
Loaded texify model to cpu with torch.float32 dtype
preprocessor_config.json: 100%|████████████████████████████████████████████████████████████| 617/617 [00:00<?, ?B/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████| 4.49k/4.49k [00:00<?, ?B/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████| 2.14M/2.14M [00:00<00:00, 2.85MB/s]
added_tokens.json: 100%|███████████████████████████████████████████████████████████████| 18.3k/18.3k [00:00<?, ?B/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████| 552/552 [00:00<00:00, 6.29MB/s] 
Detecting bboxes: 100%|███████████████████████████████████████████████████████████████| 7/7 [05:49<00:00, 49.99s/it] 
Recognizing Text: 100%|███████████████████████████████████████████████████████████████| 1/1 [00:11<00:00, 11.37s/it] 
Detecting bboxes: 100%|███████████████████████████████████████████████████████████████| 5/5 [05:32<00:00, 66.45s/it] 
Finding reading order: 100%|██████████████████████████████████████████████████████████| 5/5 [03:15<00:00, 39.04s/it] 
Saved markdown to the ./folderGPT folder

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134

配置:用户可以通过环境变量或配置文件调整Marker的行为,如设置OCR引擎、指定GPU设备、配置内存使用等。
命令行工具:Marker提供了命令行工具,允许用户以批处理方式转换单个或多个PDF文件。




商业使用与许可:

商业限制:虽然研究和个人使用是免费的,但商业使用受到一定限制。模型权重采用cc-by-nc-sa-4.0许可证,但作者为符合条件的小型组织提供了许可证豁免。
双许可选项:对于需要去除GPL许可证要求或超出收入限制的商业用户,提供了双许可选项。


社区与支持:

Discord社区:用户可以在Discord上讨论Marker的未来开发和其他相关问题。
文档与示例:GitHub仓库提供了详细的文档和示例,帮助用户快速上手。



总结:
Marker是一个功能强大、易于使用的PDF转Markdown工具,通过深度学习模型和OCR技术的结合,实现了高效且准确的文档转换。它不仅支持多种文档类型和语言,还提供了丰富的配置选项和命令行工具,满足了不同用户的需求。同时,Marker的社区支持和文档也非常完善,为用户提供了良好的使用体验。
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22