प्रौद्योगिकी साझेदारी

PDF इत्यस्य Markdown इत्यत्र परिवर्तनार्थं मुक्तस्रोतसाधनानाम् विश्लेषणम्

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

मार्करः : PDF इत्यस्य Markdown इत्यत्र परिवर्तनार्थं मुक्तस्रोतसाधनानाम् विश्लेषणम्
Marker इति एकः मुक्तस्रोतप्रकल्पः Vik Paruchuri इत्यनेन GitHub इत्यत्र विकसितः अस्ति अस्य मूलकार्यं PDF सञ्चिकाः Markdown प्रारूपे परिवर्तयितुं । मार्कर परियोजनायाः विस्तृतं विश्लेषणं निम्नलिखितम् अस्ति ।

परियोजनायाः अवलोकनम् : १.

परियोजनालिङ्कः https://github.com/VikParuchuri/marker.git
निर्वाह : विकपरुचुरी
मुख्यविशेषताः : PDF इत्येतत् Markdown प्रारूपे शीघ्रं सटीकतया च परिवर्तयन्तु, बहुविधदस्तावेजप्रकारस्य समर्थनं कुर्वन्ति, विशेषतः पुस्तकानि वैज्ञानिकपत्राणि च।

तकनीकीविशेषताः : १.

गहनशिक्षणप्रतिरूपम् : मार्करः पाठं निष्कासयितुं, पृष्ठविन्यासस्य अन्वेषणाय, पाठखण्डान् स्वच्छं कर्तुं प्रारूपयितुं च, अन्ते च तान् Markdown दस्तावेजेषु संयोजयितुं गहनशिक्षणप्रतिमानानाम् एकां श्रृङ्खलां उपयुज्यते
OCR समर्थनम् : येषां परिदृश्यानां OCR इत्यस्य आवश्यकता भवति, तेषां कृते Marker पाठनिष्कासनस्य सटीकता सुनिश्चित्य Surya तथा Tesseract इत्यादीनां OCR उपकरणानां उपयोगं समर्थयति ।
बहु-मञ्चसमर्थनम् : मार्करः भिन्न-भिन्न-हार्डवेयर-वातावरणानां आवश्यकतानां पूर्तये GPU, CPU अथवा MPS इत्यत्र चालयितुं शक्नोति ।

विशेषताविवरणम् : १.

दस्तावेजप्रक्रियाकरणम् : शीर्षकाणि, पादलेखाः अन्ये च अशुद्धयः दूरीकर्तुं, सारणीः, कोडखण्डाः च स्वरूपयितुं, चित्राणि निष्कासयितुं, रक्षितुं च समर्थयति ।
भाषासमर्थनम् : मार्करः सर्वान् भाषान् समर्थयति, उपयोक्तारः भाषासूचीं निर्दिश्य OCR प्रभावं अनुकूलितुं शक्नुवन्ति ।
समीकरणरूपान्तरणम् : अधिकांशसमीकरणानि LaTeX प्रारूपे परिवर्तयितुं समर्थः, येन Markdown दस्तावेजेषु गणितीयसूत्राणि एम्बेड् कर्तुं सुलभं भवति ।

प्रदर्शनम्‌:

गतिः सटीकता च : मार्करः वेगे सटीकता च उत्कृष्टः भवति, विशेषतः अन्यसाधनानाम् यथा नूगाट् इत्यादीनां तुलने महत्त्वपूर्णं लाभं ददाति ।
संसाधनस्य उपयोगः : A6000 Ada इत्यत्र चालने प्रत्येकं कार्यं औसतेन प्रायः 4GB VRAM गृह्णाति, यत् बहुविधदस्तावेजानां समानान्तरप्रक्रियाकरणस्य समर्थनं करोति ।

उपयोक्तुः मार्गदर्शनम् : १.

संस्थापनम् : उपयोक्तृभ्यः pip मार्गेण marker-pdf संकुलं संस्थापनीयम्

pip install marker-pdf 

  • 1
  • 2
(GraphRAG) PS D:python-workspaceGraphRAG> pip install marker-pdf 
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting marker-pdf
  Downloading https://mirrors.aliyun.com/pypi/packages/05/c1/782f56407ea60bd35c127c829b8e43da99a0da41f6c9ee002cab97e430c5/marker_pdf-0.2.15-py3-none-any.whl (63 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.2/63.2 kB 563.9 kB/s eta 0:00:00
Requirement already satisfied: Pillow<11.0.0,>=10.1.0 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (10.4.0)
Requirement already satisfied: filetype<2.0.0,>=1.2.0 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (1.2.0)
Collecting ftfy<7.0.0,>=6.1.1 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/f4/f0/21efef51304172736b823689aaf82f33dbc64f54e9b046b75f5212d5cee7/ftfy-6.2.0-py3-none-any.whl (54 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.4/54.4 kB 353.5 kB/s eta 0:00:00
Requirement already satisfied: grpcio<2.0.0,>=1.63.0 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (1.64.1)
Requirement already satisfied: numpy<2.0.0,>=1.26.1 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (1.26.4)
Collecting pdftext<0.4.0,>=0.3.10 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/54/78/8dd39d5ed3b90fb7ecaa20f92ff09c4594877a88501de6352d22e8c53aa0/pdftext-0.3.10-py3-none-any.whl (25 kB)
Requirement already satisfied: pydantic<3.0.0,>=2.4.2 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (2.8.0)
Collecting pydantic-settings<3.0.0,>=2.0.3 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/e8/4f/aad03d5f711717d94d7de9684cb542343b392df1ad6889118636674fc983/pydantic_settings-2.3.4-py3-none-any.whl (22 kB)
Requirement already satisfied: python-dotenv<2.0.0,>=1.0.0 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (1.0.1)
Collecting rapidfuzz<4.0.0,>=3.8.1 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/60/a6/6c2f5e9be933150a6d55ffce4ff6d9701ddfc5b267c789a84674eadbd373/rapidfuzz-3.9.4-cp311-cp311-win_amd64.whl (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 684.6 kB/s eta 0:00:00
Requirement already satisfied: regex<2025.0.0,>=2024.4.28 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (2024.5.15)
Requirement already satisfied: scikit-learn<2.0.0,>=1.3.2 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (1.5.0)
Collecting surya-ocr<0.5.0,>=0.4.14 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/62/a8/dd78c484fa9a459e388a31aa3a45d23eb454c6aeb2a17710284631088615/surya_ocr-0.4.14-py3-none-any.whl (94 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 94.5/94.5 kB 773.2 kB/s eta 0:00:00
Collecting tabulate<0.10.0,>=0.9.0 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/40/44/4a5f08c96eb108af5cb50b41f76142f0afa346dfa99d5296fe7202a11854/tabulate-0.9.0-py3-none-any.whl (35 kB)
Collecting texify<0.2.0,>=0.1.10 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/76/26/c12d194dd90bd78b524a7054e9125685efc32149d29005ca61c72ff4c126/texify-0.1.10-py3-none-any.whl (30 kB)
Collecting torch<3.0.0,>=2.2.2 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/d3/1d/a257913c89572de61316461db91867f87519146e58132cdeace3d9ffbe1f/torch-2.3.1-cp311-cp311-win_amd64.whl (159.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 159.8/159.8 MB 635.4 kB/s eta 0:00:00
Requirement already satisfied: tqdm<5.0.0,>=4.66.1 in e:programdataminiconda3envsgraphraglibsite-packages (from marker-pdf) (4.66.4)
Collecting transformers<5.0.0,>=4.36.2 (from marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/20/5c/244db59e074e80248fdfa60495eeee257e4d97c3df3487df68be30cd60c8/transformers-4.42.3-py3-none-any.whl (9.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.3/9.3 MB 639.8 kB/s eta 0:00:00
Collecting wcwidth<0.3.0,>=0.2.12 (from ftfy<7.0.0,>=6.1.1->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/fd/84/fd2ba7aafacbad3c4201d395674fc6348826569da3c0937e75505ead3528/wcwidth-0.2.13-py2.py3-none-any.whl (34 kB)
Collecting pypdfium2<5.0.0,>=4.29.0 (from pdftext<0.4.0,>=0.3.10->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/25/bd/56d9ec6b9f0fc4e0d95288759f3179f0fcd34b1a1526b75673d2f6d5196f/pypdfium2-4.30.0-py3-none-win_amd64.whl (2.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.9/2.9 MB 627.3 kB/s eta 0:00:00
Requirement already satisfied: annotated-types>=0.4.0 in e:programdataminiconda3envsgraphraglibsite-packages (from pydantic<3.0.0,>=2.4.2->marker-pdf) (0.7.0)
Requirement already satisfied: pydantic-core==2.20.0 in e:programdataminiconda3envsgraphraglibsite-packages (from pydantic<3.0.0,>=2.4.2->marker-pdf) (2.20.0)
Requirement already satisfied: typing-extensions>=4.6.1 in e:programdataminiconda3envsgraphraglibsite-packages (from pydantic<3.0.0,>=2.4.2->marker-pdf) (4.12.2)
Requirement already satisfied: scipy>=1.6.0 in e:programdataminiconda3envsgraphraglibsite-packages (from scikit-learn<2.0.0,>=1.3.2->marker-pdf) (1.12.0)
Requirement already satisfied: joblib>=1.2.0 in e:programdataminiconda3envsgraphraglibsite-packages (from scikit-learn<2.0.0,>=1.3.2->marker-pdf) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in e:programdataminiconda3envsgraphraglibsite-packages (from scikit-learn<2.0.0,>=1.3.2->marker-pdf) (3.5.0)
Collecting opencv-python<5.0.0.0,>=4.9.0.80 (from surya-ocr<0.5.0,>=0.4.14->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/ec/6c/fab8113424af5049f85717e8e527ca3773299a3c6b02506e66436e19874f/opencv_python-4.10.0.84-cp37-abi3-win_amd64.whl (38.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.8/38.8 MB 546.5 kB/s eta 0:00:00
Collecting filelock (from torch<3.0.0,>=2.2.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/ae/f0/48285f0262fe47103a4a45972ed2f9b93e4c80b8fd609fa98da78b2a5706/filelock-3.15.4-py3-none-any.whl (16 kB)
Collecting sympy (from torch<3.0.0,>=2.2.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/61/53/e18c8c97d0b2724d85c9830477e3ebea3acf1dcdc6deb344d5d9c93a9946/sympy-1.12.1-py3-none-any.whl (5.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 624.0 kB/s eta 0:00:00
Requirement already satisfied: networkx in e:programdataminiconda3envsgraphraglibsite-packages (from torch<3.0.0,>=2.2.2->marker-pdf) (3.3)
Collecting jinja2 (from torch<3.0.0,>=2.2.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/31/80/3a54838c3fb461f6fec263ebf3a3a41771bd05190238de3486aae8540c36/jinja2-3.1.4-py3-none-any.whl (133 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.3/133.3 kB 492.1 kB/s eta 0:00:00
Requirement already satisfied: fsspec in e:programdataminiconda3envsgraphraglibsite-packages (from torch<3.0.0,>=2.2.2->marker-pdf) (2024.6.1)
Collecting mkl<=2021.4.0,>=2021.1.1 (from torch<3.0.0,>=2.2.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/fe/1c/5f6dbf18e8b73e0a5472466f0ea8d48ce9efae39bd2ff38cebf8dce61259/mkl-2021.4.0-py2.py3-none-win_amd64.whl (228.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 228.5/228.5 MB 597.0 kB/s eta 0:00:00
Requirement already satisfied: colorama in e:programdataminiconda3envsgraphraglibsite-packages (from tqdm<5.0.0,>=4.66.1->marker-pdf) (0.4.6)
Collecting huggingface-hub<1.0,>=0.23.2 (from transformers<5.0.0,>=4.36.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/69/d6/73f9d1b7c4da5f0544bc17680d0fa9932445423b90cd38e1ee77d001a4f5/huggingface_hub-0.23.4-py3-none-any.whl (402 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 402.6/402.6 kB 598.2 kB/s eta 0:00:00
Requirement already satisfied: packaging>=20.0 in e:programdataminiconda3envsgraphraglibsite-packages (from transformers<5.0.0,>=4.36.2->marker-pdf) (23.2)
Requirement already satisfied: pyyaml>=5.1 in e:programdataminiconda3envsgraphraglibsite-packages (from transformers<5.0.0,>=4.36.2->marker-pdf) (6.0.1)
Requirement already satisfied: requests in e:programdataminiconda3envsgraphraglibsite-packages (from transformers<5.0.0,>=4.36.2->marker-pdf) (2.32.3)
Collecting safetensors>=0.4.1 (from transformers<5.0.0,>=4.36.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/cb/f6/19f268662be898ff2a23ac06f8dd0d2956b2ecd204c96e1ee07ba292c119/safetensors-0.4.3-cp311-none-win_amd64.whl (287 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 287.3/287.3 kB 571.3 kB/s eta 0:00:00
Collecting tokenizers<0.20,>=0.19 (from transformers<5.0.0,>=4.36.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/65/8e/6d7d72b28f22c422cff8beae10ac3c2e4376b9be721ef8167b7eecd1da62/tokenizers-0.19.1-cp311-none-win_amd64.whl (2.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.2/2.2 MB 625.9 kB/s eta 0:00:00
Collecting intel-openmp==2021.* (from mkl<=2021.4.0,>=2021.1.1->torch<3.0.0,>=2.2.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/6f/21/b590c0cc3888b24f2ac9898c41d852d7454a1695fbad34bee85dba6dc408/intel_openmp-2021.4.0-py2.py3-none-win_amd64.whl (3.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.5/3.5 MB 487.9 kB/s eta 0:00:00
Collecting tbb==2021.* (from mkl<=2021.4.0,>=2021.1.1->torch<3.0.0,>=2.2.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/f1/24/500811330b3b070e5995c3275181dbcd00c06cef26c6ebfe6ee1ca9b6223/tbb-2021.13.0-py3-none-win_amd64.whl (286 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 286.9/286.9 kB 505.8 kB/s eta 0:00:00
Collecting MarkupSafe>=2.0 (from jinja2->torch<3.0.0,>=2.2.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/b7/a2/c78a06a9ec6d04b3445a949615c4c7ed86a0b2eb68e44e7541b9d57067cc/MarkupSafe-2.1.5-cp311-cp311-win_amd64.whl (17 kB)
Requirement already satisfied: charset-normalizer<4,>=2 in e:programdataminiconda3envsgraphraglibsite-packages (from requests->transformers<5.0.0,>=4.36.2->marker-pdf) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in e:programdataminiconda3envsgraphraglibsite-packages (from requests->transformers<5.0.0,>=4.36.2->marker-pdf) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in e:programdataminiconda3envsgraphraglibsite-packages (from requests->transformers<5.0.0,>=4.36.2->marker-pdf) (2.2.2)
Requirement already satisfied: certifi>=2017.4.17 in e:programdataminiconda3envsgraphraglibsite-packages (from requests->transformers<5.0.0,>=4.36.2->marker-pdf) (2024.6.2)
Collecting mpmath<1.4.0,>=1.1.0 (from sympy->torch<3.0.0,>=2.2.2->marker-pdf)
  Downloading https://mirrors.aliyun.com/pypi/packages/43/e3/7d92a15f894aa0c9c4b49b8ee9ac9850d6e63b03c9c32c0367a13ae62209/mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 495.3 kB/s eta 0:00:00
Installing collected packages: wcwidth, tbb, mpmath, intel-openmp, tabulate, sympy, safetensors, rapidfuzz, pypdfium2
, opencv-python, mkl, MarkupSafe, ftfy, filelock, jinja2, huggingface-hub, torch, tokenizers, pydantic-settings, transformers, pdftext, texify, surya-ocr, marker-pdf
Successfully installed MarkupSafe-2.1.5 filelock-3.15.4 ftfy-6.2.0 huggingface-hub-0.23.4 intel-openmp-2021.4.0 jinja2-3.1.4 marker-pdf-0.2.15 mkl-2021.4.0 mpmath-1.3.0 opencv-python-4.10.0.84 pdftext-0.3.10 pydantic-settings-2.3.4 pypdfium2-4.30.0 rapidfuzz-3.9.4 safetensors-0.4.3 surya-ocr-0.4.14 sympy-1.12.1 tabulate-0.9.0 tbb-2021.13.0 texify-0.1.10 tokenizers-0.19.1 torch-2.3.1 transformers-4.42.3 wcwidth-0.2.13


使用示例:

```bash
(GraphRAG) PS D:python-workspaceGraphRAG> marker_single GPT.pdf ./folder --batch_multiplier 2 --max_pages 52 --langs English
config.json: 100%|█████████████████████████████████████████████████████████████████████| 1.18k/1.18k [00:00<?, ?B/s] 
model.safetensors: 100%|█████████████████████████████████████████████████████████| 120M/120M [00:07<00:00, 16.7MB/s] 
Loaded detection model vikp/surya_det2 on device cpu with dtype torch.float32
preprocessor_config.json: 100%|████████████████████████████████████████████████████████████| 430/430 [00:00<?, ?B/s] 
config.json: 100%|█████████████████████████████████████████████████████████████████████| 1.57k/1.57k [00:00<?, ?B/s] 
model.safetensors: 100%|█████████████████████████████████████████████████████████| 120M/120M [00:06<00:00, 18.0MB/s] 
Loaded detection model vikp/surya_layout2 on device cpu with dtype torch.float32
preprocessor_config.json: 100%|████████████████████████████████████████████████████████████| 430/430 [00:00<?, ?B/s] 
config.json: 100%|█████████████████████████████████████████████████████████████████████| 5.04k/5.04k [00:00<?, ?B/s] 
model.safetensors: 100%|█████████████████████████████████████████████████████████| 550M/550M [00:34<00:00, 16.2MB/s] 
generation_config.json: 100%|██████████████████████████████████████████████████████████████| 160/160 [00:00<?, ?B/s] 
Loaded reading order model vikp/surya_order on device cpu with dtype torch.float32
preprocessor_config.json: 100%|████████████████████████████████████████████████████████████| 684/684 [00:00<?, ?B/s] 
config.json: 100%|█████████████████████████████████████████████████████████████| 6.91k/6.91k [00:00<00:00, 6.82MB/s] 
model.safetensors: 100%|███████████████████████████████████████████████████████| 1.05G/1.05G [01:04<00:00, 16.2MB/s] 
generation_config.json: 100%|██████████████████████████████████████████████████████████████| 181/181 [00:00<?, ?B/s]
Loaded recognition model vikp/surya_rec on device cpu with dtype torch.float32
preprocessor_config.json: 100%|█████████████████████████████████████████████████████| 608/608 [00:00<00:00, 605kB/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████| 4.92k/4.92k [00:00<?, ?B/s]
model.safetensors: 100%|█████████████████████████████████████████████████████████| 625M/625M [00:38<00:00, 16.4MB/s]
generation_config.json: 100%|██████████████████████████████████████████████████████████████| 191/191 [00:00<?, ?B/s]
Loaded texify model to cpu with torch.float32 dtype
preprocessor_config.json: 100%|████████████████████████████████████████████████████████████| 617/617 [00:00<?, ?B/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████| 4.49k/4.49k [00:00<?, ?B/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████| 2.14M/2.14M [00:00<00:00, 2.85MB/s]
added_tokens.json: 100%|███████████████████████████████████████████████████████████████| 18.3k/18.3k [00:00<?, ?B/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████| 552/552 [00:00<00:00, 6.29MB/s] 
Detecting bboxes: 100%|███████████████████████████████████████████████████████████████| 7/7 [05:49<00:00, 49.99s/it] 
Recognizing Text: 100%|███████████████████████████████████████████████████████████████| 1/1 [00:11<00:00, 11.37s/it] 
Detecting bboxes: 100%|███████████████████████████████████████████████████████████████| 5/5 [05:32<00:00, 66.45s/it] 
Finding reading order: 100%|██████████████████████████████████████████████████████████| 5/5 [03:15<00:00, 39.04s/it] 
Saved markdown to the ./folderGPT folder

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134

配置:用户可以通过环境变量或配置文件调整Marker的行为,如设置OCR引擎、指定GPU设备、配置内存使用等。
命令行工具:Marker提供了命令行工具,允许用户以批处理方式转换单个或多个PDF文件。




商业使用与许可:

商业限制:虽然研究和个人使用是免费的,但商业使用受到一定限制。模型权重采用cc-by-nc-sa-4.0许可证,但作者为符合条件的小型组织提供了许可证豁免。
双许可选项:对于需要去除GPL许可证要求或超出收入限制的商业用户,提供了双许可选项。


社区与支持:

Discord社区:用户可以在Discord上讨论Marker的未来开发和其他相关问题。
文档与示例:GitHub仓库提供了详细的文档和示例,帮助用户快速上手。



总结:
Marker是一个功能强大、易于使用的PDF转Markdown工具,通过深度学习模型和OCR技术的结合,实现了高效且准确的文档转换。它不仅支持多种文档类型和语言,还提供了丰富的配置选项和命令行工具,满足了不同用户的需求。同时,Marker的社区支持和文档也非常完善,为用户提供了良好的使用体验。
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22