Handlers Module
Overview
File format handlers for different document types.
- CSV Handler - CSV file format handler
- DOC Handler - DOC file format handler
- DOCX Handler - DOCX file format handler
- HTML Handler - HTML file format handler
- JSON Handler - JSON file format handler
- MD Handler - MD file format handler
- PDF Handler - PDF file format handler
- RTF Handler - RTF file format handler
- TXT Handler - TXT file format handler
- XML Handler - XML file format handler
- ZIP Handler - ZIP file format handler
File type-specific handlers package.
Modules:
Name | Description |
---|---|
csv |
CSV file handler for text extraction. |
doc |
DOC file handler for text extraction. |
docx |
DOCX file handler for text extraction. |
html |
HTML file handler for text extraction. |
json |
JSON file handler for text extraction. |
md |
Markdown (.md) file handler for text extraction. |
pdf |
PDF file handler for text extraction. |
rtf |
RTF file handler for text extraction. |
txt |
TXT file handler for text extraction. |
xml |
XML file handler for text extraction. |
zip |
ZIP file handler for text extraction. |
Modules
csv
CSV file handler for text extraction.
Classes:
Name | Description |
---|---|
CSVHandler |
Handler for extracting text from CSV files. |
Classes
CSVHandler
Bases: FileTypeHandler
Handler for extracting text from CSV files.
Methods:
Name | Description |
---|---|
extract |
|
extract_async |
|
Source code in textxtract/handlers/csv.py
Functions
extract
Source code in textxtract/handlers/csv.py
doc
DOC file handler for text extraction.
Classes:
Name | Description |
---|---|
DOCHandler |
Handler for extracting text from DOC files with fallback options. |
Classes
DOCHandler
Bases: FileTypeHandler
Handler for extracting text from DOC files with fallback options.
Methods:
Name | Description |
---|---|
extract |
|
extract_async |
|
Source code in textxtract/handlers/doc.py
Functions
extract
Source code in textxtract/handlers/doc.py
docx
DOCX file handler for text extraction.
Classes:
Name | Description |
---|---|
DOCXHandler |
Handler for extracting text from DOCX files. |
Classes
DOCXHandler
Bases: FileTypeHandler
Handler for extracting text from DOCX files.
Methods:
Name | Description |
---|---|
extract |
|
extract_async |
|
Source code in textxtract/handlers/docx.py
Functions
extract
Source code in textxtract/handlers/docx.py
html
HTML file handler for text extraction.
Classes:
Name | Description |
---|---|
HTMLHandler |
Handler for extracting text from HTML files. |
Classes
HTMLHandler
Bases: FileTypeHandler
Handler for extracting text from HTML files.
Methods:
Name | Description |
---|---|
extract |
|
extract_async |
|
Source code in textxtract/handlers/html.py
Functions
extract
Source code in textxtract/handlers/html.py
json
JSON file handler for text extraction.
Classes:
Name | Description |
---|---|
JSONHandler |
Handler for extracting text from JSON files. |
Classes
JSONHandler
Bases: FileTypeHandler
Handler for extracting text from JSON files.
Methods:
Name | Description |
---|---|
extract |
|
extract_async |
|
Source code in textxtract/handlers/json.py
Functions
extract
Source code in textxtract/handlers/json.py
md
Markdown (.md) file handler for text extraction.
Classes:
Name | Description |
---|---|
MDHandler |
Handler for extracting text from Markdown files. |
Classes
MDHandler
Bases: FileTypeHandler
Handler for extracting text from Markdown files.
Methods:
Name | Description |
---|---|
extract |
|
extract_async |
|
Source code in textxtract/handlers/md.py
Functions
extract
Source code in textxtract/handlers/md.py
pdf
PDF file handler for text extraction.
Classes:
Name | Description |
---|---|
PDFHandler |
Handler for extracting text from PDF files with improved error handling. |
Classes
PDFHandler
Bases: FileTypeHandler
Handler for extracting text from PDF files with improved error handling.
Methods:
Name | Description |
---|---|
extract |
|
extract_async |
|
Source code in textxtract/handlers/pdf.py
Functions
extract
Source code in textxtract/handlers/pdf.py
rtf
RTF file handler for text extraction.
Classes:
Name | Description |
---|---|
RTFHandler |
Handler for extracting text from RTF files. |
Classes
RTFHandler
Bases: FileTypeHandler
Handler for extracting text from RTF files.
Methods:
Name | Description |
---|---|
extract |
|
extract_async |
|
Source code in textxtract/handlers/rtf.py
Functions
extract
Source code in textxtract/handlers/rtf.py
txt
TXT file handler for text extraction.
Classes:
Name | Description |
---|---|
TXTHandler |
Handler for extracting text from TXT files. |
Classes
TXTHandler
Bases: FileTypeHandler
Handler for extracting text from TXT files.
Methods:
Name | Description |
---|---|
extract |
|
extract_async |
|
Source code in textxtract/handlers/txt.py
Functions
extract
Source code in textxtract/handlers/txt.py
xml
XML file handler for text extraction.
Classes:
Name | Description |
---|---|
XMLHandler |
Handler for extracting text from XML files. |
Classes
XMLHandler
Bases: FileTypeHandler
Handler for extracting text from XML files.
Methods:
Name | Description |
---|---|
extract |
|
extract_async |
|
Source code in textxtract/handlers/xml.py
Functions
extract
Source code in textxtract/handlers/xml.py
zip
ZIP file handler for text extraction.
Classes:
Name | Description |
---|---|
ZIPHandler |
Handler for extracting text from ZIP archives with security checks. |
Attributes:
Name | Type | Description |
---|---|---|
logger |
|
Attributes
Classes
ZIPHandler
Bases: FileTypeHandler
Handler for extracting text from ZIP archives with security checks.
Methods:
Name | Description |
---|---|
extract |
|
extract_async |
|
Attributes:
Name | Type | Description |
---|---|---|
MAX_EXTRACT_SIZE |
|
|
MAX_FILES |
|
Source code in textxtract/handlers/zip.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
|
Attributes
Functions
extract