Core Module
Overview
Core components of the text extraction framework.
- Base Classes - Abstract base classes
- Configuration - Configuration management
- Exceptions - Custom exceptions
- Registry - Handler registry
- Utils - Utility functions
Core components for textxtract package.
Modules:
Name | Description |
---|---|
base |
Abstract base classes for text extraction. |
config |
Configuration and customization for textxtract package. |
exceptions |
Custom exceptions for textxtract package. |
logging_config |
Logging configuration for textxtract package. |
registry |
Handler registry for centralized handler management. |
utils |
Utility functions for textxtract package. |
Modules
base
Abstract base classes for text extraction.
Classes:
Name | Description |
---|---|
FileTypeHandler |
Abstract base class for file type-specific handlers. |
TextExtractor |
Abstract base class for text extractors. |
Classes
FileTypeHandler
Bases: ABC
Abstract base class for file type-specific handlers.
Methods:
Name | Description |
---|---|
extract |
Extract text synchronously from a file. |
extract_async |
Extract text asynchronously from a file. |
Source code in textxtract/core/base.py
Functions
extract
abstractmethod
extract_async
abstractmethod
async
TextExtractor
Bases: ABC
Abstract base class for text extractors.
Methods:
Name | Description |
---|---|
extract |
Extract text synchronously from file path or bytes. |
Source code in textxtract/core/base.py
Functions
extract
abstractmethod
Extract text synchronously from file path or bytes.
config
Configuration and customization for textxtract package.
Classes:
Name | Description |
---|---|
ExtractorConfig |
Enhanced configuration options for text extraction with validation. |
Classes
ExtractorConfig
Enhanced configuration options for text extraction with validation.
Methods:
Name | Description |
---|---|
__init__ |
|
__repr__ |
|
from_file |
Load configuration from a file (JSON, YAML, or TOML). |
get_handler |
Retrieve a handler for a given file extension. |
get_handler_config |
Get configuration specific to a handler. |
register_handler |
Register a custom file type handler. |
to_dict |
Convert configuration to dictionary. |
Attributes:
Name | Type | Description |
---|---|---|
custom_handlers |
|
|
encoding |
|
|
extra_config |
|
|
logging_format |
|
|
logging_level |
|
|
max_file_size |
|
|
max_memory_usage |
|
|
timeout |
|
Source code in textxtract/core/config.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
|
Attributes
logging_format
instance-attribute
Functions
__init__
__init__(encoding='utf-8', logging_level='INFO', logging_format=None, timeout=None, max_file_size=None, max_memory_usage=None, custom_handlers=None, **kwargs)
Source code in textxtract/core/config.py
__repr__
from_file
classmethod
Load configuration from a file (JSON, YAML, or TOML).
Source code in textxtract/core/config.py
get_handler
get_handler_config
Get configuration specific to a handler.
Source code in textxtract/core/config.py
register_handler
Register a custom file type handler.
to_dict
Convert configuration to dictionary.
Source code in textxtract/core/config.py
exceptions
Custom exceptions for textxtract package.
Classes:
Name | Description |
---|---|
ExtractionError |
Raised when a general extraction error occurs. |
ExtractionTimeoutError |
Raised when extraction exceeds the allowed timeout. |
FileTypeNotSupportedError |
Raised when the file type is not supported. |
InvalidFileError |
Raised when the file is invalid or unsupported. |
Classes
ExtractionError
ExtractionTimeoutError
Bases: ExtractionError
Raised when extraction exceeds the allowed timeout.
FileTypeNotSupportedError
Bases: ExtractionError
Raised when the file type is not supported.
InvalidFileError
Bases: ExtractionError
Raised when the file is invalid or unsupported.
logging_config
Logging configuration for textxtract package.
Functions:
Name | Description |
---|---|
setup_logging |
Configure logging for the package. |
Functions
setup_logging
Configure logging for the package.
registry
Handler registry for centralized handler management.
Classes:
Name | Description |
---|---|
HandlerRegistry |
Central registry for file type handlers with caching and lazy loading. |
Attributes:
Name | Type | Description |
---|---|---|
logger |
|
|
registry |
|
Attributes
Classes
HandlerRegistry
Central registry for file type handlers with caching and lazy loading.
Methods:
Name | Description |
---|---|
__init__ |
|
__new__ |
|
get_handler |
Get handler instance for file extension with caching. |
get_supported_extensions |
Get list of all supported file extensions. |
is_supported |
Check if a file extension is supported. |
register_handler |
Register a custom handler for a file extension. |
Source code in textxtract/core/registry.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
|
Functions
__init__
__new__
get_handler
cached
Get handler instance for file extension with caching.
Source code in textxtract/core/registry.py
get_supported_extensions
is_supported
register_handler
Register a custom handler for a file extension.
Source code in textxtract/core/registry.py
utils
Utility functions for textxtract package.
Classes:
Name | Description |
---|---|
FileInfo |
File information data class. |
Functions:
Name | Description |
---|---|
create_temp_file |
Create a temporary file from bytes and return its path with security validation. |
get_file_info |
Get file information for logging and debugging. |
safe_unlink |
Safely delete a file if it exists, optionally logging errors. |
validate_file_extension |
Check if the file has an allowed extension. |
validate_file_size |
Validate file size doesn't exceed limits. |
validate_filename |
Validate filename for security issues. |
Attributes:
Name | Type | Description |
---|---|---|
DEFAULT_MAX_FILE_SIZE |
|
|
DEFAULT_MAX_TEMP_FILES |
|
Attributes
Classes
FileInfo
dataclass
Functions
create_temp_file
Create a temporary file from bytes and return its path with security validation.
Source code in textxtract/core/utils.py
get_file_info
Get file information for logging and debugging.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
Union[Path, str, bytes]
|
File path or file bytes |
required |
|
Optional[str]
|
Required if source is bytes, optional for file paths |
None
|
Returns:
Name | Type | Description |
---|---|---|
FileInfo |
FileInfo
|
Data class with file information |
Source code in textxtract/core/utils.py
safe_unlink
Safely delete a file if it exists, optionally logging errors.
Source code in textxtract/core/utils.py
validate_file_extension
validate_file_size
Validate file size doesn't exceed limits.
Source code in textxtract/core/utils.py
validate_filename
Validate filename for security issues.