Testing
Comprehensive testing guide for the textxtract
package.
π§ͺ Running Tests
The project uses pytest for all tests with support for both synchronous and asynchronous testing.
Prerequisites
Install the package with all optional dependencies for complete testing:
Basic Test Execution
# Run all tests
pytest
# Run with verbose output
pytest -v
# Run specific test file
pytest tests/test_sync.py
pytest tests/test_async.py
# Run tests with coverage
pytest --cov=textxtract
Test Categories
# Run only synchronous tests
pytest tests/test_sync.py
# Run only asynchronous tests
pytest tests/test_async.py
# Run exception handling tests
pytest tests/test_exceptions.py
# Run edge case tests
pytest tests/test_edge_cases.py
π Test Structure
tests/
βββ __init__.py
βββ test_sync.py # Synchronous extractor tests
βββ test_async.py # Asynchronous extractor tests
βββ test_exceptions.py # Error handling tests
βββ test_edge_cases.py # Edge cases and validation
βββ files/ # Sample test files
βββ text_file.txt
βββ text_file.pdf
βββ text_file.docx
βββ markdown.md
βββ text.csv
βββ text.json
βββ text.html
βββ text.xml
βββ ...
π§ Test Coverage
File Type Coverage
The test suite covers all supported file types:
File Type | Test Files | Sync Tests | Async Tests |
---|---|---|---|
Plain Text | text_file.txt , text_file.text |
β | β |
Markdown | markdown.md |
β | β |
text_file.pdf |
β | β | |
Word | text_file.docx |
β | β |
Legacy Word | text_file.doc |
β | β |
Rich Text | text_file.rtf |
β | β |
HTML | text.html |
β | β |
CSV | text.csv |
β | β |
JSON | text.json |
β | β |
XML | text.xml |
β | β |
ZIP | text_zip.zip |
β | β |
Input Method Coverage
- β
File path extraction (
extractor.extract("/path/to/file.pdf")
) - β
Bytes extraction (
extractor.extract(file_bytes, "file.pdf")
) - β Both sync and async methods
- β Error handling for unsupported types
- β Context manager usage
Error Handling Coverage
- β
FileTypeNotSupportedError
for unsupported extensions - β
InvalidFileError
for corrupted/missing files - β
ExtractionError
for extraction failures - β
ValueError
for missing filename with bytes input
π― Writing Custom Tests
Testing File Extraction
import pytest
from pathlib import Path
from textxtract import SyncTextExtractor
from textxtract.core.exceptions import FileTypeNotSupportedError
def test_custom_file_extraction():
extractor = SyncTextExtractor()
# Test with file path
text = extractor.extract("path/to/test/file.txt")
assert isinstance(text, str)
assert len(text) > 0
# Test with bytes
with open("path/to/test/file.txt", "rb") as f:
file_bytes = f.read()
text = extractor.extract(file_bytes, "file.txt")
assert isinstance(text, str)
assert len(text) > 0
Testing Async Extraction
import pytest
from textxtract import AsyncTextExtractor
@pytest.mark.asyncio
async def test_async_extraction():
extractor = AsyncTextExtractor()
text = await extractor.extract("path/to/test/file.txt")
assert isinstance(text, str)
assert len(text) > 0
Testing Error Conditions
import pytest
from textxtract import SyncTextExtractor
from textxtract.core.exceptions import (
FileTypeNotSupportedError,
InvalidFileError
)
def test_error_handling():
extractor = SyncTextExtractor()
# Test unsupported file type
with pytest.raises(FileTypeNotSupportedError):
extractor.extract(b"dummy", "file.unsupported")
# Test missing file
with pytest.raises(InvalidFileError):
extractor.extract("nonexistent_file.txt")
# Test missing filename with bytes
with pytest.raises(ValueError):
extractor.extract(b"dummy bytes")
π Performance Testing
Memory Usage Testing
import psutil
import os
from textxtract import SyncTextExtractor
def test_memory_usage():
process = psutil.Process(os.getpid())
initial_memory = process.memory_info().rss
extractor = SyncTextExtractor()
# Process large file
text = extractor.extract("large_file.pdf")
final_memory = process.memory_info().rss
memory_increase = final_memory - initial_memory
# Assert reasonable memory usage
assert memory_increase < 100 * 1024 * 1024 # Less than 100MB
Concurrent Processing Testing
import asyncio
import pytest
from textxtract import AsyncTextExtractor
@pytest.mark.asyncio
async def test_concurrent_extraction():
extractor = AsyncTextExtractor()
files = ["file1.txt", "file2.pdf", "file3.docx"]
# Process files concurrently
tasks = [extractor.extract(file) for file in files]
results = await asyncio.gather(*tasks, return_exceptions=True)
# Verify all succeeded
for result in results:
assert isinstance(result, str)
assert len(result) > 0
π Test Configuration
pytest.ini Configuration
[tool:pytest]
minversion = 6.0
addopts = -ra -q --tb=short
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
asyncio_mode = auto
markers =
slow: marks tests as slow (deselect with '-m "not slow"')
integration: marks tests as integration tests
Running Specific Test Categories
# Run only fast tests
pytest -m "not slow"
# Run only integration tests
pytest -m integration
# Run tests with specific keyword
pytest -k "test_sync"
# Run tests and stop on first failure
pytest -x
π Debugging Tests
Verbose Output
# Maximum verbosity
pytest -vvv
# Show local variables in tracebacks
pytest --tb=long
# Show stdout/stderr
pytest -s
Debugging with pdb
import pytest
def test_with_debugger():
extractor = SyncTextExtractor()
# Set breakpoint
pytest.set_trace()
text = extractor.extract("test_file.txt")
assert text
π Test Reports
Coverage Reports
# Generate coverage report
pytest --cov=textxtract --cov-report=html
# View coverage in terminal
pytest --cov=textxtract --cov-report=term-missing
# Generate XML coverage for CI
pytest --cov=textxtract --cov-report=xml
JUnit XML Reports
π Continuous Integration
GitHub Actions Example
name: Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.9, 3.10, 3.11, 3.12]
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
pip install -e .[all]
pip install pytest pytest-asyncio pytest-cov
- name: Run tests
run: pytest --cov=textxtract
β Validation Checklist
Before submitting changes, ensure:
- [ ] All existing tests pass
- [ ] New features have corresponding tests
- [ ] Error conditions are tested
- [ ] Both sync and async methods are tested
- [ ] Documentation examples are tested
- [ ] Performance regressions are checked
- [ ] Memory leaks are verified
π Troubleshooting Tests
Common Issues
Missing dependencies:
Import errors:
# Ensure correct import
from textxtract import SyncTextExtractor # Correct
from text_extractor import SyncTextExtractor # Wrong
Async test issues:
File not found errors:
# Use absolute paths in tests
TEST_FILES_DIR = Path(__file__).parent / "files"
file_path = TEST_FILES_DIR / "test_file.txt"
For more testing help, see the API Reference or Usage Guide.