Home Tutorials Training Consulting Books Company Contact us






Get more...

Learn Python programming with practical examples including web crawling and AsciiDoc validation. This comprehensive tutorial covers modern Python development using Python 3.12+ features. You’ll build real-world applications including a web crawler and a document validation tool.

1. Overview

1.1. What is Python

Python is a high-level, interpreted programming language renowned for its simplicity, readability, and powerful capabilities. Created by Guido van Rossum and first released in 1991, Python has become one of the world’s most popular programming languages.

The name Python comes from the British comedy group Monty Python’s Flying Circus, reflecting the language’s emphasis on fun and accessibility. Python code is executed by an interpreter that converts source code into bytecode, which is then executed by the Python virtual machine.

Python’s popularity stems from several key strengths:

  • Readable syntax: Code looks almost like natural English

  • Versatile applications: Web development, data science, automation, AI/ML, and more

  • Rich ecosystem: Extensive standard library and third-party packages via PyPI

  • Cross-platform: Runs on Windows, macOS, Linux, and many other platforms

  • Strong community: Excellent documentation, tutorials, and community support

  • Rapid development: Faster to write and maintain than many other languages

1.2. Modern Python Features (3.12+)

Python continues to evolve with powerful new features:

  • F-strings: Modern string formatting with embedded expressions

  • Type hints: Optional static typing for better code documentation

  • Async/await: Built-in support for asynchronous programming

  • Pattern matching: Structural pattern matching (match/case statements)

  • Performance improvements: Faster startup times and execution

  • Better error messages: More helpful debugging information

1.3. Real-World Applications

In this tutorial, you’ll build practical applications that demonstrate Python’s capabilities:

  • Web Crawler: Extract and process data from websites using requests and BeautifulSoup

  • Document Validator: Check AsciiDoc files for formatting issues using LanguageTools API

  • Data Processing: Handle files, APIs, and structured data with modern Python techniques

1.4. About this tutorial

This tutorial provides a hands-on approach to learning Python through practical examples. You’ll start with Python fundamentals, then progress to building real applications. By the end, you’ll have the skills to create your own Python projects and understand modern development practices.

2. Setting Up Your Python Development Environment

2.1. Installing Python

Python 3.12+ is recommended for modern development. Most systems come with Python pre-installed, but you should ensure you have the latest version.

2.1.1. Windows

Download Python from https://www.python.org/downloads/ and run the installer. Important: Check "Add Python to PATH" during installation.

# Verify installation
python --version
# or
python3 --version

2.1.2. macOS

Use Homebrew (recommended) or download from python.org:

# Install Homebrew first, then:
brew install python

# Verify
python3 --version

2.1.3. Linux (Ubuntu/Debian)

sudo apt update
sudo apt install python3 python3-pip python3-venv

# Verify
python3 --version

2.2. Virtual Environments

Virtual environments isolate your project dependencies. Always use virtual environments for Python projects.

# Create a virtual environment
python3 -m venv myproject_env

# Activate it
# On Windows:
myproject_env\Scripts\activate
# On macOS/Linux:
source myproject_env/bin/activate

# Install packages
pip install requests beautifulsoup4 language-tool-python

# Deactivate when done
deactivate

2.3. Recommended Development Tools

While you can use any text editor, these tools enhance productivity:

  • Visual Studio Code: Free, powerful, with excellent Python extension

  • PyCharm: Full-featured IDE (Community edition is free)

  • Jupyter Notebooks: Great for data analysis and learning

  • Terminal/Command Line: Essential for running Python scripts

2.3.1. Visual Studio Code Setup

  1. Download from https://code.visualstudio.com/

  2. Install the Python extension by Microsoft

  3. Open a Python file - VS Code will help you select the right interpreter

2.4. Package Management with pip

pip is Python’s package installer. Use requirements.txt files to manage dependencies:

# Install a package
pip install requests

# Install from requirements file
pip install -r requirements.txt

# List installed packages
pip list

# Generate requirements file
pip freeze > requirements.txt

3. Your First Python Program

Let’s start with a simple but practical Python program that demonstrates modern syntax and best practices.

3.1. Hello, World! - Modern Style

Create a new file called hello_world.py:

#!/usr/bin/env python3
"""
A simple Hello World program demonstrating modern Python syntax.
"""

def greet(name: str) -> str:
    """Return a personalized greeting."""
    return f"Hello, {name}! Welcome to Python programming."

def main():
    """Main program entry point."""
    # Get user input
    user_name = input("What's your name? ")
    
    # Create and display greeting
    greeting = greet(user_name)
    print(greeting)
    
    # Show some Python features
    languages = ["Python", "Java", "JavaScript", "Go"]
    print(f"\nHere are some popular programming languages:")
    for i, lang in enumerate(languages, 1):
        print(f"{i}. {lang}")

if __name__ == "__main__":
    main()

Run it from your terminal:

python3 hello_world.py

3.2. A More Practical Example

Let’s create a simple file analyzer that demonstrates Python’s strengths:

#!/usr/bin/env python3
"""
File analyzer that demonstrates modern Python features.
"""

import os
from pathlib import Path
from typing import Dict, List

def analyze_file(file_path: Path) -> Dict[str, any]:
    """Analyze a text file and return statistics."""
    try:
        with open(file_path, 'r', encoding='utf-8') as file:
            content = file.read()
            lines = content.split('\n')
            
            return {
                'filename': file_path.name,
                'size_bytes': file_path.stat().st_size,
                'line_count': len(lines),
                'word_count': len(content.split()),
                'char_count': len(content),
                'extension': file_path.suffix
            }
    except FileNotFoundError:
        return {'error': f'File not found: {file_path}'}
    except Exception as e:
        return {'error': f'Error reading file: {e}'}

def analyze_directory(directory: str) -> List[Dict[str, any]]:
    """Analyze all text files in a directory."""
    path = Path(directory)
    results = []
    
    if not path.exists():
        return [{'error': f'Directory not found: {directory}'}]
    
    # Find text files
    text_extensions = {'.txt', '.py', '.md', '.adoc', '.rst'}
    
    for file_path in path.iterdir():
        if file_path.is_file() and file_path.suffix in text_extensions:
            results.append(analyze_file(file_path))
    
    return results

def main():
    """Main program demonstrating file analysis."""
    print("🔍 File Analyzer - Modern Python Example")
    print("=" * 40)
    
    # Analyze current directory
    directory = "."
    results = analyze_directory(directory)
    
    if not results:
        print("No text files found in current directory.")
        return
    
    # Display results
    total_files = len(results)
    total_lines = sum(r.get('line_count', 0) for r in results if 'error' not in r)
    
    print(f"\nFound {total_files} text files:")
    print(f"Total lines: {total_lines:,}")
    print("\nFile details:")
    print("-" * 60)
    
    for result in results:
        if 'error' in result:
            print(f"❌ {result['error']}")
        else:
            print(f"📄 {result['filename']:20} | "
                  f"{result['line_count']:4} lines | "
                  f"{result['size_bytes']:6} bytes")

if __name__ == "__main__":
    main()

This example shows:

  • Modern f-string formatting

  • Type hints for better code documentation

  • Exception handling

  • Working with files and paths

  • Using Python’s standard library

3.3. Interactive Python Development

Python includes an interactive interpreter perfect for experimentation:

# Start interactive Python
python3

# Try some expressions
>>> name = "Python"
>>> print(f"Hello, {name}!")
Hello, Python!

>>> numbers = [1, 2, 3, 4, 5]
>>> sum(numbers)
15

>>> exit()

3.4. Organizing Your First Project

For real projects, use this structure:

my_python_project/
├── requirements.txt          # Project dependencies
├── README.md                # Project documentation
├── main.py                  # Entry point
├── src/                     # Source code
│   ├── __init__.py
│   └── my_module.py
└── tests/                   # Test files
    ├── __init__.py
    └── test_my_module.py

4. Python Programming Fundamentals

4.1. Python Syntax Overview

Python 3.12+ includes many features that make code more readable and maintainable. Let’s explore the most important concepts.

4.2. Variables and Type Hints

Python is dynamically typed, but you can add type hints for better code documentation:

#!/usr/bin/env python3
"""
Modern Python variables and type hints examples.
"""

from typing import List, Dict, Optional

# Basic variables with type hints
name: str = "Alice"
age: int = 30
height: float = 5.6
is_student: bool = True

# Collections with type hints
numbers: List[int] = [1, 2, 3, 4, 5]
scores: Dict[str, int] = {"math": 95, "science": 88, "history": 92}
middle_name: Optional[str] = None  # Can be None or string

# Dynamic typing still works
dynamic_var = "starts as string"
dynamic_var = 42  # now it's an integer
dynamic_var = ["now", "it's", "a", "list"]

# Multiple assignment
x, y, z = 1, 2, 3
first, *rest = [1, 2, 3, 4, 5]  # first=1, rest=[2,3,4,5]

# Constants (by convention, use UPPER_CASE)
MAX_CONNECTIONS: int = 100
API_URL: str = "https://api.example.com"

print(f"Hello, {name}! You are {age} years old.")
print(f"Your scores: {scores}")
print(f"First number: {first}, rest: {rest}")

4.3. String Operations

Python offers powerful string manipulation with f-strings being the preferred approach:

#!/usr/bin/env python3
"""
Modern Python string operations with f-strings and advanced techniques.
"""

# F-string formatting (preferred in Python 3.6+)
name = "Python"
version = 3.12
print(f"Welcome to {name} {version}!")

# Multi-line f-strings
user = {"name": "Alice", "age": 30, "city": "New York"}
message = f"""
Hello {user['name']}!
You are {user['age']} years old
and live in {user['city']}.
"""
print(message)

# F-strings with expressions and formatting
numbers = [1, 2, 3, 4, 5]
print(f"Sum of {numbers} = {sum(numbers)}")
print(f"Pi to 3 decimal places: {3.14159:.3f}")

# String methods and operations
text = "  Hello, World!  "
print(f"Original: '{text}'")
print(f"Stripped: '{text.strip()}'")
print(f"Uppercase: '{text.upper()}'")
print(f"Lowercase: '{text.lower()}'")
print(f"Title Case: '{text.title()}'")

# String slicing and indexing
sentence = "Python programming is fun"
print(f"First word: {sentence[:6]}")
print(f"Last word: {sentence.split()[-1]}")
print(f"Every 2nd character: {sentence[::2]}")

# String checking methods
email = "user@example.com"
print(f"Contains @: {'@' in email}")
print(f"Starts with 'user': {email.startswith('user')}")
print(f"Ends with '.com': {email.endswith('.com')}")

# Joining and splitting
words = ["Python", "is", "awesome"]
joined = " ".join(words)
print(f"Joined: {joined}")
print(f"Split back: {joined.split()}")

# Raw strings for regex patterns
import re
pattern = r"\d{3}-\d{3}-\d{4}"  # Phone number pattern
phone = "123-456-7890"
print(f"Phone match: {bool(re.match(pattern, phone))}")

4.4. Working with Collections

Python provides rich data structures for organizing and manipulating data:

#!/usr/bin/env python3
"""
Working with Python collections: lists, dictionaries, sets, and tuples.
"""

from collections import defaultdict, Counter
from typing import List, Dict, Set, Tuple

# Lists - ordered, mutable collections
fruits: List[str] = ["apple", "banana", "cherry", "date"]
print(f"Fruits: {fruits}")

# List comprehensions (Pythonic way to create lists)
squares = [x**2 for x in range(1, 6)]
even_numbers = [x for x in range(20) if x % 2 == 0]
print(f"Squares: {squares}")
print(f"Even numbers: {even_numbers}")

# List operations
fruits.append("elderberry")
fruits.extend(["fig", "grape"])
print(f"After additions: {fruits}")

# Dictionaries - key-value pairs
person: Dict[str, any] = {
    "name": "Alice",
    "age": 30,
    "skills": ["Python", "Java", "JavaScript"],
    "is_employed": True
}

# Dictionary comprehension
word_lengths = {word: len(word) for word in fruits}
print(f"Word lengths: {word_lengths}")

# Safe dictionary access
age = person.get("age", 0)  # Returns 0 if "age" not found
print(f"Age: {age}")

# Sets - unique, unordered collections
unique_numbers: Set[int] = {1, 2, 3, 3, 4, 4, 5}
print(f"Unique numbers: {unique_numbers}")

# Set operations
set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}
print(f"Union: {set1 | set2}")
print(f"Intersection: {set1 & set2}")
print(f"Difference: {set1 - set2}")

# Tuples - immutable sequences
coordinates: Tuple[float, float] = (10.5, 20.3)
rgb_color: Tuple[int, int, int] = (255, 128, 0)
print(f"Coordinates: {coordinates}")

# Named tuples for better structure
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
p = Point(10, 20)
print(f"Point: x={p.x}, y={p.y}")

# Advanced collections
# defaultdict - provides default values
word_count = defaultdict(int)
text = "hello world hello python world"
for word in text.split():
    word_count[word] += 1
print(f"Word count: {dict(word_count)}")

# Counter - counts occurrences
counter = Counter(text.split())
print(f"Most common word: {counter.most_common(1)}")

# Unpacking and packing
numbers = [1, 2, 3, 4, 5]
first, second, *rest = numbers
print(f"First: {first}, Second: {second}, Rest: {rest}")

# Zip for parallel iteration
names = ["Alice", "Bob", "Charlie"]
ages = [25, 30, 35]
for name, age in zip(names, ages):
    print(f"{name} is {age} years old")

4.5. Functions with Modern Features

Functions in Python support default arguments, type hints, and advanced features:

#!/usr/bin/env python3
"""
Modern Python functions with type hints and advanced features.
"""

from typing import List, Optional, Callable, Any
from functools import wraps

# Basic function with type hints
def greet(name: str, age: int = 25) -> str:
    """Return a personalized greeting."""
    return f"Hello, {name}! You are {age} years old."

# Function with optional parameters
def create_user(name: str, email: str, age: Optional[int] = None) -> dict:
    """Create a user dictionary with optional age."""
    user = {"name": name, "email": email}
    if age is not None:
        user["age"] = age
    return user

# Function with variable arguments
def calculate_average(*numbers: float) -> float:
    """Calculate average of any number of values."""
    if not numbers:
        return 0.0
    return sum(numbers) / len(numbers)

# Function with keyword arguments
def create_config(**kwargs: Any) -> dict:
    """Create configuration dictionary from keyword arguments."""
    defaults = {"debug": False, "port": 8080}
    defaults.update(kwargs)
    return defaults

# Lambda functions (anonymous functions)
square = lambda x: x**2
add = lambda x, y: x + y

# Higher-order functions
def apply_operation(numbers: List[int], operation: Callable[[int], int]) -> List[int]:
    """Apply an operation to each number in the list."""
    return [operation(n) for n in numbers]

# Decorator function
def timer(func):
    """Decorator to time function execution."""
    @wraps(func)
    def wrapper(*args, **kwargs):
        import time
        start = time.time()
        result = func(*args, **kwargs)
        end = time.time()
        print(f"{func.__name__} took {end - start:.4f} seconds")
        return result
    return wrapper

@timer
def slow_function():
    """A function that takes some time."""
    import time
    time.sleep(0.1)
    return "Done!"

# Generator function
def fibonacci(n: int):
    """Generate fibonacci numbers up to n."""
    a, b = 0, 1
    count = 0
    while count < n:
        yield a
        a, b = b, a + b
        count += 1

# Example usage
if __name__ == "__main__":
    # Basic functions
    print(greet("Alice"))
    print(greet("Bob", 30))
    
    # User creation
    user1 = create_user("Alice", "alice@example.com")
    user2 = create_user("Bob", "bob@example.com", 30)
    print(f"User 1: {user1}")
    print(f"User 2: {user2}")
    
    # Variable arguments
    avg = calculate_average(10, 20, 30, 40, 50)
    print(f"Average: {avg}")
    
    # Keyword arguments
    config = create_config(debug=True, host="localhost", port=3000)
    print(f"Config: {config}")
    
    # Lambda and higher-order functions
    numbers = [1, 2, 3, 4, 5]
    squared = apply_operation(numbers, square)
    print(f"Squared: {squared}")
    
    # Decorator
    slow_function()
    
    # Generator
    fib_numbers = list(fibonacci(10))
    print(f"Fibonacci: {fib_numbers}")

4.6. Modern Class Design

Object-oriented programming in Python with modern best practices:

#!/usr/bin/env python3
"""
Modern Python classes with type hints, dataclasses, and properties.
"""

from dataclasses import dataclass
from typing import List, Optional, ClassVar
from abc import ABC, abstractmethod

# Modern class with type hints and properties
class Person:
    """A person with name, age, and email."""
    
    # Class variable
    species: ClassVar[str] = "Homo sapiens"
    
    def __init__(self, name: str, age: int, email: str) -> None:
        self._name = name
        self._age = age
        self._email = email
        self._friends: List[str] = []
    
    @property
    def name(self) -> str:
        """Get the person's name."""
        return self._name
    
    @property
    def age(self) -> int:
        """Get the person's age."""
        return self._age
    
    @age.setter
    def age(self, value: int) -> None:
        """Set the person's age with validation."""
        if value < 0:
            raise ValueError("Age cannot be negative")
        self._age = value
    
    @property
    def email(self) -> str:
        """Get the person's email."""
        return self._email
    
    def add_friend(self, friend_name: str) -> None:
        """Add a friend to the person's friend list."""
        if friend_name not in self._friends:
            self._friends.append(friend_name)
    
    def get_friends(self) -> List[str]:
        """Get a copy of the friend list."""
        return self._friends.copy()
    
    def __str__(self) -> str:
        return f"Person(name='{self.name}', age={self.age}, email='{self.email}')"
    
    def __repr__(self) -> str:
        return self.__str__()

# Dataclass - automatically generates __init__, __str__, __eq__, etc.
@dataclass
class Product:
    """A product with name, price, and quantity."""
    name: str
    price: float
    quantity: int = 0
    category: Optional[str] = None
    
    def total_value(self) -> float:
        """Calculate total value of this product."""
        return self.price * self.quantity
    
    def __post_init__(self):
        """Validate data after initialization."""
        if self.price < 0:
            raise ValueError("Price cannot be negative")

# Abstract base class
class Animal(ABC):
    """Abstract animal class."""
    
    def __init__(self, name: str, species: str):
        self.name = name
        self.species = species
    
    @abstractmethod
    def make_sound(self) -> str:
        """Make a sound - must be implemented by subclasses."""
        pass
    
    def sleep(self) -> str:
        """All animals can sleep."""
        return f"{self.name} is sleeping..."

# Concrete implementation
class Dog(Animal):
    """A dog that inherits from Animal."""
    
    def __init__(self, name: str, breed: str):
        super().__init__(name, "Canis lupus")
        self.breed = breed
    
    def make_sound(self) -> str:
        return f"{self.name} says Woof!"
    
    def fetch(self, item: str) -> str:
        return f"{self.name} fetches the {item}!"

# Class with static and class methods
class MathUtils:
    """Utility class for mathematical operations."""
    
    PI: ClassVar[float] = 3.14159
    
    @staticmethod
    def add(a: float, b: float) -> float:
        """Add two numbers."""
        return a + b
    
    @classmethod
    def circle_area(cls, radius: float) -> float:
        """Calculate circle area using class constant."""
        return cls.PI * radius * radius

# Example usage
if __name__ == "__main__":
    # Regular class
    person = Person("Alice", 30, "alice@example.com")
    person.add_friend("Bob")
    person.add_friend("Charlie")
    print(person)
    print(f"Friends: {person.get_friends()}")
    
    # Dataclass
    product = Product("Laptop", 999.99, 5, "Electronics")
    print(f"Product: {product}")
    print(f"Total value: ${product.total_value():.2f}")
    
    # Inheritance and polymorphism
    dog = Dog("Buddy", "Golden Retriever")
    print(dog.make_sound())
    print(dog.fetch("ball"))
    print(dog.sleep())
    
    # Static and class methods
    result = MathUtils.add(5, 3)
    area = MathUtils.circle_area(10)
    print(f"5 + 3 = {result}")
    print(f"Circle area (r=10): {area:.2f}")

4.7. Error Handling and Exceptions

Robust error handling is essential for reliable applications:

#!/usr/bin/env python3
"""
Modern error handling and exception management in Python.
"""

import logging
from typing import Optional, List, Dict
from pathlib import Path

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Custom exceptions
class ValidationError(Exception):
    """Raised when data validation fails."""
    pass

class NetworkError(Exception):
    """Raised when network operations fail."""
    def __init__(self, message: str, status_code: Optional[int] = None):
        super().__init__(message)
        self.status_code = status_code

# Basic exception handling
def safe_divide(a: float, b: float) -> Optional[float]:
    """Safely divide two numbers."""
    try:
        result = a / b
        return result
    except ZeroDivisionError:
        logger.error("Cannot divide by zero")
        return None
    except TypeError as e:
        logger.error(f"Type error: {e}")
        return None

# Multiple exception handling
def process_user_input(user_input: str) -> Optional[int]:
    """Process user input and return integer."""
    try:
        # Try to convert to integer
        number = int(user_input)
        
        # Validate range
        if number < 0:
            raise ValidationError("Number must be positive")
        
        return number
        
    except ValueError:
        logger.error(f"'{user_input}' is not a valid number")
        return None
    except ValidationError as e:
        logger.error(f"Validation error: {e}")
        return None

# File operations with exception handling
def read_config_file(filename: str) -> Dict[str, any]:
    """Read configuration from a file with proper error handling."""
    config = {}
    file_path = Path(filename)
    
    try:
        # Check if file exists
        if not file_path.exists():
            raise FileNotFoundError(f"Config file {filename} not found")
        
        # Read and parse file
        with open(file_path, 'r', encoding='utf-8') as file:
            for line_num, line in enumerate(file, 1):
                line = line.strip()
                if line and not line.startswith('#'):
                    try:
                        key, value = line.split('=', 1)
                        config[key.strip()] = value.strip()
                    except ValueError:
                        logger.warning(f"Invalid line {line_num}: {line}")
        
        return config
        
    except FileNotFoundError as e:
        logger.error(f"File error: {e}")
        return {"error": "file_not_found"}
    except PermissionError:
        logger.error(f"Permission denied reading {filename}")
        return {"error": "permission_denied"}
    except Exception as e:
        logger.error(f"Unexpected error reading {filename}: {e}")
        return {"error": "unexpected_error"}

# Context manager for resource handling
class DatabaseConnection:
    """Mock database connection with proper cleanup."""
    
    def __init__(self, connection_string: str):
        self.connection_string = connection_string
        self.connected = False
    
    def __enter__(self):
        """Enter context - establish connection."""
        logger.info("Connecting to database...")
        self.connected = True
        return self
    
    def __exit__(self, exc_type, exc_val, exc_tb):
        """Exit context - clean up connection."""
        if self.connected:
            logger.info("Closing database connection...")
            self.connected = False
        
        # Handle exceptions that occurred in the context
        if exc_type is not None:
            logger.error(f"Exception in context: {exc_type.__name__}: {exc_val}")
        
        # Return False to propagate exceptions
        return False
    
    def query(self, sql: str) -> List[Dict]:
        """Execute a database query."""
        if not self.connected:
            raise ConnectionError("Not connected to database")
        
        # Simulate database operation
        logger.info(f"Executing query: {sql}")
        return [{"id": 1, "name": "example"}]

# Finally block example
def process_data_with_cleanup(data_file: str) -> bool:
    """Process data file with guaranteed cleanup."""
    temp_file = None
    try:
        # Open temporary file
        temp_file = open("temp_processing.txt", "w")
        
        # Process data (might raise exceptions)
        with open(data_file, "r") as file:
            data = file.read()
            temp_file.write(data.upper())
        
        logger.info("Data processed successfully")
        return True
        
    except FileNotFoundError:
        logger.error(f"Data file {data_file} not found")
        return False
    except Exception as e:
        logger.error(f"Error processing data: {e}")
        return False
    finally:
        # This always runs, even if exception occurred
        if temp_file and not temp_file.closed:
            temp_file.close()
            logger.info("Temporary file closed")

# Example usage
if __name__ == "__main__":
    # Safe division
    print(f"10 / 2 = {safe_divide(10, 2)}")
    print(f"10 / 0 = {safe_divide(10, 0)}")
    
    # User input processing
    test_inputs = ["42", "-5", "not_a_number", "100"]
    for inp in test_inputs:
        result = process_user_input(inp)
        print(f"Input '{inp}' -> {result}")
    
    # Context manager usage
    try:
        with DatabaseConnection("sqlite://memory") as db:
            results = db.query("SELECT * FROM users")
            print(f"Query results: {results}")
    except Exception as e:
        print(f"Database operation failed: {e}")
    
    # Configuration file reading
    config = read_config_file("nonexistent_config.txt")
    print(f"Config: {config}")
    
    # Processing with cleanup
    success = process_data_with_cleanup("nonexistent_data.txt")
    print(f"Processing successful: {success}")

4.8. File Operations and Context Managers

Working with files safely using context managers:

#!/usr/bin/env python3
"""
Modern file operations using context managers and pathlib.
"""

from pathlib import Path
from typing import List, Dict, Optional
import json
import csv
import logging

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Reading files with context managers
def read_text_file(filename: str) -> Optional[str]:
    """Read a text file safely using context manager."""
    try:
        file_path = Path(filename)
        with open(file_path, 'r', encoding='utf-8') as file:
            content = file.read()
        logger.info(f"Successfully read {file_path.name} ({len(content)} characters)")
        return content
    except FileNotFoundError:
        logger.error(f"File {filename} not found")
        return None
    except Exception as e:
        logger.error(f"Error reading {filename}: {e}")
        return None

# Writing files with automatic cleanup
def write_text_file(filename: str, content: str) -> bool:
    """Write content to a text file."""
    try:
        file_path = Path(filename)
        # Create parent directories if they don't exist
        file_path.parent.mkdir(parents=True, exist_ok=True)
        
        with open(file_path, 'w', encoding='utf-8') as file:
            file.write(content)
        
        logger.info(f"Successfully wrote to {file_path.name}")
        return True
    except Exception as e:
        logger.error(f"Error writing to {filename}: {e}")
        return False

# Working with JSON files
def read_json_file(filename: str) -> Optional[Dict]:
    """Read and parse JSON file."""
    try:
        file_path = Path(filename)
        with open(file_path, 'r', encoding='utf-8') as file:
            data = json.load(file)
        logger.info(f"Successfully loaded JSON from {file_path.name}")
        return data
    except json.JSONDecodeError as e:
        logger.error(f"Invalid JSON in {filename}: {e}")
        return None
    except FileNotFoundError:
        logger.error(f"JSON file {filename} not found")
        return None

def write_json_file(filename: str, data: Dict) -> bool:
    """Write data to JSON file with proper formatting."""
    try:
        file_path = Path(filename)
        file_path.parent.mkdir(parents=True, exist_ok=True)
        
        with open(file_path, 'w', encoding='utf-8') as file:
            json.dump(data, file, indent=2, ensure_ascii=False)
        
        logger.info(f"Successfully wrote JSON to {file_path.name}")
        return True
    except Exception as e:
        logger.error(f"Error writing JSON to {filename}: {e}")
        return False

# Working with CSV files
def read_csv_file(filename: str) -> Optional[List[Dict]]:
    """Read CSV file and return list of dictionaries."""
    try:
        file_path = Path(filename)
        data = []
        
        with open(file_path, 'r', encoding='utf-8', newline='') as file:
            reader = csv.DictReader(file)
            for row in reader:
                data.append(row)
        
        logger.info(f"Successfully read {len(data)} rows from {file_path.name}")
        return data
    except Exception as e:
        logger.error(f"Error reading CSV {filename}: {e}")
        return None

def write_csv_file(filename: str, data: List[Dict], fieldnames: List[str]) -> bool:
    """Write data to CSV file."""
    try:
        file_path = Path(filename)
        file_path.parent.mkdir(parents=True, exist_ok=True)
        
        with open(file_path, 'w', encoding='utf-8', newline='') as file:
            writer = csv.DictWriter(file, fieldnames=fieldnames)
            writer.writeheader()
            writer.writerows(data)
        
        logger.info(f"Successfully wrote {len(data)} rows to {file_path.name}")
        return True
    except Exception as e:
        logger.error(f"Error writing CSV to {filename}: {e}")
        return False

# Working with paths using pathlib
def analyze_directory(directory: str) -> Dict[str, any]:
    """Analyze directory contents using pathlib."""
    try:
        dir_path = Path(directory)
        if not dir_path.exists():
            return {"error": f"Directory {directory} does not exist"}
        
        if not dir_path.is_dir():
            return {"error": f"{directory} is not a directory"}
        
        files = []
        total_size = 0
        
        for file_path in dir_path.iterdir():
            if file_path.is_file():
                size = file_path.stat().st_size
                files.append({
                    "name": file_path.name,
                    "size": size,
                    "extension": file_path.suffix,
                    "modified": file_path.stat().st_mtime
                })
                total_size += size
        
        return {
            "directory": str(dir_path),
            "file_count": len(files),
            "total_size": total_size,
            "files": files
        }
    except Exception as e:
        return {"error": f"Error analyzing directory: {e}"}

# Processing lines from large files
def process_large_file(filename: str, line_processor=None) -> int:
    """Process a large file line by line to save memory."""
    if line_processor is None:
        line_processor = lambda line, num: print(f"Line {num}: {line.strip()}")
    
    try:
        file_path = Path(filename)
        line_count = 0
        
        with open(file_path, 'r', encoding='utf-8') as file:
            for line_num, line in enumerate(file, 1):
                line_processor(line, line_num)
                line_count += 1
        
        logger.info(f"Processed {line_count} lines from {file_path.name}")
        return line_count
    except Exception as e:
        logger.error(f"Error processing file {filename}: {e}")
        return 0

# Example usage and demonstrations
def create_sample_files():
    """Create sample files for demonstration."""
    # Sample text file
    text_content = """This is a sample text file.
It contains multiple lines.
Each line demonstrates file handling capabilities."""
    write_text_file("sample_data/sample.txt", text_content)
    
    # Sample JSON file
    json_data = {
        "name": "Python Tutorial",
        "version": "3.12",
        "features": ["modern syntax", "type hints", "async support"],
        "author": {"name": "Alice", "email": "alice@example.com"}
    }
    write_json_file("sample_data/config.json", json_data)
    
    # Sample CSV file
    csv_data = [
        {"name": "Alice", "age": "30", "city": "New York"},
        {"name": "Bob", "age": "25", "city": "San Francisco"},
        {"name": "Charlie", "age": "35", "city": "Chicago"}
    ]
    write_csv_file("sample_data/users.csv", csv_data, ["name", "age", "city"])

if __name__ == "__main__":
    # Create sample files
    create_sample_files()
    
    # Read and display files
    text = read_text_file("sample_data/sample.txt")
    if text:
        print("Text file content:")
        print(text)
    
    json_data = read_json_file("sample_data/config.json")
    if json_data:
        print(f"JSON data: {json_data}")
    
    csv_data = read_csv_file("sample_data/users.csv")
    if csv_data:
        print(f"CSV data: {csv_data}")
    
    # Analyze directory
    analysis = analyze_directory("sample_data")
    print(f"Directory analysis: {analysis}")
    
    # Process file line by line
    def word_counter(line, line_num):
        words = len(line.split())
        print(f"Line {line_num} has {words} words")
    
    process_large_file("sample_data/sample.txt", word_counter)

5. Modern Python Deployment

Today’s Python applications can be deployed in many ways, from traditional web hosting to modern cloud platforms.

5.1. Popular Deployment Platforms

Cloud Platforms: * Heroku: Simple deployment with Git integration * Google Cloud Platform: Powerful infrastructure with App Engine, Cloud Run * AWS: Comprehensive services including Lambda, EC2, Elastic Beanstalk * Microsoft Azure: Full-featured cloud platform with App Service * DigitalOcean: Developer-friendly with App Platform

Containerization: * Docker: Package applications with all dependencies * Kubernetes: Orchestrate containers at scale

5.2. Modern Web Frameworks

For web development, consider these popular Python frameworks:

  • FastAPI: Modern, fast API framework with automatic documentation

  • Django: Full-featured web framework with admin interface

  • Flask: Lightweight and flexible micro-framework

  • Streamlit: Quick data science web apps

5.3. Simple Deployment Example

Here’s how to create a basic web API with FastAPI:

# main.py
from fastapi import FastAPI

app = FastAPI(title="Python Tutorial API")

@app.get("/")
def read_root():
    return {"message": "Hello from Python!"}

@app.get("/crawl-status")
def crawl_status():
    return {"status": "Web crawler ready", "version": "1.0"}

Install dependencies:

pip install fastapi uvicorn

Run locally:

uvicorn main:app --reload

This creates a REST API that can be deployed to any cloud platform.

6. Building a Web Crawler

Web crawling is a common task in data science, SEO analysis, and content aggregation. Our web crawler will extract data from web pages using modern Python libraries.

6.1. Installation and Setup

First, install the required libraries:

pip install requests beautifulsoup4 lxml

6.2. Web Crawler Implementation

#!/usr/bin/env python3
"""
Modern web crawler using requests and BeautifulSoup.
Demonstrates best practices for web scraping in Python.
"""

import requests
from bs4 import BeautifulSoup, Tag
from typing import Dict, List, Optional, Set
from urllib.parse import urljoin, urlparse
import time
import logging
from dataclasses import dataclass
from pathlib import Path
import json

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

@dataclass
class CrawlResult:
    """Data structure for crawl results."""
    url: str
    title: str
    status_code: int
    links: List[str]
    text_content: str
    meta_description: str
    word_count: int
    crawl_time: float

class WebCrawler:
    """A respectful web crawler with rate limiting and error handling."""
    
    def __init__(self, delay: float = 1.0, max_retries: int = 3):
        """
        Initialize the web crawler.
        
        Args:
            delay: Delay between requests in seconds
            max_retries: Maximum number of retry attempts
        """
        self.delay = delay
        self.max_retries = max_retries
        self.session = requests.Session()
        
        # Set a reasonable user agent
        self.session.headers.update({
            'User-Agent': 'Python Web Crawler Tutorial Bot 1.0 (+https://example.com/bot)'
        })
        
        self.crawled_urls: Set[str] = set()
        self.results: List[CrawlResult] = []
    
    def crawl_page(self, url: str) -> Optional[CrawlResult]:
        """
        Crawl a single web page and extract information.
        
        Args:
            url: The URL to crawl
            
        Returns:
            CrawlResult object or None if crawl failed
        """
        if url in self.crawled_urls:
            logger.info(f"Already crawled: {url}")
            return None
        
        logger.info(f"Crawling: {url}")
        start_time = time.time()
        
        try:
            # Make request with retry logic
            response = self._make_request(url)
            if not response:
                return None
            
            # Parse HTML content
            soup = BeautifulSoup(response.content, 'html.parser')
            
            # Extract page information
            title = self._extract_title(soup)
            links = self._extract_links(soup, url)
            text_content = self._extract_text(soup)
            meta_description = self._extract_meta_description(soup)
            word_count = len(text_content.split())
            
            crawl_time = time.time() - start_time
            
            result = CrawlResult(
                url=url,
                title=title,
                status_code=response.status_code,
                links=links,
                text_content=text_content[:500] + "..." if len(text_content) > 500 else text_content,
                meta_description=meta_description,
                word_count=word_count,
                crawl_time=crawl_time
            )
            
            self.crawled_urls.add(url)
            self.results.append(result)
            
            # Respect the website with delay
            time.sleep(self.delay)
            
            return result
            
        except Exception as e:
            logger.error(f"Error crawling {url}: {e}")
            return None
    
    def _make_request(self, url: str) -> Optional[requests.Response]:
        """Make HTTP request with retry logic."""
        for attempt in range(self.max_retries):
            try:
                response = self.session.get(url, timeout=10)
                response.raise_for_status()
                return response
                
            except requests.exceptions.RequestException as e:
                logger.warning(f"Attempt {attempt + 1} failed for {url}: {e}")
                if attempt < self.max_retries - 1:
                    time.sleep(2 ** attempt)  # Exponential backoff
                else:
                    logger.error(f"All retry attempts failed for {url}")
                    
        return None
    
    def _extract_title(self, soup: BeautifulSoup) -> str:
        """Extract page title."""
        title_tag = soup.find('title')
        if title_tag and isinstance(title_tag, Tag):
            return title_tag.get_text().strip()
        return "No title found"
    
    def _extract_links(self, soup: BeautifulSoup, base_url: str) -> List[str]:
        """Extract all links from the page."""
        links = []
        for link in soup.find_all('a', href=True):
            if isinstance(link, Tag):
                href = link['href']
                absolute_url = urljoin(base_url, href)
                links.append(absolute_url)
        return links
    
    def _extract_text(self, soup: BeautifulSoup) -> str:
        """Extract visible text content from the page."""
        # Remove script and style elements
        for script in soup(["script", "style"]):
            script.decompose()
        
        # Get text and clean it up
        text = soup.get_text()
        lines = (line.strip() for line in text.splitlines())
        chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
        text = ' '.join(chunk for chunk in chunks if chunk)
        
        return text
    
    def _extract_meta_description(self, soup: BeautifulSoup) -> str:
        """Extract meta description."""
        meta_desc = soup.find('meta', attrs={'name': 'description'})
        if meta_desc and isinstance(meta_desc, Tag):
            return meta_desc.get('content', '')
        return ""
    
    def crawl_multiple_pages(self, urls: List[str]) -> List[CrawlResult]:
        """Crawl multiple pages and return results."""
        logger.info(f"Starting crawl of {len(urls)} pages")
        
        results = []
        for url in urls:
            result = self.crawl_page(url)
            if result:
                results.append(result)
        
        logger.info(f"Crawling completed. Successfully crawled {len(results)} pages")
        return results
    
    def save_results(self, filename: str = "crawl_results.json") -> bool:
        """Save crawl results to JSON file."""
        try:
            # Convert dataclass objects to dictionaries
            results_data = [
                {
                    'url': result.url,
                    'title': result.title,
                    'status_code': result.status_code,
                    'links_count': len(result.links),
                    'first_10_links': result.links[:10],  # Save only first 10 links
                    'text_preview': result.text_content,
                    'meta_description': result.meta_description,
                    'word_count': result.word_count,
                    'crawl_time': result.crawl_time
                }
                for result in self.results
            ]
            
            with open(filename, 'w', encoding='utf-8') as f:
                json.dump(results_data, f, indent=2, ensure_ascii=False)
            
            logger.info(f"Results saved to {filename}")
            return True
            
        except Exception as e:
            logger.error(f"Error saving results: {e}")
            return False
    
    def get_statistics(self) -> Dict[str, any]:
        """Get crawling statistics."""
        if not self.results:
            return {"error": "No crawling results available"}
        
        total_words = sum(result.word_count for result in self.results)
        avg_crawl_time = sum(result.crawl_time for result in self.results) / len(self.results)
        total_links = sum(len(result.links) for result in self.results)
        
        return {
            "pages_crawled": len(self.results),
            "total_words": total_words,
            "average_words_per_page": total_words // len(self.results),
            "total_links_found": total_links,
            "average_crawl_time": round(avg_crawl_time, 2),
            "successful_crawls": len(self.results),
            "domains_crawled": len(set(urlparse(result.url).netloc for result in self.results))
        }

if __name__ == "__main__":
    # Example usage
    crawler = WebCrawler(delay=1.0)
    
    # Example URLs (using HTTP examples that are safe to crawl)
    test_urls = [
        "http://httpbin.org/html",
        "http://httpbin.org/robots.txt",
    ]
    
    # Crawl the pages
    results = crawler.crawl_multiple_pages(test_urls)
    
    # Display results
    for result in results:
        print(f"\n{'='*60}")
        print(f"URL: {result.url}")
        print(f"Title: {result.title}")
        print(f"Status: {result.status_code}")
        print(f"Word Count: {result.word_count}")
        print(f"Links Found: {len(result.links)}")
        print(f"Crawl Time: {result.crawl_time:.2f}s")
        print(f"Text Preview: {result.text_content[:100]}...")
    
    # Show statistics
    stats = crawler.get_statistics()
    print(f"\n{'='*60}")
    print("CRAWLING STATISTICS")
    print(f"{'='*60}")
    for key, value in stats.items():
        print(f"{key.replace('_', ' ').title()}: {value}")
    
    # Save results
    crawler.save_results("web_crawl_results.json")

This web crawler demonstrates: * HTTP requests with proper error handling * HTML parsing with BeautifulSoup * Rate limiting to be respectful to websites * Data extraction and structuring * Robustness with retry logic

6.3. Using the Web Crawler

#!/usr/bin/env python3
"""
Example usage of the web crawler.
"""

from web_crawler import WebCrawler
import logging

# Configure logging to see crawler activity
logging.basicConfig(level=logging.INFO)

def main():
    """Demonstrate web crawler usage."""
    print("🕷️ Web Crawler Example")
    print("=" * 50)
    
    # Create crawler with 2-second delay between requests
    crawler = WebCrawler(delay=2.0, max_retries=2)
    
    # URLs to crawl (using safe test websites)
    urls_to_crawl = [
        "http://httpbin.org/html",
        "http://httpbin.org/robots.txt",
        "https://jsonplaceholder.typicode.com/",  # API service with HTML
    ]
    
    print(f"Crawling {len(urls_to_crawl)} URLs...")
    
    # Perform the crawl
    results = crawler.crawl_multiple_pages(urls_to_crawl)
    
    # Display detailed results
    print(f"\n✅ Successfully crawled {len(results)} pages\n")
    
    for i, result in enumerate(results, 1):
        print(f"📄 Page {i}: {result.url}")
        print(f"   Title: {result.title}")
        print(f"   Status: {result.status_code}")
        print(f"   Words: {result.word_count}")
        print(f"   Links: {len(result.links)}")
        print(f"   Time: {result.crawl_time:.2f}s")
        if result.meta_description:
            print(f"   Description: {result.meta_description}")
        print(f"   Preview: {result.text_content[:100]}...\n")
    
    # Show overall statistics
    stats = crawler.get_statistics()
    print("📊 Crawling Statistics:")
    print("-" * 30)
    for key, value in stats.items():
        print(f"{key.replace('_', ' ').title()}: {value}")
    
    # Save results to file
    saved = crawler.save_results("example_crawl_results.json")
    if saved:
        print("\n💾 Results saved to example_crawl_results.json")

if __name__ == "__main__":
    main()

6.4. Best Practices Demonstrated

The web crawler example showcases important Python development practices:

  • Type hints: Make code more maintainable and self-documenting

  • Error handling: Graceful failure handling with informative messages

  • Logging: Proper logging for debugging and monitoring

  • Modular design: Functions and classes with single responsibilities

  • Documentation: Clear docstrings and comments

  • External libraries: Leveraging the Python ecosystem

  • Resource management: Proper cleanup and context managers

6.5. Extending the Web Crawler

Consider these enhancements to deepen your learning:

  • Add support for different content types (PDF, images)

  • Implement concurrent crawling with asyncio

  • Add data storage to databases or files

  • Create a web interface with Flask or FastAPI == AsciiDoc Validation with LanguageTools

LanguageTools provides grammar and style checking for text documents. We’ll create a tool to check AsciiDoc files for writing issues.

6.6. Installation and Setup

Install the required library:

pip install language-tool-python

Note: This will download the LanguageTools server on first use.

6.7. AsciiDoc Validator Implementation

Create a file named asciidoc_validator.py with the following content:

#!/usr/bin/env python3
"""
AsciiDoc validator using LanguageTools to check grammar and style.
Demonstrates file processing and external API integration.
"""

import language_tool_python
from pathlib import Path
from typing import List, Dict, Optional, NamedTuple
import re
import logging
import json
from dataclasses import dataclass
import argparse

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

class ValidationIssue(NamedTuple):
    """Structure for validation issues."""
    line_number: int
    column: int
    message: str
    rule_id: str
    suggestions: List[str]
    context: str

@dataclass
class FileReport:
    """Report for a single file."""
    file_path: str
    total_lines: int
    issues_found: int
    issues: List[ValidationIssue]
    processing_time: float

class AsciiDocValidator:
    """Validator for AsciiDoc files using LanguageTools."""
    
    def __init__(self, language: str = 'en-US'):
        """
        Initialize the validator.
        
        Args:
            language: Language code for LanguageTools (default: en-US)
        """
        self.language = language
        self.tool: Optional[language_tool_python.LanguageTool] = None
        self.reports: List[FileReport] = []
        
        # Patterns to ignore in AsciiDoc files
        self.ignore_patterns = [
            r'include::', # Include directives
            r'image::', # Image directives  
            r'\[source,', # Source code blocks
            r'----', # Code block delimiters
            r'====', # Example block delimiters
            r'^\|', # Table rows
            r'^\+', # Table continuation
            r'^\*\*\*', # Section breaks
            r'^\=\=\=', # Headers
            r'^\#', # Comments
            r'^\[', # Attribute definitions
            r'^\:', # Attribute assignments
        ]
    
    def _initialize_language_tool(self):
        """Initialize LanguageTools (lazy loading)."""
        if self.tool is None:
            logger.info("Initializing LanguageTools... (this may take a moment)")
            try:
                self.tool = language_tool_python.LanguageTool(self.language)
                logger.info("LanguageTools initialized successfully")
            except Exception as e:
                logger.error(f"Failed to initialize LanguageTools: {e}")
                raise
    
    def _should_check_line(self, line: str) -> bool:
        """Determine if a line should be checked for language issues."""
        line_stripped = line.strip()
        
        # Skip empty lines
        if not line_stripped:
            return False
        
        # Check against ignore patterns
        for pattern in self.ignore_patterns:
            if re.match(pattern, line_stripped):
                return False
        
        return True
    
    def _extract_text_content(self, file_path: Path) -> List[str]:
        """Extract text content from AsciiDoc file."""
        try:
            with open(file_path, 'r', encoding='utf-8') as file:
                lines = file.readlines()
            
            # Filter lines that should be checked
            text_lines = []
            for line_num, line in enumerate(lines, 1):
                if self._should_check_line(line):
                    text_lines.append((line_num, line.strip()))
            
            return text_lines
            
        except Exception as e:
            logger.error(f"Error reading file {file_path}: {e}")
            return []
    
    def validate_file(self, file_path: Path) -> FileReport:
        """
        Validate a single AsciiDoc file.
        
        Args:
            file_path: Path to the AsciiDoc file
            
        Returns:
            FileReport with validation results
        """
        import time
        start_time = time.time()
        
        logger.info(f"Validating: {file_path}")
        
        # Initialize LanguageTools if needed
        self._initialize_language_tool()
        
        # Extract text content
        text_lines = self._extract_text_content(file_path)
        
        if not text_lines:
            processing_time = time.time() - start_time
            return FileReport(
                file_path=str(file_path),
                total_lines=0,
                issues_found=0,
                issues=[],
                processing_time=processing_time
            )
        
        # Check each line for issues
        all_issues = []
        
        for line_number, text in text_lines:
            try:
                matches = self.tool.check(text)
                
                for match in matches:
                    issue = ValidationIssue(
                        line_number=line_number,
                        column=match.offset,
                        message=match.message,
                        rule_id=match.ruleId,
                        suggestions=[s for s in match.replacements[:3]],  # First 3 suggestions
                        context=text[max(0, match.offset-10):match.offset+match.errorLength+10]
                    )
                    all_issues.append(issue)
                    
            except Exception as e:
                logger.warning(f"Error checking line {line_number}: {e}")
        
        processing_time = time.time() - start_time
        
        report = FileReport(
            file_path=str(file_path),
            total_lines=len(text_lines),
            issues_found=len(all_issues),
            issues=all_issues,
            processing_time=processing_time
        )
        
        self.reports.append(report)
        return report
    
    def validate_directory(self, directory: Path, pattern: str = "*.adoc") -> List[FileReport]:
        """
        Validate all AsciiDoc files in a directory.
        
        Args:
            directory: Directory to scan
            pattern: File pattern to match (default: *.adoc)
            
        Returns:
            List of FileReport objects
        """
        logger.info(f"Scanning directory: {directory}")
        
        if not directory.exists():
            logger.error(f"Directory not found: {directory}")
            return []
        
        # Find all matching files
        adoc_files = list(directory.glob(pattern))
        if not adoc_files:
            logger.warning(f"No {pattern} files found in {directory}")
            return []
        
        logger.info(f"Found {len(adoc_files)} files to validate")
        
        # Validate each file
        reports = []
        for file_path in adoc_files:
            report = self.validate_file(file_path)
            reports.append(report)
        
        return reports
    
    def generate_report(self, output_file: Optional[str] = None) -> Dict:
        """Generate summary report of all validations."""
        if not self.reports:
            return {"error": "No validation reports available"}
        
        total_files = len(self.reports)
        total_issues = sum(report.issues_found for report in self.reports)
        files_with_issues = sum(1 for report in self.reports if report.issues_found > 0)
        
        # Group issues by rule ID
        issue_types = {}
        for report in self.reports:
            for issue in report.issues:
                rule_id = issue.rule_id
                if rule_id not in issue_types:
                    issue_types[rule_id] = {"count": 0, "message": issue.message}
                issue_types[rule_id]["count"] += 1
        
        # Create summary report
        summary = {
            "validation_summary": {
                "total_files": total_files,
                "files_with_issues": files_with_issues,
                "total_issues": total_issues,
                "average_issues_per_file": round(total_issues / total_files, 2),
            },
            "issue_breakdown": issue_types,
            "file_reports": [
                {
                    "file": report.file_path,
                    "lines_checked": report.total_lines,
                    "issues": report.issues_found,
                    "processing_time": round(report.processing_time, 2)
                }
                for report in self.reports
            ]
        }
        
        # Save to file if requested
        if output_file:
            try:
                with open(output_file, 'w', encoding='utf-8') as f:
                    json.dump(summary, f, indent=2, ensure_ascii=False)
                logger.info(f"Report saved to {output_file}")
            except Exception as e:
                logger.error(f"Error saving report: {e}")
        
        return summary
    
    def print_detailed_report(self):
        """Print detailed validation report to console."""
        if not self.reports:
            print("No validation reports available.")
            return
        
        print("\n" + "="*60)
        print("ASCIIDOC VALIDATION REPORT")
        print("="*60)
        
        total_issues = sum(report.issues_found for report in self.reports)
        files_with_issues = [r for r in self.reports if r.issues_found > 0]
        
        print(f"Files scanned: {len(self.reports)}")
        print(f"Files with issues: {len(files_with_issues)}")
        print(f"Total issues found: {total_issues}")
        print("\n" + "-"*60)
        
        # Show issues by file
        for report in self.reports:
            if report.issues_found > 0:
                print(f"\n📄 {Path(report.file_path).name}")
                print(f"   Issues: {report.issues_found}")
                
                for issue in report.issues[:5]:  # Show first 5 issues
                    print(f"   Line {issue.line_number}: {issue.message}")
                    if issue.suggestions:
                        suggestions = ", ".join(issue.suggestions)
                        print(f"   Suggestions: {suggestions}")
                
                if len(report.issues) > 5:
                    print(f"   ... and {len(report.issues) - 5} more issues")

def main():
    """Command-line interface for the validator."""
    parser = argparse.ArgumentParser(description="Validate AsciiDoc files using LanguageTools")
    parser.add_argument("path", help="File or directory path to validate")
    parser.add_argument("--language", default="en-US", help="Language code (default: en-US)")
    parser.add_argument("--output", help="Output file for JSON report")
    parser.add_argument("--pattern", default="*.adoc", help="File pattern for directory scanning")
    
    args = parser.parse_args()
    
    validator = AsciiDocValidator(language=args.language)
    path = Path(args.path)
    
    if path.is_file():
        # Validate single file
        validator.validate_file(path)
    elif path.is_dir():
        # Validate directory
        validator.validate_directory(path, args.pattern)
    else:
        print(f"Error: Path {path} does not exist")
        return
    
    # Generate and display report
    validator.print_detailed_report()
    
    if args.output:
        validator.generate_report(args.output)

if __name__ == "__main__":
    main()

This validator demonstrates:

  • Working with file system paths

  • Text processing and filtering

  • Integration with external tools

  • Report generation

  • Command-line interface design

6.8. Using the AsciiDoc Validator

Create the following file named Test.adoc.

include::res/practical/Test.adoc

Run this program in a folder which contains Asciidoc (*.adoc) files to validate them.

python asciidoc_validator.py ~/git/content/TestContent

6.9. Ignoring Specific Words

When working with technical documentation, you often have specialized terms, product names, or abbreviations that should be ignored by the spell checker. You can create an external file to maintain a list of words to exclude from spell checking.

6.9.1. Creating an Ignore List

Create a file named ignored_words.txt with one word per line:

# Technical terms
JFace
SWT
OSGi
Maven
Tycho
IDE
APIs
AsciiDoc
AsciiDoctor

# Company and product names
vogella
Eclipse
IntelliJ
VSCode

# Programming terms
foreach
classpath
runtime
workflow

6.9.2. Updated Validator Implementation

Here’s how to modify the validator to use the ignore list:

def load_ignored_words(file_path: str) -> set[str]:
    """Load words to ignore from a file."""
    try:
        with open(file_path, 'r') as f:
            # Read lines, strip whitespace, and filter out comments and empty lines
            return {line.strip() for line in f
                   if line.strip() and not line.strip().startswith('#')}
    except FileNotFoundError:
        print(f"Warning: Ignore file {file_path} not found. No words will be ignored.")
        return set()

def is_valid_word(word: str, ignored_words: set[str]) -> bool:
    """Check if a word should be validated."""
    return word.lower() not in {w.lower() for w in ignored_words}

# In your main validation function:
ignored_words = load_ignored_words('ignored_words.txt')
# When checking words, add:
if not is_valid_word(word, ignored_words):
    continue  # Skip this word

6.9.3. Usage with Ignored Words

Run the validator with the ignore list:

# The ignored_words.txt file will be loaded automatically
python asciidoc_validator.py ~/git/content/TestContent

The validator will now skip any words found in the ignore list. This is particularly useful for:

  • Technical terms (e.g., JFace, OSGi)

  • Product names (e.g., Eclipse, IntelliJ)

  • Programming terminology

  • Company names and trademarks

Keep your ignored_words.txt under version control to share it with your team and maintain consistency across your documentation.

6.10. Best Practices Demonstrated

The validator example showcases important Python development practices:

  • Type hints: Make code more maintainable and self-documenting

  • Error handling: Graceful failure handling with informative messages

  • Logging: Proper logging for debugging and monitoring

  • Modular design: Functions and classes with single responsibilities

  • Documentation: Clear docstrings and comments

  • External libraries: Leveraging the Python ecosystem

  • Resource management: Proper cleanup and context managers

6.11. Extending the Validator

Consider these enhancements to deepen your learning:

  • Support for multiple document formats

  • Integration with CI/CD pipelines

  • Custom rule definitions

  • Batch processing of multiple directories

  • HTML report generation

7. Links and Literature