Learn Python programming with practical examples including web crawling and AsciiDoc validation. This comprehensive tutorial covers modern Python development using Python 3.12+ features. You’ll build real-world applications including a web crawler and a document validation tool.
1. Overview
1.1. What is Python
Python is a high-level, interpreted programming language renowned for its simplicity, readability, and powerful capabilities. Created by Guido van Rossum and first released in 1991, Python has become one of the world’s most popular programming languages.
The name Python comes from the British comedy group Monty Python’s Flying Circus, reflecting the language’s emphasis on fun and accessibility. Python code is executed by an interpreter that converts source code into bytecode, which is then executed by the Python virtual machine.
1.1.1. Why Python is Popular
Python’s popularity stems from several key strengths:
-
Readable syntax: Code looks almost like natural English
-
Versatile applications: Web development, data science, automation, AI/ML, and more
-
Rich ecosystem: Extensive standard library and third-party packages via PyPI
-
Cross-platform: Runs on Windows, macOS, Linux, and many other platforms
-
Strong community: Excellent documentation, tutorials, and community support
-
Rapid development: Faster to write and maintain than many other languages
1.2. Modern Python Features (3.12+)
Python continues to evolve with powerful new features:
-
F-strings: Modern string formatting with embedded expressions
-
Type hints: Optional static typing for better code documentation
-
Async/await: Built-in support for asynchronous programming
-
Pattern matching: Structural pattern matching (match/case statements)
-
Performance improvements: Faster startup times and execution
-
Better error messages: More helpful debugging information
1.3. Real-World Applications
In this tutorial, you’ll build practical applications that demonstrate Python’s capabilities:
-
Web Crawler: Extract and process data from websites using requests and BeautifulSoup
-
Document Validator: Check AsciiDoc files for formatting issues using LanguageTools API
-
Data Processing: Handle files, APIs, and structured data with modern Python techniques
1.4. About this tutorial
This tutorial provides a hands-on approach to learning Python through practical examples. You’ll start with Python fundamentals, then progress to building real applications. By the end, you’ll have the skills to create your own Python projects and understand modern development practices.
2. Setting Up Your Python Development Environment
2.1. Installing Python
Python 3.12+ is recommended for modern development. Most systems come with Python pre-installed, but you should ensure you have the latest version.
2.1.1. Windows
Download Python from https://www.python.org/downloads/ and run the installer. Important: Check "Add Python to PATH" during installation.
# Verify installation
python --version
# or
python3 --version
2.1.2. macOS
Use Homebrew (recommended) or download from python.org:
# Install Homebrew first, then:
brew install python
# Verify
python3 --version
2.1.3. Linux (Ubuntu/Debian)
sudo apt update
sudo apt install python3 python3-pip python3-venv
# Verify
python3 --version
2.2. Virtual Environments
Virtual environments isolate your project dependencies. Always use virtual environments for Python projects.
# Create a virtual environment
python3 -m venv myproject_env
# Activate it
# On Windows:
myproject_env\Scripts\activate
# On macOS/Linux:
source myproject_env/bin/activate
# Install packages
pip install requests beautifulsoup4 language-tool-python
# Deactivate when done
deactivate
2.3. Recommended Development Tools
While you can use any text editor, these tools enhance productivity:
-
Visual Studio Code: Free, powerful, with excellent Python extension
-
PyCharm: Full-featured IDE (Community edition is free)
-
Jupyter Notebooks: Great for data analysis and learning
-
Terminal/Command Line: Essential for running Python scripts
2.3.1. Visual Studio Code Setup
-
Download from https://code.visualstudio.com/
-
Install the Python extension by Microsoft
-
Open a Python file - VS Code will help you select the right interpreter
2.4. Package Management with pip
pip is Python’s package installer. Use requirements.txt files to manage dependencies:
# Install a package
pip install requests
# Install from requirements file
pip install -r requirements.txt
# List installed packages
pip list
# Generate requirements file
pip freeze > requirements.txt
3. Your First Python Program
Let’s start with a simple but practical Python program that demonstrates modern syntax and best practices.
3.1. Hello, World! - Modern Style
Create a new file called hello_world.py
:
#!/usr/bin/env python3
"""
A simple Hello World program demonstrating modern Python syntax.
"""
def greet(name: str) -> str:
"""Return a personalized greeting."""
return f"Hello, {name}! Welcome to Python programming."
def main():
"""Main program entry point."""
# Get user input
user_name = input("What's your name? ")
# Create and display greeting
greeting = greet(user_name)
print(greeting)
# Show some Python features
languages = ["Python", "Java", "JavaScript", "Go"]
print(f"\nHere are some popular programming languages:")
for i, lang in enumerate(languages, 1):
print(f"{i}. {lang}")
if __name__ == "__main__":
main()
Run it from your terminal:
python3 hello_world.py
3.2. A More Practical Example
Let’s create a simple file analyzer that demonstrates Python’s strengths:
#!/usr/bin/env python3
"""
File analyzer that demonstrates modern Python features.
"""
import os
from pathlib import Path
from typing import Dict, List
def analyze_file(file_path: Path) -> Dict[str, any]:
"""Analyze a text file and return statistics."""
try:
with open(file_path, 'r', encoding='utf-8') as file:
content = file.read()
lines = content.split('\n')
return {
'filename': file_path.name,
'size_bytes': file_path.stat().st_size,
'line_count': len(lines),
'word_count': len(content.split()),
'char_count': len(content),
'extension': file_path.suffix
}
except FileNotFoundError:
return {'error': f'File not found: {file_path}'}
except Exception as e:
return {'error': f'Error reading file: {e}'}
def analyze_directory(directory: str) -> List[Dict[str, any]]:
"""Analyze all text files in a directory."""
path = Path(directory)
results = []
if not path.exists():
return [{'error': f'Directory not found: {directory}'}]
# Find text files
text_extensions = {'.txt', '.py', '.md', '.adoc', '.rst'}
for file_path in path.iterdir():
if file_path.is_file() and file_path.suffix in text_extensions:
results.append(analyze_file(file_path))
return results
def main():
"""Main program demonstrating file analysis."""
print("🔍 File Analyzer - Modern Python Example")
print("=" * 40)
# Analyze current directory
directory = "."
results = analyze_directory(directory)
if not results:
print("No text files found in current directory.")
return
# Display results
total_files = len(results)
total_lines = sum(r.get('line_count', 0) for r in results if 'error' not in r)
print(f"\nFound {total_files} text files:")
print(f"Total lines: {total_lines:,}")
print("\nFile details:")
print("-" * 60)
for result in results:
if 'error' in result:
print(f"❌ {result['error']}")
else:
print(f"📄 {result['filename']:20} | "
f"{result['line_count']:4} lines | "
f"{result['size_bytes']:6} bytes")
if __name__ == "__main__":
main()
This example shows:
-
Modern f-string formatting
-
Type hints for better code documentation
-
Exception handling
-
Working with files and paths
-
Using Python’s standard library
3.3. Interactive Python Development
Python includes an interactive interpreter perfect for experimentation:
# Start interactive Python
python3
# Try some expressions
>>> name = "Python"
>>> print(f"Hello, {name}!")
Hello, Python!
>>> numbers = [1, 2, 3, 4, 5]
>>> sum(numbers)
15
>>> exit()
3.4. Organizing Your First Project
For real projects, use this structure:
my_python_project/
├── requirements.txt # Project dependencies
├── README.md # Project documentation
├── main.py # Entry point
├── src/ # Source code
│ ├── __init__.py
│ └── my_module.py
└── tests/ # Test files
├── __init__.py
└── test_my_module.py
4. Python Programming Fundamentals
4.1. Python Syntax Overview
Python 3.12+ includes many features that make code more readable and maintainable. Let’s explore the most important concepts.
4.2. Variables and Type Hints
Python is dynamically typed, but you can add type hints for better code documentation:
#!/usr/bin/env python3
"""
Modern Python variables and type hints examples.
"""
from typing import List, Dict, Optional
# Basic variables with type hints
name: str = "Alice"
age: int = 30
height: float = 5.6
is_student: bool = True
# Collections with type hints
numbers: List[int] = [1, 2, 3, 4, 5]
scores: Dict[str, int] = {"math": 95, "science": 88, "history": 92}
middle_name: Optional[str] = None # Can be None or string
# Dynamic typing still works
dynamic_var = "starts as string"
dynamic_var = 42 # now it's an integer
dynamic_var = ["now", "it's", "a", "list"]
# Multiple assignment
x, y, z = 1, 2, 3
first, *rest = [1, 2, 3, 4, 5] # first=1, rest=[2,3,4,5]
# Constants (by convention, use UPPER_CASE)
MAX_CONNECTIONS: int = 100
API_URL: str = "https://api.example.com"
print(f"Hello, {name}! You are {age} years old.")
print(f"Your scores: {scores}")
print(f"First number: {first}, rest: {rest}")
4.3. String Operations
Python offers powerful string manipulation with f-strings being the preferred approach:
#!/usr/bin/env python3
"""
Modern Python string operations with f-strings and advanced techniques.
"""
# F-string formatting (preferred in Python 3.6+)
name = "Python"
version = 3.12
print(f"Welcome to {name} {version}!")
# Multi-line f-strings
user = {"name": "Alice", "age": 30, "city": "New York"}
message = f"""
Hello {user['name']}!
You are {user['age']} years old
and live in {user['city']}.
"""
print(message)
# F-strings with expressions and formatting
numbers = [1, 2, 3, 4, 5]
print(f"Sum of {numbers} = {sum(numbers)}")
print(f"Pi to 3 decimal places: {3.14159:.3f}")
# String methods and operations
text = " Hello, World! "
print(f"Original: '{text}'")
print(f"Stripped: '{text.strip()}'")
print(f"Uppercase: '{text.upper()}'")
print(f"Lowercase: '{text.lower()}'")
print(f"Title Case: '{text.title()}'")
# String slicing and indexing
sentence = "Python programming is fun"
print(f"First word: {sentence[:6]}")
print(f"Last word: {sentence.split()[-1]}")
print(f"Every 2nd character: {sentence[::2]}")
# String checking methods
email = "user@example.com"
print(f"Contains @: {'@' in email}")
print(f"Starts with 'user': {email.startswith('user')}")
print(f"Ends with '.com': {email.endswith('.com')}")
# Joining and splitting
words = ["Python", "is", "awesome"]
joined = " ".join(words)
print(f"Joined: {joined}")
print(f"Split back: {joined.split()}")
# Raw strings for regex patterns
import re
pattern = r"\d{3}-\d{3}-\d{4}" # Phone number pattern
phone = "123-456-7890"
print(f"Phone match: {bool(re.match(pattern, phone))}")
4.4. Working with Collections
Python provides rich data structures for organizing and manipulating data:
#!/usr/bin/env python3
"""
Working with Python collections: lists, dictionaries, sets, and tuples.
"""
from collections import defaultdict, Counter
from typing import List, Dict, Set, Tuple
# Lists - ordered, mutable collections
fruits: List[str] = ["apple", "banana", "cherry", "date"]
print(f"Fruits: {fruits}")
# List comprehensions (Pythonic way to create lists)
squares = [x**2 for x in range(1, 6)]
even_numbers = [x for x in range(20) if x % 2 == 0]
print(f"Squares: {squares}")
print(f"Even numbers: {even_numbers}")
# List operations
fruits.append("elderberry")
fruits.extend(["fig", "grape"])
print(f"After additions: {fruits}")
# Dictionaries - key-value pairs
person: Dict[str, any] = {
"name": "Alice",
"age": 30,
"skills": ["Python", "Java", "JavaScript"],
"is_employed": True
}
# Dictionary comprehension
word_lengths = {word: len(word) for word in fruits}
print(f"Word lengths: {word_lengths}")
# Safe dictionary access
age = person.get("age", 0) # Returns 0 if "age" not found
print(f"Age: {age}")
# Sets - unique, unordered collections
unique_numbers: Set[int] = {1, 2, 3, 3, 4, 4, 5}
print(f"Unique numbers: {unique_numbers}")
# Set operations
set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}
print(f"Union: {set1 | set2}")
print(f"Intersection: {set1 & set2}")
print(f"Difference: {set1 - set2}")
# Tuples - immutable sequences
coordinates: Tuple[float, float] = (10.5, 20.3)
rgb_color: Tuple[int, int, int] = (255, 128, 0)
print(f"Coordinates: {coordinates}")
# Named tuples for better structure
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
p = Point(10, 20)
print(f"Point: x={p.x}, y={p.y}")
# Advanced collections
# defaultdict - provides default values
word_count = defaultdict(int)
text = "hello world hello python world"
for word in text.split():
word_count[word] += 1
print(f"Word count: {dict(word_count)}")
# Counter - counts occurrences
counter = Counter(text.split())
print(f"Most common word: {counter.most_common(1)}")
# Unpacking and packing
numbers = [1, 2, 3, 4, 5]
first, second, *rest = numbers
print(f"First: {first}, Second: {second}, Rest: {rest}")
# Zip for parallel iteration
names = ["Alice", "Bob", "Charlie"]
ages = [25, 30, 35]
for name, age in zip(names, ages):
print(f"{name} is {age} years old")
4.5. Functions with Modern Features
Functions in Python support default arguments, type hints, and advanced features:
#!/usr/bin/env python3
"""
Modern Python functions with type hints and advanced features.
"""
from typing import List, Optional, Callable, Any
from functools import wraps
# Basic function with type hints
def greet(name: str, age: int = 25) -> str:
"""Return a personalized greeting."""
return f"Hello, {name}! You are {age} years old."
# Function with optional parameters
def create_user(name: str, email: str, age: Optional[int] = None) -> dict:
"""Create a user dictionary with optional age."""
user = {"name": name, "email": email}
if age is not None:
user["age"] = age
return user
# Function with variable arguments
def calculate_average(*numbers: float) -> float:
"""Calculate average of any number of values."""
if not numbers:
return 0.0
return sum(numbers) / len(numbers)
# Function with keyword arguments
def create_config(**kwargs: Any) -> dict:
"""Create configuration dictionary from keyword arguments."""
defaults = {"debug": False, "port": 8080}
defaults.update(kwargs)
return defaults
# Lambda functions (anonymous functions)
square = lambda x: x**2
add = lambda x, y: x + y
# Higher-order functions
def apply_operation(numbers: List[int], operation: Callable[[int], int]) -> List[int]:
"""Apply an operation to each number in the list."""
return [operation(n) for n in numbers]
# Decorator function
def timer(func):
"""Decorator to time function execution."""
@wraps(func)
def wrapper(*args, **kwargs):
import time
start = time.time()
result = func(*args, **kwargs)
end = time.time()
print(f"{func.__name__} took {end - start:.4f} seconds")
return result
return wrapper
@timer
def slow_function():
"""A function that takes some time."""
import time
time.sleep(0.1)
return "Done!"
# Generator function
def fibonacci(n: int):
"""Generate fibonacci numbers up to n."""
a, b = 0, 1
count = 0
while count < n:
yield a
a, b = b, a + b
count += 1
# Example usage
if __name__ == "__main__":
# Basic functions
print(greet("Alice"))
print(greet("Bob", 30))
# User creation
user1 = create_user("Alice", "alice@example.com")
user2 = create_user("Bob", "bob@example.com", 30)
print(f"User 1: {user1}")
print(f"User 2: {user2}")
# Variable arguments
avg = calculate_average(10, 20, 30, 40, 50)
print(f"Average: {avg}")
# Keyword arguments
config = create_config(debug=True, host="localhost", port=3000)
print(f"Config: {config}")
# Lambda and higher-order functions
numbers = [1, 2, 3, 4, 5]
squared = apply_operation(numbers, square)
print(f"Squared: {squared}")
# Decorator
slow_function()
# Generator
fib_numbers = list(fibonacci(10))
print(f"Fibonacci: {fib_numbers}")
4.6. Modern Class Design
Object-oriented programming in Python with modern best practices:
#!/usr/bin/env python3
"""
Modern Python classes with type hints, dataclasses, and properties.
"""
from dataclasses import dataclass
from typing import List, Optional, ClassVar
from abc import ABC, abstractmethod
# Modern class with type hints and properties
class Person:
"""A person with name, age, and email."""
# Class variable
species: ClassVar[str] = "Homo sapiens"
def __init__(self, name: str, age: int, email: str) -> None:
self._name = name
self._age = age
self._email = email
self._friends: List[str] = []
@property
def name(self) -> str:
"""Get the person's name."""
return self._name
@property
def age(self) -> int:
"""Get the person's age."""
return self._age
@age.setter
def age(self, value: int) -> None:
"""Set the person's age with validation."""
if value < 0:
raise ValueError("Age cannot be negative")
self._age = value
@property
def email(self) -> str:
"""Get the person's email."""
return self._email
def add_friend(self, friend_name: str) -> None:
"""Add a friend to the person's friend list."""
if friend_name not in self._friends:
self._friends.append(friend_name)
def get_friends(self) -> List[str]:
"""Get a copy of the friend list."""
return self._friends.copy()
def __str__(self) -> str:
return f"Person(name='{self.name}', age={self.age}, email='{self.email}')"
def __repr__(self) -> str:
return self.__str__()
# Dataclass - automatically generates __init__, __str__, __eq__, etc.
@dataclass
class Product:
"""A product with name, price, and quantity."""
name: str
price: float
quantity: int = 0
category: Optional[str] = None
def total_value(self) -> float:
"""Calculate total value of this product."""
return self.price * self.quantity
def __post_init__(self):
"""Validate data after initialization."""
if self.price < 0:
raise ValueError("Price cannot be negative")
# Abstract base class
class Animal(ABC):
"""Abstract animal class."""
def __init__(self, name: str, species: str):
self.name = name
self.species = species
@abstractmethod
def make_sound(self) -> str:
"""Make a sound - must be implemented by subclasses."""
pass
def sleep(self) -> str:
"""All animals can sleep."""
return f"{self.name} is sleeping..."
# Concrete implementation
class Dog(Animal):
"""A dog that inherits from Animal."""
def __init__(self, name: str, breed: str):
super().__init__(name, "Canis lupus")
self.breed = breed
def make_sound(self) -> str:
return f"{self.name} says Woof!"
def fetch(self, item: str) -> str:
return f"{self.name} fetches the {item}!"
# Class with static and class methods
class MathUtils:
"""Utility class for mathematical operations."""
PI: ClassVar[float] = 3.14159
@staticmethod
def add(a: float, b: float) -> float:
"""Add two numbers."""
return a + b
@classmethod
def circle_area(cls, radius: float) -> float:
"""Calculate circle area using class constant."""
return cls.PI * radius * radius
# Example usage
if __name__ == "__main__":
# Regular class
person = Person("Alice", 30, "alice@example.com")
person.add_friend("Bob")
person.add_friend("Charlie")
print(person)
print(f"Friends: {person.get_friends()}")
# Dataclass
product = Product("Laptop", 999.99, 5, "Electronics")
print(f"Product: {product}")
print(f"Total value: ${product.total_value():.2f}")
# Inheritance and polymorphism
dog = Dog("Buddy", "Golden Retriever")
print(dog.make_sound())
print(dog.fetch("ball"))
print(dog.sleep())
# Static and class methods
result = MathUtils.add(5, 3)
area = MathUtils.circle_area(10)
print(f"5 + 3 = {result}")
print(f"Circle area (r=10): {area:.2f}")
4.7. Error Handling and Exceptions
Robust error handling is essential for reliable applications:
#!/usr/bin/env python3
"""
Modern error handling and exception management in Python.
"""
import logging
from typing import Optional, List, Dict
from pathlib import Path
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Custom exceptions
class ValidationError(Exception):
"""Raised when data validation fails."""
pass
class NetworkError(Exception):
"""Raised when network operations fail."""
def __init__(self, message: str, status_code: Optional[int] = None):
super().__init__(message)
self.status_code = status_code
# Basic exception handling
def safe_divide(a: float, b: float) -> Optional[float]:
"""Safely divide two numbers."""
try:
result = a / b
return result
except ZeroDivisionError:
logger.error("Cannot divide by zero")
return None
except TypeError as e:
logger.error(f"Type error: {e}")
return None
# Multiple exception handling
def process_user_input(user_input: str) -> Optional[int]:
"""Process user input and return integer."""
try:
# Try to convert to integer
number = int(user_input)
# Validate range
if number < 0:
raise ValidationError("Number must be positive")
return number
except ValueError:
logger.error(f"'{user_input}' is not a valid number")
return None
except ValidationError as e:
logger.error(f"Validation error: {e}")
return None
# File operations with exception handling
def read_config_file(filename: str) -> Dict[str, any]:
"""Read configuration from a file with proper error handling."""
config = {}
file_path = Path(filename)
try:
# Check if file exists
if not file_path.exists():
raise FileNotFoundError(f"Config file {filename} not found")
# Read and parse file
with open(file_path, 'r', encoding='utf-8') as file:
for line_num, line in enumerate(file, 1):
line = line.strip()
if line and not line.startswith('#'):
try:
key, value = line.split('=', 1)
config[key.strip()] = value.strip()
except ValueError:
logger.warning(f"Invalid line {line_num}: {line}")
return config
except FileNotFoundError as e:
logger.error(f"File error: {e}")
return {"error": "file_not_found"}
except PermissionError:
logger.error(f"Permission denied reading {filename}")
return {"error": "permission_denied"}
except Exception as e:
logger.error(f"Unexpected error reading {filename}: {e}")
return {"error": "unexpected_error"}
# Context manager for resource handling
class DatabaseConnection:
"""Mock database connection with proper cleanup."""
def __init__(self, connection_string: str):
self.connection_string = connection_string
self.connected = False
def __enter__(self):
"""Enter context - establish connection."""
logger.info("Connecting to database...")
self.connected = True
return self
def __exit__(self, exc_type, exc_val, exc_tb):
"""Exit context - clean up connection."""
if self.connected:
logger.info("Closing database connection...")
self.connected = False
# Handle exceptions that occurred in the context
if exc_type is not None:
logger.error(f"Exception in context: {exc_type.__name__}: {exc_val}")
# Return False to propagate exceptions
return False
def query(self, sql: str) -> List[Dict]:
"""Execute a database query."""
if not self.connected:
raise ConnectionError("Not connected to database")
# Simulate database operation
logger.info(f"Executing query: {sql}")
return [{"id": 1, "name": "example"}]
# Finally block example
def process_data_with_cleanup(data_file: str) -> bool:
"""Process data file with guaranteed cleanup."""
temp_file = None
try:
# Open temporary file
temp_file = open("temp_processing.txt", "w")
# Process data (might raise exceptions)
with open(data_file, "r") as file:
data = file.read()
temp_file.write(data.upper())
logger.info("Data processed successfully")
return True
except FileNotFoundError:
logger.error(f"Data file {data_file} not found")
return False
except Exception as e:
logger.error(f"Error processing data: {e}")
return False
finally:
# This always runs, even if exception occurred
if temp_file and not temp_file.closed:
temp_file.close()
logger.info("Temporary file closed")
# Example usage
if __name__ == "__main__":
# Safe division
print(f"10 / 2 = {safe_divide(10, 2)}")
print(f"10 / 0 = {safe_divide(10, 0)}")
# User input processing
test_inputs = ["42", "-5", "not_a_number", "100"]
for inp in test_inputs:
result = process_user_input(inp)
print(f"Input '{inp}' -> {result}")
# Context manager usage
try:
with DatabaseConnection("sqlite://memory") as db:
results = db.query("SELECT * FROM users")
print(f"Query results: {results}")
except Exception as e:
print(f"Database operation failed: {e}")
# Configuration file reading
config = read_config_file("nonexistent_config.txt")
print(f"Config: {config}")
# Processing with cleanup
success = process_data_with_cleanup("nonexistent_data.txt")
print(f"Processing successful: {success}")
4.8. File Operations and Context Managers
Working with files safely using context managers:
#!/usr/bin/env python3
"""
Modern file operations using context managers and pathlib.
"""
from pathlib import Path
from typing import List, Dict, Optional
import json
import csv
import logging
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Reading files with context managers
def read_text_file(filename: str) -> Optional[str]:
"""Read a text file safely using context manager."""
try:
file_path = Path(filename)
with open(file_path, 'r', encoding='utf-8') as file:
content = file.read()
logger.info(f"Successfully read {file_path.name} ({len(content)} characters)")
return content
except FileNotFoundError:
logger.error(f"File {filename} not found")
return None
except Exception as e:
logger.error(f"Error reading {filename}: {e}")
return None
# Writing files with automatic cleanup
def write_text_file(filename: str, content: str) -> bool:
"""Write content to a text file."""
try:
file_path = Path(filename)
# Create parent directories if they don't exist
file_path.parent.mkdir(parents=True, exist_ok=True)
with open(file_path, 'w', encoding='utf-8') as file:
file.write(content)
logger.info(f"Successfully wrote to {file_path.name}")
return True
except Exception as e:
logger.error(f"Error writing to {filename}: {e}")
return False
# Working with JSON files
def read_json_file(filename: str) -> Optional[Dict]:
"""Read and parse JSON file."""
try:
file_path = Path(filename)
with open(file_path, 'r', encoding='utf-8') as file:
data = json.load(file)
logger.info(f"Successfully loaded JSON from {file_path.name}")
return data
except json.JSONDecodeError as e:
logger.error(f"Invalid JSON in {filename}: {e}")
return None
except FileNotFoundError:
logger.error(f"JSON file {filename} not found")
return None
def write_json_file(filename: str, data: Dict) -> bool:
"""Write data to JSON file with proper formatting."""
try:
file_path = Path(filename)
file_path.parent.mkdir(parents=True, exist_ok=True)
with open(file_path, 'w', encoding='utf-8') as file:
json.dump(data, file, indent=2, ensure_ascii=False)
logger.info(f"Successfully wrote JSON to {file_path.name}")
return True
except Exception as e:
logger.error(f"Error writing JSON to {filename}: {e}")
return False
# Working with CSV files
def read_csv_file(filename: str) -> Optional[List[Dict]]:
"""Read CSV file and return list of dictionaries."""
try:
file_path = Path(filename)
data = []
with open(file_path, 'r', encoding='utf-8', newline='') as file:
reader = csv.DictReader(file)
for row in reader:
data.append(row)
logger.info(f"Successfully read {len(data)} rows from {file_path.name}")
return data
except Exception as e:
logger.error(f"Error reading CSV {filename}: {e}")
return None
def write_csv_file(filename: str, data: List[Dict], fieldnames: List[str]) -> bool:
"""Write data to CSV file."""
try:
file_path = Path(filename)
file_path.parent.mkdir(parents=True, exist_ok=True)
with open(file_path, 'w', encoding='utf-8', newline='') as file:
writer = csv.DictWriter(file, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(data)
logger.info(f"Successfully wrote {len(data)} rows to {file_path.name}")
return True
except Exception as e:
logger.error(f"Error writing CSV to {filename}: {e}")
return False
# Working with paths using pathlib
def analyze_directory(directory: str) -> Dict[str, any]:
"""Analyze directory contents using pathlib."""
try:
dir_path = Path(directory)
if not dir_path.exists():
return {"error": f"Directory {directory} does not exist"}
if not dir_path.is_dir():
return {"error": f"{directory} is not a directory"}
files = []
total_size = 0
for file_path in dir_path.iterdir():
if file_path.is_file():
size = file_path.stat().st_size
files.append({
"name": file_path.name,
"size": size,
"extension": file_path.suffix,
"modified": file_path.stat().st_mtime
})
total_size += size
return {
"directory": str(dir_path),
"file_count": len(files),
"total_size": total_size,
"files": files
}
except Exception as e:
return {"error": f"Error analyzing directory: {e}"}
# Processing lines from large files
def process_large_file(filename: str, line_processor=None) -> int:
"""Process a large file line by line to save memory."""
if line_processor is None:
line_processor = lambda line, num: print(f"Line {num}: {line.strip()}")
try:
file_path = Path(filename)
line_count = 0
with open(file_path, 'r', encoding='utf-8') as file:
for line_num, line in enumerate(file, 1):
line_processor(line, line_num)
line_count += 1
logger.info(f"Processed {line_count} lines from {file_path.name}")
return line_count
except Exception as e:
logger.error(f"Error processing file {filename}: {e}")
return 0
# Example usage and demonstrations
def create_sample_files():
"""Create sample files for demonstration."""
# Sample text file
text_content = """This is a sample text file.
It contains multiple lines.
Each line demonstrates file handling capabilities."""
write_text_file("sample_data/sample.txt", text_content)
# Sample JSON file
json_data = {
"name": "Python Tutorial",
"version": "3.12",
"features": ["modern syntax", "type hints", "async support"],
"author": {"name": "Alice", "email": "alice@example.com"}
}
write_json_file("sample_data/config.json", json_data)
# Sample CSV file
csv_data = [
{"name": "Alice", "age": "30", "city": "New York"},
{"name": "Bob", "age": "25", "city": "San Francisco"},
{"name": "Charlie", "age": "35", "city": "Chicago"}
]
write_csv_file("sample_data/users.csv", csv_data, ["name", "age", "city"])
if __name__ == "__main__":
# Create sample files
create_sample_files()
# Read and display files
text = read_text_file("sample_data/sample.txt")
if text:
print("Text file content:")
print(text)
json_data = read_json_file("sample_data/config.json")
if json_data:
print(f"JSON data: {json_data}")
csv_data = read_csv_file("sample_data/users.csv")
if csv_data:
print(f"CSV data: {csv_data}")
# Analyze directory
analysis = analyze_directory("sample_data")
print(f"Directory analysis: {analysis}")
# Process file line by line
def word_counter(line, line_num):
words = len(line.split())
print(f"Line {line_num} has {words} words")
process_large_file("sample_data/sample.txt", word_counter)
5. Modern Python Deployment
Today’s Python applications can be deployed in many ways, from traditional web hosting to modern cloud platforms.
5.1. Popular Deployment Platforms
Cloud Platforms: * Heroku: Simple deployment with Git integration * Google Cloud Platform: Powerful infrastructure with App Engine, Cloud Run * AWS: Comprehensive services including Lambda, EC2, Elastic Beanstalk * Microsoft Azure: Full-featured cloud platform with App Service * DigitalOcean: Developer-friendly with App Platform
Containerization: * Docker: Package applications with all dependencies * Kubernetes: Orchestrate containers at scale
5.2. Modern Web Frameworks
For web development, consider these popular Python frameworks:
-
FastAPI: Modern, fast API framework with automatic documentation
-
Django: Full-featured web framework with admin interface
-
Flask: Lightweight and flexible micro-framework
-
Streamlit: Quick data science web apps
5.3. Simple Deployment Example
Here’s how to create a basic web API with FastAPI:
# main.py
from fastapi import FastAPI
app = FastAPI(title="Python Tutorial API")
@app.get("/")
def read_root():
return {"message": "Hello from Python!"}
@app.get("/crawl-status")
def crawl_status():
return {"status": "Web crawler ready", "version": "1.0"}
Install dependencies:
pip install fastapi uvicorn
Run locally:
uvicorn main:app --reload
This creates a REST API that can be deployed to any cloud platform.
6. Building a Web Crawler
Web crawling is a common task in data science, SEO analysis, and content aggregation. Our web crawler will extract data from web pages using modern Python libraries.
6.1. Installation and Setup
First, install the required libraries:
pip install requests beautifulsoup4 lxml
6.2. Web Crawler Implementation
#!/usr/bin/env python3
"""
Modern web crawler using requests and BeautifulSoup.
Demonstrates best practices for web scraping in Python.
"""
import requests
from bs4 import BeautifulSoup, Tag
from typing import Dict, List, Optional, Set
from urllib.parse import urljoin, urlparse
import time
import logging
from dataclasses import dataclass
from pathlib import Path
import json
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
@dataclass
class CrawlResult:
"""Data structure for crawl results."""
url: str
title: str
status_code: int
links: List[str]
text_content: str
meta_description: str
word_count: int
crawl_time: float
class WebCrawler:
"""A respectful web crawler with rate limiting and error handling."""
def __init__(self, delay: float = 1.0, max_retries: int = 3):
"""
Initialize the web crawler.
Args:
delay: Delay between requests in seconds
max_retries: Maximum number of retry attempts
"""
self.delay = delay
self.max_retries = max_retries
self.session = requests.Session()
# Set a reasonable user agent
self.session.headers.update({
'User-Agent': 'Python Web Crawler Tutorial Bot 1.0 (+https://example.com/bot)'
})
self.crawled_urls: Set[str] = set()
self.results: List[CrawlResult] = []
def crawl_page(self, url: str) -> Optional[CrawlResult]:
"""
Crawl a single web page and extract information.
Args:
url: The URL to crawl
Returns:
CrawlResult object or None if crawl failed
"""
if url in self.crawled_urls:
logger.info(f"Already crawled: {url}")
return None
logger.info(f"Crawling: {url}")
start_time = time.time()
try:
# Make request with retry logic
response = self._make_request(url)
if not response:
return None
# Parse HTML content
soup = BeautifulSoup(response.content, 'html.parser')
# Extract page information
title = self._extract_title(soup)
links = self._extract_links(soup, url)
text_content = self._extract_text(soup)
meta_description = self._extract_meta_description(soup)
word_count = len(text_content.split())
crawl_time = time.time() - start_time
result = CrawlResult(
url=url,
title=title,
status_code=response.status_code,
links=links,
text_content=text_content[:500] + "..." if len(text_content) > 500 else text_content,
meta_description=meta_description,
word_count=word_count,
crawl_time=crawl_time
)
self.crawled_urls.add(url)
self.results.append(result)
# Respect the website with delay
time.sleep(self.delay)
return result
except Exception as e:
logger.error(f"Error crawling {url}: {e}")
return None
def _make_request(self, url: str) -> Optional[requests.Response]:
"""Make HTTP request with retry logic."""
for attempt in range(self.max_retries):
try:
response = self.session.get(url, timeout=10)
response.raise_for_status()
return response
except requests.exceptions.RequestException as e:
logger.warning(f"Attempt {attempt + 1} failed for {url}: {e}")
if attempt < self.max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
else:
logger.error(f"All retry attempts failed for {url}")
return None
def _extract_title(self, soup: BeautifulSoup) -> str:
"""Extract page title."""
title_tag = soup.find('title')
if title_tag and isinstance(title_tag, Tag):
return title_tag.get_text().strip()
return "No title found"
def _extract_links(self, soup: BeautifulSoup, base_url: str) -> List[str]:
"""Extract all links from the page."""
links = []
for link in soup.find_all('a', href=True):
if isinstance(link, Tag):
href = link['href']
absolute_url = urljoin(base_url, href)
links.append(absolute_url)
return links
def _extract_text(self, soup: BeautifulSoup) -> str:
"""Extract visible text content from the page."""
# Remove script and style elements
for script in soup(["script", "style"]):
script.decompose()
# Get text and clean it up
text = soup.get_text()
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
text = ' '.join(chunk for chunk in chunks if chunk)
return text
def _extract_meta_description(self, soup: BeautifulSoup) -> str:
"""Extract meta description."""
meta_desc = soup.find('meta', attrs={'name': 'description'})
if meta_desc and isinstance(meta_desc, Tag):
return meta_desc.get('content', '')
return ""
def crawl_multiple_pages(self, urls: List[str]) -> List[CrawlResult]:
"""Crawl multiple pages and return results."""
logger.info(f"Starting crawl of {len(urls)} pages")
results = []
for url in urls:
result = self.crawl_page(url)
if result:
results.append(result)
logger.info(f"Crawling completed. Successfully crawled {len(results)} pages")
return results
def save_results(self, filename: str = "crawl_results.json") -> bool:
"""Save crawl results to JSON file."""
try:
# Convert dataclass objects to dictionaries
results_data = [
{
'url': result.url,
'title': result.title,
'status_code': result.status_code,
'links_count': len(result.links),
'first_10_links': result.links[:10], # Save only first 10 links
'text_preview': result.text_content,
'meta_description': result.meta_description,
'word_count': result.word_count,
'crawl_time': result.crawl_time
}
for result in self.results
]
with open(filename, 'w', encoding='utf-8') as f:
json.dump(results_data, f, indent=2, ensure_ascii=False)
logger.info(f"Results saved to {filename}")
return True
except Exception as e:
logger.error(f"Error saving results: {e}")
return False
def get_statistics(self) -> Dict[str, any]:
"""Get crawling statistics."""
if not self.results:
return {"error": "No crawling results available"}
total_words = sum(result.word_count for result in self.results)
avg_crawl_time = sum(result.crawl_time for result in self.results) / len(self.results)
total_links = sum(len(result.links) for result in self.results)
return {
"pages_crawled": len(self.results),
"total_words": total_words,
"average_words_per_page": total_words // len(self.results),
"total_links_found": total_links,
"average_crawl_time": round(avg_crawl_time, 2),
"successful_crawls": len(self.results),
"domains_crawled": len(set(urlparse(result.url).netloc for result in self.results))
}
if __name__ == "__main__":
# Example usage
crawler = WebCrawler(delay=1.0)
# Example URLs (using HTTP examples that are safe to crawl)
test_urls = [
"http://httpbin.org/html",
"http://httpbin.org/robots.txt",
]
# Crawl the pages
results = crawler.crawl_multiple_pages(test_urls)
# Display results
for result in results:
print(f"\n{'='*60}")
print(f"URL: {result.url}")
print(f"Title: {result.title}")
print(f"Status: {result.status_code}")
print(f"Word Count: {result.word_count}")
print(f"Links Found: {len(result.links)}")
print(f"Crawl Time: {result.crawl_time:.2f}s")
print(f"Text Preview: {result.text_content[:100]}...")
# Show statistics
stats = crawler.get_statistics()
print(f"\n{'='*60}")
print("CRAWLING STATISTICS")
print(f"{'='*60}")
for key, value in stats.items():
print(f"{key.replace('_', ' ').title()}: {value}")
# Save results
crawler.save_results("web_crawl_results.json")
This web crawler demonstrates: * HTTP requests with proper error handling * HTML parsing with BeautifulSoup * Rate limiting to be respectful to websites * Data extraction and structuring * Robustness with retry logic
6.3. Using the Web Crawler
#!/usr/bin/env python3
"""
Example usage of the web crawler.
"""
from web_crawler import WebCrawler
import logging
# Configure logging to see crawler activity
logging.basicConfig(level=logging.INFO)
def main():
"""Demonstrate web crawler usage."""
print("🕷️ Web Crawler Example")
print("=" * 50)
# Create crawler with 2-second delay between requests
crawler = WebCrawler(delay=2.0, max_retries=2)
# URLs to crawl (using safe test websites)
urls_to_crawl = [
"http://httpbin.org/html",
"http://httpbin.org/robots.txt",
"https://jsonplaceholder.typicode.com/", # API service with HTML
]
print(f"Crawling {len(urls_to_crawl)} URLs...")
# Perform the crawl
results = crawler.crawl_multiple_pages(urls_to_crawl)
# Display detailed results
print(f"\n✅ Successfully crawled {len(results)} pages\n")
for i, result in enumerate(results, 1):
print(f"📄 Page {i}: {result.url}")
print(f" Title: {result.title}")
print(f" Status: {result.status_code}")
print(f" Words: {result.word_count}")
print(f" Links: {len(result.links)}")
print(f" Time: {result.crawl_time:.2f}s")
if result.meta_description:
print(f" Description: {result.meta_description}")
print(f" Preview: {result.text_content[:100]}...\n")
# Show overall statistics
stats = crawler.get_statistics()
print("📊 Crawling Statistics:")
print("-" * 30)
for key, value in stats.items():
print(f"{key.replace('_', ' ').title()}: {value}")
# Save results to file
saved = crawler.save_results("example_crawl_results.json")
if saved:
print("\n💾 Results saved to example_crawl_results.json")
if __name__ == "__main__":
main()
6.4. Best Practices Demonstrated
The web crawler example showcases important Python development practices:
-
Type hints: Make code more maintainable and self-documenting
-
Error handling: Graceful failure handling with informative messages
-
Logging: Proper logging for debugging and monitoring
-
Modular design: Functions and classes with single responsibilities
-
Documentation: Clear docstrings and comments
-
External libraries: Leveraging the Python ecosystem
-
Resource management: Proper cleanup and context managers
6.5. Extending the Web Crawler
Consider these enhancements to deepen your learning:
-
Add support for different content types (PDF, images)
-
Implement concurrent crawling with
asyncio
-
Add data storage to databases or files
-
Create a web interface with Flask or FastAPI == AsciiDoc Validation with LanguageTools
LanguageTools provides grammar and style checking for text documents. We’ll create a tool to check AsciiDoc files for writing issues.
6.6. Installation and Setup
Install the required library:
pip install language-tool-python
Note: This will download the LanguageTools server on first use.
6.7. AsciiDoc Validator Implementation
Create a file named asciidoc_validator.py with the following content:
#!/usr/bin/env python3
"""
AsciiDoc validator using LanguageTools to check grammar and style.
Demonstrates file processing and external API integration.
"""
import language_tool_python
from pathlib import Path
from typing import List, Dict, Optional, NamedTuple
import re
import logging
import json
from dataclasses import dataclass
import argparse
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
class ValidationIssue(NamedTuple):
"""Structure for validation issues."""
line_number: int
column: int
message: str
rule_id: str
suggestions: List[str]
context: str
@dataclass
class FileReport:
"""Report for a single file."""
file_path: str
total_lines: int
issues_found: int
issues: List[ValidationIssue]
processing_time: float
class AsciiDocValidator:
"""Validator for AsciiDoc files using LanguageTools."""
def __init__(self, language: str = 'en-US'):
"""
Initialize the validator.
Args:
language: Language code for LanguageTools (default: en-US)
"""
self.language = language
self.tool: Optional[language_tool_python.LanguageTool] = None
self.reports: List[FileReport] = []
# Patterns to ignore in AsciiDoc files
self.ignore_patterns = [
r'include::', # Include directives
r'image::', # Image directives
r'\[source,', # Source code blocks
r'----', # Code block delimiters
r'====', # Example block delimiters
r'^\|', # Table rows
r'^\+', # Table continuation
r'^\*\*\*', # Section breaks
r'^\=\=\=', # Headers
r'^\#', # Comments
r'^\[', # Attribute definitions
r'^\:', # Attribute assignments
]
def _initialize_language_tool(self):
"""Initialize LanguageTools (lazy loading)."""
if self.tool is None:
logger.info("Initializing LanguageTools... (this may take a moment)")
try:
self.tool = language_tool_python.LanguageTool(self.language)
logger.info("LanguageTools initialized successfully")
except Exception as e:
logger.error(f"Failed to initialize LanguageTools: {e}")
raise
def _should_check_line(self, line: str) -> bool:
"""Determine if a line should be checked for language issues."""
line_stripped = line.strip()
# Skip empty lines
if not line_stripped:
return False
# Check against ignore patterns
for pattern in self.ignore_patterns:
if re.match(pattern, line_stripped):
return False
return True
def _extract_text_content(self, file_path: Path) -> List[str]:
"""Extract text content from AsciiDoc file."""
try:
with open(file_path, 'r', encoding='utf-8') as file:
lines = file.readlines()
# Filter lines that should be checked
text_lines = []
for line_num, line in enumerate(lines, 1):
if self._should_check_line(line):
text_lines.append((line_num, line.strip()))
return text_lines
except Exception as e:
logger.error(f"Error reading file {file_path}: {e}")
return []
def validate_file(self, file_path: Path) -> FileReport:
"""
Validate a single AsciiDoc file.
Args:
file_path: Path to the AsciiDoc file
Returns:
FileReport with validation results
"""
import time
start_time = time.time()
logger.info(f"Validating: {file_path}")
# Initialize LanguageTools if needed
self._initialize_language_tool()
# Extract text content
text_lines = self._extract_text_content(file_path)
if not text_lines:
processing_time = time.time() - start_time
return FileReport(
file_path=str(file_path),
total_lines=0,
issues_found=0,
issues=[],
processing_time=processing_time
)
# Check each line for issues
all_issues = []
for line_number, text in text_lines:
try:
matches = self.tool.check(text)
for match in matches:
issue = ValidationIssue(
line_number=line_number,
column=match.offset,
message=match.message,
rule_id=match.ruleId,
suggestions=[s for s in match.replacements[:3]], # First 3 suggestions
context=text[max(0, match.offset-10):match.offset+match.errorLength+10]
)
all_issues.append(issue)
except Exception as e:
logger.warning(f"Error checking line {line_number}: {e}")
processing_time = time.time() - start_time
report = FileReport(
file_path=str(file_path),
total_lines=len(text_lines),
issues_found=len(all_issues),
issues=all_issues,
processing_time=processing_time
)
self.reports.append(report)
return report
def validate_directory(self, directory: Path, pattern: str = "*.adoc") -> List[FileReport]:
"""
Validate all AsciiDoc files in a directory.
Args:
directory: Directory to scan
pattern: File pattern to match (default: *.adoc)
Returns:
List of FileReport objects
"""
logger.info(f"Scanning directory: {directory}")
if not directory.exists():
logger.error(f"Directory not found: {directory}")
return []
# Find all matching files
adoc_files = list(directory.glob(pattern))
if not adoc_files:
logger.warning(f"No {pattern} files found in {directory}")
return []
logger.info(f"Found {len(adoc_files)} files to validate")
# Validate each file
reports = []
for file_path in adoc_files:
report = self.validate_file(file_path)
reports.append(report)
return reports
def generate_report(self, output_file: Optional[str] = None) -> Dict:
"""Generate summary report of all validations."""
if not self.reports:
return {"error": "No validation reports available"}
total_files = len(self.reports)
total_issues = sum(report.issues_found for report in self.reports)
files_with_issues = sum(1 for report in self.reports if report.issues_found > 0)
# Group issues by rule ID
issue_types = {}
for report in self.reports:
for issue in report.issues:
rule_id = issue.rule_id
if rule_id not in issue_types:
issue_types[rule_id] = {"count": 0, "message": issue.message}
issue_types[rule_id]["count"] += 1
# Create summary report
summary = {
"validation_summary": {
"total_files": total_files,
"files_with_issues": files_with_issues,
"total_issues": total_issues,
"average_issues_per_file": round(total_issues / total_files, 2),
},
"issue_breakdown": issue_types,
"file_reports": [
{
"file": report.file_path,
"lines_checked": report.total_lines,
"issues": report.issues_found,
"processing_time": round(report.processing_time, 2)
}
for report in self.reports
]
}
# Save to file if requested
if output_file:
try:
with open(output_file, 'w', encoding='utf-8') as f:
json.dump(summary, f, indent=2, ensure_ascii=False)
logger.info(f"Report saved to {output_file}")
except Exception as e:
logger.error(f"Error saving report: {e}")
return summary
def print_detailed_report(self):
"""Print detailed validation report to console."""
if not self.reports:
print("No validation reports available.")
return
print("\n" + "="*60)
print("ASCIIDOC VALIDATION REPORT")
print("="*60)
total_issues = sum(report.issues_found for report in self.reports)
files_with_issues = [r for r in self.reports if r.issues_found > 0]
print(f"Files scanned: {len(self.reports)}")
print(f"Files with issues: {len(files_with_issues)}")
print(f"Total issues found: {total_issues}")
print("\n" + "-"*60)
# Show issues by file
for report in self.reports:
if report.issues_found > 0:
print(f"\n📄 {Path(report.file_path).name}")
print(f" Issues: {report.issues_found}")
for issue in report.issues[:5]: # Show first 5 issues
print(f" Line {issue.line_number}: {issue.message}")
if issue.suggestions:
suggestions = ", ".join(issue.suggestions)
print(f" Suggestions: {suggestions}")
if len(report.issues) > 5:
print(f" ... and {len(report.issues) - 5} more issues")
def main():
"""Command-line interface for the validator."""
parser = argparse.ArgumentParser(description="Validate AsciiDoc files using LanguageTools")
parser.add_argument("path", help="File or directory path to validate")
parser.add_argument("--language", default="en-US", help="Language code (default: en-US)")
parser.add_argument("--output", help="Output file for JSON report")
parser.add_argument("--pattern", default="*.adoc", help="File pattern for directory scanning")
args = parser.parse_args()
validator = AsciiDocValidator(language=args.language)
path = Path(args.path)
if path.is_file():
# Validate single file
validator.validate_file(path)
elif path.is_dir():
# Validate directory
validator.validate_directory(path, args.pattern)
else:
print(f"Error: Path {path} does not exist")
return
# Generate and display report
validator.print_detailed_report()
if args.output:
validator.generate_report(args.output)
if __name__ == "__main__":
main()
This validator demonstrates:
-
Working with file system paths
-
Text processing and filtering
-
Integration with external tools
-
Report generation
-
Command-line interface design
6.8. Using the AsciiDoc Validator
Create the following file named Test.adoc.
include::res/practical/Test.adoc
Run this program in a folder which contains Asciidoc (*.adoc) files to validate them.
python asciidoc_validator.py ~/git/content/TestContent
6.9. Ignoring Specific Words
When working with technical documentation, you often have specialized terms, product names, or abbreviations that should be ignored by the spell checker. You can create an external file to maintain a list of words to exclude from spell checking.
6.9.1. Creating an Ignore List
Create a file named ignored_words.txt
with one word per line:
# Technical terms
JFace
SWT
OSGi
Maven
Tycho
IDE
APIs
AsciiDoc
AsciiDoctor
# Company and product names
vogella
Eclipse
IntelliJ
VSCode
# Programming terms
foreach
classpath
runtime
workflow
6.9.2. Updated Validator Implementation
Here’s how to modify the validator to use the ignore list:
def load_ignored_words(file_path: str) -> set[str]:
"""Load words to ignore from a file."""
try:
with open(file_path, 'r') as f:
# Read lines, strip whitespace, and filter out comments and empty lines
return {line.strip() for line in f
if line.strip() and not line.strip().startswith('#')}
except FileNotFoundError:
print(f"Warning: Ignore file {file_path} not found. No words will be ignored.")
return set()
def is_valid_word(word: str, ignored_words: set[str]) -> bool:
"""Check if a word should be validated."""
return word.lower() not in {w.lower() for w in ignored_words}
# In your main validation function:
ignored_words = load_ignored_words('ignored_words.txt')
# When checking words, add:
if not is_valid_word(word, ignored_words):
continue # Skip this word
6.9.3. Usage with Ignored Words
Run the validator with the ignore list:
# The ignored_words.txt file will be loaded automatically
python asciidoc_validator.py ~/git/content/TestContent
The validator will now skip any words found in the ignore list. This is particularly useful for:
-
Technical terms (e.g., JFace, OSGi)
-
Product names (e.g., Eclipse, IntelliJ)
-
Programming terminology
-
Company names and trademarks
Keep your ignored_words.txt under version control to share it with your team and maintain consistency across your documentation. |
6.10. Best Practices Demonstrated
The validator example showcases important Python development practices:
-
Type hints: Make code more maintainable and self-documenting
-
Error handling: Graceful failure handling with informative messages
-
Logging: Proper logging for debugging and monitoring
-
Modular design: Functions and classes with single responsibilities
-
Documentation: Clear docstrings and comments
-
External libraries: Leveraging the Python ecosystem
-
Resource management: Proper cleanup and context managers
6.11. Extending the Validator
Consider these enhancements to deepen your learning:
-
Support for multiple document formats
-
Integration with CI/CD pipelines
-
Custom rule definitions
-
Batch processing of multiple directories
-
HTML report generation
7. Links and Literature
7.1. vogella Java example code
If you need more assistance we offer Online Training and Onsite training as well as consulting