By Dargslan in Python Programming — 09 Nov 2025

Parsing XML Files in Python (ElementTree Examples)

Sponsor message — This article is made possible by Dargslan.com, a publisher of practical, no-fluff IT & developer workbooks.

Why Dargslan.com?

If you prefer doing over endless theory, Dargslan’s titles are built for you. Every workbook focuses on skills you can apply the same day—server hardening, Linux one-liners, PowerShell for admins, Python automation, cloud basics, and more.

In the vast ecosystem of data formats, XML remains a cornerstone of information exchange, configuration management, and document structuring across countless applications and industries. Despite the rise of JSON and other alternatives, XML's self-descriptive nature, hierarchical structure, and widespread adoption in enterprise systems, web services, and legacy applications make it an indispensable skill for developers working with data integration, API consumption, and system interoperability. Understanding how to parse, manipulate, and generate XML documents efficiently can be the difference between a robust, maintainable solution and a brittle, error-prone implementation.

Parsing XML files means extracting structured information from documents written in Extensible Markup Language—a markup language that defines rules for encoding documents in a format that is both human-readable and machine-readable. Python offers several libraries for this purpose, with ElementTree standing out as the standard library solution that balances simplicity, performance, and functionality. This built-in module provides a lightweight, Pythonic API for parsing and creating XML data, making it the go-to choice for most XML processing tasks without requiring external dependencies.

Throughout this comprehensive exploration, you'll discover practical techniques for reading XML files, navigating complex document structures, extracting specific data elements, modifying existing content, and creating new XML documents from scratch. We'll examine real-world scenarios, performance considerations, error handling strategies, and best practices that will transform you from an XML novice into a confident practitioner capable of tackling diverse XML processing challenges with elegance and efficiency.

Understanding the ElementTree Architecture

The ElementTree module implements a simple and efficient API for parsing and creating XML data, organizing the document as a tree structure where each node represents an element with its attributes, text content, and child elements. This hierarchical model mirrors the nested structure of XML documents themselves, making navigation and manipulation intuitive for developers familiar with tree-based data structures. The module provides two primary classes: ElementTree represents the entire XML document as a tree, while Element represents a single node within that tree, serving as the fundamental building block for all XML operations.

When working with ElementTree, you'll encounter several key concepts that form the foundation of XML processing. Elements contain tags (the name of the element), attributes (key-value pairs providing metadata), text (the content between opening and closing tags), and children (nested elements). The root element serves as the entry point to the document tree, from which you can traverse downward to access any part of the structure. Understanding these relationships enables you to navigate even the most complex XML schemas with confidence and precision.

"The beauty of ElementTree lies in its simplicity—it doesn't try to be everything to everyone, but rather provides exactly what most developers need for everyday XML processing tasks."

The module's design philosophy emphasizes practicality over completeness, offering a streamlined interface that covers the majority of use cases without the overhead of more comprehensive XML libraries. This approach results in faster parsing times, lower memory consumption, and a gentler learning curve compared to alternatives like lxml or minidom. For applications requiring advanced features like XPath 2.0, XSLT transformations, or XML Schema validation, other libraries may be necessary, but ElementTree remains the optimal starting point for most XML processing scenarios.

Import Strategies and Module Variations

Python provides two implementations of the ElementTree API: the pure Python version in the xml.etree.ElementTree module and a faster C implementation in xml.etree.cElementTree. Modern Python versions (3.3 and later) automatically use the C implementation when you import ElementTree, so you don't need to worry about choosing between them. The standard import convention uses the alias ET to keep code concise while maintaining readability, a practice widely adopted throughout the Python community.

import xml.etree.ElementTree as ET

# Alternative imports for specific needs
from xml.etree.ElementTree import Element, SubElement, tostring
from xml.etree import ElementTree

Security considerations demand attention when parsing XML from untrusted sources, as XML parsers can be vulnerable to various attacks including billion laughs attacks, external entity expansion, and DTD retrieval. ElementTree's default configuration provides reasonable protection by disabling external entity processing, but for maximum security when handling untrusted input, consider using the defusedxml library, which wraps ElementTree with additional safeguards against XML vulnerabilities.

Loading XML Documents into Memory

The first step in any XML processing workflow involves loading the document into memory where it can be accessed, queried, and manipulated. ElementTree offers multiple methods for parsing XML data, each suited to different scenarios and data sources. The most straightforward approach uses the parse() function to read XML from a file path, returning an ElementTree object that represents the complete document structure. This method handles file opening, parsing, and resource cleanup automatically, making it the preferred choice for file-based XML processing.

import xml.etree.ElementTree as ET

# Parse from file path
tree = ET.parse('data.xml')
root = tree.getroot()

# Access root element information
print(f"Root tag: {root.tag}")
print(f"Root attributes: {root.attrib}")

For XML content already in memory as a string—perhaps received from an API response, database query, or user input—the fromstring() function provides direct parsing without file system interaction. This function returns an Element object representing the root of the parsed tree, bypassing the ElementTree wrapper since you're working with a fragment rather than a complete document. This distinction matters when you need to write the modified XML back to a file, as only ElementTree objects have a write() method.

xml_string = '''

    
        Gambardella, Matthew
        XML Developer's Guide
        44.95
    
'''

root = ET.fromstring(xml_string)

# For working with the full tree from a string
tree = ET.ElementTree(ET.fromstring(xml_string))

Handling File-Like Objects and Streams

Real-world applications often need to parse XML from sources beyond simple file paths—network streams, compressed files, in-memory buffers, or other file-like objects. ElementTree's parse() function accepts any object supporting the file protocol, providing flexibility for diverse data sources. This capability enables seamless integration with libraries like requests for HTTP responses, gzip for compressed files, or io.BytesIO for in-memory operations.

import xml.etree.ElementTree as ET
import gzip
from io import BytesIO

# Parse from compressed file
with gzip.open('data.xml.gz', 'rb') as f:
    tree = ET.parse(f)
    root = tree.getroot()

# Parse from HTTP response
import requests
response = requests.get('https://example.com/data.xml')
root = ET.fromstring(response.content)

# Parse from bytes in memory
xml_bytes = b'<root><child>text</child></root>'
root = ET.fromstring(xml_bytes)

"Flexibility in data sources is not just a convenience—it's a necessity in modern applications where XML might arrive from APIs, databases, message queues, or any number of integration points."

Navigating the XML Tree Structure

Once you've loaded an XML document into memory, the next challenge involves navigating its hierarchical structure to locate specific elements and extract their data. ElementTree provides several approaches for tree traversal, each with distinct advantages depending on your document structure and access patterns. Direct child access through iteration offers simplicity for shallow hierarchies, while recursive methods enable deep traversal of complex nested structures. Understanding these navigation techniques empowers you to efficiently locate and process any element within even the most elaborate XML schemas.

The most basic navigation pattern involves iterating over an element's direct children using a simple for loop. Each element behaves as an iterable container of its child elements, making it natural to process siblings sequentially. This approach works beautifully when you know the document structure and need to access elements at a specific level of the hierarchy. Combined with conditional logic, you can filter children based on tag names, attributes, or content, creating focused processing pipelines for specific element types.

import xml.etree.ElementTree as ET

tree = ET.parse('books.xml')
root = tree.getroot()

# Iterate over direct children
for child in root:
    print(f"Tag: {child.tag}, Attributes: {child.attrib}")
    
# Access specific child by index
first_book = root[0]
print(first_book.tag)

# Find first child with specific tag
for book in root:
    if book.tag == 'book':
        print(f"Found book: {book.get('id')}")
        break

Finding Elements with Search Methods

For more sophisticated element location, ElementTree provides find() and findall() methods that search for elements matching specific criteria. The find() method returns the first matching element or None if no match exists, making it ideal for retrieving unique elements or when you only need the first occurrence. The findall() method returns a list of all matching elements, perfect for processing collections of similar items like multiple book entries, product records, or transaction details.

Method	Return Type	Use Case	Example
find(tag)	Element or None	Locate first matching direct child	root.find('book')
findall(tag)	List of Elements	Get all matching direct children	root.findall('book')
find(path)	Element or None	Navigate nested structure	root.find('./book/author')
findall(path)	List of Elements	Get all elements at path	root.findall('.//price')
iter(tag)	Iterator	Recursively find all descendants	root.iter('title')

# Find first book element
book = root.find('book')
if book is not None:
    print(f"First book ID: {book.get('id')}")

# Find all book elements
books = root.findall('book')
print(f"Total books: {len(books)}")

# Find nested elements using path
author = root.find('./book/author')
if author is not None:
    print(f"First author: {author.text}")

# Find all price elements anywhere in tree
prices = root.findall('.//price')
for price in prices:
    print(f"Price: {price.text}")

Advanced Traversal with XPath-Like Expressions

ElementTree supports a limited subset of XPath expressions, providing powerful pattern-matching capabilities for complex element selection. While not as comprehensive as full XPath 2.0 implementations found in lxml, ElementTree's XPath support covers the most common use cases with expressions for descendant selection, attribute matching, and positional filtering. These expressions use a familiar syntax that will feel natural to developers experienced with XPath in other contexts, though with some limitations to be aware of.

# Find all books regardless of nesting level
all_books = root.findall('.//book')

# Find books with specific attribute
expensive_books = root.findall(".//book[@price]")

# Find all titles within books
titles = root.findall('.//book/title')

# Get elements at specific positions
first_book = root.find('./book[1]')  # Note: 1-indexed

# Find elements with specific attribute value
classic = root.find(".//book[@category='classic']")

# Iterate over all elements of specific type
for title in root.iter('title'):
    print(title.text)

"The key to efficient XML navigation is choosing the right tool for the job—direct iteration for simple structures, find methods for targeted access, and iter() for comprehensive traversal."

Extracting Data from Elements

Navigating to the right elements represents only half the battle—extracting their data in usable forms completes the parsing workflow. Elements contain three primary types of information: the tag name identifying the element type, attributes providing metadata as key-value pairs, and text content representing the actual data payload. Mastering the extraction of each data type enables you to transform raw XML into structured Python objects ready for business logic processing, database insertion, or further transformation.

Element text content, accessed through the .text attribute, contains the character data between opening and closing tags. This attribute returns a string for elements with text content or None for empty elements, requiring careful handling to avoid AttributeError exceptions when processing mixed content. For elements with child elements, .text contains only the text before the first child, while .tail contains text after the element's closing tag, a distinction crucial for preserving document structure during manipulation.

import xml.etree.ElementTree as ET

xml_data = '''

    
        The Great Gatsby
        F. Scott Fitzgerald
        1925
        12.99
    

'''

root = ET.fromstring(xml_data)
book = root.find('book')

# Extract text content
title = book.find('title').text
author = book.find('author').text
year = book.find('year').text

print(f"Title: {title}")
print(f"Author: {author}")
print(f"Year: {year}")

# Safe text extraction with default
description = book.find('description')
desc_text = description.text if description is not None else "No description available"

Working with Attributes

Attributes provide metadata about elements without increasing document depth, making them ideal for IDs, categories, flags, and other qualifying information. ElementTree exposes attributes through the .attrib dictionary, enabling standard dictionary operations for attribute access, modification, and enumeration. The .get() method provides safe attribute access with optional default values, preventing KeyError exceptions when attributes might be absent—a common scenario in documents with optional metadata fields.

# Access attributes dictionary
attributes = book.attrib
print(f"All attributes: {attributes}")

# Get specific attribute
book_id = book.get('id')
genre = book.get('genre')
print(f"Book ID: {book_id}, Genre: {genre}")

# Safe attribute access with default
isbn = book.get('isbn', 'N/A')
print(f"ISBN: {isbn}")

# Check attribute existence
if 'id' in book.attrib:
    print("Book has an ID")

# Get nested element attributes
price = book.find('price')
currency = price.get('currency', 'USD')
amount = float(price.text)
print(f"Price: {amount} {currency}")

Type Conversion and Data Validation

XML stores all data as text strings, requiring explicit type conversion for numeric, boolean, or date values. Implementing robust conversion logic with error handling ensures your application gracefully handles malformed data, missing elements, or unexpected formats. Combining extraction with validation creates defensive parsing code that fails gracefully rather than crashing on invalid input, a critical characteristic for production systems processing XML from external sources.

def safe_int(element, default=0):
    """Safely convert element text to integer."""
    try:
        return int(element.text) if element is not None and element.text else default
    except (ValueError, AttributeError):
        return default

def safe_float(element, default=0.0):
    """Safely convert element text to float."""
    try:
        return float(element.text) if element is not None and element.text else default
    except (ValueError, AttributeError):
        return default

# Extract and convert data
year = safe_int(book.find('year'), default=0)
price = safe_float(book.find('price'), default=0.0)

# Boolean conversion
in_stock = book.find('in_stock')
is_available = in_stock.text.lower() == 'true' if in_stock is not None else False

# Date conversion
from datetime import datetime

published = book.find('published')
if published is not None:
    try:
        pub_date = datetime.strptime(published.text, '%Y-%m-%d')
    except ValueError:
        pub_date = None

"Data extraction without validation is like building on sand—it works until the first unexpected input brings everything crashing down."

Practical Parsing Patterns

Real-world XML processing rarely involves simple, flat structures with predictable content. Production XML documents often contain deeply nested hierarchies, repeated elements, optional fields, mixed content, and namespaces that complicate parsing logic. Developing robust patterns for common scenarios—list processing, hierarchical data extraction, and namespace handling—transforms you from a casual XML user into a proficient practitioner capable of tackling enterprise-grade documents with confidence and efficiency.

Processing Collections of Similar Elements

Many XML documents represent collections of similar items—product catalogs, transaction logs, contact lists, or configuration entries. Processing these collections efficiently requires patterns that minimize code duplication while maintaining readability and error handling. List comprehensions combined with extraction functions create concise, expressive code for transforming XML collections into Python data structures like lists of dictionaries, ready for further processing or database insertion.

import xml.etree.ElementTree as ET

xml_catalog = '''

    
        Gambardella, Matthew
        XML Developer's Guide
        44.95
        2000-10-01
    
    
        Ralls, Kim
        Midnight Rain
        5.95
        2000-12-16
    
    
        Corets, Eva
        Maeve Ascendant
        5.95
        2000-11-17
    

'''

root = ET.fromstring(xml_catalog)

# Extract all books as dictionaries
books = []
for book in root.findall('book'):
    book_data = {
        'id': book.get('id'),
        'category': book.get('category'),
        'author': book.find('author').text,
        'title': book.find('title').text,
        'price': float(book.find('price').text),
        'publish_date': book.find('publish_date').text
    }
    books.append(book_data)

# List comprehension approach
books_compact = [
    {
        'id': book.get('id'),
        'title': book.find('title').text,
        'price': float(book.find('price').text)
    }
    for book in root.findall('book')
]

# Filter and process
expensive_books = [
    book for book in books 
    if book['price'] > 20.0
]

print(f"Found {len(expensive_books)} expensive books")

Handling Nested Hierarchies

Hierarchical data structures—organizational charts, file systems, category trees—require recursive processing to capture parent-child relationships at arbitrary depths. Recursive functions elegantly handle these structures by processing each level independently while maintaining context through function parameters. This approach scales naturally from simple two-level hierarchies to complex multi-level structures without requiring different code paths for different depths.

xml_hierarchy = '''

    
        
            John Smith
            Jane Doe
        
        
            Bob Johnson
        
    
    
        Alice Williams
    

'''

def process_hierarchy(element, level=0):
    """Recursively process hierarchical structure."""
    indent = "  " * level
    print(f"{indent}{element.tag}: {element.get('name', element.text or '')}")
    
    for child in element:
        process_hierarchy(child, level + 1)

root = ET.fromstring(xml_hierarchy)
process_hierarchy(root)

# Build nested dictionary structure
def element_to_dict(element):
    """Convert element tree to nested dictionary."""
    result = {
        'tag': element.tag,
        'attributes': element.attrib,
        'text': element.text.strip() if element.text else None,
        'children': [element_to_dict(child) for child in element]
    }
    return result

hierarchy_dict = element_to_dict(root)

Namespace Management

XML namespaces prevent tag name collisions when combining documents from different vocabularies, but they complicate parsing by requiring namespace-aware element access. ElementTree represents namespaced elements with fully-qualified names in Clark notation—{namespace}localname—requiring you to include the namespace URI when searching for elements. Creating a namespace dictionary and using it consistently throughout your parsing code maintains readability while ensuring correct element matching.

xml_with_ns = '''

    
        
            Python Guide
            John Smith
        
    

'''

root = ET.fromstring(xml_with_ns)

# Define namespace dictionary
namespaces = {
    'book': 'http://example.com/book',
    'author': 'http://example.com/author'
}

# Find elements using namespaces
catalog = root.find('book:catalog', namespaces)
items = root.findall('.//book:item', namespaces)

for item in items:
    title = item.find('book:title', namespaces)
    author = item.find('author:name', namespaces)
    print(f"Title: {title.text}, Author: {author.text}")

# Alternative: use full Clark notation
catalog_clark = root.find('{http://example.com/book}catalog')
title_clark = catalog_clark.find('.//{http://example.com/book}title')

Namespace Scenario	Approach	Example	Notes
Single namespace	Clark notation	{http://example.com}tag	Direct but verbose
Multiple namespaces	Namespace dictionary	find('ns:tag', namespaces)	More readable
Default namespace	Register with prefix	Register then use prefix	Simplifies queries
No namespace	Standard methods	find('tag')	Simplest case
Mixed content	Selective matching	Use local-name() in XPath	Requires lxml

"Namespaces are XML's way of preventing chaos when different vocabularies collide—embrace them with proper abstractions rather than fighting against their complexity."

Modifying XML Documents

Parsing XML represents only half of the XML processing equation—many applications require modifying existing documents, updating values, adding new elements, or removing obsolete content. ElementTree provides comprehensive modification capabilities through intuitive methods that mirror common data structure operations. Understanding these modification patterns enables you to build XML transformation pipelines, configuration updaters, and document processors that maintain structural integrity while adapting content to changing requirements.

Updating Element Content and Attributes

Modifying element text and attributes uses simple assignment operations that feel natural to Python developers. The .text attribute accepts string assignments to update element content, while the .attrib dictionary supports standard dictionary operations for attribute manipulation. These modifications occur in-memory, requiring explicit write operations to persist changes to disk, giving you fine-grained control over when modifications become permanent.

import xml.etree.ElementTree as ET

tree = ET.parse('books.xml')
root = tree.getroot()

# Update element text
book = root.find('book')
title = book.find('title')
title.text = "Updated Title"

# Update attributes
book.set('id', 'bk999')
book.set('category', 'updated')

# Update attribute via dictionary
book.attrib['status'] = 'modified'

# Remove attribute
if 'old_attr' in book.attrib:
    del book.attrib['old_attr']

# Update nested element
price = book.find('price')
price.text = "29.99"
price.set('currency', 'EUR')

# Save changes
tree.write('books_updated.xml', encoding='utf-8', xml_declaration=True)

Adding New Elements

Creating new elements requires the Element constructor or the SubElement helper function, which automatically adds the new element as a child of a parent element. SubElement reduces boilerplate code for common scenarios where you create and immediately attach elements, making your code more concise and readable. Setting text content and attributes during or after creation provides flexibility for different data sources and construction patterns.

from xml.etree.ElementTree import Element, SubElement

# Create new elements manually
new_book = Element('book')
new_book.set('id', 'bk104')
new_book.set('category', 'technology')

title = SubElement(new_book, 'title')
title.text = "Python Advanced Techniques"

author = SubElement(new_book, 'author')
author.text = "Jane Developer"

price = SubElement(new_book, 'price')
price.text = "39.99"
price.set('currency', 'USD')

# Add to existing tree
root.append(new_book)

# Create multiple elements in loop
authors_data = [
    {'name': 'John Smith', 'role': 'lead'},
    {'name': 'Jane Doe', 'role': 'contributor'}
]

authors_section = SubElement(new_book, 'authors')
for author_info in authors_data:
    author_elem = SubElement(authors_section, 'author')
    author_elem.text = author_info['name']
    author_elem.set('role', author_info['role'])

Removing Elements

Element removal requires calling the remove() method on the parent element, not the element to be removed—a common source of confusion for newcomers. This design reflects XML's tree structure where elements don't maintain references to their parents, requiring you to navigate from parent to child for removal operations. For scenarios where you need to remove elements based on content or attributes, finding the parent first through upward navigation or maintaining parent references during traversal becomes necessary.

# Remove specific child element
book = root.find('book')
description = book.find('description')
if description is not None:
    book.remove(description)

# Remove all children matching criteria
for book in root.findall('book'):
    if float(book.find('price').text) > 50.0:
        root.remove(book)

# Remove all children of specific type
for review in book.findall('review'):
    book.remove(review)

# Clear all children but keep element
book.clear()  # Removes all children, text, and attributes

# Remove element and preserve siblings
parent = root
for child in list(parent):  # Create list copy to avoid modification during iteration
    if child.get('status') == 'obsolete':
        parent.remove(child)

Reordering and Restructuring

XML element order often carries semantic meaning—book chapters, process steps, priority rankings—requiring careful handling during restructuring operations. ElementTree maintains child order in the sequence they were added, but provides no direct reordering methods, requiring you to remove and re-add elements in the desired sequence. For complex restructuring, building new subtrees and replacing entire sections often proves simpler than manipulating existing structures in place.

# Sort books by price
books = root.findall('book')
sorted_books = sorted(books, key=lambda b: float(b.find('price').text))

# Remove all books
for book in books:
    root.remove(book)

# Add back in sorted order
for book in sorted_books:
    root.append(book)

# Move element to different parent
book = root.find('book')
archive = root.find('archive')
if archive is None:
    archive = SubElement(root, 'archive')

root.remove(book)
archive.append(book)

# Insert element at specific position
new_book = Element('book')
root.insert(0, new_book)  # Insert at beginning

"Modification operations should always be followed by validation—ensuring your changes maintain document integrity prevents cascading failures in downstream processing."

Creating XML Documents from Scratch

While parsing existing XML documents addresses many use cases, generating new XML documents programmatically enables configuration file creation, data export, API response formatting, and document generation from database records. ElementTree's construction API provides everything needed to build complete XML documents with proper structure, namespaces, and formatting. Mastering document creation transforms your applications from passive XML consumers into active participants in XML-based data ecosystems.

from xml.etree.ElementTree import Element, SubElement, ElementTree
from xml.dom import minidom

def create_book_catalog():
    """Create a complete XML catalog from scratch."""
    # Create root element
    catalog = Element('catalog')
    catalog.set('version', '1.0')
    
    # Add books
    books_data = [
        {
            'id': 'bk101',
            'category': 'programming',
            'title': 'XML Mastery',
            'author': 'John Smith',
            'price': 44.95,
            'currency': 'USD'
        },
        {
            'id': 'bk102',
            'category': 'programming',
            'title': 'Python Excellence',
            'author': 'Jane Doe',
            'price': 39.95,
            'currency': 'USD'
        }
    ]
    
    for book_data in books_data:
        book = SubElement(catalog, 'book')
        book.set('id', book_data['id'])
        book.set('category', book_data['category'])
        
        title = SubElement(book, 'title')
        title.text = book_data['title']
        
        author = SubElement(book, 'author')
        author.text = book_data['author']
        
        price = SubElement(book, 'price')
        price.text = str(book_data['price'])
        price.set('currency', book_data['currency'])
    
    return catalog

# Create document
root = create_book_catalog()
tree = ElementTree(root)

# Write to file
tree.write('new_catalog.xml', encoding='utf-8', xml_declaration=True)

# Pretty print (requires minidom)
def prettify(element):
    """Return a pretty-printed XML string."""
    rough_string = ET.tostring(element, encoding='utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="  ")

pretty_xml = prettify(root)
with open('catalog_pretty.xml', 'w', encoding='utf-8') as f:
    f.write(pretty_xml)

Working with XML Declarations and Processing Instructions

Complete XML documents often require XML declarations specifying version and encoding, along with processing instructions for stylesheets or other document metadata. ElementTree's write() method accepts parameters for XML declarations, while processing instructions require manual addition to the tree structure. Understanding these document-level features ensures your generated XML conforms to specifications and integrates seamlessly with other XML tools and validators.

# Write with XML declaration
tree.write('output.xml', 
           encoding='utf-8', 
           xml_declaration=True,
           method='xml')

# Add processing instruction
from xml.etree.ElementTree import ProcessingInstruction

pi = ProcessingInstruction('xml-stylesheet', 
                           'type="text/xsl" href="style.xsl"')
root.insert(0, pi)

# Create document with namespace
root = Element('catalog')
root.set('xmlns', 'http://example.com/catalog')
root.set('xmlns:book', 'http://example.com/book')

# Register namespace for cleaner output
ET.register_namespace('', 'http://example.com/catalog')
ET.register_namespace('book', 'http://example.com/book')

Generating XML from Data Structures

Real-world applications often need to transform Python data structures—dictionaries, lists, database query results, or API responses—into XML format. Creating reusable conversion functions that handle nested structures, type conversion, and attribute mapping streamlines this process, enabling declarative data-to-XML transformations. These functions become building blocks for larger systems that bridge relational databases, NoSQL stores, or in-memory objects with XML-based interfaces.

def dict_to_xml(tag, data):
    """Convert dictionary to XML element recursively."""
    element = Element(tag)
    
    for key, value in data.items():
        if key.startswith('@'):
            # Attribute (keys starting with @)
            element.set(key[1:], str(value))
        elif isinstance(value, dict):
            # Nested dictionary
            child = dict_to_xml(key, value)
            element.append(child)
        elif isinstance(value, list):
            # List of items
            for item in value:
                if isinstance(item, dict):
                    child = dict_to_xml(key, item)
                    element.append(child)
                else:
                    child = SubElement(element, key)
                    child.text = str(item)
        else:
            # Simple value
            child = SubElement(element, key)
            child.text = str(value)
    
    return element

# Example usage
data = {
    '@version': '1.0',
    'book': [
        {
            '@id': 'bk101',
            'title': 'XML Guide',
            'author': 'John Smith',
            'price': {
                '@currency': 'USD',
                '#text': '44.95'
            }
        },
        {
            '@id': 'bk102',
            'title': 'Python Handbook',
            'author': 'Jane Doe',
            'price': {
                '@currency': 'USD',
                '#text': '39.95'
            }
        }
    ]
}

catalog = dict_to_xml('catalog', data)
tree = ElementTree(catalog)
tree.write('generated.xml', encoding='utf-8', xml_declaration=True)

Performance Optimization Techniques

XML processing performance becomes critical when handling large files, processing high-volume streams, or operating under strict latency requirements. ElementTree loads entire documents into memory, which can become problematic for multi-gigabyte files or memory-constrained environments. Understanding performance characteristics, memory usage patterns, and optimization strategies enables you to build efficient XML processing pipelines that scale from small configuration files to enterprise data feeds without sacrificing reliability or maintainability.

Streaming Large Files with iterparse

The iterparse() function provides event-driven parsing that processes XML documents incrementally, loading only small portions into memory at any given time. This streaming approach enables processing arbitrarily large files with constant memory usage, making it essential for log files, data exports, or any scenario where file size exceeds available RAM. By processing elements as they're encountered and clearing them after use, you can handle gigabyte-scale documents on modest hardware.

import xml.etree.ElementTree as ET

def process_large_file(filename):
    """Process large XML file with constant memory usage."""
    context = ET.iterparse(filename, events=('start', 'end'))
    context = iter(context)
    
    # Get root element
    event, root = next(context)
    
    book_count = 0
    total_price = 0.0
    
    for event, elem in context:
        if event == 'end' and elem.tag == 'book':
            # Process book element
            price = elem.find('price')
            if price is not None:
                total_price += float(price.text)
            book_count += 1
            
            # Clear element to free memory
            elem.clear()
            root.clear()
    
    avg_price = total_price / book_count if book_count > 0 else 0
    print(f"Processed {book_count} books, average price: ${avg_price:.2f}")

# Process file without loading entire document
process_large_file('large_catalog.xml')

# Extract specific elements efficiently
def extract_titles(filename):
    """Extract all titles without loading full document."""
    titles = []
    for event, elem in ET.iterparse(filename, events=('end',)):
        if elem.tag == 'title':
            titles.append(elem.text)
            elem.clear()
    return titles

Minimizing Memory Usage

Memory consumption during XML processing stems from two primary sources: the parsed tree structure itself and Python object overhead for each element. For read-only operations, clearing processed elements after extraction prevents memory accumulation during iteration. For write operations, building documents incrementally and writing in chunks rather than accumulating large structures in memory reduces peak usage. These techniques prove essential when processing XML in memory-constrained environments like containers, embedded systems, or serverless functions.

# Memory-efficient batch processing
def process_in_batches(filename, batch_size=1000):
    """Process XML in batches to control memory usage."""
    batch = []
    
    for event, elem in ET.iterparse(filename, events=('end',)):
        if elem.tag == 'book':
            book_data = {
                'title': elem.find('title').text,
                'author': elem.find('author').text,
                'price': float(elem.find('price').text)
            }
            batch.append(book_data)
            
            # Process batch when full
            if len(batch) >= batch_size:
                process_batch(batch)
                batch.clear()
            
            elem.clear()
    
    # Process remaining items
    if batch:
        process_batch(batch)

def process_batch(books):
    """Process batch of book data."""
    # Insert into database, write to file, etc.
    pass

# Incremental document building
def build_large_document(data_iterator):
    """Build large XML document incrementally."""
    root = Element('catalog')
    tree = ElementTree(root)
    
    # Write header
    with open('output.xml', 'wb') as f:
        f.write(b'\n')
        f.write(b'\n')
        
        # Write elements incrementally
        for item in data_iterator:
            book = Element('book')
            # ... populate book element ...
            
            # Write element and clear from memory
            f.write(ET.tostring(book, encoding='utf-8'))
            f.write(b'\n')
        
        f.write(b'')

"Performance optimization begins with choosing the right tool—iterparse for large files, standard parsing for small documents, and lxml when you need maximum speed with complex operations."

Benchmarking and Profiling

Optimization without measurement leads to premature optimization and wasted effort. Profiling XML processing code identifies actual bottlenecks—parsing speed, memory allocation, string operations, or I/O overhead—enabling targeted improvements where they matter most. Using Python's built-in profiling tools alongside memory profilers reveals performance characteristics and guides optimization decisions based on data rather than assumptions.

import time
import xml.etree.ElementTree as ET
from memory_profiler import profile

@profile
def parse_standard(filename):
    """Standard parsing approach."""
    tree = ET.parse(filename)
    root = tree.getroot()
    books = root.findall('.//book')
    return len(books)

@profile
def parse_streaming(filename):
    """Streaming parsing approach."""
    count = 0
    for event, elem in ET.iterparse(filename, events=('end',)):
        if elem.tag == 'book':
            count += 1
            elem.clear()
    return count

# Benchmark different approaches
def benchmark():
    """Compare parsing methods."""
    filename = 'large_catalog.xml'
    
    # Standard parsing
    start = time.time()
    count1 = parse_standard(filename)
    time1 = time.time() - start
    
    # Streaming parsing
    start = time.time()
    count2 = parse_streaming(filename)
    time2 = time.time() - start
    
    print(f"Standard: {count1} books in {time1:.2f}s")
    print(f"Streaming: {count2} books in {time2:.2f}s")
    print(f"Speedup: {time1/time2:.2f}x")

if __name__ == '__main__':
    benchmark()

Error Handling and Validation

Robust XML processing requires comprehensive error handling to gracefully manage malformed documents, missing elements, unexpected structures, and encoding issues. Production systems processing XML from external sources must assume that inputs may be invalid, incomplete, or malicious, implementing defensive parsing strategies that fail gracefully while providing actionable error messages. Combining ElementTree's exception handling with validation logic creates resilient parsers that maintain system stability even when confronted with problematic data.

Common Exceptions and Recovery Strategies

ElementTree raises several exception types during parsing and manipulation operations, each indicating specific failure modes requiring different recovery strategies. ParseError signals malformed XML syntax, FileNotFoundError indicates missing files, and AttributeError suggests accessing None elements. Implementing try-except blocks around parsing operations with specific exception handlers enables graceful degradation, logging for debugging, and user-friendly error messages that guide resolution.

import xml.etree.ElementTree as ET
from xml.etree.ElementTree import ParseError
import logging

def safe_parse(filename):
    """Parse XML file with comprehensive error handling."""
    try:
        tree = ET.parse(filename)
        return tree.getroot()
    except FileNotFoundError:
        logging.error(f"XML file not found: {filename}")
        return None
    except ParseError as e:
        logging.error(f"XML parsing error: {e}")
        return None
    except PermissionError:
        logging.error(f"Permission denied reading file: {filename}")
        return None
    except Exception as e:
        logging.error(f"Unexpected error parsing XML: {e}")
        return None

def safe_extract(element, path, default=None):
    """Safely extract element with fallback."""
    try:
        found = element.find(path)
        return found.text if found is not None else default
    except AttributeError:
        return default

# Usage with error handling
root = safe_parse('data.xml')
if root is not None:
    for book in root.findall('book'):
        title = safe_extract(book, 'title', 'Unknown Title')
        author = safe_extract(book, 'author', 'Unknown Author')
        
        # Safe numeric conversion
        try:
            price = float(safe_extract(book, 'price', '0.0'))
        except ValueError:
            price = 0.0
            logging.warning(f"Invalid price for book: {title}")
        
        print(f"{title} by {author}: ${price:.2f}")

Schema Validation

While ElementTree focuses on parsing and manipulation rather than validation, ensuring XML documents conform to expected schemas prevents downstream processing errors. For rigorous validation against XML Schema (XSD) or DTD definitions, the lxml library provides comprehensive validation capabilities. Implementing validation as a separate step before processing enables early failure detection, clearer error messages, and separation of concerns between structural validation and business logic.

# Basic structure validation
def validate_book_structure(book_element):
    """Validate book element has required children."""
    required_fields = ['title', 'author', 'price']
    missing_fields = []
    
    for field in required_fields:
        if book_element.find(field) is None:
            missing_fields.append(field)
    
    if missing_fields:
        return False, f"Missing required fields: {', '.join(missing_fields)}"
    
    return True, "Valid"

def validate_catalog(root):
    """Validate entire catalog structure."""
    if root.tag != 'catalog':
        return False, "Root element must be 'catalog'"
    
    books = root.findall('book')
    if not books:
        return False, "Catalog must contain at least one book"
    
    for i, book in enumerate(books):
        valid, message = validate_book_structure(book)
        if not valid:
            return False, f"Book {i+1}: {message}"
    
    return True, "Catalog is valid"

# Validate before processing
root = ET.parse('catalog.xml').getroot()
valid, message = validate_catalog(root)

if valid:
    # Process valid catalog
    process_catalog(root)
else:
    logging.error(f"Validation failed: {message}")

# For XSD validation (requires lxml)
try:
    from lxml import etree
    
    def validate_against_xsd(xml_file, xsd_file):
        """Validate XML against XSD schema."""
        try:
            schema = etree.XMLSchema(file=xsd_file)
            doc = etree.parse(xml_file)
            return schema.validate(doc), schema.error_log
        except Exception as e:
            return False, str(e)
except ImportError:
    logging.warning("lxml not available, XSD validation disabled")

"Error handling is not defensive programming—it's responsible programming that acknowledges the reality of imperfect inputs and unreliable systems."

Real-World Application Examples

Theoretical knowledge transforms into practical skill through application to real-world scenarios. The following examples demonstrate complete workflows for common XML processing tasks, combining parsing, extraction, transformation, and generation techniques into cohesive solutions. These patterns serve as templates for your own applications, adaptable to specific requirements while maintaining best practices for error handling, performance, and maintainability.

📄 Processing Configuration Files

Configuration files represent one of the most common XML use cases, storing application settings, database connections, feature flags, and environment-specific parameters. Building a configuration parser that loads XML settings into Python objects enables centralized configuration management with type safety, validation, and easy access throughout your application.

import xml.etree.ElementTree as ET
from dataclasses import dataclass
from typing import List, Optional

@dataclass
class DatabaseConfig:
    host: str
    port: int
    database: str
    username: str
    password: str
    pool_size: int = 10

@dataclass
class AppConfig:
    app_name: str
    debug: bool
    database: DatabaseConfig
    features: List[str]

def parse_config(filename):
    """Parse application configuration from XML."""
    tree = ET.parse(filename)
    root = tree.getroot()
    
    # Parse database configuration
    db_elem = root.find('database')
    database = DatabaseConfig(
        host=db_elem.find('host').text,
        port=int(db_elem.find('port').text),
        database=db_elem.find('database').text,
        username=db_elem.find('username').text,
        password=db_elem.find('password').text,
        pool_size=int(db_elem.find('pool_size').text or '10')
    )
    
    # Parse application settings
    app_elem = root.find('application')
    app_name = app_elem.find('name').text
    debug = app_elem.find('debug').text.lower() == 'true'
    
    # Parse feature flags
    features = [
        feature.text 
        for feature in root.findall('.//features/feature')
        if feature.get('enabled', 'true') == 'true'
    ]
    
    return AppConfig(
        app_name=app_name,
        debug=debug,
        database=database,
        features=features
    )

# Load and use configuration
config = parse_config('config.xml')
print(f"Connecting to {config.database.host}:{config.database.port}")
print(f"Enabled features: {', '.join(config.features)}")

🔄 Converting Between XML and JSON

Modern applications often need to bridge XML and JSON formats, converting between them for API integration, data migration, or format standardization. Building bidirectional converters requires careful handling of structural differences—XML attributes versus JSON properties, element order preservation, and type conversion—while maintaining data fidelity through the transformation process.

import xml.etree.ElementTree as ET
import json

def xml_to_dict(element):
    """Convert XML element to dictionary recursively."""
    result = {}
    
    # Add attributes
    if element.attrib:
        result['@attributes'] = element.attrib
    
    # Add text content
    if element.text and element.text.strip():
        result['#text'] = element.text.strip()
    
    # Add children
    children = {}
    for child in element:
        child_data = xml_to_dict(child)
        
        if child.tag in children:
            # Multiple children with same tag -> list
            if not isinstance(children[child.tag], list):
                children[child.tag] = [children[child.tag]]
            children[child.tag].append(child_data)
        else:
            children[child.tag] = child_data
    
    result.update(children)
    
    return result

def dict_to_xml(tag, data):
    """Convert dictionary to XML element recursively."""
    element = ET.Element(tag)
    
    for key, value in data.items():
        if key == '@attributes':
            # Restore attributes
            for attr_name, attr_value in value.items():
                element.set(attr_name, str(attr_value))
        elif key == '#text':
            # Restore text content
            element.text = str(value)
        elif isinstance(value, list):
            # Create multiple child elements
            for item in value:
                child = dict_to_xml(key, item)
                element.append(child)
        elif isinstance(value, dict):
            # Create nested element
            child = dict_to_xml(key, value)
            element.append(child)
        else:
            # Create simple child element
            child = ET.SubElement(element, key)
            child.text = str(value)
    
    return element

# Convert XML to JSON
root = ET.parse('data.xml').getroot()
data_dict = {root.tag: xml_to_dict(root)}
json_output = json.dumps(data_dict, indent=2)
print(json_output)

# Convert JSON back to XML
data = json.loads(json_output)
root_tag = list(data.keys())[0]
xml_root = dict_to_xml(root_tag, data[root_tag])
tree = ET.ElementTree(xml_root)
tree.write('output.xml', encoding='utf-8', xml_declaration=True)

📊 Generating Reports from Database Queries

Exporting database query results to XML format enables data exchange with external systems, archival storage, or report generation. Building a generic database-to-XML converter that handles various query structures, data types, and formatting requirements creates a reusable component for multiple reporting scenarios.

import xml.etree.ElementTree as ET
from datetime import datetime
from decimal import Decimal

def query_to_xml(query_results, root_tag='data', row_tag='row'):
    """Convert database query results to XML."""
    root = ET.Element(root_tag)
    root.set('generated', datetime.now().isoformat())
    root.set('count', str(len(query_results)))
    
    for row in query_results:
        row_elem = ET.SubElement(root, row_tag)
        
        for column, value in row.items():
            col_elem = ET.SubElement(row_elem, column)
            
            # Handle different data types
            if value is None:
                col_elem.set('null', 'true')
            elif isinstance(value, datetime):
                col_elem.text = value.isoformat()
                col_elem.set('type', 'datetime')
            elif isinstance(value, Decimal):
                col_elem.text = str(value)
                col_elem.set('type', 'decimal')
            elif isinstance(value, bool):
                col_elem.text = str(value).lower()
                col_elem.set('type', 'boolean')
            elif isinstance(value, (int, float)):
                col_elem.text = str(value)
                col_elem.set('type', 'number')
            else:
                col_elem.text = str(value)
    
    return root

# Example: Export sales data
sales_data = [
    {
        'order_id': 1001,
        'customer': 'John Smith',
        'amount': Decimal('299.99'),
        'order_date': datetime(2024, 1, 15),
        'shipped': True
    },
    {
        'order_id': 1002,
        'customer': 'Jane Doe',
        'amount': Decimal('149.50'),
        'order_date': datetime(2024, 1, 16),
        'shipped': False
    }
]

root = query_to_xml(sales_data, 'sales_report', 'order')
tree = ET.ElementTree(root)
tree.write('sales_report.xml', encoding='utf-8', xml_declaration=True)

🌐 Consuming SOAP Web Services

SOAP web services communicate using XML messages, requiring parsing of SOAP envelopes, extracting response data, and handling faults. Building a lightweight SOAP client demonstrates practical XML processing in API integration scenarios, combining HTTP requests with XML parsing and generation.

import xml.etree.ElementTree as ET
import requests

class SOAPClient:
    """Simple SOAP web service client."""
    
    def __init__(self, endpoint_url):
        self.endpoint_url = endpoint_url
        self.namespaces = {
            'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
            'ns': 'http://example.com/service'
        }
    
    def create_envelope(self, method, params):
        """Create SOAP envelope."""
        envelope = ET.Element('{http://schemas.xmlsoap.org/soap/envelope/}Envelope')
        body = ET.SubElement(envelope, '{http://schemas.xmlsoap.org/soap/envelope/}Body')
        
        method_elem = ET.SubElement(body, f'{{http://example.com/service}}{method}')
        
        for key, value in params.items():
            param = ET.SubElement(method_elem, key)
            param.text = str(value)
        
        return envelope
    
    def call(self, method, params):
        """Call SOAP method."""
        envelope = self.create_envelope(method, params)
        
        headers = {
            'Content-Type': 'text/xml; charset=utf-8',
            'SOAPAction': f'http://example.com/service/{method}'
        }
        
        response = requests.post(
            self.endpoint_url,
            data=ET.tostring(envelope, encoding='utf-8'),
            headers=headers
        )
        
        if response.status_code == 200:
            return self.parse_response(response.content)
        else:
            raise Exception(f"SOAP call failed: {response.status_code}")
    
    def parse_response(self, xml_content):
        """Parse SOAP response."""
        root = ET.fromstring(xml_content)
        
        # Check for SOAP fault
        fault = root.find('.//soap:Fault', self.namespaces)
        if fault is not None:
            fault_string = fault.find('faultstring').text
            raise Exception(f"SOAP Fault: {fault_string}")
        
        # Extract response data
        body = root.find('.//soap:Body', self.namespaces)
        return body

# Use SOAP client
client = SOAPClient('http://example.com/service')
result = client.call('GetBookInfo', {'isbn': '978-0-123456-78-9'})

🔍 Log File Analysis

Analyzing XML-formatted log files requires efficient streaming processing to handle large files, extraction of specific events, aggregation of metrics, and identification of patterns or anomalies. This scenario combines iterparse for memory efficiency with data extraction and analysis logic.

import xml.etree.ElementTree as ET
from collections import defaultdict
from datetime import datetime

def analyze_xml_logs(filename):
    """Analyze XML log file and generate statistics."""
    stats = {
        'total_events': 0,
        'events_by_level': defaultdict(int),
        'events_by_source': defaultdict(int),
        'errors': [],
        'time_range': {'start': None, 'end': None}
    }
    
    for event, elem in ET.iterparse(filename, events=('end',)):
        if elem.tag == 'event':
            stats['total_events'] += 1
            
            # Count by level
            level = elem.get('level', 'INFO')
            stats['events_by_level'][level] += 1
            
            # Count by source
            source = elem.find('source')
            if source is not None:
                stats['events_by_source'][source.text] += 1
            
            # Collect errors
            if level == 'ERROR':
                message = elem.find('message')
                timestamp = elem.find('timestamp')
                stats['errors'].append({
                    'timestamp': timestamp.text if timestamp is not None else 'Unknown',
                    'message': message.text if message is not None else 'No message'
                })
            
            # Track time range
            timestamp = elem.find('timestamp')
            if timestamp is not None:
                try:
                    ts = datetime.fromisoformat(timestamp.text)
                    if stats['time_range']['start'] is None or ts < stats['time_range']['start']:
                        stats['time_range']['start'] = ts
                    if stats['time_range']['end'] is None or ts > stats['time_range']['end']:
                        stats['time_range']['end'] = ts
                except ValueError:
                    pass
            
            # Clear element to free memory
            elem.clear()
    
    return stats

# Analyze logs
stats = analyze_xml_logs('application.log.xml')
print(f"Total events: {stats['total_events']}")
print(f"Error count: {stats['events_by_level']['ERROR']}")
print(f"Time range: {stats['time_range']['start']} to {stats['time_range']['end']}")

"Real-world XML processing is rarely about parsing alone—it's about transforming, validating, integrating, and extracting value from structured data in production environments."

Alternative Libraries and When to Use Them

While ElementTree provides an excellent balance of simplicity, performance, and functionality for most XML processing tasks, certain scenarios benefit from alternative libraries with specialized capabilities. Understanding the strengths and limitations of each option enables informed technology choices based on specific requirements rather than familiarity or convention. The Python XML ecosystem offers several mature libraries, each optimized for different use cases and performance characteristics.

lxml: Power and Performance

The lxml library wraps the libxml2 and libxslt C libraries, providing the fastest XML processing in Python along with comprehensive XPath 2.0, XSLT, XML Schema validation, and RelaxNG support. For applications requiring maximum performance, advanced querying capabilities, or strict schema validation, lxml represents the optimal choice despite its external dependency and steeper learning curve. The library maintains API compatibility with ElementTree, making migration straightforward when additional capabilities become necessary.

# lxml offers enhanced capabilities
from lxml import etree

# Advanced XPath queries
root = etree.parse('data.xml').getroot()
expensive_books = root.xpath('//book[price > 30]/title/text()')

# XML Schema validation
schema = etree.XMLSchema(file='schema.xsd')
valid = schema.validate(root)

# XSLT transformation
xslt = etree.parse('transform.xsl')
transform = etree.XSLT(xslt)
result = transform(root)

# Pretty printing built-in
print(etree.tostring(root, pretty_print=True, encoding='unicode'))

minidom: DOM API Compatibility

The xml.dom.minidom module implements the W3C Document Object Model API, providing compatibility with DOM-based code from other languages and platforms. While slower and more memory-intensive than ElementTree, minidom's familiar API benefits developers with DOM experience or applications requiring DOM-specific features like node types, document fragments, or attribute nodes as first-class objects.

SAX: Event-Driven Processing

Simple API for XML (SAX) provides event-driven parsing through callback handlers, offering the lowest memory footprint for sequential XML processing. SAX suits scenarios where you process documents linearly without needing random access or document modification, particularly when memory constraints prohibit tree-based parsing or when extracting small amounts of data from very large files.

✨ Use ElementTree for standard XML processing with good performance and no external dependencies
⚡ Use lxml when you need maximum performance, advanced XPath, XSLT, or schema validation
🔄 Use minidom for DOM API compatibility or when migrating DOM-based code from other languages
💾 Use SAX for extremely large files where memory constraints prohibit tree-based parsing
🛡️ Use defusedxml when parsing untrusted XML to prevent security vulnerabilities

What is the difference between parse() and fromstring() in ElementTree?

The parse() function reads XML from a file path or file-like object and returns an ElementTree object representing the complete document, while fromstring() parses XML from a string and returns an Element object representing just the root element. Use parse() when working with files and need the full tree structure for writing back to disk, and fromstring() when working with XML strings in memory from APIs or other sources.

How do I handle XML namespaces in ElementTree?

ElementTree requires namespace-aware element access using either Clark notation ({namespace}localname) or a namespace dictionary passed to find/findall methods. Create a namespace dictionary mapping prefixes to URIs, then use it in queries: namespaces = {'ns': 'http://example.com'}; element.find('ns:tag', namespaces). For cleaner output when creating XML, register namespaces using ET.register_namespace() before building the tree.

Can ElementTree handle very large XML files?

Standard ElementTree parsing loads entire documents into memory, which becomes problematic for multi-gigabyte files. Use the iterparse() function for streaming processing of large files, which processes elements incrementally with constant memory usage. Clear processed elements using elem.clear() after extraction to free memory, enabling processing of arbitrarily large files without memory exhaustion.

How do I pretty-print XML output from ElementTree?

ElementTree's write() method doesn't include built-in pretty-printing. For formatted output, use xml.dom.minidom.parseString() to reparse the XML and call toprettyxml(), or switch to lxml which includes pretty_print=True in its tostring() function. Alternatively, use the indent() function added in Python 3.9 which modifies the tree in-place to add whitespace for indentation.

Is ElementTree safe for parsing untrusted XML?

ElementTree disables external entity processing by default, providing basic protection against XXE attacks, but doesn't protect against all XML vulnerabilities like billion laughs attacks. For parsing untrusted XML, use the defusedxml library which wraps ElementTree with additional safeguards, or validate input size and complexity before parsing to prevent resource exhaustion attacks.

How do I modify XML and write it back to a file?

Parse the XML using parse() to get an ElementTree object, modify elements using standard methods (element.text = 'new value', element.set('attr', 'value'), element.append(child)), then call tree.write('output.xml', encoding='utf-8', xml_declaration=True) to save changes. Remember that modifications happen in-memory until explicitly written, allowing you to make multiple changes before persisting to disk.