Parse XML documents with Python

What is XML?

XML stands for eXtensible Markup Language.

XML is a markup language much like HTML.

XML was designed to store and transport small/medium data

About XML

XML works just like HTML, are start tags and end tags.

Tags indicate the beginning and ending of elements.

Attributes are key-value pairs on the start tag.

In XML it doesn’t matter extra spaces you put in.

    <name>Product name</name>
    <brand>Calvin Klein</brand>
    <color>Black Velvet</color>
    <size available="L, XL, XXL"/>

The Difference Between XML and HTML

XML was designed to carry data, with focus on what data is


HTML was designed to display data, with focus on how data looks

The XML language has no predefined tags, while HTML tags are.

What is Serialization / Deserialization

Serialization / Deserialization is the process of converting the data into a common format that can be stored/transmitted between systems in an independent matter.

What is XML schema?

An XML schema is used to define the structure of an XML document, describing what a given XML document can contain.

XML Validation is the way of verifying that the data is in the right format.

An XML schema is often used to specify a contract between systems. The contract itself is an XML document.

XSD Restrictions

Restrictions on Values

For an integer value

age element have to be integer.

<xs:element name="age">
    <xs:restriction base="xs:integer">

Restrictions on a Set of Values

To limit the content of an XML element to a set of acceptable values, use the enumeration constraint.
The example below defines an element called “color” with a restriction. The only acceptable values are: red, blue, white:

<xs:element name="color">
    <xs:restriction base="xs:string">
      <xs:enumeration value="red"/>
      <xs:enumeration value="blue"/>
      <xs:enumeration value="white"/>


If you were building an XML Schema and wanted to limit the values allowed in an xs:string field to only those in a particular list, what XML tag would you use in your XML Schema definition?


Python enables you to parse and modify the XML document. In order to parse the XML document, you need to have the entire XML document in memory.

How to use ElementTree to print nicely formatted XML documents?

fromstring() method parses XML from a string directly into an Element, which is the root element of the parsed tree.

findall() method – Find all matching subelements by tag name or path.

Parse XML with xml.etree.ElementTree Example

import xml.etree.ElementTree as et
data = '''<products>
      <name>Women's Sleeveless Round Neck Fit and Flare Dress</name>
      <brand>Calvin Klein</brand>
      <color>Black Velvet</color>
      <size available="L, XL, XXL"/>
      <name>Womens Summer Dresses Soft Loose</name>
      <size available="S, M, L, XL"/>

products_tree = et.fromstring(data)
#products_tree is an Object, a tree of information

#Find all products, Get a list of objects, each object containing a product
products = products_tree.findall('product')
for product in products:
  #Get the name
  #Get the color
  #Get available sizes

Name: Women's Sleeveless Round Neck Fit and Flare Dress
Color: Black Velvet
Sizes: L, XL, XXL

Name: Womens Summer Dresses Soft Loose
Color: Red
Sizes: S, M, L, XL

