1 Description
This post describes several methods for accessing the fields in Xmerl records:
- Access through the Elixir Record module.
- Access using functions whose source code is generated from the Xmerl records.
- Access using functions that are created from the Xmerl record definitions using Elixir metaprogramming.
- Access by converting Xmerl record instances to Elixir Structs.
The various Xmerl record types (xmlElement, xmlAttribute, etc) are available as very regular Elixir data structures. Once you become just a little bit familiar with those record definitions, accessing individual fields through any of the methods listed above and explained below is quite easy. This post attempts to help you get started doing that.
Regardless of the strategy that you choose to use, you will want to familiarize yourself with the Xmerl record definitions. You can find the Xmerl record definitions themselves in your Erlang source code. In your Erlang source code distribution, look in this file: otp_src_22.2/lib/xmerl/include/xmerl.hrl.
And, in all these strategies, you will need to convert the XML instance document that you want to process into Xmerl tuples. I suggest that you use SweetXml for that:
You can learn about SweetXml here: https://hexdocs.pm/sweet_xml/SweetXml.html#content.
And you can add it to your mix project by adding the following in the deps section of your mix.exs:
defp deps do [ {:sweet_xml, ">0.0.0"}, ] end
There is a tutorial here: https://www.youtube.com/watch?v=T-3SPxK1RBo
Here is an example of converting an XML document:
$ cd my_mix_project $ iex -S mix Erlang/OTP 22 [erts-10.6] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:1] [hipe] Interactive Elixir (1.10.0) - press Ctrl+C to exit (type h() ENTER for help) iex> rec = File.stream!("path/to/my/xml/doc.xml") |> SweetXml.parse
2 Access through the Elixir Record module
Given that you have this module in your project:
defmodule XmerlRecs do @moduledoc """ Define Xmerl records using record definitions extracted from Erlang Xmerl. """ require Record Record.defrecord(:xmlElement, Record.extract(:xmlElement, from_lib: "xmerl/include/xmerl.hrl")) Record.defrecord(:xmlText, Record.extract(:xmlText, from_lib: "xmerl/include/xmerl.hrl")) Record.defrecord(:xmlAttribute, Record.extract(:xmlAttribute, from_lib: "xmerl/include/xmerl.hrl")) Record.defrecord(:xmlNamespace, Record.extract(:xmlNamespace, from_lib: "xmerl/include/xmerl.hrl")) Record.defrecord(:xmlComment, Record.extract(:xmlComment, from_lib: "xmerl/include/xmerl.hrl")) end
You can write something like the following:
defmodule Test do def demo1 do element = File.stream!("path/to/my/doc.xml") |> SweetXml.parse name = XmerlRecs.xmlElement(element, :name) IO.puts("element name: #{name}") XmerlRecs.xmlElement(element, :attributes) |> Enum.each(fn attr -> attrname = XmerlRecs.xmlAttribute(attr, :name) attrvalue = XmerlRecs.xmlAttribute(attr, :value) IO.puts(" attribute -- name: #{attrname} value: #{attrvalue}") end) XmerlRecs.xmlElement(element, :content) |> Enum.each(fn item -> case elem(item, 0) do :xmlText -> IO.puts(" text -- value: #{XmerlRecs.xmlText(item, :value)}") _ -> nil end end) end end
Notes:
We use SweetXml to parse an XML document. That returns nested tuples that represent an element. The element has a name, and list of attributes, and a list of (sub-)contents.
We use the following to access the fields in that element (tuple):
XmerlRecs.xmlElement(element, :name) XmerlRecs.xmlElement(element, :attributes) XmerlRecs.xmlElement(element, :content)
While handling the content, we match against the first element of each content tuple against the atom :xmlText to determine whether that item is a text item.
3 Generating source code for access functions
With a quite small amount of code, we can use the Xmerl tuple definitions to generate Elixir source code containing functions that access each field it the Xmerl records. For example:
defmodule Xml.Element do require XmerlRecs def get_name(item), do: XmerlRecs.xmlElement(item, :name) def get_expanded_name(item), do: XmerlRecs.xmlElement(item, :expanded_name) def get_nsinfo(item), do: XmerlRecs.xmlElement(item, :nsinfo) def get_namespace(item), do: XmerlRecs.xmlElement(item, :namespace) def get_parents(item), do: XmerlRecs.xmlElement(item, :parents) def get_pos(item), do: XmerlRecs.xmlElement(item, :pos) def get_attributes(item), do: XmerlRecs.xmlElement(item, :attributes) def get_content(item), do: XmerlRecs.xmlElement(item, :content) def get_language(item), do: XmerlRecs.xmlElement(item, :language) def get_xmlbase(item), do: XmerlRecs.xmlElement(item, :xmlbase) def get_elementdef(item), do: XmerlRecs.xmlElement(item, :elementdef) end defmodule Xml.Attribute do require XmerlRecs def get_name(item), do: XmerlRecs.xmlAttribute(item, :name) def get_expanded_name(item), do: XmerlRecs.xmlAttribute(item, :expanded_name) def get_nsinfo(item), do: XmerlRecs.xmlAttribute(item, :nsinfo) def get_namespace(item), do: XmerlRecs.xmlAttribute(item, :namespace) def get_parents(item), do: XmerlRecs.xmlAttribute(item, :parents) def get_pos(item), do: XmerlRecs.xmlAttribute(item, :pos) def get_language(item), do: XmerlRecs.xmlAttribute(item, :language) def get_value(item), do: XmerlRecs.xmlAttribute(item, :value) def get_normalized(item), do: XmerlRecs.xmlAttribute(item, :normalized) end
As you can see from the above code, these functions are simple wrappers around the access technique that we saw in the previous section.
You can find this code in the XmlElixirStructs repo at: https://github.com/dkuhlman/xmlelixirstructs.git. It's in lib/generate.ex, and is short enough so that I'll repeat it here:
defmodule GenerateFuncs do @moduledoc """ This module can be used to generate an Elixir (.ex) file containing Xmerl accessor functions. """ @type device :: atom | pid @type_names [:attribute, :comment, :element, :namespace, :text, ] @doc """ Write out source code for accessor functions for Xmerl records. **Caution:** This function is destructive. It will over-write an existing file without warning. ## Examples iex> GenerateFuncs.generate("path/to/output/file.ex") """ @spec generate(Path.t()) :: :ok def generate(path) when is_binary(path) do {:ok, dev} = File.open(path, [:write]) wrt = fn val -> IO.write(dev, val <> "\n") end generate(wrt) :ok end @spec generate(device) :: :ok def generate(wrt) do #wrt = &IO.putss/1 @type_names |> Enum.each(fn item -> name = to_string(item) cap_name = String.capitalize(name) #ident = to_atom("xml#{String.capitalize(to_string(item))}") wrt.("defmodule Xml.#{cap_name} do") wrt.(" require XmerlRecs") Record.extract( String.to_atom("xml#{cap_name}"), from_lib: "xmerl/include/xmerl.hrl") #|> IO.inspect(label: "fields") |> Enum.each(fn {field, _} -> wrt.(" def get_#{field}(item), do: XmerlRecs.xml#{cap_name}(item, :#{field})") end) wrt.("end\n") end) end end
And, here is our demo, rewritten to use those generated access functions:
defmodule Test do def demo2 do element = File.stream!("Data/test02.xml") |> SweetXml.parse name = Xml.Element.get_name(element) IO.puts("element name: #{name}") Xml.Element.get_attributes(element) |> Enum.each(fn attr -> attrname = XmerlRecs.xmlAttribute(attr, :name) attrvalue = XmerlRecs.xmlAttribute(attr, :value) IO.puts(" attribute -- name: #{attrname} value: #{attrvalue}") end) Xml.Element.get_content(element) |> Enum.each(fn item -> case elem(item, 0) do :xmlText -> IO.puts(" text -- value: #{Xml.Text.get_value(item)}") _ -> nil end end) end end
4 Using Elixir metaprogramming to generate access functions
Now, let's try to use Elixir metaprogramming to produce functions similar to those described in the previous section.
Here is some code. You can find this in lib/xmlmetaprogramming.ex in the XmlElixirStructs repository at https://github.com/dkuhlman/xmlelixirstructs.git:
defmodule XmerlAccess do @moduledoc """ Use Elixir meta-programming to generate test and accessor functions. For each Xmerl record type generate the following: - A test function, e.g. `is_element/1`, `is_attribute/1`, etc. - A set of assessor functions, one for each field, e.g. `get_element_name/1`, `get_element_attributes/1`, ..., `get_attribute_name/1`, etc. """ require XmerlRecs @record_types ["element", "attribute", "text", "namespace", "comment"] @record_types |> Enum.each(fn record_type_str -> record_type_string = "xml#{String.capitalize(record_type_str)}" record_type_atom = String.to_atom(record_type_string) is_method_name_str = "is_#{record_type_str}" #1 is_method_name_atom = String.to_atom(is_method_name_str) is_method_body_str = """ #2 if is_tuple(item) and tuple_size(item) > 0 do case elem(item, 0) do :#{record_type_string} -> true _ -> false end else false end """ {:ok, is_method_body_ast} = Code.string_to_quoted(is_method_body_str) #3 def unquote(is_method_name_atom) (item) do #4 unquote(is_method_body_ast) end Record.extract(record_type_atom, from_lib: "xmerl/include/xmerl.hrl") |> Enum.each(fn {field_name_atom, _} -> #5 method_name_str = "get_#{record_type_str}_#{to_string(field_name_atom)}" method_name_atom = String.to_atom(method_name_str) method_body_str = "XmerlRecs.#{to_string(record_type_atom)}(item, :#{to_string(field_name_atom)})" {:ok, method_body_ast} = Code.string_to_quoted(method_body_str) def unquote(method_name_atom)(item) do unquote(method_body_ast) end end) end) end
Notes -- These notes correspond to the comment numbers in the above code:
- For each XML record type (e.g. for each xmlElement, xmlAttribute, xmlText, etc) we create a string to represent the name of the function that we want to define. Then, we convert that string to an atom.
- We create a string containing the body of the function that we want to define.
- We convert the string that represents the code in the body of the function into an AST (abstract syntax tree) by calling Code.string_to_quoted/1.
- We define the function. Note that because def is a macro and because macros return an AST, the def will result in inserting the AST into our module. That defines a function.
- We do something similar to the above for each field in that XML type (Xmerl record), which defines our "getter" functions (e.g. get_element_name/1, get_element_attributes/1, get_attribute_name/1, etc.).
Given that you have included the above in your Elixir mix project, you can write code like this:
defmodule Test do def test1() do rec = File.stream!("path/to/my/doc.xml") |> SweetXml.parse if XmerlAccess.is_element(rec) do IO.puts("element -- name: #{XmerlAccess.get_element_name(rec)}") XmerlAccess.get_element_attributes(rec) |> Enum.each(fn attr -> attrname = XmerlAccess.get_attribute_name(attr) attrvalue = XmerlAccess.get_attribute_value(attr) IO.puts(" attribute -- name: #{attrname} value: #{attrvalue}") end) end end end
You can get some help by typing h XmerlAccess:
iex> h XmerlAccess XmerlAccess Use Elixir meta-programming to generate test and accessor functions. For each Xmerl record type generate the following: • A test function, e.g. is_element/1, is_attribute/1, etc. • A set of assessor functions, one for each field, e.g. get_element_name/1, get_element_attributes/1, ..., get_attribute_name/1, etc.
5 Access by converting Xmerl record instances to Elixir Structs
This strategy is described in a previous post, which is here: http://reifywork.com/xml-elixir-structs.html.
You can find the code here: https://github.com/dkuhlman/xmlelixirstructs
You can add this code to your Elixir Mix project by adding the following to the dependencies in your mix.exs file:
defp deps do [ # {:dep_from_hexpm, "~> 0.3.0"}, # {:dep_from_git, git: "https://github.com/elixir-lang/my_dep.git", tag: "0.1.0"} {:xmlelixirstructs, github: "dkuhlman/xmlelixirstructs"}, ] end
6 Ideas for further exploration
Since the Xmerl data structures are so regular and are available to Elixir code, why not use Elixir metaprogramming to generate the field access functions described above? That is an idea for a future post, I hope. -- Done. See above: Using Elixir metaprogramming to generate access functions.