1 Introduction
I've been interested in structured (and semi-structured) data for a long time. Both Erlang and Elixir provide a variety of ways for representing structured data. This document summarizes them and provides a brief reference for using them.
2 Erlang data structures
2.1 Property lists
A property list is a list of 2-tuples. The equivalent data structure in Elixir is called a keyword or keyword list.
Examples:
1> Fruits = [{apr, "Apricot"}, {apl, "Apple"}, {grp, "Grape"}]. [{apr,"Apricot"},{apl,"Apple"},{grp,"Grape"}] 2> 2> l(keyword). {error,nofile} 4> proplists:get_value(grp, Fruits). "Grape" 5> proplists:get_value(apl, Fruits). "Apple" 6> proplists:get_value(aplx, Fruits). undefined 7> proplists:lookup(apl, Fruits). {apl,"Apple"} 8> proplists:get_value(aplx, Fruits, not_found). not_found
Notes:
- lookup/2 returns the tuple, whereas getvalue/2 returns only the value associated with a property.
- The proplists module ignores items that are not 2-tuples.
2.2 Records
Erlang records of a given type have a fixed number of fields with fixed field names.
Define a record with the -record module attribute. Examples:
-record(complexType, { name=nil, childDefs=[] }). -record(element, { name=nil, type=nil }). -record(simpleType, { name=nil, base=nil, constraints=nil }).
And, from .kerl/20.0/lib/xmerl-1.3.15/include/xmerl.hrl:
-record(xmlElement,{ name, % atom() expanded_name = [], % string() | {URI,Local} | {"xmlns",Local} nsinfo = [], % {Prefix, Local} | [] namespace=#xmlNamespace{}, parents = [], % [{atom(),integer()}] pos, % integer() attributes = [], % [#xmlAttribute()] content = [], language = "", % string() xmlbase="", % string() XML Base path, for relative URI:s elementdef=undeclared % atom(), one of [undeclared | prolog | external | element] }).
Notes:
- The default values for each field are optional.
Create an instance of a record -- Example:
Rec = #simpleType{name=Name, base=Base, constraints=Constraints},
Access a field in a record using dot notation -- Example:
Namespace = ComplexType#xmlElement.namespace, Nodes = Namespace#xmlNamespace.nodes,
Or, more compactly:
Nodes = ComplexType#xmlElement.namespace#xmlNamespace.nodes,
Access fields using pattern matching -- Example:
#xmlElement{namespace=NSpace} = Root,
Modify the value of a field in a record -- Example:
NewRoot = Root#xmlElement{name="newTag"}
2.3 Maps
Maps provide a key-value lookup data structure in Erlang. The maps module provides helper functions for maps.
Create a map -- Example:
33> Veggies =#{cucumber => "green and crunchy", tomato => "red and juicy"}. #{cucumber => "green and crunchy",tomato => "red and juicy"}
Get (retrieve) a value from a map -- Example:
34> maps:get(tomato, Veggies). "red and juicy"
Update a map -- Example:
37> Veggies1 = maps:update(cucumber, "good in salads", Veggies). #{cucumber => "good in salads",tomato => "red and juicy"}
Size -- Determine the number of pairs in a map -- Example:
41> map_size(Veggies1). 2 42> maps:size(Veggies1). 2
Process each of the items in a map -- Example:
43> Fun1 = fun (K, V, Acc) -> [{K, V * 2} | Acc] end. #Fun<erl_eval.18.99386804> 45> M1 = #{aa => 11, bb => 22, cc => 33}. #{aa => 11,bb => 22,cc => 33} 46> maps:fold(Fun1, [], M1). [{cc,66},{bb,44},{aa,22}]
Convert a map to a list of 2-tuples -- Example:
47> maps:to_list(M1). [{aa,11},{bb,22},{cc,33}]
Determine whether a key exists in a map -- Example:
48> maps:is_key(bb, M1). true 49> maps:is_key(xx, M1). false
Get all the keys or the values from a map -- Examples:
50> maps:keys(M1). [aa,bb,cc] 51> maps:values(M1). [11,22,33]
Filter (select) items based on a predicate (function) and create a new map -- Example:
52> Fun2 = fun (K, V) -> if V =< 22 -> true; true -> false end end. #Fun<erl_eval.12.99386804> 53> maps:filter(Fun2, M1). #{aa => 11,bb => 22}
Filter (select or reject) items based on a list of keys -- Examples:
54> maps:with([aa, cc], M1). #{aa => 11,cc => 33} 55> maps:without([aa, cc], M1). #{bb => 22}
Erlang is a functional language and does copying when you update complex data structures. However, apparently, Erlang (the Erlang Beam, actually) is very intelligent in doing this copying, and attempts to optimize with respect to space and time or both. For small maps, this likely does not matter. If your Erlang work involves large maps, you may want to read this thread on the "erlang-questions" Email list: http://erlang.org/pipermail/erlang-questions/2017-August/093155.html. And, another option for large maps (suggested in the above thread) is to use ETS. And, for persistence, use DETS.
3 Elixir data structures
Reminders:
- Erlang atoms begin with a lower case letter or are enclosed in single quotes. Erlang variables begin with an upper case letter.
- Elixir atoms begin with a colon or an upper case letter. Elixir variables begin with a lower case letter.
Additional help -- There is a very helpful set of Elixir examples, including ones that use Elixir data structures, here: https://elixir-examples.github.io/.
3.1 Tuples
Elixir tuples are quite like Erlang tuples.
Get the size and access individual elements -- Examples:
iex(2)> a = {11, 22, 33, 44} {11, 22, 33, 44} iex(3)> tuple_size(a) 4 iex(4)> elem(a, 0) 11 iex(5)> elem(a, 1) 22
There are helper functions in the Tuple module. See: https://hexdocs.pm/elixir/Tuple.html#content.
3.2 Keyword lists
Elixir keyword lists are lists of 2-tuples, where the first item of each 2-tuple is used as a key. Keyword lists are (roughly) the equivalent of Erlang maps. The Elixir Keyword module provides helper functions.
Note that the "keys" must be atoms.
You can create 2-tuple lists explicitly, or use Elixir's special syntax -- Examples:
iex(2)> veggies = [{:cucumber, "crunchy"}, {:tomato, "juicy"}] [cucumber: "crunchy", tomato: "juicy"] iex(3)> fruits = [peach: "suculent", watermelon: "yummy"] [peach: "suculent", watermelon: "yummy"]
Retrieve values and keys from a keyword list -- Examples:
iex(6)> Keyword.fetch(fruits, :watermelon) {:ok, "yummy"} iex(7)> Keyword.fetch(fruits, :watermelonxxx) :error iex(8)> Keyword.get(fruits, :watermelon) "yummy" iex(9)> Keyword.get(fruits, :watermelonxxx) nil iex(10)> Keyword.keys(fruits) [:peach, :watermelon] iex(11)> Keyword.values(fruits) ["suculent", "yummy"]
For helper functions and more information, see:
3.3 Records
Elixir has no direct equivalent of Erlang records. (1) If you want something reasonably similar to Erlang records in Elixir, use structures. (2) If you have code written in Erlang and it contains Erlang records that you want to use from Elixir, then look at the Elixir Record module, and consider the following notes.
Remember that underneath, records are just tuples. So, you can, at the least, just work with them as you would Elixir tuples. In particular, you can use:
- Pattern matching.
- tuple_size/1 -- Get the number of items in a tuple.
- elem/2 -- Get a specific item from a tuple by index.
Examples:
iex(20)> a = {:aa, :bb, :cc} {:aa, :bb, :cc} iex(21)> tuple_size(a) 3 iex(22)> elem(a, 2) :cc
There is discussion here: https://stackoverflow.com/questions/28891758/elixir-how-to-convert-a-map-struct-to-a-record-struct
And, there is an example here: https://github.com/elixir-lang/elixir/blob/master/lib/elixir/lib/file/stat.ex
3.4 Maps
Maps are the standard key-value store in Elixir.
Create a map with the "%{}" syntax, or with the helper function from the Map module. Examples:
iex(6)> a = %{} %{} iex(7)> b = %{:aa => 11, :bb => 22, :cc => 33} %{aa: 11, bb: 22, cc: 33} iex(10)> c = Map.new() %{} iex(12)> d = Map.new([{:aa, 11}, {:bb, 22}]) %{aa: 11, bb: 22}
If all the keys for a map are atoms, then you can use the following more convenient syntax for creating a map:
iex(25)> e = %{aa: 11, bb: 22, cc: 33} %{aa: 11, bb: 22, cc: 33}
Access items in a map with the indexing syntax -- Examples:
iex(9)> b[:bb] 22
Or, with helper functions from Map -- Examples:
iex(14)> Map.fetch(b, :bb) {:ok, 22} iex(15)> Map.get(b, :bb) 22
And, if the keys are atoms, then you can use dot notation to access items in a map. Example:
iex(20)> b.bb 22
Determine whether a map contains a specified key -- Examples:
iex(20)> Map.has_key?(b, :cc) true iex(21)> Map.has_key?(b, :ccxxyy) false
Get a list of keys or values from a map -- Examples:
iex(17)> Map.keys(b) [:aa, :bb, :cc] iex(18)> Map.values(b) [11, 22, 33]
For helper functions and more information on maps, see:
3.5 Structures
According to the Elixir documentation: structs are extensions built on top of maps that provide compile-time checks and default values."
In particular: (1) structs are a data structure with named fields. (2) Restrictions are enforced on the field names used (which prevents errors).
Define a struct. Examples:
iex(40)> defmodule RGB do ...(40)> defstruct red: 0, green: 0, blue: 0 ...(40)> end
Notes:
- The structure takes on the name of the module in which it is defined.
Create an instance of a struct. Examples:
iex(44)> a = %RGB{} %RGB{blue: 0, green: 0, red: 0} iex(45)> red = %RGB{red: 255} %RGB{blue: 0, green: 0, red: 255} iex(46)> green = %RGB{green: 255} %RGB{blue: 0, green: 255, red: 0} iex(47)> blue = %RGB{blue: 255} %RGB{blue: 255, green: 0, red: 0}
Note that if we attempt to create an instance of a struct using a key that in not in that struct's definition, we get an error. Example:
iex(49)> stuff = %RGB{junk: 255} ** (KeyError) key :junk not found in: %RGB{blue: 0, green: 0, red: 0}
A struct is a map that has a special key :__struct__ (whose value is the name of the struct):
iex(48)> is_map(blue) true iex(49)> blue %RGB{blue: 255, green: 0, red: 0} iex(50)> blue.__struct__ RGB
We can access values in a struct with dot notation (as with maps). Examples:
iex(51)> blue.blue 255 iex(52)> blue.red 0
Update a struct -- Examples:
iex(57)> red_green = %RGB{red: 200, green: 200} %RGB{blue: 0, green: 200, red: 200} iex(58)> red_green_alt = %{red_green | red: 100, green: 100} %RGB{blue: 0, green: 100, red: 100} iex(59)> red_green %RGB{blue: 0, green: 200, red: 200} iex(60)> red_green_alt %RGB{blue: 0, green: 100, red: 100}
We can also use pattern matching to access fields. Examples:
iex(65)> %RGB{red: red_amount} = red_green %RGB{blue: 0, green: 200, red: 200} iex(66)> red_amount 200
We can define a struct without giving default values to some fields, in which case the default value is nil. And, we can require that some fields are given a value when an instance is created. Examples:
iex(68)> defmodule Color do ...(68)> @enforce_keys [:desc] ...(68)> defstruct [:desc, :red, :green, :blue] ...(68)> end %Color{blue: nil, desc: nil, green: nil, red: nil}} iex(707> color1 = %Color{} ** (ArgumentError) the following keys must also be given when building struct Color: [:desc] iex(70)> color1 = %Color{desc: "color_no_1"} %Color{blue: nil, desc: "color_no_1", green: nil, red: nil} iex(71)> color2 = %Color{desc: "color_no_1", red: 0, green: 0, blue: 0} %Color{blue: 0, desc: "color_no_1", green: 0, red: 0}
3.6 Enumerables and streams
For information on enumerables, streams, etc. see:
- https://hexdocs.pm/elixir/Enum.html
- https://hexdocs.pm/elixir/Stream.html
- https://hexdocs.pm/elixir/Enumerable.html
- https://hexdocs.pm/elixir/List.html
Functions in the Enum module are eager: they consume the items in a enumerable immediately. Functions in the Stream module are "lazy": they produce an object (a stream) that is capable of producing items when demanded. Example:
iex(11)> f1 = fn(x) -> x + 3 end #Function<6.99386804/1 in :erl_eval.expr/5> iex(12)> a = Stream.iterate(1, f1) #Function<61.34375656/2 in Stream.unfold/2> iex(13)> Enum.take(a, 10) [1, 4, 7, 10, 13, 16, 19, 22, 25, 28]
Notes:
- Stream.iterate/2 produces a stream.
- Enum.take/2 creates a list that has the first n items from an enumerable.
A few suggestions:
You can "force" evaluation of a Stream by using Enum.each/2.
You can convert a Stream into a list with Enum.to_list/1, but be wary that the stream has a reasonable length.
You can "pipe" an enumerable from a function that produces it to a function that consumes it with the pipe operator "|>". The pipe operator takes an enumerable or a function that produces an enumerable on its left and a function that takes an enumerable as its first argument on its right. Examples:
iex(22)> f1 = fn(x) -> x + 10 end #Function<6.99386804/1 in :erl_eval.expr/5> iex(23)> Stream.iterate(1, f1) |> Enum.take(8) [1, 11, 21, 31, 41, 51, 61, 71] iex(31)> b = Stream.iterate(10, f1) #Function<61.34375656/2 in Stream.unfold/2> iex(32)> b |> Enum.take(9) [10, 20, 30, 40, 50, 60, 70, 80, 90]
Look in the Enum, List, and Stream modules for more help with enumerables.
4 Advanced data structures
As in most programming languages, we can construct more complex data structures out of simpler one, for example, lists containing tuples containing maps containing ... Elixir offers another and more advanced way: DSL (domain specific languages). This sections explores and gives a simple, first level explanation of that facility.
Other things to explore W.R.T. data structures:
- Ecto -- https://hexdocs.pm/ecto/
- quote, unquote, the Macro module -- https://elixir-lang.org/getting-started/meta/quote-and-unquote.html
- Erlang and Elixir support for Docutils/reStructuredText, Asciidoc, and Markdown.
- Text analysis and parsing ad hoc text data.