This document provides help and links about various data representation formats.
The are roughly two kinds of data representations that we will consider: (1) Those used to write documentation and to format readable content, for example, Asciidoc/Asciidoctor, reST/Docutils, and Markdown. (2) Those used to encode data, usually intended for machine processing, for example, XML, Yaml, JSON, and CSV. We’ll discuss each of these below. We’ll also discuss several binary storage media, in particular, HDF5 and Sqlite3.
1. Asciidoc / Asciidoctor
From the Asciidoctor Web site — "Asciidoctor is a fast, open source text processor and publishing toolchain for converting AsciiDoc content to HTML5, DocBook, PDF, and other formats."
More information: https://asciidoctor.org/
Installing Asciidoctor — There are instructions here https://asciidoctor.org/docs/install-toolchain/. Or, do the following:
-
First, make sure that you have a reasonably up-to-date version of Ruby installed on your machine. E.g. do:
$ ruby --version
. -
Then install with
gem
:$ gem install asciidoctor
Or, on Linux, install with your favorite package manager. For example:
$ apt-get install asciidoctor # or, alternatively ... $ aptitude install asciidoctor
Then, read the instructions here https://asciidoctor.org/docs/#get-started-with-asciidoctor.
Interesting features provided by Asciidoctor:
-
There are converters that produce each of PDF, EPUB3, and LaTeX. See Supplemental Converters.
-
There is a built-in backend that produces Docbook. Use
$ asciidoctor -b docbook5 mydoc.txt
.
Good to know:
-
Built-in attributes help you customize the look of the document you generate. See Built-in Attributes.
-
Some built-in attributes can be set either (1) in the document header or from the command line. See the link above to determine which built-in attributes can only be set in the document header. To set attributes on the command line, use the
-a
command line option, which can be repeated. These examples customize the TOC (table of contents):$ asciidoctor -a toc mydoc.txt $ asciidoctor -a toc=left -a toclevels=4 -a sectnums mydoc.txt $ asciidoctor -a toc=left -a toclevels=4 -a sectnums -a toc-title="The TOC" mydoc.txt
-
For help with displaying and customizing a table of contents, see Table of Contents.
-
For help with writing Asciidoctor content see Quick Reference and Writer’s Guide and other guides and manuals at Asciidoctor Docs.
1.1. Transforming asciidoc
to reST
We can convert an Asciidoc/Asciidoctor document to reST
(reStructuredText) with help from pandoc
.
We use asciidoctor
to produce Docbook content, then use pandoc
to convert Docbook to Asciidoctor content.
Notes:
-
On Linux,
pandoc
can be installed with theaptitude
or theapt-get
package manager. -
pandoc
is also available in the Anaconda distribution of Python.
The following shell scripts show how to convert an Asciidoctor document to reST
:
#!/bin/sh
asciidoctor -b docbook5 -o - $1 | pandoc -f docbook -t rst -o $2
Or, to write to stdout
:
#!/bin/sh
asciidoctor -b docbook5 -o - $1 | pandoc -f docbook -t rst
2. reStructuredText — Docutils
Information is here: https://docutils.sourceforge.io/.
reStructuredText
or reST
is, like Asciidoc, a lightweight markup language.
reStructuredText
and Docutils are analogous to Asciidoc and Asciidoctor.
There are significant similarities and differences in the text source format.
And, the tool chains are different; they are two separate implementations.
You can compare the source text formats here:
reStructuredText
and Asciidoc.
Installation:
-
It’s available at the Python Package Index: https://pypi.org/project/docutils/#description. Either download and install from source, or install with
pip
:$ pip install docutils
. -
Or, install from source — Get the source here: Docutils source code. Then, follow the instructions in the
README
file.
There is plenty of documentation to help you get started writing reStructuredText and using Docutils at the Documentation Overview.
3. Markdown
For information, see:
There are various implementations. See https://www.w3.org/community/markdown/wiki/MarkdownImplementations.
4. HTML
You can also write documentation directly in HTML. Of course other tools, several of which are discussed in this document, can make this task less excruciating. And, another strategy is to use a WYSIWYG editor that enable you to generate HTML, e.g. LibreOffice/LibreWriter.
5. XML
5.1. Python
Python has several tools for processing XML, for example:
-
ElementTree
—ElementTree
is in the standard Python Library. See https://docs.python.org/3/library/xml.etree.elementtree.html. Here is a sample code snippet:[ins] In [12]: import xml.etree.ElementTree as etree [ins] In [13]: doc = etree.parse('data_representation.xml') [ins] In [14]: rootnode = doc.getroot() [ins] In [15]: print('root tag:', rootnode.tag) root tag: {http://docbook.org/ns/docbook}article
-
Lxml
—Lxml
attempts to implement theElementTree
API. It also has extensions to that API (each element object in the element tree has added methods) and additional capabilities, for example,xpath
searches and XSLT (eXtensible Stylesheet Language Transforms). See https://lxml.de/index.html. Here is a sample code snippet:[ins] In [23]: from lxml import etree [ins] In [24]: doc = etree.parse('data_representation.xml') [ins] In [25]: root = doc.getroot() [ins] In [26]: tag = root.tag [ins] In [27]: print('root tag:', tag) root tag: {http://docbook.org/ns/docbook}article [ins] In [28]: [ins] In [28]: nsmap = {'db': root.nsmap[None]} [ins] In [29]: nodes = root.xpath('//db:section', namespaces=nsmap) [ins] In [30]: nodes Out[30]: [<Element {http://docbook.org/ns/docbook}section at 0x7f0864116230>, <Element {http://docbook.org/ns/docbook}section at 0x7f085f259f00>, <Element {http://docbook.org/ns/docbook}section at 0x7f085f1b5b90>, <Element {http://docbook.org/ns/docbook}section at 0x7f085f1b5320>, <Element {http://docbook.org/ns/docbook}section at 0x7f085f1b5140>, <Element {http://docbook.org/ns/docbook}section at 0x7f085f1b5af0>, <Element {http://docbook.org/ns/docbook}section at 0x7f085f1b58c0>, <Element {http://docbook.org/ns/docbook}section at 0x7f085f1b5280>, <Element {http://docbook.org/ns/docbook}section at 0x7f085f1b5a50>, <Element {http://docbook.org/ns/docbook}section at 0x7f085f30ac30>, <Element {http://docbook.org/ns/docbook}section at 0x7f08640b4a00>, <Element {http://docbook.org/ns/docbook}section at 0x7f08640b4870>, <Element {http://docbook.org/ns/docbook}section at 0x7f08640b4b90>, <Element {http://docbook.org/ns/docbook}section at 0x7f08640b41e0>, <Element {http://docbook.org/ns/docbook}section at 0x7f08640b44b0>] [ins] In [31]: node1 = nodes[0] [ins] In [32]: node1.attrib Out[32]: {'{http://www.w3.org/XML/1998/namespace}id': '_asciidoc_asciidoctor'}
5.2. Elixir
Consider using SweetXml.
It’s built on top of the Xmerl
XML implementation in Erlang.
I’ve written several Blog posts on processing XML with Elixir that you might find helpful:
6. Yaml
6.1. Python
Information is here:
-
See this about the deprecation of the
yaml.load
function: https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation
Here is an example that loads Python data objects from a Yaml file:
[ins] In [20]: with open('Data/data06.yaml', 'r') as infile:
...: data = yaml.full_load(infile)
...:
[ins] In [21]: pprint.pprint(data)
['test 1',
{'name1': 'value1', 'name2': 'value2', 'name3': 'value3'},
{'name4': 'value4', 'name5': 'value5', 'name6': 'value6'},
[11, 22, 33],
[44, 55, {'name7': 'value7', 'name8': 'value8'}, 66]]
And, here is an example that writes (dumps) Python data structures to a Yaml file:
[ins] In [25]: with open("junk01.yaml", 'w') as outfile:
...: yaml.dump(data, outfile)
...:
6.2. Elixir
I’ve found several Yaml implementations for Elixir.
6.2.1. fast_yaml
Learn about it here: https://github.com/processone/fast_yaml
Add the following to your mix.exs
:
defp deps do
[
{:fast_yaml, "~> 1.0.24"},
]
end
These examples decode/parse data from (1) a file and (2) a string:
iex> {:ok, data1} = :fast_yaml.decode_from_file("Data/data06.yaml")
{:ok,
[
[
"test 1",
[{"name1", "value1"}, {"name2", "value2"}, {"name3", "value3"}],
[{"name4", "value4"}, {"name5", "value5"}, {"name6", "value6"}],
[11, 22, 33],
[44, 55, [{"name7", "value7"}, {"name8", "value8"}], 66]
]
]}
iex> {:ok, data2} = :fast_yaml.decode(content)
{:ok,
[
[
"test 1",
[{"name1", "value1"}, {"name2", "value2"}, {"name3", "value3"}],
[{"name4", "value4"}, {"name5", "value5"}, {"name6", "value6"}],
[11, 22, 33],
[44, 55, [{"name7", "value7"}, {"name8", "value8"}], 66]
]
]}
Encoding an Elixir data structure to a string — Since fast_yaml
is an Erlang library, it produces char lists.
But, you can convert those to Elixir strings.
Example:
iex> data3encoded = :fast_yaml.encode(data3)
iex> data3encoded_as_strings = IO.chardata_to_string(data3encoded)
To write the encoded string to a file, try something like this:
iex> data3encoded = :fast_yaml.encode(data3)
iex> File.write!("junk.yaml", data3encoded)
6.2.2. yaml-elixir
Learn about it here: https://github.com/KamilLelonek/yaml-elixir
Add the following to your mix.exs
:
defp deps do
[
{:yaml_elixir, "~> 2.4.0"},
]
end
Parsing Yaml data from a file — Use either of the following:
iex> data = YamlElixir.read_from_file!("Data/data06.yaml")
[
"test 1",
%{"name1" => "value1", "name2" => "value2", "name3" => "value3"},
%{"name4" => "value4", "name5" => "value5", "name6" => "value6"},
[11, 22, 33],
[44, 55, %{"name7" => "value7", "name8" => "value8"}, 66]
]
iex> {:ok, data} = YamlElixir.read_from_file("Data/data06.yaml")
{:ok,
[
"test 1",
%{"name1" => "value1", "name2" => "value2", "name3" => "value3"},
%{"name4" => "value4", "name5" => "value5", "name6" => "value6"},
[11, 22, 33],
[44, 55, %{"name7" => "value7", "name8" => "value8"}, 66]
]}
You can also read and parse data from a string. An example:
iex> IO.puts(content)
- "test 1"
-
"name1": "value1"
"name2": "value2"
"name3": "value3"
-
"name4": "value4"
"name5": "value5"
"name6": "value6"
-
- 11
- 22
- 33
-
- 44
- 55
-
"name7": "value7"
"name8": "value8"
- 66
:ok
iex> data = YamlElixir.read_from_string!(content)
[
"test 1",
%{"name1" => "value1", "name2" => "value2", "name3" => "value3"},
%{"name4" => "value4", "name5" => "value5", "name6" => "value6"},
[11, 22, 33],
[44, 55, %{"name7" => "value7", "name8" => "value8"}, 66]
]
Apparently, yaml-elixir
does not provide the ability to convert an Elixir data structure to a string.
7. JSON
7.1. Python
JSON support is in the Python standard library. See json — JSON encoder and decoder.
Here is an example of its use:
[ins] In [6]: import json
[ins] In [7]: mymap = {"name": "lemon", "color": "yellow", "sizes": [12, 14, 16]}
[ins] In [8]: data = json.dumps(mymap)
[ins] In [9]: data
Out[9]: '{"name": "lemon", "color": "yellow", "sizes": [12, 14, 16]}'
[ins] In [10]: json.loads(data)
Out[10]: {'name': 'lemon', 'color': 'yellow', 'sizes': [12, 14, 16]}
7.2. Elixir
There are several Elixir modules for JSON.
7.2.1. Jason
You can add it to your Mix
dependencies in mix.exs
. For example:
defp deps do
[
{:jason, ">0.0.0"}
]
end
Here is an example of its use:
iex> mymap = %{name: "lemon", color: "yellow", sizes: [12, 14, 16]}
%{color: "yellow", name: "lemon", sizes: [12, 14, 16]}
iex> {:ok, data} = Jason.encode(mymap)
{:ok, "{\"color\":\"yellow\",\"name\":\"lemon\",\"sizes\":[12,14,16]}"}
iex> Jason.decode(data)
{:ok, %{"color" => "yellow", "name" => "lemon", "sizes" => [12, 14, 16]}}
7.2.2. JSON
Information is here:
Add the following to your mix.exs
:
defp deps do
[
{:json, "~> 1.3"},
]
end
Here is an example of it’s use:
iex> mymap = %{name: "lemon", color: "yellow", sizes: [12, 14, 16]}
%{color: "yellow", name: "lemon", sizes: [12, 14, 16]}
iex> {:ok, data} = JSON.encode(mymap)
{:ok, "{\"color\":\"yellow\",\"name\":\"lemon\",\"sizes\":[12,14,16]}"}
iex> JSON.decode(data)
{:ok, %{"color" => "yellow", "name" => "lemon", "sizes" => [12, 14, 16]}}
8. CSV
8.1. Python
More information:
CSV support is in the Python standard library.
An example:
[ins] In [1]: import csv
[ins] In [2]: infile = open('tmp10.csv', 'r')
[ins] In [3]: reader = csv.reader(infile)
[ins] In [4]: rows = list(reader)
[ins] In [5]: rows
Out[5]:
[['Device_Software_Image_Version', '11.12.20'],
['Device_Product', 'Catalyst'],
['Product_B_version', '1.1.2.1'],
...
['Device_hw_type', 'virtual appliance']]
8.2. Elixir
Information is here:
Insert this in your project’s mix.exs
file:
defp deps do
[
{:csv, "~> 2.3.1"},
]
end
And, here are two example functions that use this CSV module — One writes out a few lines to a CSV file.. The other reads lines in from a CSV file, splits each line into fields, and displays them.
defmodule Test24.CSV do
@csvdata1 [
~w(dog cat bird),
~w(tomato radish squash),
~w(lemon orange tangerine),
~w(poppy sunflower phacelia),
]
@doc """
Write a few lines to a CSV file.
Options:
* :force -- If true, overwrite existing file.
## Examples
iex> Test24.CSV.write_csv_file("data01.csv", force: true)
:ok
iex> Test24.CSV.write_csv_file("data01.csv")
:ok
iex> Test24.CSV.write_csv_file("data01.csv")
{:error, "file data01.csv exists"}
"""
@spec write_csv_file(String.t(), Keyword.t()) :: :ok | {:error, String.t()}
def write_csv_file(out_file_path, opts \\ []) do
if Keyword.get(opts, :force) != true and File.exists?(out_file_path) do
{:error, "file #{out_file_path} exists"}
else
data1 = @csvdata1
out_file = File.open!(out_file_path, [:write])
data1
|> CSV.encode
|> Enum.each(fn line ->
IO.write(out_file, line)
#IO.puts("line: #{line}")
end)
File.close(out_file)
:ok
end
end
@doc """
Read CSV lines from file. Parse into fields. Display them.
## Examples
iex> Test24.CSV.read_csv_file("data01.csv")
dog|cat|bird
tomato|radish|squash
lemon|orange|tangerine
poppy|sunflower|phacelia
:ok
iex> Test24.CSV.read_csv_file("missing_file.csv")
{:error, "file missing_file.csv not found"}
"""
@spec read_csv_file(String.t()) :: :ok | {:error, String.t()}
def read_csv_file(in_file_path) do
if not File.exists?(in_file_path) do
{:error, "file #{in_file_path} not found"}
else
in_file_path
#|> Path.expand(__DIR__)
|> File.stream!
|> CSV.decode!
|> Enum.each(fn line ->
#IO.puts(line)
#IO.inspect(line, label: "line")
IO.puts(Enum.join(line, "|"))
end)
:ok
end
end
end
9. HDF5
9.1. Python
More information:
Python has several packages that support HDF5 data files.
Two of them are h5py
and pytables
.
9.1.1. h5py
More information:
An example:
import h5py
def test():
infile = h5py.File('testdata05.hdf5', 'r')
print('infile keys:', infile.keys())
subgroup = infile['subgroup03']
print('subgroup keys:', subgroup.keys())
subgroup['dataset3-3']
dataset1 = subgroup['dataset3-3']
print('dataset1:', dataset1)
for row in dataset1:
print(row)
print('dataset1 contents:\n', dataset1[()])
infile.close()
test()
9.2. Elixir
Erlhdf5
— See https://github.com/RomanShestakov/erlhdf5.
10. Sqlite
10.1. Python
For relational tables stored in a file,
Python has the sqlite3
module. It’s in the standard library.
See https://docs.python.org/3/library/sqlite3.html.
sqlite3
implements the Python DB-API 2.0 specification.
See https://www.python.org/dev/peps/pep-0249/.
Here is a small amount of sample code:
[ins] In [10]: import sqlite3
[ins] In [11]: con = sqlite3.connect('tmp01.db')
[ins] In [12]:
[ins] In [12]: cursor = con.execute('select * from samples')
[ins] In [13]: for row in cursor:
...: print(row)
...:
('carrot', 25)
('tomato', 35)
('radish', 15)
10.2. Elixir
Elixir has a module giving support for Sqlite.
There is documentation here:
Put this in your mix.exs
file:
defp deps do
[
{:sqlite, "1.1.0"},
]
end
Examples:
-
Here is a sample function that uses Sqlite to show the rows in a table in an Sqlite file/database:
@doc """ Open connection to DB file and show rows in table. ## Examples iex> Test24.show_rows("test01.db", "samples") Rows from Sqlite DB test01.db: --------------------------------------------- name: carrot amount: 25 name: tomato amount: 35 name: radish amount: 15 :ok iex> Test24.show_rows("test01.db", "samples", "size") Rows from Sqlite DB test01.db: --------------------------------------------- name: radish amount: 15 name: pepper amount: 20 name: carrot amount: 25 name: tomato amount: 35 name: arugula amount: 40 :ok """ @spec show_rows(String.t(), String.t(), String.t()) :: :ok def show_rows(db_file_name, table_name, order \\ "") do order_by = if order == "" do "" else " order by #{order}" end {:ok, connection} = Sqlite.open(db_file_name) IO.puts("Rows from Sqlite DB #{db_file_name}:") IO.puts("---------------------------------------------") query = "select * from #{table_name}#{order_by}" Sqlite.q(query, connection) |> Enum.each(fn {n, amt} -> IO.puts("name: #{n} amount: #{amt}") end) Sqlite.close(connection) :ok end
-
Here is a function that can add a row to a table in an Sqlite database:
@doc """ Add a row to a table in a database file. ## Examples iex> Test24.add_row("test01.db", "samples", "\"parsley\", \"60\"") :ok """ @spec add_row(String.t(), String.t(), String.t()) :: :ok def add_row(db_file_name, table_name, columns) do {:ok, connection} = Sqlite.open(db_file_name) query = "insert into #{table_name} values (#{columns})" Sqlite.q(query, connection) #|> IO.inspect(label: "result") Sqlite.close(connection) :ok end
11. Appendix A: Copyright and License[appendix]
Copyright © 2020 Dave Kuhlman. Free use of this documentation is granted under the terms of the MIT License. See https://opensource.org/licenses/MIT.
12. Appendix B: Document source
This document is written in the Asciidoc light-weight markup language format. It had been processed with Asciidoctor. The document source is here: source.