Algebraix Technology Core Library¶
algebraixlib
is a library that provides constructs and facilities to harness the fundamentals
of data algebra. You find us:
- On PyPI. Here you find the
pip
installer. Install aspip install algebraixlib
(system-wide) orpip install algebraixlib --user <username>
(for a single user) or download the package. - On GitHub. Here you find “Getting Started” quick-start instructions, the full sources and can use the bug tracker.
- On Algebraix Data’s GitHub project page. Here you find support information.
Also check out our tutorials and example code in the examples directory on GitHub.
API Documentation¶
Algebra Documentation¶
A Beginner’s Introduction to Data Algebra¶
The purpose of this section is to provide a brief introduction to data algebra. In particular, we would like to motivate the various definitions and terms in a way that is not daunting to beginners, and shows the usefulness of the data algebra framework. Note that this introduction will make use of set notation, of which there is a short primer at Set Notation if needed.
We will do this by taking a dataset and analyze it using data algebra. Consider the following table of information:
year | providedScientificName | ITISscientificName | ITIScommonName | ITIStsn | validAcceptedITIStsn | decimalLatitude | decimalLongitude |
---|---|---|---|---|---|---|---|
1970 | Micrurus tener Baird & Girard, 1853 | Micrurus tener | Texas Coralsnake | 683040 | 683040 | 30.43099976 | -98.05999756 |
2008 | Masticophis taeniatus Hallowell, 1852 | Masticophis taeniatus | Culebra-chirriadora adornada;Striped Whipsnake | 174240 | 174240 | 30.43399048 | -97.96090698 |
1951 | Kinosternon flavescens Agassiz, 1857 | Kinosternon flavescens | Tortuga-pecho quebrado amarilla;Yellow Mud Turtle | 173766 | 173766 | 30.43678093 | -97.66889191 |
1951 | Acris crepitans Baird, 1854 | Acris crepitans | Northern Cricket Frog;Rana-grillo norte | 173520 | 173520 | 30.43678093 | -97.66889191 |
2011 | Parus bicolor Linnaeus, 1766 | Baeolophus bicolor | Carbonero cresta negra;Tufted Titmouse | 178738 | 554138 | 30.4736805 | -97.96916962 |
Table Sightings is taken from BISON (Biodiversity Information Serving Our Nation) [1], a publicly available dataset from the US Geological Survey, and consists of animal sightings in Travis County, TX.
Couplets: The Basic Pieces of Data¶
Every data entry, or datum, is represented in data algebra using couplets. The idea being any piece of data consists of two pieces of information. For example in table Sightings 2011 refers to a year, and 30.7436805 is a decimalLatitude. In data algebra notation we write these as
and
The object before the arrow is called the left component, and the object after the arrow is called the right component. For example the couplet \(year{\mapsto}2011\) has left component \(year\) and right right component \(2011\).
Relations: Sets of Couplets¶
Each row of Sightings is a set of couplets, which we call a relation. For example, the first row of Sightings, let us denote it by \(R_1\), is the relation
While there are other ways of forming relations from the table, for our purposes we will use rows to form relations. One reason we will do this is that each row relation is in fact a function in this case. It is often the case that row relations are functional.
Getting Data from a Relation¶
One of our primary methods of extracting information from a dataset is composition. Let us say we want to know the \(ITIScommonName\) of the first row of Sightings. What we can do is compose \(R_1\) with the relation
Just like function composition, the output of the first relation becomes the input for the next relation. In this case, our first relation has only one output, or right component, which corresponds to only one input, or left component, in \(R_1\), hence
which tells us that \(ITIScommonName\) for the first row is \(Texas\ Coralsnake\). (Note that, just like with functions, compositions are evaluated from right to left. In particular, given relation composition \(r_2 \circ r_1\) we would apply \(r_1\) first, and then \(r_2\).)
Clans: Sets of Relations¶
A set of relations is called a clan. In particular, any table can be divided up into a set of row relations, which means any table can be represented by a clan. We will refer to the table Sightings as a clan whose relations are the row relations. Once again, we can use composition to extract data out of our clan.
Getting Data from a Clan¶
For example, if we want the projection (in terms of relational algebra) of Sightings over \(ITISscientificName\) and \(ITIScommonName\), we can form the relation
Let us use \(\mathbb{S}\) to denote the Sightings clan. If we use \(R_k\) to denote the \(k\)th row of Sightings, then
Note that
and \(R_k \circ D\) will give you the \(ITISscientificName\) and \(ITIScommonName\) in the \(k\)th row of Sightings. In particular we have
which is the projection of Sightings onto \(ITISscientificName\) and \(ITIScommonName\) as we wanted.
[1] | BISON can be accessed at http://bison.usgs.ornl.gov To obtain the data in the table, on the map click on Texas, then on Travis county, and one can then download all of the wildlife sightings recorded for Travis county. The above table is only a small subset of the many sightings in Travis county. |
Algebra Reference¶
- absolute
Applied to couplets, sets, constructs derived from them (like relations) and algebras. Such a construct is called absolute if its ground set is based on set A (\(A\)).
See for example absolute set.
- A relation that has only members that are elements of the Cartesian product \(A \times A\) is an ‘absolute relation’. Example: \(\{a{\mapsto}1, b{\mapsto}2\}\). (However, an absolute relation is not an absolute set; the members of the relation are couplets, not atoms.)
- The relation \(\{a{\mapsto}1, b{\mapsto}\{2\}\}\) is not an absolute relation, as one of the right elements of the member couplets is not an atom (it is a set \(\{2\}\)).
- absolute clan
- An absolute clan is a clan that is an element of the second power set of the Cartesian product of set A (\(A\)) (\(P^2(A \times A)\)). Such a clan has only atoms as members of the couplets in the relations. Example: \(\{ \{2{\mapsto}1, 3{\mapsto}2\}, \{5{\mapsto}4, 9{\mapsto}7\} \}\).
- absolute couplet
- An absolute couplet is a couplet that is an element of the Cartesian product of set A (\(A\)) (\(A \times A\)). Such a couplet has only atoms as members. Example: \(2{\mapsto}1\).
- absolute ground set
Our algebras normally have a ground set and an absolute ground set. The absolute ground set is the ground set with the elements of the algebra being expressed in terms of set A (\(A\)).
For example, the ground set of the algebra of relations is \(P(M \times M)\); this allows sets and couplets as elements of the couplets that form the relations. The absolute ground set of the algebra of relations is \(P(A \times A)\); this requires atoms to form the couplets that form the relations.
- absolute multiclan
- An absolute multiclan is a multiclan that has only atoms as members of the couplets in the relations. See also absolute clan.
- absolute multiset
- An absolute multiset is a multiset that is an element of the power set of the Cartesian product of set A (\(A\)) with set N (\(N\)) (\(P(A \times N)\). Such a multiset has only atoms as elements. Example: \(\{1{:}3, 'a'{:}5\}\).
- absolute relation
- An absolute relation is a set that is an element of the power set of the Cartesian product of set A (\(A\)) (\(P(A \times A)\). Such a set has only couplets as elements that consist only of atoms. Example: \(\{2{\mapsto}1, 5{\mapsto}2\}\).
- absolute set
- An absolute set is a set that is an element of the power set of set A (\(A\)) (\(P(A)\). Such a set has only atoms as elements. Example: \(\{1, 2\}\).
- algebra
- An algebra is a set together with one or more operations defined on it. For a more detailed definition see Algebras.
- algebra of clans
- The algebra of clans is the algebra on sets of relations. As a result its ground set is the second power set of couplets, which allows us to use more operations in addition to those on an algebra of sets. See Algebra of Clans for a more detailed explanation.
- algebra of couplets
- Given a collection of couplets we define an algebra on it by including the operations of composition and transposition. See Algebra of Couplets for a more detailed definition.
- algebra of multiclans
- An algebra defined on multisets of relations. This algebra is similar to the algebra of clans, but takes into account multiplicities of relations. See Algebra of Multiclans for more information.
- algebra of multisets
- An algebra on the set of all multisets of set M (\(M\)) with the operations of multiset union, multiset intersection, multiset addition, multiset difference, and submultiset. See Multiset Algebra for more detailed information.
- algebra of relations
- The algebra of relations is an algebra on a set of couplets under the operations of composition and transposition. See Algebra of Relations for a more detailed definition.
- algebra of sets
- Also called set algebra, is the algebra formed by taking the power set of a set and applying the operations of union and intersection. See Algebra of Sets for a more detailed definition.
- atom
- An atom is a datum that is not a set or a couplet. The set of all atoms is set A (\(A\)).
- bijective
- A relation is bijective if it is both left-functional and right-functional.
- binary extension
A binary extension is an extension of a binary operation from a given algebra to an extension of the algebra that consists of sets of the elements of the original algebra:
\[binaryExtn(op, S1, S2) := \{op(s1, s2)\ :\ s1 \in S1 \text{ and } s2 \in S2 \text{ where } op(s1, s2) \text{ is defined}\}\]- binary intersection
- A binary intersection is an intersection of two sets. See intersection for a complete definition.
- binary multi-extension
A binary multi-extension is an extension of a binary operation from a given algebra to an extension of the algebra that consists of multisets of the elements of the original algebra:
\[binMultiExtn(op, S1, S2) := \{op(s1, s2){:}(\dot{S1}(s1) \cdot \dot{S2}(s2))\ :\ s1 \in S1 \text{ and } s2 \in S2 \text{ where } op(s1, s2) \text{ is defined}\}\]- binary operation
- A binary operation is an operation with two arguments, typically with a result that belongs to the same ground set as the arguments (when the operation is a member of an algebra).
- binary relation
- We represent a binary relation as a set where every member is a couplet. We also call this simply a relation.
- binary union
- A binary union is a union of two sets. See union for more information.
- Cartesian product
- The Cartesian product of two sets \(X \times Y\) is the set of all couplets where the first member of the couplet is a member of \(X\) and the second member of the couplet is a member of \(Y\).
- clan
- A clan is a set of relations.
- clan diagonal
- A clan diagonal is a clan with a single relation that is a diagonal.
- complement
The complement of a given set is the collection of elements not in the given set. This definition depends on a choice of a larger set in which context every other set is a subset of. In particular, given an algebra of sets whose ground set is the power set \(P(U)\) and \(X\in P(U)\), then the complement of \(X\) is:
\[X' := \{x \in U: x \not\in X\} = U - X\]- composition
The composition of the couplets \(a{\mapsto}b\) and \(c{\mapsto}d\) is defined as:
\[\begin{split}c{\mapsto}d \circ a{\mapsto}b := \begin{cases} a{\mapsto}d & \text{if } b = c \\ \text{undefined} & \text{if } b \ne c \end{cases}\end{split}\]The operation may be extended to extended algebras using the binary extension and – if there is no danger of ambiguities – is then also called simply ‘composition’.
Specific extensions:
Algebra of relations: \(R_2 \circ R_1 := \{c_2 \circ c_1\ :\ c_1 \in R_1,\ c_2 \in R_2\}\)
(\(R_1\) and \(R_2\) are relations; \(c_1\) and \(c_2\) are couplets.)
Algebra of clans: \(C_2 \circ C_1 := \{R_2 \circ R_1\ :\ R_1 \in C_1,\ R_2 \in C_2\}\)
(\(C_1\) and \(C_2\) are clans; \(R_1\) and \(R_2\) are relations.)
Algebra of multiclans: \(\dot{C}_2 \circ \dot{C}_1 := \{(R_2 \circ R_1){:}(\dot{C}_2(R_2) \cdot \dot{C}_1(R_1))\ :\ R_1 \in \dot{C}_1,\ R_2 \in \dot{C}_2\}\)
(\(\dot{C}_1\) and \(\dot{C}_2\) are multiclans; \(R_1\) and \(R_2\) are relations.)
- couplet
- A couplet is an ordered pair, following the Kuratowski definition of an ordered pair defined as \(\{\{l\}, \{l, r\}\}\). It is the mathematical object used to represent a datum or data point. We denote it by \(l{\mapsto}r\), with \(l\) called the left component, and \(r\) the right component.
- cross-functional union
- A short name for cross-left-functional union in cases where no ambiguities are expected.
- cross-intersection
The cross-intersection is an extension of intersection (or multiset intersection). Depending on the specific form of extension, it may be a binary extension (when extending to an algebra of sets) or a binary multi-extension (when extending to an algebra of multisets).
In the specific context, cross-intersection is the binary extension of intersection from an algebra of sets to an algebra of sets of sets (for example, from the algebra of relations to the algebra of clans). The cross-intersection of the sets (of sets) \(\mathbb{S}\) and \(\mathbb{T}\) is defined as:
\[\mathbb{S} \blacktriangle \mathbb{T} = \{X \cap Y\ : X \in \mathbb{S} \text{ and } Y \in \mathbb{T}\}\]For the version of cross-intersection on the algebra of multiclans and on an algebra of multisets in general see multi-cross-intersection.
- cross-left-functional union
The cross-left-functional union, (or cross-functional union), of two clans \(\mathbb{C}\) and \(\mathbb{D}\) is a binary extension of the left-functional union from the algebra of relations to the algebra of clans:
\[\mathbb{C} \underset{f}{\blacktriangledown} \mathbb{D} = \{R \underset{f}{\cup} Q\ : R \in \mathbb{C} \text{ and } Q \in \mathbb{D}\}\]- cross-right-functional union
The cross-right-functional union of two clans \(\mathbb{C}\) and \(\mathbb{D}\) is a binary extension of the right-functional union from the algebra of relations to the algebra of clans:
\[\mathbb{C} \underset{rf}{\blacktriangledown} \mathbb{D} = \{R \underset{rf}{\cup} Q\ : R \in \mathbb{C} \text{ and } Q \in \mathbb{D}\}\]- cross-substriction
The cross-substriction is a binary extension of substriction from an algebra of sets to an algebra of sets of sets (for example, from the algebra of relations to the algebra of clans). The cross-substriction of the sets (of sets) \(\mathbb{S}\) and \(\mathbb{T}\) is defined as:
\[\mathbb{S} \blacktriangleleft \mathbb{T} = \{X \vartriangleleft Y\ : X \in \mathbb{S} \text{ and } Y \in \mathbb{T}\} = \{X : X \in \mathbb{S} \text{ and } X \subset Y \text{ for some } Y \in \mathbb{T} \}\]We also have a binary multi-extension of cross-substriction called the multi-cross-substriction.
- cross-superstriction
The cross-superstriction is a binary extension of superstriction from an algebra of sets to an algebra of sets of sets (for example, from the algebra of relations to the algebra of clans). The cross-superstriction of the sets (of sets) \(\mathbb{S}\) and \(\mathbb{T}\) is defined as:
\[\mathbb{S} \blacktriangleright \mathbb{T} = \{X \vartriangleright Y\ : X \in \mathbb{S} \text{ and } Y \in \mathbb{T}\} = \{X : X \in \mathbb{S} \text{ and } X \supset Y \text{ for some } Y \in \mathbb{T} \}\]We also have a binary multi-extension of cross-superstriction called the multi-cross-superstriction.
- cross-union
The cross-union is a binary extension of union from an algebra of sets to an algebra of sets of sets (for example, from the algebra of relations to the algebra of clans). The cross-union of the sets (of sets) \(\mathbb{S}\) and \(\mathbb{T}\) is defined as:
\[\mathbb{S} \blacktriangledown \mathbb{T} = \{X \cup Y\ : X \in \mathbb{S} \text{ and } Y \in \mathbb{T}\}\]We also have a binary multi-extension of cross-union called the multi-cross-union.
- diagonal
The diagonal is an equivalence relation. The diagonal of a set \(S\) is defined as:
\[D_S = \{x{\mapsto}x \in S \times S\ : x \in S\}\]- difference
The difference of two sets \(S\) and \(T\) is the set of elements in \(S\) but not in \(T\) (see also Wikipedia: Relative complement). The definition is:
\[S \setminus T = \{x: x \in S\ \&\ x \notin T\}\]- equivalence relation
- A relation is said to be an equivalence relation if it is reflexive, symmetric and transitive.
- extension
- An extension of a given set or algebra, is an extension of the set or algebra either by expanding the ground set, the set of operations on the set or both. For a more detailed definition, see Extension.
- finite intersection
- A finite intersection is the intersection of a finite collection of sets. See intersection for a complete definition.
- finite union
- A finite union is the union of a finite collection of sets. See union for a more detailed explanation.
- function
- A function is a left-functional relation.
- functional
- A short name for left-functional in cases where no ambiguities are expected.
- functional union
- A short name for left-functional union in cases where no ambiguities are expected.
- ground set
- The set that contains all the elements of an algebra.
- identity element
- The element of the ground set of an algebra that, when used as one of the arguments of a binary operation produces the other argument of the operation as result of the operation.
- intersection
An operation on sets that creates a set by collecting the elements in common to two or more individual sets into a new set. In mathematical terms, if \(\mathbb{S}\) is a collection of sets, then the intersection of all of the sets in \(\mathbb{S}\) is denoted:
\[\bigcap \mathbb{S} = \bigcap_{T\in\mathbb{S}}T,\]and is the set \(\{x\ : \forall T \in \mathbb{S},\ x \in T\}\). If \(\mathbb{S}\) consists of only two sets, the intersection is called a binary intersection. If \(\mathbb{S}\) consists of a finite collection of sets, the intersection is called a finite intersection. See also Wikipedia: Intersection.
- left
- A short name for left component where no ambiguity is expected.
- left component
- Given a couplet represented by \(l{\mapsto}r\), the component \(l\) is called the left component.
- left set
The left set of a relation \(R\) is the set of all left components of its members:
\[left(R) = \{ l\ : l{\mapsto}r \in R \}\]The left set of a clan \(\mathbb{C}\) is the union of all left sets of its member relations:
\[left(\mathbb{C}) = \underset{R\ \in\ \mathbb{C}}{\bigcup} left(R)\]- left-functional
A relation \(R\) is said to be left-functional, or simply functional, if:
\[x{\mapsto}y \in R\ \& \ x{\mapsto}z \in R \implies y = z\]A clan \(\mathbb{C}\) is said to be (left-)functional if all its relations are functional:
\[\forall R \in \mathbb{C}: R \text{ is left-functional}\]- left-functional union
The left-functional union (or functional union) of two functions \(R\) and \(Q\) is the union of the two relations if the result is left-functional, otherwise the result is not defined:
\[\begin{split}R \underset{f}{\cup} Q = \begin{cases} R \cup Q & \text{if }R \cup Q \text{ is left-functional} \\ \text{undefined} & \text{if it is not left-functional} \end{cases}\end{split}\]- left-regular
A clan \(\mathbb{C}\) is said to be left-regular (or short regular) if it is left-functional and the left sets of all its relations are the same:
\[\begin{split}\begin{align*} \forall R \in \mathbb{C}&: R \text{ is left-functional} \text{ and } \\ \forall R, Q \in \mathbb{C}&: left(R) = left(Q) \end{align*}\end{split}\]- lhs-cross-functional union
- A short name for lhs-cross-left-functional union in cases where no ambiguities are expected.
- lhs-cross-left-functional union
Given two clans the lhs-cross-left-functional union (or lhs-cross-functional union) takes the cross-left-functional union, then any relations in the clan on the left side of the operation that resulted in the empty set are collected by taking their union and combined with the result of the cross-left-functional union. Mathematically, if \(\mathbb{C}\) and \(\mathbb{D}\) are the two clans in question, then their lhs-cross-left-functional union is:
\[\mathbb{C} \overrightarrow{\underset{f}\blacktriangledown} \mathbb{D} = \mathbb{C} \underset{f}{\blacktriangledown} \mathbb{D} \bigcup\{T : T\in\mathbb{C}\ \& \ T \underset{f}{\blacktriangledown} \mathbb{D} = \varnothing \}\]- multi-cross-intersection
Let \(\dot{\mathbb{S}}\) and \(\dot{\mathbb{T}}\) be multisets of sets with \(X \in \dot{\mathbb{S}}\) and \(Y \in \dot{\mathbb{T}}\), then their multi-cross-intersection is defined in the same way as the cross-intersection with the multiplicities satisfying:
\[\dot{\mathbb{S}} \blacktriangle \dot{\mathbb{T}}(X \cap Y) = \dot{\mathbb{S}}(X) \cdot \dot{\mathbb{T}}(Y)\]Where there is no ambiguity we will refer to the multi-cross-intersection simply as the cross-intersection.
- multi-cross-substriction
Let \(\dot{\mathbb{S}}\) and \(\dot{\mathbb{T}}\) be multisets of sets with \(X \in \dot{\mathbb{S}}\) and \(Y \in \dot{\mathbb{T}}\), then their multi-cross-substriction is defined in the same way as the cross-substriction with the multiplicities satisfying:
\[\dot{\mathbb{S}} \blacktriangleleft \dot{\mathbb{T}}(X \vartriangleleft Y) = \dot{\mathbb{S}}(X)\]Where there is no ambiguity we will refer to the multi-cross-substriction simply as the cross-substriction.
- multi-cross-superstriction
Let \(\dot{\mathbb{S}}\) and \(\dot{\mathbb{T}}\) be multisets of sets with \(X \in \dot{\mathbb{S}}\) and \(Y \in \dot{\mathbb{T}}\), then their multi-cross-superstriction is defined in the same way as the cross-superstriction with the multiplicities satisfying:
\[\dot{\mathbb{S}} \blacktriangleright \dot{\mathbb{T}}(X \vartriangleright Y) = \dot{\mathbb{S}}(X)\]Where there is no ambiguity we will refer to the multi-cross-superstriction simply as the cross-superstriction.
- multi-cross-union
Let \(\dot{\mathbb{S}}\) and \(\dot{\mathbb{T}}\) be multisets of sets with \(X \in \dot{\mathbb{S}}\) and \(Y \in \dot{\mathbb{T}}\), then their multi-cross-union is defined in the same way as the cross-union with the multiplicities satisfying:
\[\dot{\mathbb{S}} \blacktriangledown \dot{\mathbb{T}}(X \cup Y) = \dot{\mathbb{S}}(X) \cdot \dot{\mathbb{T}}(Y)\]Where there is no ambiguity we will refer to the multi-cross-union simply as the cross-union.
- multiclan
- A multiclan is a multiset of relations.
- multiplicity
- Given an element of a multiset, the multiplicity of that element is the number of times the element appears in the multiset. See Multiset for more information.
- multiset
- A multiset, also sometimes called a bag, is a generalization of the idea of a set where multiple instances of the same element are allowed. See Multiset for more information.
- multiset addition
The multiset addition of two multisets \(\dot{S}\) and \(\dot{T}\) is defined as follows:
\[\big(\dot{S} + \dot{T}\big)(x) = \dot{S}(x) + \dot{T}(x)\]for any \(x\), where here we take \(\dot{S}(x)=0\) for \(x \not\in \dot{S}\), and \(\dot{T}(x)=0\) for \(x \not\in \dot{T}\).
- multiset difference
The multiset difference of two multisets \(\dot{S}\) and \(\dot{T}\) is defined as follows:
\[\begin{split}\big(\dot{S} \setminus \dot{T}\big)(x) = \begin{cases} \dot{S}(x) - \dot{T}(x) & \text{if } \dot{S}(x) - \dot{T}(x)>0 \\ \text{undefined} & \text{if } \dot{S}(x) - \dot{T}(x) \leq 0 \end{cases}\end{split}\]for any \(x\), where here we take \(\dot{S}(x)=0\) for \(x \not\in \dot{S}\), and \(\dot{T}(x)=0\) for \(x \not\in \dot{T}\).
- multiset intersection
We define the multiset intersection of two multisets \(\dot{S}\) and \(\dot{T}\) to be:
\[\big(\dot{S} \cap \dot{T}\big)(x) = min \big(\dot{S}(x), \dot{T}(x)\big)\]for any \(x\), where here we take \(\dot{S}(x)=0\) for \(x \not\in \dot{S}\), and \(\dot{T}(x)=0\) for \(x \not\in \dot{T}\).
- multiset union
We define the multiset union of two multisets \(\dot{S}\) and \(\dot{T}\) to be:
\[\big(\dot{S} \cup \dot{T}\big)(x) = max \big(\dot{S}(x), \dot{T}(x)\big)\]for any \(x\), where here we take \(\dot{S}(x)=0\) for \(x \not\in \dot{S}\), and \(\dot{T}(x)=0\) for \(x \not\in \dot{T}\).
- partition
- A partition of a set is the splitting of a set into a collection of smaller subsets. Mathematically, given a set \(S\), we create a set of subsets of \(S\) such that the union of those sets is \(S\), and whose pairwise intersection is the empty set (another term for this is that any two sets are disjoint).
- power set
- The power set of any set \(S\), written \(P(S)\), is the set of all subsets of \(S\), including the empty set and \(S\) itself. We also use the expressions ‘second power set’, ‘third power set’ and so on to mean successive application of the power set operation for the indicated number of times: ‘power set of the power set of the ... of \(S\). (Adapted from Wikipedia: Power set.)
- projection
Given a clan, call it \(C\), and a collection of elements, call it \(lefts\), the projection of \(C\) onto \(lefts\) is a new clan where all of the left components of all of the couplets of the relations of \(C\) are in the set \(lefts\).
To obtain the projection of \(C\) onto \(lefts\) mathematically we can do this as follows:
\[\begin{split}D_{lefts} := \{l{\mapsto}l\ : l\in lefts \} \\ project(C,lefts) = \{ R \circ D_{lefts}\ : R\in C\}\end{split}\]- reflexive
A relation \(R\) is said to be reflexive if:
\[\forall x \in left(R) \cup right(R): x{\mapsto}x \in R\]A couplet can also be reflexive if it is of the form \(x{\mapsto}x\).
- regular
- A short name for left-regular in cases where no ambiguities are expected.
- relation
- A relation is a set of couplets. See also binary relation.
- right
- A short name for right component where no ambiguity is expected.
- right component
- Given a couplet represented by \(l{\mapsto}r\) the component \(r\) is the right component.
- right multiset
The right multiset of a multiclan \(\mathbb{C}\) is the multiset addition of all right sets of its member relations:
\[right(\mathbb{C}) = \underset{R\ \in\ \mathbb{C}}{+} right(R)\]- right set
The right set of a relation \(R\) is the set of all right components of its members:
\[right(R) = \{ r\ : l{\mapsto}r \in R \}\]The right set of a clan \(\mathbb{C}\) is the union of all right sets of its member relations:
\[right(\mathbb{C}) = \underset{R\ \in\ \mathbb{C}}{\bigcup} right(R)\]- right-functional
A relation \(R\) is said to be right-functional if:
\[y{\mapsto}x \in R\ \&\ z{\mapsto}x \in R \implies y = z\]- right-functional union
The right-functional union of two right-functional relations \(R\) and \(Q\) is the union of the two relations if the result is right-functional, otherwise the result is not defined:
\[\begin{split}R \underset{rf}{\cup} Q = \begin{cases} R \cup Q & \text{if }R \cup Q \text{ is right-functional} \\ \text{undefined} & \text{if it is not right-functional} \end{cases}\end{split}\]- right-regular
A clan \(\mathbb{C}\) is said to be right-regular if the right sets of all its relations are the same:
\[\forall R, Q \in \mathbb{C}: right(R) = right(Q)\]- set
- A set is a collection of distinct objects. Each object of a set is called an element of the set. In particular, if \(X\) is a set, then we denote the fact that \(x\) is an element of \(X\) by writing \(x\in X\). We will apply the axioms of Zermelo-Fraenkel set theory with choice (ZFC) to the sets. See also Set Notation for more information about set notation
- set A
- The set \(A\) is the set of all atoms. It is a subset of the set M (\(M\)).
- set M
- The set \(M\) is the set of all elements that can be represented in a given system, including atoms, couplets and sets. (A consequence of this is that the power set of \(M\) \(P(M)\) cannot be represented in a given system, and therefore is not an element of \(M\).)
- set N
- The set \(N\) is the set of all positive integers.
- submultiset
The submultiset relation is a binary relation of multisets. A multiset \(\dot{S}\) is a submultiset of a multiset \(\dot{T}\) if the following holds:
\[\dot{S} \subset \dot{T} \iff \forall x \in \dot{S}, \: \dot{S}(x) \leq \dot{T}(x).\]- subset
The subset relation is a binary relation of sets. A set \(S\) is a subset of a set \(T\) if every element of \(S\) is also an element of \(T\):
\[S \subset T \implies \forall x\ [\ x \in S\ \implies\ x \in T\ ]\]- substriction
Substriction is a partial binary operation on sets and multisets. The substriction of two sets or multisets \(S\) and \(T\) is defined as:
\[S \vartriangleleft T = S\ \ \text{if}\ \ S \subset T\](When extended to an algebra of sets of sets (for example, the algebra of clans), we obtain the cross-substriction, which is also sometimes called ‘substriction’.)
- supermultiset
The supermultiset relation is a binary relation of multisets. A multiset \(\dot{S}\) is a supermultiset of a multiset \(\dot{T}\) if the following holds:
\[\dot{S} \supset \dot{T} \iff \forall x \in \dot{T}, \: \dot{S}(x) \geq \dot{T}(x).\]- superset
The superset relation is a binary relation of sets. A set \(S\) is a superset of a set \(T\) if every element of \(T\) is also an element of \(S\):
\[S \supset T \implies \forall x\ [\ x \in T\ \implies\ x \in S\ ]\]- superstriction
Superstriction is a partial binary operation on sets and multisets. The superstriction of two sets or multisets \(S\) and \(T\) is defined as:
\[S \vartriangleright T := S\ \ \text{if}\ \ S \supset T\](When extended to an algebra of sets of sets (for example, the algebra of clans), we obtain the cross-superstriction, which is also sometimes called ‘superstriction’.)
- symmetric
A relation \(R\) is said to be symmetric if:
\[\forall x, y \in left(R) \cup right(R): x{\mapsto}y \in R \implies y{\mapsto}x \in R\]- symmetric difference
The symmetric difference of two sets \(S\) and \(T\) is the set of elements that are only in one of the sets. The definition is:
\[S \vartriangle T = (S \cup T) \setminus (S \cap T)\]- transitive
A relation \(R\) is said to be transitive if:
\[\forall x, y, z \in left(R) \cup right(R): (x{\mapsto}y \in R \ \& \ y{\mapsto}z \in R) \implies x{\mapsto}z \in R\]- transposition
Transposition is a unary operation on couplets. The transposition of a couplet \(a{\mapsto}b\) is defined as:
\[\overleftrightarrow{a{\mapsto}b} = b{\mapsto}a\]The operation may be extended to extended algebras (like the algebra of relations) using the unary extension and – if there is no danger of ambiguities – is then also called simply ‘transposition’.
In multisets and multiclans the operation is the same and the multiplicities do not change.
- unary extension
The unary extension is the operation that extends a unary operation from its algebra to an extended algebra (which is an algebra of sets):
\[unaryExtn(op, S) := \{op(s)\ :\ s \in S \text{ where } op(s) \text{ is defined}\}\]- unary multi-extension
The unary multi-extension is the operation that extends a unary operation from its algebra to an extended algebra (which is an algebra of multisets). For this extension, the multiplicities do not change:
\[unaryExtn(op, \dot{S}) := \{op(s){:}\dot{S}(s)\ :\ s \in S \text{ where } op(s) \text{ is defined}\}\]- unary operation
- An operation with only one argument, typically with a result that belongs to the same ground set as the argument (when the operation is a member of an algebra).
- union
An operation on sets that creates a set by collecting the elements of two or more individual sets into a new set. In mathematical terms, if \(\mathbb{S}\) is a collection of sets, then the union of all of the sets in \(\mathbb{S}\) is denoted:
\[\bigcup \mathbb{S} = \bigcup_{T\in\mathbb{S}}T,\]and is the set \(\{x\ : \exists T \in \mathbb{S},\ x \in T\}\). If \(\mathbb{S}\) consists of only two sets, the union is called a binary union. If \(\mathbb{S}\) consists of a finite collection of sets, the union is called a finite union. See also Wikipedia: Union.
- Zermelo-Fraenkel set theory with choice (ZFC)
- A system of axioms on sets that is the standard form of set theory and the foundation of much of modern mathematics. See also Wikipedia: Zermelo-Fraenkel set theory with choice (ZFC).
Algebras¶
Definition
In mathematics (abstract algebra), an algebra (also referred to as an algebraic structure consists of a set (called the ground set, which is the set that contains the elements to which the algebra applies) with one or more operations (finitary operation) on the set such as:
- A set of binary operations (each with an optional identity element).
- A set of unary operations.
- A set of relations defined on the ground set.
The ground set combined with the set of operations and identity elements together is called the signature of the algebra.
Some types of algebras:
A semigroup is a non-empty set \(X\) together with a binary operation \(\ast\) which is associative.
A monoid is a semigroup \(X\) with an identity element \(e\) under \(\ast\). We call \(e\) and identity element if: \(\begin{equation*} x\ast e=e\ast x\text{ for all }x\in X. \end{equation*}\)
A group is a monoid where every element has an inverse under \(\ast\), namely, for each \(x\in X\) there exists a \(y\in X\) such that \(x\ast y=e=y\ast x\).
A Boolean algebra consists of a non-empty set \(B\) together with two binary operations we will call \(\oplus\) and \(\odot\), a unary operation \(^{*}\), and two distinguished elements \(\mathbf{0}\) and \(\mathbf{1}\) (these are not, in general, the integers 0 and 1) such that:
- \(\oplus\) and \(\odot\) are each commutative and associative, and each distributes over the other.
- For each \(x\in B\)
\[\begin{split}\begin{equation*} \begin{array}{lll} x\oplus \mathbf{0}=x & \text{ } & x\odot \mathbf{0}=\mathbf{0} \\ x\oplus \mathbf{1}=\mathbf{1} & \text{ } & x\odot \mathbf{1}=x \\ x\oplus x\text{*}=\mathbf{1} & \text{ } & x\odot x\text{*}=\mathbf{0}% \end{array}% . \end{equation*}\end{split}\]- For all \(x,y\in B\)
\[\begin{equation*} \begin{array}{lll} x\odot (x\oplus y)=x & \text{ } & x\oplus (x\odot y)=x% \end{array}% . \end{equation*}\]
Algebra of Sets¶
Definition:
The algebra of sets is an algebra with the signature
Its ground set is the power set of set M (\(M\)). The binary operations are union (\(\cup\), with the identity element \(\varnothing\), the empty set) and intersection (\(\cap\), with the identity element \(M\)), the one unary operation is the complement (\('\)) and the one relation is the subset relation.
Other binary operations (that can be derived from the basic operations and relations) include difference, substriction and superstriction; other relations include the superset relation.
Remark and useful fact: The algebra of any non-empty set is always a Boolean algebra. This means that in particular the algebra on set M (\(M\)) is a Boolean algebra. It also true that any Boolean algebra with a finite ground set \(B\) is the algebra of some set.
Other useful operations on sets:
- The difference of two sets.
- The symmetric difference of two sets.
This gives us two alternative algebras of sets where we use difference and symmetric difference:
\[\bigg[ P(M),[-],[\bigtriangleup ],\subset ,^{\prime }\bigg]\]
and an extension
\[\bigg[ P(M), [\cup ,\varnothing ], [\cap ,M],[-], [\bigtriangleup ],\subset ,^{\prime } \bigg].\]
Identities concerning difference and symmetric difference: Given subsets \(S,T\) and \(U\) of set M (\(M\)):
- \(S-T=S\cap T^{\prime }\).
- \(S-S=\varnothing\).
- \(S-\varnothing =S\).
- \(\varnothing -S=\varnothing\).
- \(S-M=\varnothing\).
- \(M-S=S^{\prime }\).
- \(S\bigtriangleup T=T\bigtriangleup S\). (\(\bigtriangleup\) is commutative.)
- \(S\bigtriangleup (T\bigtriangleup U)=(S\bigtriangleup T)\bigtriangleup U\). (\(\bigtriangleup\) is associative.)
- \(S\bigtriangleup S=\varnothing\).
- \(S\bigtriangleup \varnothing =S\).
- \(S\bigtriangleup M=M-S=S^{\prime }\).
Algebra of Multisets¶
Multiset¶
Definition
A multiset is a set that allows multiple instances of any given element. We define it as a function from a given subset of set M (\(M\)) to the positive integers. In particular, we call \(\dot{S}\) a multiset of \(M\) if for certain \(x\in M\) we have \(\dot{S}(x)\) is a positive integer that we call the multiplicity of \(x\).
Example
Let \(M=\{a,b,c\}\) and \(\dot{S}\) represent the multiset \(\{a,a,a,b,b\}\), then we have \(\dot{S}(a)=3\), \(\dot{S}(b)=2\), and \(\dot{S}(c)\) is not defined. In this case \(a\) has multiplicity 3, and \(b\) has multiplicity 2.
Multiset Algebra¶
Definition
Let \(\dot{P}(M)\) represent the set of all multisets of \(M\). Using the definitions of multiset union, multiset addition, multiset intersection, and submultiset as applied to multisets we have an algebra on multisets with the following signature:
Algebra of Couplets¶
The algebra of couplets is an algebra with the signature
\[\bigg[ M \times M , \big\{\ \circ\ \big\} , \big\{ \leftrightarrow \big\} \bigg]\]
Its ground set is the Cartesian product of set M (\(M\)) with itself. The only binary operation is composition (\(\circ\), without identity element) and the only unary operation is the transposition (\(\leftrightarrow\)).
Algebra of Relations¶
Definition
The algebra of relations is the algebra of sets of couplets; its signature is
\[\bigg[ P(M \times M) , \big\{ [ \circ, D_M ] \big\} , \big\{ \leftrightarrow \big\} \bigg]\]
Its ground set is the power set of the Cartesian product of set M (\(M\)) with itself. The only binary operation is composition (\(\circ\), with the identity element \(D_M\), which is the diagonal on set M (\(M\))) and the only unary operation is the transposition (\(\leftrightarrow\)).
The binary operations union and intersection, the unary operation complement and the subset relation are implicitly present, inherited from the algebra of sets (since the algebra of relations is also an algebra of sets).
Other operations and relations (that can be derived from the basic operations and relations) include the binary operations difference, substriction, superstriction, left-functional union and right-functional union and the superset relation.
Algebra of Clans¶
Definition
The algebra of clans is the algebra of sets of relations, which is sets of sets couplets. Its ground set is therefore the second power set of the Cartesian product of set M (\(M\)) with itself. As a result of being a second power set, the operations of cross-union and cross-intersection can be applied, in addition to the operations on an algebra of sets. Moreover, the operations transposition and composition from the algebra of relations extend in a natural way to the algebra of clans. We also include substriction, superstriction, cross-substriction and cross-superstriction. As a result, it has the following signature:
Note that included in the operations of union and cross-union are the special left-functional and right-functional cases of cross-left-functional union and cross-right-functional union.
Algebra of Multiclans¶
Definition
The algebra of multiclans is an algebra that generalizes the algebra of clans. Since we use \(P(M \times M)\) to denote the set of couplets, a multiset of relations is a multiset of sets of couplets, so we will use \(\dot{P}(P(M \times M))\) to denote the multiset of relations. Combining the multiset algebra with the clan algebra we have
Here the binary operations are all binary multi-extensions of the operations from the algebra of clans, with the exception of multiset addition, and subset is replaced with submultiset. For example, cross-substriction is replaced with multi-cross-substriction. Note that there is no extension of the complement operation. As in the case of clans, the multiset versions of the operations of cross-left-functional union and cross-right-functional union are implied.
Extension¶
Definition
Given a set, or algebra \(S\), we say that \(T\) is an extension of \(S\), or extends \(S\), if one or more of the following holds:
- \(S \subseteq T\) and \(T\) has operations that \(S\) does not.
- \(S \subset T\) and any operation \(op\) on \(S\) can be obtained from an operation \(OP\) on \(T\) by restricting \(OP\) to elements only in \(S\).
For the purposes of data algebra, we will deal almost exclusively with algebras of sets and their extensions. Mathematically an algebra can be extended in many different ways. Below are some general examples, as well as some specific to data algebra.
Examples of extensions¶
Basic examples¶
- Any set \(S\) with no pre-defined operations on it extends to its set algebra: \(\bigg[P(S), \big \{ [ \cup, \varnothing ] , [ \cap, M ] \big\} , \big\{\ '\ \big\} , \big\{ \subset \big\} \bigg]\).
- Let \(\mathbb{Z}\) represent the set of integers, and \(\mathbb{Q}\) represent the set of rational numbers. The set \(\mathbb{Z}\) has a natural algebra under addition, and multiplication, and negation with signature: \(\bigg[ \mathbb{Z}, \big \{[+,0],[\cdot ,1]\}, \big \{-\}, \bigg]\). This algebra naturally extends to an algebra on the rational numbers with signature: \(\bigg[ \mathbb{Q}, \big \{[+,0],[\cdot ,1]\}, \big \{-\}, \bigg]\).
Examples from data algebra¶
The algebra of couplets on a set \(M\) with signature \(\bigg[ M \times M , \big\{\ \circ\ \big\} , \big\{ \leftrightarrow \big\} \bigg]\) extends to the algebra of relations \(\bigg[P(M \times M),\big\{[ \circ, D_M ] \big\} , \big\{ \leftrightarrow \big\}\bigg]\), where \(\circ\) is composition, \(\leftrightarrow\) is transposition, and \(D_M\) is the diagonal of \(M\).
The power set of a power set has an algebra given by the algebra of sets. Hence, if \(U\) is a set, we have an algebra with signature \(\bigg[ P^{2}(U),[\cup ,\varnothing ],[\mathbb{\cap },P(U)],\subset ,\prime \bigg]\). This algebra extends to the algebra \(\bigg[ P^{2}(U),[\cup ,\varnothing ],[\mathbb{\cap },P(U)], [\blacktriangledown ,\{\varnothing\}],[\mathbb{\blacktriangle },\{U\}],\subset ,\prime \bigg]\), where \(\blacktriangledown\) is the cross-intersection and \(\blacktriangle\) is the cross-union, which are binary extensions of intersection and union resprectively from \(P(U)\) to \(P^{2}(U)\). This is how we extend the algebra of relations to the algebra of clans.
An algebra of sets can also be extended to an algebra of multisets. In particular we can simply take binary multi-extensions of all of the binary operations, and replace subset with submultiset. We can also include multiset addition, but we will have to take out complement, as there is not multiset equivalent of that. Therefore, the algebra
\[\bigg[ P(M) , \big\{ [ \cup, \varnothing ] , [ \cap, M ] \big\} , \big\{\ '\ \big\} , \big\{ \subset \big\} \bigg]\]extends to
\[\bigg[\dot{P}(M), [\cup,\varnothing], [+,\varnothing],\cap,\subset \bigg].\]Similar to the previous example an algebra of clans can extend to an algebra of multiclans, by taking binary multi-extensions of all the operations. Hence, an algebra of clans with signature:
\[\bigg[ P^{2}(M \times M),[\cup ,\varnothing ],[\mathbb{\cap },P(M \times M)], [\blacktriangledown ,\{\varnothing\}],[\mathbb{\blacktriangle },\{M \times M\}], \vartriangleleft, \vartriangleright, \blacktriangleleft, \blacktriangleright, \subset , \prime \bigg]\]can be extended to an algebra of multiclans with signature
\[\bigg[ \dot{P}(P(M \times M)),[\cup ,\varnothing ],[+,\varnothing],\mathbb{\cap }, [\blacktriangledown ,\{\varnothing\}],[\mathbb{\blacktriangle },\{M \times M\}], \vartriangleleft, \vartriangleright, \blacktriangleleft, \blacktriangleright, \subset \bigg].\]
An example of adding set operations to an algebra with no given set structure¶
Let us take the algebra of integers with signature \(\bigg[ \mathbb{Z}, \big \{[+,0],[\cdot ,1]\}, \big \{-\}, \bigg]\) as in an earlier example. Since \(\mathbb{Z}\) is a set, it also posesses the set algebra \(\bigg[P(\mathbb{Z}), \big \{ [ \cup, \varnothing ] , [ \cap, M ] \big\} , \big\{\ '\ \big\} , \big\{ \subset \big\} \bigg]\). We can extend \(\bigg[ \mathbb{Z}, \big \{[+,0],[\cdot ,1]\}, \big \{-\}, \bigg]\) to the set algebra by defining addition, multiplication, and negation on subsets of \(\mathbb{Z}\) as follows: Given subsets \(A,B\subset\mathbb{Z}\)
\[\begin{split}\begin{align*} A + B &:= \{c \in \mathbb{Z} : c = a + b \text{ for some }a \in A \text{ and for some } b \in B\} \\ A \cdot B &:= \{c \in \mathbb{Z} : c = a \cdot b \text{ for some } a \in A \text{ and for some } b \in B\} \\ -A &:= \{c \in \mathbb{Z} : c = -a \text{ for some } a \in A\} \end{align*}\end{split}\]In words, the above equations say that over the integers:
- The sum of sets is the set of sums.
- The product of sets the set of products.
- The negative of a set is the set of negatives.
So for example if \(A=\{3,-5,9\}\) and \(B=\{4,12\}\), then
\[\begin{split}\begin{eqnarray*} \{3,-5,9\}+\{4,12\} &=&\left\{ \begin{array}{c} 3+4,3+12, \\ -5+4,-5+12, \\ 9+4,9+12% \end{array}% \right\} \\ &=&\left\{ \begin{array}{c} 7,15, \\ -1,7, \\ 13,21% \end{array}% \right\} \\ &=&\{7,15,-1,13,21\}. \end{eqnarray*}\end{split}\]and,
\[\begin{split}\begin{eqnarray*} \{3,-5,9\}\cdot \{4,12\} &=&\left\{ \begin{array}{c} 3\cdot 4,3\cdot 12, \\ -5\cdot 4,-5\cdot 12, \\ 9\cdot 4,9\cdot 12% \end{array}% \right\} \\ &=&\left\{ \begin{array}{c} 12,36, \\ -20,-60, \\ 36,108% \end{array}% \right\} \\ &=&\{12,36,-20,-60,36,108\}. \end{eqnarray*}\end{split}\]and
\[-A=-\{3,-5,9\}=\{-3,5,-9\}\]In conclusion, this shows that the algebra \(\bigg[ \mathbb{Z}, \big \{[+,0],[\cdot ,1]\}, \big \{-\}, \bigg]\) extends to the algebra
\[\begin{equation*} \bigg[ P(\mathbb{Z}),\{[\cup ,\varnothing ],[\mathbb{\cap },% \mathbb{Z}],[+,\{0\}],[\cdot ,\{1\}]\},\{-,^{\prime }\},\{\subset \}\bigg] , \end{equation*}\]
This page is here to help review notation and terminology for those new to, or requiring a refresher, of sets and set theory.
Set Notation¶
Recall that a set is a collection of distinct objects. In any given set each object in the set is called an element of the set.
For example, if \(S\) represents a set, and \(a\) is an element of set \(S\), we denote this by writing
\[a \in S.\]
If \(a\) is not an element of set \(S\), we write it as
\[a \not\in S.\]
We can more explicitly write out what the specific elements of a set are by using braces: \(\{,\}\). For example, if I wanted \(S\) to be the set of integers between 2 and 6 inclusively, then I can write
\[S = \{2,3,4,5,6\}.\]
There is also the set with no elements at all in it, and it is called the empty set, and is denoted by \(\emptyset\). Other ways of denoting are by using \(\varnothing\), or even \(\{\}\).
We also use colons to represent conditions on elements in a particular set. This lets us build up more complicated sets. For example, if I use \(\mathbb{Z}\) to represent the set of all integers, then if I want \(T\) to represent the set of all integers greater than five, I can write it as
\[T = \{n \in \mathbb{Z} : n > 5\}.\]
Colons in combination with braces and \(\in\) is called set builder notation and lets us create very complicated sets. Let’s say I want to create the set of all irrational numbers, then I can do this by starting with the sets of real numbers, rational numbers, and using set builder notation:
Let \(\mathbb{R}\) represent the set of real numbers, and \(\mathbb{Q}\) the set of rational numbers, then if we use \(X\) to represent the set of irrational numbers, we can write it as
\[X = \{ r \in \mathbb{R} : r \not\in \mathbb{Q} \}.\]
We can add even more conditions. Let us use \(Y\) to represent the set of negative rational numbers, then we can write it as
\[Y = \{ r \in\mathbb{R} : r\not\in\mathbb{Q}\ \&\ r<0 \},\]
or alternatively by going back to \(X\) and saying
\[Y = \{ x \in X : x<0 \}.\]
To help write conditions precisely without creating too much clutter, we will sometimes use symbols to represent logical quantifiers such as \(\exists\) to say “there exists” or \(\forall\) to represent “for all”. For example, if we wanted to say that a set \(S\) was a subset of a set \(T\), that is, any element of \(S\) is also an element of \(T\), we can write that as
\[\forall x, x \in S \implies x \in T.\]
See also set notation and set (mathematics) for more information.