Sizes and Indices
This post is part of a series of notes on machine learning.
This post is literate Haskell; you can load the source into GHCi and play along.
First some boilerplate.
{-# LANGUAGE LambdaCase #-}
module Indices where
import Data.List
import Test.QuickCheck
import Test.QuickCheck.Test
import System.Exit
This post is just some preliminary ideas about tensors - nothing learning-specific yet.
Fundamentally, supervised learning models are (nonlinear) functions involving vector spaces over \(\mathbb{R}\). A lot of literature refers to the elements of these spaces as “tensors”, because, well, that’s what they are. But I think this word “tensor” can be unhelpful in this context for several reasons. For one thing, the correct answer to the question “what is a tensor?” quickly veers into multilinear functions and massive quotient spaces and universal properties and omg I just wanted to write a program to tell the difference between cats and dogs. So I’ll just say that a tensor “is” just a multidimensional array, in the sense that a linear transformation “is” a matrix, since for the most part we really want to think of tensors as data, and don’t care so much about the more abstract bits.
With that said, what exactly is a multidimensional array? One definition is that it’s an element of a set like \[\mathbb{R}^{k_1 \times k_2 \times \cdots \times k_t}\] where each \(k_t\) is a natural number. And this is totally appropriate. But I am going to do something a little different. I’m not really comfortable defining things in terms of ellipses; that “dot dot dot” hides enough details to make rigorous calculations awkward to my taste. And since from a machine learning perspective we don’t really need the full power of tensor algebra, I suspect we can afford to take a different tack.
Instead, let’s think for a minute about what we want out of a multidimensional array. An array is a really simple kind of data structure, consisting of entries that are accessed using their index or position in the array. That word dimension is a funny thing – what does it really mean here? In the strictest sense, it measures the number of “coordinates” needed to specify an entry in the array. So, for example, an array in \(\mathbb{R}^{2 \times 3 \times 4}\) has “dimension” 3, since each entry has an address along 3 different “axes”. But then \(\mathbb{R}^{5 \times 6 \times 7}\) also has dimension 3 in this sense. (In tensor language we’d call this the rank rather than dimension.)
The reason why we attach numbers to things is typically to quantify how alike they are. So: how are arrays in \(\mathbb{R}^{2 \times 3 \times 4}\) and \(\mathbb{R}^{5 \times 6 \times 7}\) alike? Are they alike enough to warrant using a hefty word like dimension to express their similarity, especially when there’s a more relevant notion of vector space dimension floating around? I don’t think so.
In this post I’ll define a couple of algebras in an attempt to nail down a useful notion of dimension, as well as shape, size, and index for multidimensional arrays. For now, when I say algebra I mean universal algebra; that is, a set with some functions on it that satisfy 0 or more universally quantified axioms.
Let’s think again about that vector dimension \(2 \times 3 \times 4\). This is a funny way to write a dimension. Yes, we can think of a natural number as the set of numbers less than itself, and that \(\times\) like the cartesian product of sets, and then \(\mathbb{R}^{2 \times 3 \times 4}\) can be thought of as a literal set of functions from the set \(2 \times 3 \times 4\) to \(\mathbb{R}\), as the notation suggests. But that \(2 \times 3 \times 4\) makes it look like we want to express some kind of arithmetic that remembers where it came from, in the sense that \(2 \times 3\) and \(3 \times 2\) are different. Doing this with actual sets is a little awkward, though, so lets make an algebra instead.
We denote by \(\mathbb{S}\) the free algebra over \(\mathbb{N}\) with two function symbols of arity 2, denoted \(\oplus\) and \(\otimes\). Elements of \(\mathbb{S}\) are called sizes, and we’ll sometimes refer to \(\mathbb{S}\) as the algebra of sizes.
For example, \(2 \oplus 4\) and \(5 \otimes (3 \oplus 6)\) are elements of \(\mathbb{S}\). Eventually we’ll use, for instance, the element \(2 \otimes 3\) to describe the “size” of a \(2 \times 3\) matrix. The size algebra has no axioms, so for example \(a \otimes (b \otimes c)\) and \((a \otimes b) \otimes c\) are not equal. And just ignore the \(\oplus\) for now. :)
So the elements of \(\mathbb{S}\) look like unevaluated arithmetic expressions with plus and times. By the way, one benefit of using free algebras is we can implement them in Haskell with algebraic data types.
data Size
= Size Integer
| Size :+ Size
| Size :* Size
deriving Eq
instance Show Size where
show =
let
p x = if ' ' `elem` x
then "(" ++ x ++ ")" else x
in
\case
Size k -> show k
a :+ b -> concat [p $ show a, " + ", p $ show b]
a :* b -> concat [p $ show a, " x ", p $ show b]
-- so we can define them with numeric literals
instance Num Size where
fromInteger k = if k >= 0
then Size k
else error "sizes cannot be negative."
(+) = (:+)
(*) = (:*)
abs = error "Size Num instance: abs makes no sense."
signum = error "Size Num instance: signum makes no sense."
negate = error "Size Num instance: negate makes no sense."
If you’re following along with GHCi, try defining some Size
s. (The Num
instance is just there to make the notation less awkward.)
Another nice thing about free algebras is that we get universal mappings for free! For example:
We denote by \(\mathbb{H}\) the free algebra over \(\ast = \{\ast\}\) with two function symbols of arity 2, denoted \(\oplus\) and \(\otimes\). Elements of \(\mathbb{H}\) are called shapes, and we’ll sometimes refer to \(\mathbb{H}\) as the algebra of shapes. Define \(h : \mathbb{N} \rightarrow \ast\) by \(h(k) = \ast\), and let \(H : \mathbb{S} \rightarrow \mathbb{H}\) be the map induced by \(h\). If \(s \in \mathbb{S}\), we say \(H(s)\) is the shape of \(s\).
Note that \((\mathbb{N},+,\times)\) is an algebra with two function symbols of arity 2. Let \(D : \mathbb{S} \rightarrow \mathbb{N}\) be the map induced by the identity function on \(\mathbb{N}\). If \(s \in \mathbb{S}\), we say \(D(s)\) is the dimension of \(s\).
Again, we can implement these in code in the usual way.
data Shape
= HAtom
| HPlus Shape Shape
| HTimes Shape Shape
deriving Eq
instance Show Shape where
show = \case
HAtom -> "*"
HPlus a b -> concat ["(", show a, " + ", show b, ")"]
HTimes a b -> concat ["(", show a, " x ", show b, ")"]
shapeOf :: Size -> Shape
shapeOf = \case
Size _ -> HAtom
a :+ b -> HPlus (shapeOf a) (shapeOf b)
a :* b -> HTimes (shapeOf a) (shapeOf b)
dimOf :: Size -> Integer
dimOf = \case
Size k -> k
a :+ b -> (dimOf a) + (dimOf b)
a :* b -> (dimOf a) * (dimOf b)
Eventually, \(s \in \mathbb{S}\) will represent the “size” of a tensor and \(D(s) \in \mathbb{N}\) will be the vector space dimension of the space it comes from.
Indices
This is well and good; we have a type, Size
, that will eventually represent the size of a multidimensional array, and we can extract the “shape” and “dimension” of a size. But we also need a reasonable understanding of how to refer to the entries of an array of a given size.
However we define indices, which indices make sense for a given size will depend on the structure of the size. For instance, a natural number size \(k\) might be indexed by \(k\) contiguous natural numbers, starting from 0 or 1 or whatever. A product-shaped size like \(a \otimes b\) might be indexed by a pair \((u,v)\), where \(u\) is an index of \(a\) and \(v\) an index of \(b\). The sum size is a little stranger: to index \(a \oplus b\), we need an index for either \(a\) or \(b\), and some way to distinguish which is which.
Putting this together, we will define an algebra of indices like so.
We denote by \(\mathbb{I}\) the free algebra over \(\mathbb{N}\) with two function symbols of arity 1 and one of arity 2, denoted \(\mathsf{L}\), \(\mathsf{R}\), and \(\&\). Elements of \(\mathbb{I}\) are called indices, and we’ll sometimes refer to \(\mathbb{I}\) as the algebra of indices.
Again, since \(\mathbb{I}\) is a free algebra we can represent it as an algebraic type.
data Index
= Index Integer
| L Index
| R Index
| Index :& Index
deriving Eq
instance Show Index where
show = \case
Index k -> show k
L a -> "L(" ++ show a ++ ")"
R b -> "R(" ++ show b ++ ")"
a :& b -> concat ["(", show a, ",", show b, ")"]
instance Num Index where
fromInteger = Index
(+) = error "Index Num instance: (+) does not make sense"
(*) = error "Index Num instance: (*) does not make sense"
negate = error "Index Num instance: negate does not make sense"
abs = error "Index Num instance: abs does not make sense"
signum = error "Index Num instance: signum does not make sense"
Now given an index and a size, it may or may not make sense to talk about an entry at the index in a structure of the given size – like asking for the item at index 10 in an array of length 5. To capture this, we define a compatibility relation to detect when an index can be used on a given size.
isIndexOf :: Index -> Size -> Bool
(Index t) `isIndexOf` (Size k)
= 0 <= t && t < k
(L u) `isIndexOf` (a :+ _)
= isIndexOf u a
(R v) `isIndexOf` (_ :+ b)
= isIndexOf v b
(u :& v) `isIndexOf` (a :* b)
= (u `isIndexOf` a) && (v `isIndexOf` b)
_ `isIndexOf` _
= False
From now on, if \(s\) is a size, I’ll also use \(s\) to denote the set of indices compatible with \(s\). So for example, if \(s = 5\), we might say somthing like \[\sum_{i \in s} f(i)\] without ambiguity.
We’d like to be able to construct \(s\) as a list; this is what indicesOf
does. I’m going to play a little fast and loose with the proof because laziness.
indicesOf :: Size -> [Index]
indicesOf = \case
Size k -> map Index [0..(k-1)]
a :+ b -> map L (indicesOf a) ++ map R (indicesOf b)
a :* b -> [ u :& v | v <- indicesOf b, u <- indicesOf a ]
The number of different indices for a given size should be equal to the size’s dimension. This suggests a simple test: the length of the index list is the dimension, and all entries of the index list are distinct.
_test_index_count :: Test (Size -> Bool)
_test_index_count =
testName "dimOf s == length $ indicesOf s" $
\s ->
(dimOf s) == (fromIntegral $ length $ indicesOf s)
_test_indices_distinct :: Test (Size -> Bool)
_test_indices_distinct =
testName "indicesOf s all distinct" $
\s -> (indicesOf s) == (nub $ indicesOf s)
In later posts, \(s \in \mathbb{S}\) will represent the size (and shape) of the elements in a vector space consisting of tensors, which itself has vector space dimension \(D(s)\). But it will sometimes be convenient to think of these tensors canonically as \(D(s)\)-dimensional vectors. To do this, we’ll set up a bijection between the indices of a given size \(s\) and the natural numbers less than \(D(s)\). I’ll call the function from indices to numbers “flatten”, since it turns a complicated thing into a one-dimensional thing, and call the inverse “buildup”.
flatten :: Size -> Index -> Integer
flatten (Size k) (Index t)
= if 0 <= t && t < k then t else error "index out of bounds"
flatten (a :+ _) (L u)
= flatten a u
flatten (a :+ b) (R v)
= (dimOf a) + (flatten b v)
flatten (a :* b) (u :& v)
= (flatten a u) + (flatten b v)*(dimOf a)
buildup :: Size -> Integer -> Index
buildup (Size k) t
= if 0 <= t && t < k
then Index t
else error "integer index out of bounds"
buildup (a :+ b) t
= if t < dimOf a
then L $ buildup a t
else R $ buildup b (t - dimOf a)
buildup (a :* b) t
= (buildup a (t `rem` (dimOf a))) :& (buildup b (t `quot` (dimOf a)))
Now flatten
and buildup
should be inverses of each other, which we can test.
_test_flatten_buildup :: Test (Size -> Bool)
_test_flatten_buildup =
testName "flatten s . buildup s == id" $
\s ->
let ks = [0..((dimOf s) - 1)]
in ks == map (flatten s . buildup s) ks
_test_buildup_flatten :: Test (Size -> Bool)
_test_buildup_flatten =
testName "buildup s . flatten s == id" $
\s ->
let ks = indicesOf s
in ks == map (buildup s . flatten s) ks
To wrap up, in this post we defined two algebraic types, Size
and Index
, to represent the sizes and indices of multidimensional arrays, and two functions, flatten
and buildup
, that canonically map the indices of a given size to a 0-indexed list of natural numbers.
In the next post, we’ll use Size
and Index
to define and manipulate multidimensional arrays.
Tests
Math heavy code is well suited to automated tests, so we’ll write some along the way using the QuickCheck
library.
First off, we won’t be needing the full complexity of QuickCheck, so here are some helper functions to make the tests a little simpler to write.
type Test prop = (String, prop)
testName :: String -> prop -> Test prop
testName name prop = (name, prop)
runTest, chattyTest, skipTest :: Testable prop => Args -> Test prop -> IO ()
runTest args (name, prop) = do
putStrLn ("\x1b[1;34m" ++ name ++ "\x1b[0;39;49m")
result <- quickCheckWithResult args prop
if isSuccess result
then return ()
else (putStrLn (show result)) >> exitFailure
chattyTest args (name, prop) = do
putStrLn ("\x1b[1;35m" ++ name ++ "\x1b[0;39;49m")
result <- verboseCheckWithResult args prop
if isSuccess result
then return ()
else (putStrLn (show result))
-- when testing tests
skipTest _ (name, _) =
putStrLn ("\x1b[1;33mskipped: " ++ name ++ "\x1b[0;39;49m")
testLabel :: String -> IO ()
testLabel msg = putStrLn ("\n\x1b[1;32m" ++ msg ++ "\x1b[0;39;49m")
class TypeName t where
typeName :: t -> String
instance TypeName Int where typeName _ = "Int"
instance TypeName Integer where typeName _ = "Integer"
instance TypeName Double where typeName _ = "Double"
pairOf :: (Monad m) => m a -> m b -> m (a,b)
pairOf ma mb = do
x <- ma
y <- mb
return (x,y)
forAll2 :: (Show a, Show b, Testable prop)
=> Gen a -> Gen b -> (a -> b -> prop) -> Property
forAll2 ga gb f = forAll genPair (uncurry f)
where
genPair = do
x <- ga
y <- gb
return (x,y)
forAll3 :: (Show a, Show b, Show c, Testable prop)
=> Gen a -> Gen b -> Gen c -> (a -> b -> c -> prop) -> Property
forAll3 ga gb gc f = forAll genTriple g
where
genTriple = do
x <- ga
y <- gb
z <- gc
return (x,y,z)
g (x,y,z) = f x y z
To write QuickCheck tests for a given type it needs to be an instance of Arbitrary
, which provides two basic functions: arbitrary
, which generates a “random” element of the type, and shrink
, which takes an element and makes it “smaller” in some way. Defining these functions for a given type may be ugly, but only has to be done once.
instance Arbitrary Size where
arbitrary = sized arbSize
shrink = \case
Size k -> if k <= 0 then [] else [Size (k-1)]
u :+ v -> [u, v]
u :* v -> [u, v]
arbSize :: Int -> Gen Size
arbSize 0 = do
k <- elements [0,1,1,2,2,2,2]
return (Size k)
arbSize n = do
switch <- arbitrary :: Gen Int
m <- choose (1,n)
case switch `mod` 5 of
0 -> do
u <- arbSize $ n-1
v <- arbSize $ n-1
return (u :* v)
1 -> do
u <- arbSize $ n-1
v <- arbSize $ n-1
return (u :+ v)
_ -> do
k <- elements [0,1,2]
return (Size k)
Now we can wrap up our tests in a little suite, _test_index
. The arguments for this function are (1) the number of test cases to generate and (2) how big they should be.
-- run all tests for Size and Index
_test_index :: Int -> Int -> IO ()
_test_index num size = do
testLabel "Size and Index"
let
args = stdArgs
{ maxSuccess = num
, maxSize = size
}
runTest args _test_index_count
runTest args _test_indices_distinct
runTest args _test_flatten_buildup
runTest args _test_buildup_flatten
main_index :: IO ()
main_index = _test_index 200 20