Skip to content

Pattern matching F# Lists

Here are some simple examples I’ve come up with for working with lists in F#. These follow on from previous work looking at finding a value in a list and again this is definitely not efficient F# code. I’m just playing around in the language and trying to understand things.

First I tried walking a list using a recursive function and a match pattern. All I wanted to do was pop the head of the list, print it and process the remainder of the list.

The printfn statements in my example use the %A format specifier which provides pretty printing for any value. Also the ; symbol used to separate multiple expressions in a match rule causes the first expression to be evaluated and its result discarded before the second expression is evaluated and used as the result. This method can be used to execute side-affecting statements such as old fashioned debug printing.

#light;

let rec listWalk list =
  printfn "-----";
  match list with 
  | [] -> printfn "Empty List";
  | h::t -> printfn "Head is %A" h; printfn "Tail is %A" t; listWalk t;;

let s = "A string of words";;
let l = s |> String.split [' '];;

The list of words l is used in all the examples for this post.

> listWalk l;;
-----
Head is "A"
Tail is ["string"; "of"; "words"]
-----
Head is "string"
Tail is ["of"; "words"]
-----
Head is "of"
Tail is ["words"]
-----
Head is "words"
Tail is []
-----
Empty List
val it : unit = ()
>

The first match clause matches on an empty list ([]) and the second matches on a list construction. The head element is bound to the identifier h and the tail to the identifier t. Looking at the output from printfn, h is a value of type string (denoted by the surrounding double quotes) and t is a value of type list of string. Notice that when t is an empty list its string representation is the same as the pattern used to denote an empty list.

Next I wanted to a match rule to find the last element in the list, that is where there is a value for head and the tail is an empty list. So I added a rule that combined the previous two and matched on a list constructed with a head and an empty list.

let rec listWalk list =
 printfn "-----";
 match list with 
 | [] -> printfn "Empty List";
 | h::[] -> printfn "Last element is %A" h; 
 | h::t -> printfn "Head is %A" h; printfn "Tail is %A" t; listWalk t;; 

Using the same list of words as above.

> listWalk l;;
-----
Head is "A"
Tail is ["string"; "of"; "words"]
-----
Head is "string"
Tail is ["of"; "words"]
-----
Head is "of"
Tail is ["words"]
-----
Last element is "words"
val it : unit = ()

This time the rule for an empty list ([]) did not match because a recursive call was not made when a list with a single element was detected.

There is an easy mistake to make when using match patterns like this though. The patterns h::t and h::[] will both match the pattern of a list with one remaining element, as [] just means an empty list. So if the rules are sequenced incorrectly my new rule to detect the last element in the list will not work.

let rec listWalk list =
  printfn "-----";
  match list with 
  | [] -> printfn "Empty List";
  | h::t -> printfn "Head is %A" h; printfn "Tail is %A" t; listWalk t;
  | h::[] -> printfn "Last element is %A" h;;

Results in…

> listWalk l;;
-----
Head is "A"
Tail is ["string"; "of"; "words"]
-----
Head is "string"
Tail is ["of"; "words"]
-----
Head is "of"
Tail is ["words"]
-----
Head is "words"
Tail is []
-----
Empty List
val it : unit = ()
>

In fact the compiler is able to work this out outputs a warning when compiling the function.

warning FS0026: This rule will never be matched.

Of course if all I wanted to do was iterate over a list there are many functions in the List Module that would be a better choice. For example using one of the four iter(ation) functions.

let listWalk list =
  printfn "-----";
  list |> List.iter (fun e -> printfn "Element is %A" e);;
> > listWalk l;;
-----
Element is "A"
Element is "string"
Element is "of"
Element is "words"
val it : unit = ()
>

Matching patterns in F#

Patterns in F# are kind of like switch statements in C#, but much more powerful. For my first experiment with them I decided to see if I could find a word in a sentence. This is definitely the wrong way to go about solving this problem, calling String.Contains() would be easier. But I am still very much learning the language and this is the first step in investigating whether patterns would be of use solving another problem I have in mind.

So with that in mind lets get started in F# interactive mode and the #light syntax.

My first draft was not pretty, but it worked and got my head into the game. The algorithm involves examining the elements of a list one by one and comparing the value to the value of a variable.

#light;

> let sentence = "there is a red dog";;
val sentence : string

> let tokens = sentence |> String.split [' '];;
val tokens : string list

> tokens;;
val it : string list = ["there"; "is"; "a"; "red"; "dog"]

> let find = "red";;
val find : string

> let rec search tokens find= 
 match tokens with
 | [] -> "no match"
 | h::t when h = find -> "match"
 | h::t -> search t find;;

val search : 'a list -> 'a -> string

First I defined and initialized the value sentence with the string I wanted to search, letting the compiler correctly infer the type to be string. The value of sentence was then passed as the last argument to the String.split function using the pipeline operator (|>). The first argument passed was a list containing the delimiters to split the string on, in this case just the space character. The result of calling split is another list containing the words split from the sentence; executing tokens;; in the interactive session returns the value.

The value to search for was stored in the variable ‘find’. I’m trying to avoid calling variables ‘variables’ as by default they are immutable in F# and so cannot be changed. So I tend to think of them more like values with a name, but this seems to make things a little confusing at times.

Then I defined the function search using the rec modifier to instruct the compiler to allow recursive calls. The function takes two parameters, the first is the list to search and the second is the value to search for. The match statement has three patterns, one of which makes a (tail) recursive call to continue processing the list of words. Recursion was probably not the best construct to use but I wanted to see how it works in F#. The three patterns are evaluated in order and work as follows.

  1. Match on an empty list, execute the expression to return the string “no match”
  2. Match on a list with at least one element, bind the head of the list to h and the remainder of the list to t. If the value of h is equivalent to the value of ‘find’ execute the expression to return the string “match”
  3. Match on a list with at least one element, bind the head of the list to h and the remainder of the list to t. Make a recursive call to search and pass the tail (t) of the list.

Some simple tests showed to function to be working as expected.

> search tokens find;;
val it : string = "match"

> let find = "blue";;
val find : string

> search tokens find;;
val it : string = "no match"

Function Values

That worked as a first draft but it was not very nice, so I set about it with my refactoring stick.

The first thing I did was to modify the search function to return a boolean. I did not have to explicitly say it returned a boolean value, just change the string return values to the boolean constants and the compiler worked it out. Otherwise the search function itself was fine.

> let rec search tokens find = 
 match tokens with
 | [] -> false 
 | h::t when h = find -> true
 | h::t -> search t find;;
val search : 'a list -> 'a -> bool

As well as inferring the return type the compiler was also able to automatically generalize the function. There was nothing in the search function that required the value of find or the list of tokens to be a strings. So the compiler created a function that accepts a list of a generic type (”‘a list”), a single value of the same generic type (”‘a”) and returns a boolean value (bool).

Next I used partial function application / currying to create a function that accepts a string sentence to search in and returns an anonymous function. The code in the wordSearch function to split the sentence is only executed once, and only when the anonymous function is executed. At the moment I don’t understand the language enough to explain why, but the key element to achieving this result is the anonymous function requiring a parameter to be passed.

> let wordSearch s = 
 let tokens = String.split [' '] s
 fun (word: string) -> search tokens word;;
val wordSearch : string -> (string -> bool)

> let s = wordSearch "there is a red dog";;
val s : (string -> bool)

> s "red";;
val it : bool = true

> s "blue";;
val it : bool = false

Next I hope to modify the function to search the list of tokens for any element from a list.

Initialization In F and C #

There are some common compiler tricks in F# and C#, for example object initialization.

In C# 3.0 we have Object Initializers which allow for an object to be instantiated via an explicit constructor call and properties to be set in one statement.

Form f = new Form1()
{
    Text = "Hello"
};
Application.Run(f);

And in F# we have initial property settings where named values are first mapped to constructor parameters and the unused values used to set properties.

> open System.Windows.Forms;;

> let f = new Form(Text = "Hello");;

val f : Form

> f.Show();;
val it : unit = ()

Inferred Generics in F#

The first hundred or so pages of Expert F# have been really interesting, the compiler feels so much more advanced than C#. Although I think it has more to do with the nature of functional programming than any deficiency in the C# compiler. The code does not tell the machine how to do something rather it says what needs to be done. Which means the compiler is free to do as it wishes and may compile things to have a broader meaning, so long as the intention of the code is honored.

One of the neat tricks is when a function is automatically inferred to be generic through a process known as Generalization. This occurs when a function does not constrain or use a parameter is such a way that the value must be of a specific type.

Lets start with something that is not Generalized. A function that takes a tuple of two values and subtracts 1 from each value.

> let sub (a,b) = (a-1, b-1);;
val sub : int * int -> int * int

> sub (2,3);;
val it : int * int = (1, 2)

While I cannot explain why the sub function is not generalizable in terms of the rules laid out in the language specification of Generalization, it’s pretty clear what’s happened. Subtracting the integer 1 from each of the tuple values requires that the tuple be of type int * int, i.e. two integers. So the function sub takes a tuple of two integers and returns a tuple of two integers.

Now something that can be generalized, returning the first value in a tuple of two values.

> let getFirst (a,b) = a;;
val getFirst : 'a * 'b -> 'a

> getFirst (1,2);;
val it : int = 1

> getFirst ("a", "b");;
val it : string = "a"

Types are denoted by the token, so the tuplea * “b is a tuple of a value of type a and a value of type b. Because of the generalization getFirst can be called with two integers (1,2) or two strings (”a”, “b”). This is just how the built in fst operator works. It takes a tuple of two values and returns the first (there is also a snd to return the second value).

>fst;;
val it : ('a * 'b -> 'a) = <fun:clo@0_6>

> fst (1,2);;
val it : int = 1

> fst ("a", "b");;
val it : string = "a"

Persisting Computed XML

It’s possible to define an XML column as a computed column. And in most cases I think it would make sense to persist the column to save doing all that work again. As Bob Beauchemin points out there’s a trick to doing it.

The first thing you need is a user defined scalar function to that returns an XML type. As with all persisted computed columns the function must be deterministic and as Bob points out that’s fine because all XQuery functions are deterministic. However the UDF must also be marked with SCHEMABINDING to ensure it’s determinism.

SCHEMABINDING guarantees that the dependancies of the function cannot be changed without first altering the function which has two effects. First the function will continue to execute without an exception (e.g. bad column name), which could be considered a type of determinism. Second column types and definitions cannot change; changes could lead to truncation, overflow or a dependancy on a column changed to a non deterministic computed definition.

So first the function, you’ll need to pass all the column values into the function.

create function dbo.udf_event_data
(
    @start_time as smalldatetime, 
    @end_time as smalldatetime  
)
returns xml 
with schemabinding
as
begin

-- times must be cast to formatted strings for XQuery
declare @start_time_char varchar(128)
set @start_time_char = convert(varchar(128), @start_time, 126) + 'Z'    

declare @end_time_char varchar(128)
set @end_time_char = convert(varchar(128), @end_time, 126) + 'Z'    

declare @x xml
set @x = '';

set @x = 
    @x.query('
        declare namespace xsi="http://www.w3.org/2001/XMLSchema-instance";

        
                    
                {   
                    if ( empty(sql:variable("@start_time_char") cast as xs:dateTime? )) then                    
                        attribute xsi:nil {"true"}  
                    else () 
                }
                {
                    sql:variable("@start_time_char") cast as xs:dateTime?
                }
            
                  
                {   
                    if ( empty(sql:variable("@end_time_char") cast as xs:dateTime? )) then                  
                        attribute xsi:nil {"true"}  
                    else () 
                }
                {
                    sql:variable("@end_time_char") cast as xs:dateTime?
                }
            
        
    ');     
return @x
end     

The next step is to create a computed column and mark the XML column as PERSISTED. All the columns values needed by the function must be passed in the definition which should look something like.

...
[event_data]  AS ([dbo].[udf_event_data]([start_time],[end_time])) PERSISTED,

The UDF above uses the mechanism for handling SQL NULL discussed previously.