• Algebra and Geometry

    Posted on July 19th, 2007 sridhar No comments

    Algebra and Geometry

    It is always interesting to note in an enterprise software development team, there would be some specific people who can write better SQL Queries than others. There are some who are good at OO concepts, complex algorithms but loathe writing SQL Queries. I would compare normal code to Algebra and SQL to say Trignometry or Geometry. These are two very different problem spaces in mathematics. But if you could take the spatial element of geometry and transform it through the aid of what is called coordinate geometry, you could represent circle and lines using algebraic equations. Infact mathematicians sometimes use geometry to solve algebraic problems as in the proof of Fermat’s theorem by Andrew Wiles. The kind of geometry Andrew Wiles used in his proof is called Algebraic geometry.

    Like how initially geometry and algebra formed separate problem spaces, writing stored procedures and writing other kinds of enterprise code (Business Logic and Presentation Layer) are regarded as distinctly different problem spaces. But in this case some would argue that they truly are different unlike the difference between geometry and algebra. Let us investigate if it is true.

     

     

    Set theory and SQL

    At the very core SQL Query’s job is selection of a subset of elements from a bigger set. When I want to select all the customers who belong to California , I am interested in the subset of the “customers” set. [Please note that all the rows (elements) in the “Customers” table in your database constitute yet another subset of the set “all the people in the world”. By adding a new customer row entry you are increasing that subset and by removing an entry you are decreasing that subset.]

    The “where” clause in a SQL Query specifies the selection criteria, described in terms of the properties which define each element of the subset. When we have two conditions in the where clause, then that is equivalent to extraction of two subsets for each condition, and making intersection/union of these two subsets based on the conjunction (OR, AND…).

    The “Join” clause is another interesting keyword, which correlates two totally different sets like say “Customers” and “States”, but has a relationship “Customers reside in a State”. Here we take two sets and do a “Set Product”. A Set Product is a set whose elements are pairs consisting of one element from each operand set. So if there are 10 students and 2 courses, the number of elements in the set product will consist of 20 elements tying each customer with all the states. But then if we impose a constraint that a customer resides in only one state (by making say state a column in customer table – popularly called foreign key relation), then we could create a subset from the “Set Product”, using this constraint and we realize a Join. Again Join is a process of extracting a subset after doing a product operation. Similarly all other SQL clauses could be explained through Set Theory and that would be left as an exercise to the reader!

     

     

    Declarative vs Imperative Programming

    Having covered in brief the underlying set theory aspect of SQL, I would now like to concentrate on another interesting aspect of SQL. People differentiate between declarative programming and imperative programming. SQL is a pretty high level declarative language compared to say a language like C#. But then C# could be called declarative compared to say assembly code. But in general, declarative coding indicates that the coder provides less detail on how to do something rather specifies only what to do. It is up to the instructed (SQL Engine or the compiler as the case may be) to work out the mechanics of how to do. More you specify “how to do”, more imperative your code becomes. Now using code, I would say you could MicroManage till you reach “Processor Instructions”, but if you want to control more and don’t feel that the processor is not doing a good job at executing your instructions, then you might have to think of writing your own processor. But whatever way we think, definitely C# coding seems to be more imperative than SQL queries.

    But there are places where C# code has a tinge of declarative programming, for instance in the use of attributes. Attributes are used to decorate a class or member method, indicating to the runtime as to what additional functionality we would like runtime to provide the class, but stopping at that and not elaborating as to how to achieve it. It is left upto the runtime or provider classes which hook into the runtime to make sense of such attributes and act accordingly.

     

     

    Lists and Iterators

    We have all worked with lists and have iterated through the lists. Lists of objects can be thought of as sets of elements. All the set operations which we explained above done using SQL can be done on lists too through explicit functions operating on the sets. List after all is a data structure quite similar to a table, with columns representing properties and foreign keys representing reference properties. Say we need to use C# to do the same stuff we have done using SQL in extracting relevant subsets from a List, how would you go about it.

    Using the same example, suppose we want to select from a Customer List, whose entries which correspond to the customers who are from California , what would we do? We would typically iterate through the list and investigate each customer element in the list and find out if the State property is equal to California and if so we would add to a result list. In the end we would return this result list which is the subset of original list. This is similar to what SQL Server would be doing behind scenes when we execute a SQL Query though here the source list is an in-memory list while there the list resides in the database.

    All kinds of lists, needs to be iterated to traverse the list elements. A List which implements IEnumerable interface or a generic list which implements IEnumerable interface, which in turn has a function GetEnumerator which inturn returns IEnumerator reference, then such a list is called a “Sequence”. In these list to iterate we could use the “for each” statement. IEnumerator has three methods to traverse the list, MoveNext, Current and Reset and the “for each” internally uses these methods. IEnumerator allows Forward Only traversal. Such well defined iteration interface mechanism is very useful for external functions and classes to act on Lists without ambiguity and uniformly.

     

     

    We have class and class member functions. Let us say, we have written a class, but we want to add one more function to the class, but say I don’t or cannot change the code of the class, what do we do? This is also true if you have a class written by a third party provider, but you would like to add an additional method to the existing set of member methods. Typically this is performed through by composition-delegation or implementation inheritance, adding the additional functionality in the composing or derived class. Instead of going through the headache of doing all that, C# has provided what they call syntactic sugar, to inject an extra method into the existing class definition, and can be called with the same syntax used in calling existing class methods. But please note that though we seem to inject a member method, this member method can only access public members of the class! So though they seem to give a feel as if they are a part of the class, they are in essence outsiders. These members are defined as static members of a static class with the first parameter as this

    .It is as shown below:

     

     

    namespace Extend

    {

    public static class ExtendUtil

    {

    public static string ExtensionMethod(this OriginalClass oc)

    {

    -

    }

    }

    }

    The member method “ExtensionMethod” of the static class ExtendUtil extends the OriginalClass with this new method.

    It can be invoked as follows.

    OriginalClass oc = new OriginalClass();

    Console.WriteLine(“oc.ExtensionMethod = {0}”, oc.ExtensionMethod());

    Please note that the call will be translated to in IL as follows.

    call string Extend. ExtendUtil::ExtensionMethod (class OriginalClass)

    This clearly indicates that this is just Syntactic Sugar as ultimately this call is getting converted to an external call and also the extension method has only access to public members of the original class reiterating the same point.

    Lambda Expressions and Anonymous Delegates

    delegate R Func(A arg);

    The above declaration is a declaration of a generic delegate type Func which takes an argument of any type A and returns a value of any type R.

    A specific instantiation of that would be like below.

    Func<int, int> f1 = new Func<int, int> (MethodClass.OnlyMethod);

    But then in certain cases you do not want to write a class and write a method, when you are just interested in a single function. Infact if you look at it we are more and more moving towards the creation of a global function. In order to remove the trouble of writing MethodClass and have a member OnlyMethod, C# gives a nice way of creating a function inline through what is called anonymous delegates. It can be done as follows.

    Func<int, int> f1 = delegate(int i) { return 2*i; };

    Now we could call this function like below.

    Console.WriteLine(f1(12));

    What we have achieved is creation of a nominal global function without having an explicit class. If suppose the function is a very straight forward algebraic transformation like the case above, then we could simplify it further by writing it as below.

    Func<int, int> f1 = x => 2*x;

    The expression “x => 2*x” is called a lambda expression which takes input parameter as x and returns out as 2*x. So if you look at it Lambda expression can be achieved through existing c# language constructs and is not a fundamental keyword, but as they call it is just syntactic sugar. Here the Lambda expression is just an anonymous delegate which cannot be inspected and changed at runtime. If at all you need to inspect it, you would need to inspect IL. Expression trees are a higher form of Lamba expressions where the expression or algebraic computation is stored as Data, which can inspected and changed at runtime.

    Putting it all together - Realizing Algebraic Geometry

    Now we have looked at many different concepts and are ready to approach LinQ in its elementary form. LinQ expands as “Language Integrated Query”. LinQ as we see sounds as if it is trying to close the gap between the imperative C# and declarative SQL. Let us see how it does that taking a simple case.

    Suppose I have a Customer table and I would like to get the list of customers who belong to a particular state. Using ADO.NET you could execute a SProc (using say the data access application block) passing the state as a parameter. Inside the stored proc it would use that parameter in the where clause – Select * from Customer where state = @state.

    Suppose I want to achieve this on a customer list using C#. This may be how I would go about it. Let us say I have a customer list and I have method to populate the list and I populate the list like below:

    List<Customer> customers = GetCustomerList();

    Then I would need now to get the list of customers belong to a “ California ”. That is easy.

    Write another function

    List<Customer> FilterCustomerListByState(List<Customer> customers, string state)

    – which takes the state as a parameter. Inside the function we can iterate the list using the provided enumerator and create a new list of only those customers who belong to that state. But what happens when I now decide to want to get customers who do not belong to California ? Again I have to write another function for the same. Instead let me write a generic filter function, which accepts a filter criteria and uses the enumerator to iterate the list, apply the filter criteria and returns a new list with elements only satisfying the filter criteria. How will that function look??

    List<Customer> FilterCustomerList (List<Customer> customers, filter)

    But what is the Type of filter? If it is a string, then it should be in a defined format and we need to parse it. That might not be the best way to do it as we need to define an elaborate filter language and grammar. How about saying that filter should be a function which tells whether a customer qualifies or not? This is normally called a predicate.

    In our case it could be a delegate of the following type.

    public delegate bool filterDelegate (Customer cust);

    We can have the real function implementation in a class CustomerPredicatesImpl and let us say the function is named fiterImpl.

    filterDelegate filter = new filterDelegate(CustomerPredicatesImpl.fiterImpl);

    After we get the delegate instance, we could pass it to the FilterCustomerList function like as below.

    List<Customer> filteredCustomers = FilterCustomerList(customers, filter);

    If we write the function FilterCustomerList as a Extension method for the List class like how we discussed before then we could have written the invocation as follows, which is more friendlier

    List<Customer> filteredCustomers = customers.FilterCustomerList (filter);

    Suppose I where to use anonymous delegates and Lambda expressions, then I wouldn’t need to do that roundabout way of creating the filter delegate and I could have written like below.

    List<Customer> filteredCustomers = customers.FilterCustomerList (c => c.State == ” California “);

    Here the lambda expression c => c.State == ” California ” would be taken and converted to an anonymous delegate which fits the parameter type of method FilterCustomerList.

    If we generalize the extension method not extending just List, but any class which implements IEnumerable then that is what is called as the “where” query operator in LinQ which is defined as below.

    public static IEnumerable Where(
        
    this IEnumerable source,
        
    Func< SPAN>bool> predicate);

    This is nothing but the extension method which we were discussing above.

    So the statement above could be written as:

    IEnumerable filteredCustomers = customers.Where (c => c.State == ” California “);

    and suddenly it starts looking like LinQ.

    There is a select operator which is also an extension method similar to Where, which projects or transforms the input set to an output set of a different type or same type.

    Suppose say we want to only select customer names, and not the customers themselves, then it could be done as follows.

    IEnumerable<string> filteredCustomers =

    customers.

    Where (c => c.State == ” California “).

    Select (c=> c.Name);

    Another way to write this is as below by removing some brackets and periods which can be inferred by the compiler.

    IEnumerable<string> filteredCustomers = from c in customers

    where c.State=” California ”

    select c.Name;

    Now doesn’t this look like a SQL Statement? But you are staring at C# code though!!

     

    Leave a reply