The LINQ Set Operations - - they're not just for math! Once again, in this series of posts I look at the parts of the . NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. Today we are going to examine the LINQ set operations that are part of the IEnumerable< T> extension methods. Now, for nearly all the primitive types (int, double, char, etc) the string class, structs and some BCL reference types, equality is well defined and implemented and they will work fine as is. However, custom classes you write can present a problem because the default implementation of equality most likely will not meet our needs. Just keep that in mind for now and we’ll come back to that in the end with examples of how this can bite you and how to mitigate it. Intersections are very useful in determining where two sets overlap (that is, what elements two sets have in common). The two forms of Intersect(), assuming extension method syntax where the first argument is the target, are: Intersect(IEnumerable< TSource> second)Returns a sequence of elements that are common between the first sequence and the second sequence using the default equality comparer for TSource. Intersect(IEnumerable< TSource> second, IEquality. Comparer< TSource> comparer)Returns a sequence of elements that are common between the first sequence and the second sequence using the definition of equality specified by the passed- in equality comparer. Let’s play with string since it already has a good default equality comparer. Scanning, Union, Intersection, Difference, Symmetric. Union, Intersection, Difference, Symmetric Difference. Another common set of C library functions are those used by applications. C (programming language). C Program to Implement a Stack. Here is source code of the C program to implement a stack. The C program is successfully compiled and. C program for set union and intersection operations? For example: let C = (4, 5. There are several fundamental operations for constructing new sets from given. So say we have two enumerable sequences of string: 1: // lets say we have a list of healthy stuff to consume 2: var healthy. Stuff = new List< string> . So in our example that would mean that the following two statements are logically identical: 1: // these two are identical 2: results = my. Stuff. Intersect(healthy. Stuff); 3: results = healthy. Stuff. Intersect(my. Stuff); This makes sense because asking what healthy foods exist in the list that I eat is the same as asking what foods do I eat that exist in the healthy foods list. Union() – combining the unique elements. The Union() method combines the unique elements from two different enumerable sequences, just like the set union operation dictates. Unions are very useful for combining two sets without duplicates. Thus if in our example we wanted to get a list of all the healthy foods and all foods I eat, we could union the two sets. The two forms of Union(), assuming extension method syntax where the first argument is the target, are: Union(IEnumerable< TSource> second)Returns a sequence of unique elements from the first and second sequence combined using the default equality comparer for TSource. Union(IEnumerable< TSource> second, IEquality. Comparer< TSource> comparer)Returns a sequence of unique elements from the first and second sequence combined using the definition of equality specified by the passed- in equality comparer. By unique elements, I don’t mean to imply that only items with no duplicates are in the resulting set, but that the resulting set eliminates any duplicates. Union() is also commutative, so A . However that said the ordering will be different since the elements from the first sequence appear first in the resulting sequence, followed by any elements in the result that came from the second sequence. The nice thing about Union() is it gives you a nice and easy way to join together two sequences and eliminate duplicates. Note that this is very different from the Concat() extension method in LINQ that just concatenates one sequence to the end of the other, but this makes no attempt to remove duplicates. That is, A – B yields the items in A minus any items in B that happen to be in A. Any items that were unique to B alone are ignored. Thus, if we wanted to get a list of the food I eat that is NOT healthy food, I could do the set difference between what I eat and the healthy things to eat. The two forms of Except(), assuming extension method syntax where the first argument is the target, are: Except(IEnumerable< TSource> second)Returns the unique items in the first set, minus any common items from the second sequence using the default equality comparer for TSource. Except(IEnumerable< TSource> second, IEquality. Comparer< TSource> comparer)Returns the unique items in the first set, minus any common items from the second sequence using the definition of equality specified by the passed- in equality comparer. Once again this is a simple set difference operation. The 1 and 5 are removed since they were in both the first and second set, and the 8 is removed since it didn’t exist in the first set. So the resulting sequence are only the unique items from the first set that are not also in the second set. As you can probably tell, difference is not commutative because if you reverse the order of the sets in the difference you get two different things. Once again if we look at the food example: 1: // this is a list of the things I eat that are not healthy 2: // soda, chips, fat, sugar 3: var results = my. Stuff. Except(healthy. Stuff); 4: 5: // this is a list of healthy things that I do not eat 6: // fruits, vegetables, simple carbs, fiber 7: results = healthy. Stuff. Except(my. Stuff); So as you can see, Except() is a handy way to get a list of elements in a sequence that do not match the items from a second sequence. A quick note on deferred execution. Thus if you did something like this: 1: results = healthy. Stuff. Except(my. Stuff); 2: 3: // because results is an iterator (deferred execution) this clears 4: // my. Stuff before it is actually used, which alters our results 5: my. Stuff. Clear(); 6: 7: foreach (var item in results) 8: . For example, if you had class Employee and class Salaried. Employee which inherits from Employee, then you can perform set operations between the two sets and the resulting set type is the wider of the two types (that is, the higher up the inheritance chain – Employee in this case). A quick note on mixed containers. Also notice that the only thing required for these set operations in System. Linq to work is that both sequences must implement IEnumerable< T> , this means they can be an array, a List< T> , a Hash. Set< T> , or an iterator from another query of type T (and so on). Essentially, this is just to say that you can intersect a Hash. Set< string> with a List< string> and so on, the only thing that is important is that their element types are the same (or covariant as stated above). A final note on equality in complex classes. I hinted before that these operations will work exactly as you expect for primitives, strings, structs, and any reference types that correctly implement the concept of equality. And I hinted that custom classes you write may be in danger of not working. But why? Well, you may think that the first problem is that with class the default concept of Equals() is a reference comparison. While this is true, it is only half the issue. Let’s say we define an Employee class and override Equals() on it: 1: publicclass Employee 2: ? We can get a hint in that the second forms of Union(), Intersect(), and Except() that take an IEquality. Comparer< TSource>. Why IEquality. Comparer, why not just IComparer? The answer is that IEquality. Comparer requires both an Equals() and a Get. Hash. Code() method to be defined. So this should be a good hint to us that we need to provide not only a meaningful Equals() overload but a Get. Hash. Code() overload in our custom classes (or provide a separate custom IEquality. Comparer of course). Name. Get. Hash. Code() : 0; 1. Salary. Get. Hash. Code(); 1. 5: 1. 6: return hash; 1. Many of the LINQ extension methods use the hash codes of the items in the sequences to quickly and efficiently work their way through the lists. We don’t have this issue with primitives and classes such as string which already override Equals() and Get. Hash. Code() appropriately, and struct doesn’t have this issue because struct by default already does a member- wise Equals() and Get. Hash. Code() construction. Thus, another way we could have corrected this would be to make Employee a struct, though this has larger ramifications to consider and shouldn’t be done lightly (for more info on class vs struct and all the differences see here). These can come in handy when combining sequences with no duplicates (Union()), seeing if two sequences have elements in common (Intersect()), or seeing what elements in a sequence are not part of another sequence (Except()). While set operations are typically thought of as math operations, these can be applied to many computer science problems and should be considered when needing to check membership between two sequences of items.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
December 2016
Categories |