I have been using LINQ for a while now for pretty standard queryies, usually against object collections. One of the extension methods in the System.Linq.Enumerable class that I find I’m using more and more is the SelectMany( ) method – it wasn’t obvious at all to me when I first saw it what it’s purpose is.
This is best illustrated by an example. Suppose I have defined some classes that represent a League that contains Teams and Teams contain Players. This is a fairly typical hierarchical collection of objects that I work with every day. In this example, I want to find all the Players across all Leagues and Teams that meet some certain criteria. The “poor man’s” way to do this is use for-loops walk through each League, then each Team and collect the Players that I’m looking for. Well, this is exactly what SelectMany( ) can do for us.
My collection of Leagues, Teams, and Players has a structure like the following:
- League: AFC-West
- Team: Chargers
- Player: Rivers
- Player: Tomlinson
- Player: Gates
- Team: Broncos
- Player: Cutler
- Player: Bailey
- Player: Marshall
- Team: Chargers
- League: AFC-South
- Team: Colts
- Player: Manning
- Player: Addai
- Player: Vinatieri
- Team: Colts
Now let’s write some queries (I use LINQPad to play with these queries and get immediate feedback/output).
This most basic version gives you a flat collection of IEnumerable<Team> of all the teams in across all leagues.
You can chain the SelectMany calls together to dive as deep as you want in the hierarchy. This returns an IEnumerable<Player> of all players in all leagues & teams.
And of course you can add where clause criteria to further refine which players you get now that you’re operating on the flat list of players.
Here’s a good way to chain Where( ) in between the SelectMany( ) calls get all the players only from certain teams.
The 3rd and 4th overloads of the SelectMany( ) method with the additional “result selector” parameter take a bit more explanation. I look this extra result selector parameter as a helper object to help you know the relationship between the parent and child collections. Say you need a result collection that will not only have the full list of Teams that match some criteria in all leagues, but need to know what League the teams are in. If you use one of the above queries, you get the flat list of Team objects, but have lost their connection to what League they’re from. (Sometimes you have “back references” in your object model to get from a child object to their parent, which makes using this result selector unnecessary, but you don’t always have that benefit.) The result selector is an intermediate object availble within the scope of the query to give you the information you need, and it’s up to you to decide what data you need in the result selector to help you. Here’s an example:
So the helper object is created from the 2-argument Func<> which takes the object from the parent collection and the object from the child collection that are getting paired up during the query processing, and you can do whatever you want with them – I’m just doing something easy that creates a new anonymous type with those objects embedded in them, which gives me the ability to use that data in the where clause, as well as in the results of the query. (Note: you can just “select” the whole helper object in the final line of the query as well)