Careful with those enumerables

Ever since .Net 2.0 introduced the yield keyword for creating enumerators, and even more after the introduction of LINQ in C# 3.0, I've seen more and more APIs return IEnumerable<T> instead of IList<T> or ICollection<T> or their older cousins the ArrayList and the array object.

That makes sense most of the time, especially for collections that aren't meant to be modified, and choosing between those different return types is not what I'm about to discuss here. You can find endless articles and threads about that.

The caller's perspective

What has caused me some trouble recently was being caught off guard by some unexpected performance penalties when using one of those IEnumerable<T>.

Not that IEnumerable<T> has any performance issue by itself but the way we deliver it can misguide the caller. Let me try to make that statement a little clearer with a small example.

Consider this ProductCatalog class that basically wraps a collection of Product objects.

public class ProductCatalog
{
  private readonly IInventoryService _inventoryService;
  private List<Product> _products;

  public ProductCatalog(IInventoryService inventoryService)
  {
    _inventoryService = inventoryService;
    _products = new List<Product>();
  }

  public void Populate()
  {
    //imagine this will populate from a database
    _products.Add(new Product {Id = 1, Price = 12.34m});
    _products.Add(new Product {Id = 2, Price = 11.22m});
    _products.Add(new Product {Id = 3, Price = 7.99m});
    _products.Add(new Product {Id = 4, Price = 3.49m});
	//...
    _products.Add(new Product {Id = 10000, Price = 75.99m});
  }

  public IEnumerable<Product> Products { get { return _products; } }

  public IEnumerable<Product> AvailableProducts
  {
    get
    {
      return Products
        .Where(product => _inventoryService.IsProductInStock(product));
    }
  }
}

And here's the code using it.

var catalog = new ProductCatalog(new InventoryService());
catalog.Populate();

var available = catalog.AvailableProducts;

foreach (var product in available)
{
  Console.Out.WriteLine("Product Id " + product.Id + " is available.");
}

var priceSum = 0m;

priceSum = available.Sum(product => product.Price);
Console.Out.WriteLine(
  "If I buy one of each product I'll pay: " + priceSum.ToString("c"));

The InventoryService is something like this:

public class InventoryService : IInventoryService
{
  public bool IsProductInStock(Product product)
  {
    Console.Out.WriteLine("Expensive verification for prod id: " + product.Id);
    //imagine something a little more complex and lengthy is happening here.
    return true;
  }
}

It seems trivial enough, but let's look at what the output gives us:

Expensive verification for prod id: 1
Product Id 1 is available.
Expensive verification for prod id: 2
Product Id 2 is available.
Expensive verification for prod id: 3
Product Id 3 is available.
Expensive verification for prod id: 4
Product Id 4 is available.
Expensive verification for prod id: 10000
Product Id 10000 is available.
Expensive verification for prod id: 1
Expensive verification for prod id: 2
Expensive verification for prod id: 3
Expensive verification for prod id: 4
Expensive verification for prod id: 10000
If I buy one of each product I'll pay: $111.03

See? If I didn't know how ProductCatalog.AvailableProducts was implemented I would be stumped by this behavior. Looking at this from the caller's point of view, I was just trying to use an object's collection property twice, probably thinking the return value would be the same collection of objects each time.

Well, they were the same individual Product objects in each call but they most definitely were not packaged in the same collection structure, and I don't know about you but I would never, in a million years, think that accessing that property would cause the collection to be rebuilt each time.

What's an API designer to do?

My suggestion in situations like the above, where the collection is closer to an object feed than a fixed list, is to implement that member as a method, not a property. Callers are more likely to associate a method call to something expensive than in the property access case.

Properties carry a historical expectation of being cheap to use and consistent. If your property's design includes the chance of not honoring this expectation, like in my example which depended on an injected IInventoryService, then use a method instead.

If you can't use a method for whatever reason, try to at least lazy load or cache the returned collection.

The last thing you want is requiring your callers to know implementation details of all your collection-returning members to decide how they should use the collection.

So this is not about the return type, huh?

As you may have noticed, the problem I went into had nothing to do with IEnumerable<T>, it could have happened with any of the other mentioned types — they're all enumerable anyway. I could certainly convert my example to IList<T> and still have the same problem.

The reason I called out IEnumerable<T> more explicitly is that it seems to happen more with it than the other ones. Maybe that's because the LINQ extension methods return IEnumerable<T> or because an IList<T> property backed by a fixed List<T> collection is a very common implementation choice.


Posted 05-09-2010 11:20 AM by sergiopereira
Filed under:

[Advertisement]

Comments

James wrote re: Careful with those enumerables
on 05-09-2010 4:27 PM

And returning IList<T> is bad from an API design point of view if the collection is not to be mutable :)

I've found ICollection<T> to be useful in scenarios where you dont want to expose the ability to modify a collection, but people may still want to know how many items are in it.

You could also just put a .ToArray() in the end of your where clause in AvailableProducts to prevent the enumerable from being lazy evaluated twice.

sergiopereira wrote re: Careful with those enumerables
on 05-09-2010 5:13 PM

@James,

Adding the ToArray() at the end would fix my problem in this particular example because I'm saving the result in that "available" local variable. If I had used the property directly twice I would still see the whole thing run again.

My point is that I would still need to know that I should save the results of a simple property access.

armin wrote re: Careful with those enumerables
on 05-10-2010 7:42 AM

I like the idea of using a function to indicate, that computation might be expensive.

Additionally, I suggest to add some "indication" in the function name. Usually I prefix such methods with "Compute" or "Query" to make them look more expensive in terms of performance.

sergiopereira wrote re: Careful with those enumerables
on 05-10-2010 9:08 AM

@armin, if you know for sure it will be expensive then giving a hint in the name could be a good idea. Many times, though, like in the example, you may not know if the operation is going to be expensive because it depends on what is passed in as that IInventoryService instance.

Nick wrote re: Careful with those enumerables
on 05-10-2010 10:12 AM

What you have is the classic Select N + 1 issue associated with lazy loading. It looks like your InventoryService is coupled to your Product; so, you could implement a GetAvailableProducts() method in your service and call it once in your Populate function storing the sequence in a local member variable. When a caller calls AvailableProducts, return _availableProducts.Intersect(_products)...this would satisfy the assumed contract. However, this is all moot if you don't have control over the service.

Carsten wrote re: Careful with those enumerables
on 05-10-2010 11:57 AM

hmm... I never had any problems with this behavior (indeed I expect none the less - you are calling GetEnumerator() twice so of course the enumeration is changing).

I many FP languagues there are some kind of "memoized lists/enumerations/whatever" that helps you prevent this.

And it's not to hard to implement such an type/functor (aka extension method on top of IEnumerable that returns one of your MemoizedEnumerable-class) - just use the allready mentioned ToArray() method inside your wrapper.

Of course you might implement your ProductCatalog a bit cleaner .... Lazy comes to mind

arnshea wrote re: Careful with those enumerables
on 05-10-2010 2:50 PM

The guidance for Properties, in particular at msdn.microsoft.com/.../ms229006.aspx , would have prevented this problem.

"Do use a method, rather than a property, in the following situations..... The operation returns a different result each time it is caled...."

xe wrote re: Careful with those enumerables
on 05-10-2010 3:42 PM

If a collection is supposed do be immutable ReadOnlyCollection<T> from System.Collections.ObjectModel namespace could be a good choice

sergiopereira wrote re: Careful with those enumerables
on 05-10-2010 6:13 PM

@xe, the problem (as I mention in the last two paragraphs) is not really with the chosen return type.

You could have the same problem returning read-only collections if you weren't careful to avoid repeating the looping through the source elements - assuming that is possible at all.

Brad wrote re: Careful with those enumerables
on 05-10-2010 7:06 PM

You could also expose the collection as IQueryable<T> so the consumer of the API knows there is a query that is deferred-executed (and will be a evaluated each time).

Balaji Birajdar wrote re: Careful with those enumerables
on 05-11-2010 12:14 AM

Good catch Sergio.

I usually use IEnumerable<T>  to return my collections. But now I think I should carefully think about the return type.

sergiopereira wrote re: Careful with those enumerables
on 05-11-2010 9:20 AM

@Balaji, please not that just changing the return type will not really prevent this type of situation.

The problem here is making it clear for your callers that the data is expensive to get or trying to minimize that cost.

Dan wrote re: Careful with those enumerables
on 05-12-2010 12:35 AM

I'm really confused about this. Can anyone explain why the behavior is not reproduceable if I switch the Console.Out.WriteLine in the IsProductInStock method vs a Thread.Sleep. Then it will only be writing one line for each product...

Eric Smith wrote re: Careful with those enumerables
on 05-12-2010 1:22 AM

@sergio, shouldn't your article be titled "Careful with those properties"?

Dan wrote re: Careful with those enumerables
on 05-12-2010 5:12 AM

Sorry 'bout my last incomprehensible comment. Please ignore it.

I was mislead by the example and I can agree upon that the IEnumerable<T> can be quite misleading for properties.

This post clearly exemplifies that you need to be careful with the IEnumerable types for performance issues and not see these as constant lists. Rather see them as the foreach loops they are.

sergiopereira wrote re: Careful with those enumerables
on 05-12-2010 9:20 AM

@Eric, kind of. In the post's example, if we had changed the AvailableProdutcs to end in an .ToList() or .ToArray(), as @James pointed out,  the problem would go away because I'm saving the property value in a local variable. So there's something to be said about purely yield-backed IEnumerables.

Eric G wrote re: Careful with those enumerables
on 05-17-2010 9:28 AM

I think the guidance on method vs. property is helpful.   You don't expect the result of a property to change between calls unless you changed it.

Then couple that with consistent implementation of the LINQ rule, the client should realize an IEnumerable with ToArray or ToList and store locally when you want to ensure an enumeration won't be re-evaluated.

The main problem I see with AvailableProducts returning either ToList() or ToArray() as a realization of IEnumerable is you've just created the problem on the other side of the coin.  A client that might EXPECT a local reference to an IEnumerable to be re-evaluated each time it's enumerated.  A client should assume nothing about what actually implements the IEnumerable and the server shouldn't require that the client be aware of the realization in order be implemented correctly.  

sergiopereira wrote re: Careful with those enumerables
on 05-17-2010 10:01 AM

@Eric even if you make that property end with .ToList() or .ToArray() it will still be evaluated each time it's called.

What will feel bizarre is that if you store the result of the first call in a local var it behaves one way and if you use the property directly each time it will behave another way.

Even changing the property to a method that returns IEnumerable will still feel surprising when you store the result of the method in a local var and see the whole internal loop being execute each time you use the local var.

Eric G wrote re: Careful with those enumerables
on 05-17-2010 2:14 PM

@Sergio, We're on the same page.  I agree that semantically AvailableProducts is better being a method.  

"The last thing you want is requiring your callers to know implementation details of all your collection-returning members to decide how they should use the collection."  

I fully agree.  When a client calls catalog.AvailableProducts, whether as property or method, all the client knows is 'I have an IEnumerable'.  It might be a list, an array, an IQueryable, a result from yield, a foo:IEnumerable, whatever.  The client can't, or shouldn't, know which it will be just by seeing IEnumerable.

If the server complets with ToList() or ToArray() you'll end up with inconsistent behaviour using the local var vs. the property/method.  Backing the return with yield alone provides consistency.  It may then be surprising that enumerating the local var twice causes re-evaluation, but that's consistent with LINQ in general.

When designing an API I would aim for consistency with LINQ.  And the client is then responsible for deciding if the IEnumerable should be realized as and array or list to avoid re-evaluation.

I'd go so far as to say that if the call completes with ToList() or ToArray() then the return type should be array or IList, not IEnumerable.  The API has made a decision about what the appropriate realized return type should be and should make that decision known.  It also removes any doubt if enumerating over the returned value multiple times will cause re-evaluated.

sergiopereira wrote re: Careful with those enumerables
on 05-17-2010 3:22 PM

@Eric,  I like that.

About The CodeBetter.Com Blog Network
CodeBetter.Com FAQ

Our Mission

Advertisers should contact Brendan

Subscribe
Google Reader or Homepage

del.icio.us CodeBetter.com Latest Items
Add to My Yahoo!
Subscribe with Bloglines
Subscribe in NewsGator Online
Subscribe with myFeedster
Add to My AOL
Furl CodeBetter.com Latest Items
Subscribe in Rojo

Member Projects
DimeCasts.Net - Derik Whittaker

Friends of Devlicio.us
Red-Gate Tools For SQL and .NET

NDepend

SlickEdit
 
SmartInspect .NET Logging
NGEDIT: ViEmu and Codekana
LiteAccounting.Com
DevExpress
Fixx
NHibernate Profiler
Unfuddle
Balsamiq Mockups
Scrumy
JetBrains - ReSharper
Umbraco
NServiceBus
RavenDb
Web Sequence Diagrams
Ducksboard<-- NEW Friend!

 



Site Copyright © 2007 CodeBetter.Com
Content Copyright Individual Bloggers

 

Community Server (Commercial Edition)