Python’s Yield

This week, I’ve been playing with implementing an HTTP client in Python. Why Python? It seemed like a straightforward language for this sort of thing, and the fact that I don’t know it at all is a bonus learning opportunity! In any case, I make no claim to be any kind of authority at all on the language.  I am certain there are better ways to do all of these things, but I don’t know them-yet!

After a bit of googling and reading and copying and pasting, I ended up with the following methods.

def getLine(s):
    line = ''
    for l in iter(lambda: s.recv(1), '\r'):
        if (l != '\n'):
            line += l
    return line

def getHeader(s)
    for line in iter(lambda:getLine(s), ''):
        yield line

def getContentLength(header):    
    for line in header:
        if (re.match("Content-Length: \d+$", line)):
            return int(line[line.find(": ")+2 : len(line)])
    return -1

def fetchlines(s):
    header = getHeader(s)
    contentLength = getContentLength(header)

# other methods to get message


Assume fetchlines is the entry to the above methods. First, we are getting header lines up to the first empty line, then, we are pulling the content length from those lines.  In later code (not shown), I get the body of the message based on the content length. But I had a strange problem. Here’s the header I was receiving:

HTTP/1.1 200 OK
Date: Wed, 26 Jan 2011 06:36:39 GMT
Server: Apache/2.2.4 (Unix) mod_ssl/2.2.4 OpenSSL/0.9.8a DAV/2 PHP/5.2.1
Last-Modified: Tue, 14 Dec 2010 17:23:19 GMT
ETag: "1104fb7-1ea9-4976213a4cfc0"
Accept-Ranges: bytes
Content-Length: 7849
Vary: Accept-Encoding
Content-Type: text/html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
...


I was expecting to pop everything down through the Content-Type line into header, then get the content length value in contentLength, then, socket reading being a one-way operation, the message reading code would start right on in at the DOCTYPE line.  Instead, I kept having a strange problem:  every time I started in on reading the message body, I ended up at the Vary line instead!  I wasn’t sure why: I was specifically looking to read down to first blank line, and there was no blank line before Vary.  I looked for hidden \r characters; nothing.

After some reading, I discovered the problem: yield.  Though looking at a few examples, I’d been under the impression that yield was essentially a nice and terse  loop-returning-collection construct; i.e. it would run through, collect up all the results, return as a collection all by itself.  I did remember hearing something about yield being *weird* in C#, but I hadn’t actually ever used it, and couldn’t remember was the problem was.   Besides, this was Python, and it seemed to be working, except for this little issue. How magical.

Uh... no.

Turns out this a case where it may have sorta looked like a duck, but it was something else entirely...  Though I was treating the function’s return value like the list I believed it to be, getHeaders wasn’t returning a list.  It was returning an iterator to a generator function.  A generator function, when called through an iterator, will run through its body, return a value back  to the accessor of the iterator at the yield, save state, and just hang out until you call for the next iterator item, at which point, it will resume on the line after the yield.  So the loop in getHeaders wasn’t actually run when the method was initially called-instead, each run through that loop is done once for each run through getContentLength’s loop... and since getContentLenth exits once it gets the content length, it never pulls the rest of the items from the iterator, and so, the Vary line never gets pulled in getHeaders... the blank line condition is never even hit, and the Vary line is still waiting to get pulled when it was time to read the message body-not what we want!

A quick fix? take out the yield and use a regular list. It adds an extra line or two over the yield, but this way, I’m sure the full header gets read before pulling the message body.

def getHeader(s):
    header =[]
    for line in iter(lambda:getLine(s), ''):
        header.append(line)
        print line
    return header


Lessons to take from this: 1) yield is NOT generating a list, it’s an iterator pointing to a generator, which is entirely  different, and on a broader note, 2) copying and pasting code that you don’t understand can result in behavior you don’t understand-or even worse, behavior that you only think you understand!


Posted 01-27-2011 12:40 AM by Anne Epstein

[Advertisement]

Comments

Dan S wrote re: Python’s Yield
on 01-27-2011 1:29 PM

for line in iter(lambda:getLine(s), '')

Add a Comment

(required)  
(optional)
(required)  
Remember Me?

About The CodeBetter.Com Blog Network
CodeBetter.Com FAQ

Our Mission

Advertisers should contact Brendan

Subscribe
Google Reader or Homepage

del.icio.us CodeBetter.com Latest Items
Add to My Yahoo!
Subscribe with Bloglines
Subscribe in NewsGator Online
Subscribe with myFeedster
Add to My AOL
Furl CodeBetter.com Latest Items
Subscribe in Rojo

Member Projects
DimeCasts.Net - Derik Whittaker

Friends of Devlicio.us
Red-Gate Tools For SQL and .NET

NDepend

SlickEdit
 
SmartInspect .NET Logging
NGEDIT: ViEmu and Codekana
LiteAccounting.Com
DevExpress
Fixx
NHibernate Profiler
Unfuddle
Balsamiq Mockups
Scrumy
JetBrains - ReSharper
Umbraco
NServiceBus
RavenDb
Web Sequence Diagrams
Ducksboard<-- NEW Friend!

 



Site Copyright © 2007 CodeBetter.Com
Content Copyright Individual Bloggers

 

Community Server (Commercial Edition)