Posterous
Greg is using Posterous to post everything online. Shouldn't you?
Profilepic_thumb
 
blog | projects | resume

Using App Engine's task queue to break problems down

Loading mentions Retweet

Two weeks ago when Twilio announced their developer contest for their new SMS API, I decided to build a mobile application that let me query the Madison Metro bus system to determine when my bus would arrive.

Although the entry did not win the contest, it was named mashup of the day last week at Programmable Web! It's called SMS My Bus, and if you live in Madison and ride the bus I encourage you to take advantage of it! You can find details about it here...

http://www.smsmybus.com

The basic architecture of the application is straight forward. SMS messages are sent to my Twilio phone number and Twilio routes them to my server via HTTP POST requests. I do a schedule look-up based on the user's input and return the results.

The tricky parts stem primarily from the fact that:

1. The Madison Metro doesn't actually provide web services for this data. The consequences for an app like mine is that I need to do a bit of screen scraping to find the data I'm after.

2. I chose to deploy this app using Google App Engine, and URL scraping can become a show stopper since GAE is resource limiting for every request that runs. GAE will only let you a single request for about 30 seconds.

Needless to say, I would love it if my fine city of Madison would join the Gov 2.0 movement, and open up more of its rich data via standard web services

In the meantime... I needed a solution that would allow me to gather disparate data across many, many URLS. As an example, the busiest stop in the Metro system has 34 buses passing through it. I may need to grab 34 different web pages to begin to piece together the schedule as it relates to the caller at that stop.

I took advantage of GAE's Task Queue API and memcache counters to tackle this problem by farming out autonomous jobs that find the next available bus per route per stop, and at the end aggregating the results.

Admittedly, this is not advanced Computer Science. But I hope other App Engine developers can find some use in the pattern.

1. Define my task queues

I used two different task queues to manage the process. One, called aggregation, that queried individual routes at a stop. And another, called aggregationSMS, that pieced the results together for the return SMS message.

 
- name: aggregation 
  rate: 20/s 
  bucket_size: 1 
 
- name: aggregationSMS 
  rate: 10/s 
  bucket_size: 1 

2. Spawn tasks

When an SMS request arrives, the request handler parses the input to determine the request parameters. If the request does not include a specific bus route, I'll query my route table for every route that passes through the respective stop. This table contains URLs for the the real time arrival estimates.

I loop over the result set and create new tasks for the aggregation task queue.

 
    q = db.GqlQuery("SELECT * FROM RouteListing WHERE stopID = :1",stopID) 
    routeQuery = q.fetch(100) 
    if len(routeQuery) > 0: 
        # create a counter for a universally unique caller ID 
        memcache.add(sid, 0) 
 
        # loop over every route at this stop 
        for r in routeQuery: 
          # the unique counter for this caller's request 
          counter = memcache.incr(sid) 
 
          # spawn a task for this stop/route tuple 
          task = Task(url='/aggregationtask', 
                      params={'sid':sid, 
                              'stop':stopID, 
                              'route':r.route, 
                              'direction':r.direction, 
                              'url':r.scheduleURL, 
                              'caller':caller 
                              }) 
          task.add('aggregation') 
    else: 
        # do some error handling 

3. Define the task handlers

There are two task handlers. One to tackle the smallest job of determining the schedule for an individual bus at a stop. And one to piece all of these results together when the system is ready to reply to the caller.

The task handler, AggregationHandler, does the specific work to scrape the scheduling information from the bus system's site. The handler does three things.

  • Scrape the web page to find the next stop time.
  • Store the results in the datastore
  • Decrement the memcache counter for this transaction

Many of the implementation details have been stripped out of the following code snippet...

 
 
class AggregationHandler(webapp.RequestHandler): 
 
 def post(self): 
 # extract the parameters for this task 
 sid = self.request.get('sid') 
 directionID = self.request.get('direction') 
 # more inputs as well... 
 
 # 1. fetch the real time data 
 result = urlfetch.fetch(scheduleURL) 
 
 # scrape the page 
 textBody = result.getNextTime() 
 
 # 2. store these results in the datastore 
 stop = BusStopAggregation() 
 stop.stopID = stopID 
 stop.routeID = routeID 
 stop.sid = sid # the sid identifies the caller's transaction 
 stop.text = textBody 
 stop.put() 
 
 # 3. decrement the counter 
 counter = memcache.decr(sid) 
 
 # if we've completed the scraping, create a task to 
 # piece the results together. 
 if counter == 0: 
   task = Task(url='/aggregationSMStask', 
                     params={'sid':sid,'caller':caller}) 
   task.add('aggregationSMS') 
 
 # delete the counter for this transaction 
 memcache.delete(sid) 
 
 return 
 

The task handler, AggregationSMSHandler, does the job of piecing the results together. It relies on the unique SID for a caller's transaction to query the datastore and find the scheduling details.

 
 
class AggregationSMSHandler(webapp.RequestHandler): 
 
 def post(self): 
 # extract the task's inputs 
 sid = self.request.get('sid') 
 phone = self.request.get('caller') 
 
 # sort the results by time to find soonest upcoming stops 
 q = db.GqlQuery("SELECT * FROM BusStopAggregation WHERE sid = :1 ORDER BY time", sid) 
 
 # we'll only send the next four stops in the reply message 
 routeQuery = q.fetch(4) 
 stopID = routeQuery[0].stopID 
 textBody = "Stop: %s\n" % routeQuery[0].stopID 
 for r in routeQuery: 
 textBody += "Route %s: " % r.routeID + " %s" % r.text + "\n" 
 else: 
 textBody = "Doesn't look good... Your bus isn't running right now!" 
 
 # send off the result via the twilio API 
 outboundSMS(phone, textBody) 
 

Results

This pattern allowed me to almost completely mitigate the DeadlineExceededExceptions on GAE. I've yet to see a timeout problem inside the app. It's always possible that a single task could take too long, but if a task fails, it will re-queue itself and run again until it succeeds.

It's worth pointing out that another use of the Task Queue that I used but didn't show in the code snippets were for other repeated, remote tasks. For example, when I interface with the Twilio API, I do that by spawning a task to do the work. Likewise, I log Twilio events in the datastore on their own task queue as well.

Filed under  //   google   madison   programming   projects   software  

Comments [9]

Software release humor

Loading mentions Retweet

The Google App Engine team announced a pre-release version of a new datastore implemention for the Python SDK. The announcement included the following disclaimer, which is too funny not to share.

Also, please bear in mind that this is pre-release code. It may wipe out all your data. It may cause the spontaneous generation of a black hole which swallows your cat. It may even work as expected! Nearly anything could happen.

Filed under  //   google   programming  

Comments [0]

Marathon training started this week

Loading mentions Retweet

Ok, so I haven't actually laced up the sneakers and hit the pavement yet. But it's Wisconsin and it's cold! Besides, I've been nursing my bizarre back injury back into form.

In the cold and with a bum back my training has consisted of consuming Born To Run by Christopher McDougall. I just finished it this week and am thoroughly inspired. I recommend it to anyone that is thinking about running a marathon or looking for motivation to become more active.

I made a pack with a friend of mine this past fall to run the Chicago Marathon. It's October 10th (221 days from today) which leaves plenty of opportunity for procrastination, injury, and excuses. I'm going to try to choose a different path so if you don't hear from me on this subject, call me out on it!

Filed under  //   exercise  

Comments [0]

Winter might actually end (no really - I have evidence!)

Loading mentions Retweet

Comments [0]

The sun has set...

Loading mentions Retweet

Where will it rise next?

Comments [1]

The evolution of the web (and standing on the shoulders of others)

Loading mentions Retweet

Max Ventilla, co-founder of Aardvark, has some great insight into building new web businesses today.

He argues that not every web business needs to be a destination site and you're actually better off building on top of the shoulders of existing incumbents first.

Aardvark was the sixth idea that we tried, following a string of failed prototypes. But all our ideas were subject to the restriction that they could not be a destination site. Any candidate idea had to be useful from within some other online application. Aardvark is designed to be a contact that is accessible from anywhere that contacts go (email, phone, IM…). It wasn’t until we were about eighteen months into the company that we finally built a full-fledged website. That seemed pretty remarkable for a *web* company but I think it will increasingly be there norm.

It's worth reading the entire post from Max.

As everyone has been flocking to the web to have "a presence" over the last twelve years, the paradigm has shifted. It's not about getting on the web now, it's about getting in front of the consumers where they are already congregating.

It's equivalent to the challenges of selling dog biscuits. Do you open the shop in the neighborhood strip mall, or do put all your effort into getting inside Target?

Filed under  //   startups  

Comments [0]

Mini CEOship - advice from Mark Pincus

Loading mentions Retweet

There was a great Corner Office interview with Mark Pincus in today's New York Times.

In it, he talks about a technique he's used where he required everyone to become a mini-CEO for something important within the company.

I’d turn people into C.E.O.’s. One thing I did at my second company was to put white sticky sheets on the wall, and I put everyone’s name on one of the sheets, and I said, “By the end of the week, everybody needs to write what you’re C.E.O. of, and it needs to be something really meaningful.” And that way, everyone knows who’s C.E.O. of what and they know whom to ask instead of me. And it was really effective. People liked it. And there was nowhere to hide.

While his reasoning is primarily based on his own management bandwidth, I suspect it has an awesome effect on company culture. I love this idea and plan to steal it.

The full interview is here.

Comments [0]

Vince Lombardi was wrong

Loading mentions Retweet

I just finished reading Seth Godin's "The Dip". A fast and potentially inspiring read to help you understand what is keeping you from becoming great in whatever you choose to do.

I had a couple of takeaways...

1. Vince Lombardi was wrong

One of Vince's famous quotes is, "Quitters never win and winners never quit." Godin does a nice job of squashing this mantra. Not only is quitting a perfectly viable option. But many times it is the smartest option. You can be a fool for not taking it.

The smartest people recognize when quitting is the right decision and when it's the wrong one (when it's just a matter of pushing through a dip). Ironically, by not quitting, you've set yourself up for failure. Smart people quit all the time and go on to win big.

2. Success happens in strange places

Godin argues that, "the dip is where success happens". I love this, and I think it's important to be aware of before venturing into anything new.

If you want to be the best, you are guaranteed to go through a dip. After all, if you there were no dip, everyone would come out on top.

3. The Dip is a dip itself!

I began to find it ironic that I was ready to put the book down half way through. It's short, but it's incredibly repetitive with the underlying messages.

Did Godin intentionally create a dip inside the book? :)

4. There needs to be a startup version of this book

One of the flaws with the book is that it comes from a marketers perspective. There is an underlying assumption that your idea/skill/product has a market fit, and a dip falls into one of eight categories (manufacturing, sales, education, risk, relationship, conceptual, ego, and distribution).

Startups are never that easy if you have not already established a fit in the respective market. It would be nice to see an analysis of dips for startups while they strive to find a market/product/price fit.

Filed under  //   books   startups  

Comments [0]

Facebook's identity crisis: privacy does matter

Loading mentions Retweet

The Crunchies Awards has produced a small (but not big enough) debate over Facebook's privacy strategy. CEO Mark Zuckerberg was far too dismissive about user privacy online during an interview with TechCrunch editor, Michael Arrington (video link here).

Ryan Healy agrees with Zuckerberg and tweeted some nice coverage of the debate. Ryan agreed with Zuckerberg's view that privacy is over. I said I didn't. Here's why...

Blogging and Twitter are the not the right data points for evaluating the state of online privacy as Facebook seems to be advocating. That's not the right way to describe how "people are changing". Those platforms are designed for public consumption. They're equivalent to the cork board at the coffee shop and the telephone pole at the bus stop.

The right comparisons for Facebook privacy are the sidewalk chats in the neighborhood, the water cooler conversations, and even the phone. For as long as the human race has been living in the same areas, we've been socializing and sharing stories. And there has never been a doubt about the people involved in those experiences.

With the help of Facebook, we can enjoy these social experiences in brand new ways and without the requirement of being in the same physical location. It's remarkably efficient, but what makes the experience so rewarding is the fact that the underlying social mechanics are the same as in our physical world.

But people aren't changing. The communication tools are. Let's hope Facebook realizes this before its too late.

Comments [0]

Kids building software (@emmatracy)

Loading mentions Retweet

I was smitten when my daughter told me she had an idea for a web application. We were at the book store and were talking about how expensive it can be to buy books. She thought it would be a great idea to create a website where her friends could list the books they each had and then setup book swaps. And poof! It was born...

Here are the mockups she drew today.

     
Click here to download:
Kids_building_software_emmatra.zip (7083 KB)

Filed under  //   kids   software  

Comments [5]