Posterous
Greg is using Posterous to post everything online. Shouldn't you?
Profilepic_thumb
 
blog | projects | resume
Filed under

software

 

Using App Engine's task queue to break problems down

Loading mentions Retweet

Two weeks ago when Twilio announced their developer contest for their new SMS API, I decided to build a mobile application that let me query the Madison Metro bus system to determine when my bus would arrive.

Although the entry did not win the contest, it was named mashup of the day last week at Programmable Web! It's called SMS My Bus, and if you live in Madison and ride the bus I encourage you to take advantage of it! You can find details about it here...

http://www.smsmybus.com

The basic architecture of the application is straight forward. SMS messages are sent to my Twilio phone number and Twilio routes them to my server via HTTP POST requests. I do a schedule look-up based on the user's input and return the results.

The tricky parts stem primarily from the fact that:

1. The Madison Metro doesn't actually provide web services for this data. The consequences for an app like mine is that I need to do a bit of screen scraping to find the data I'm after.

2. I chose to deploy this app using Google App Engine, and URL scraping can become a show stopper since GAE is resource limiting for every request that runs. GAE will only let you a single request for about 30 seconds.

Needless to say, I would love it if my fine city of Madison would join the Gov 2.0 movement, and open up more of its rich data via standard web services

In the meantime... I needed a solution that would allow me to gather disparate data across many, many URLS. As an example, the busiest stop in the Metro system has 34 buses passing through it. I may need to grab 34 different web pages to begin to piece together the schedule as it relates to the caller at that stop.

I took advantage of GAE's Task Queue API and memcache counters to tackle this problem by farming out autonomous jobs that find the next available bus per route per stop, and at the end aggregating the results.

Admittedly, this is not advanced Computer Science. But I hope other App Engine developers can find some use in the pattern.

1. Define my task queues

I used two different task queues to manage the process. One, called aggregation, that queried individual routes at a stop. And another, called aggregationSMS, that pieced the results together for the return SMS message.

 
- name: aggregation 
  rate: 20/s 
  bucket_size: 1 
 
- name: aggregationSMS 
  rate: 10/s 
  bucket_size: 1 

2. Spawn tasks

When an SMS request arrives, the request handler parses the input to determine the request parameters. If the request does not include a specific bus route, I'll query my route table for every route that passes through the respective stop. This table contains URLs for the the real time arrival estimates.

I loop over the result set and create new tasks for the aggregation task queue.

 
    q = db.GqlQuery("SELECT * FROM RouteListing WHERE stopID = :1",stopID) 
    routeQuery = q.fetch(100) 
    if len(routeQuery) > 0: 
        # create a counter for a universally unique caller ID 
        memcache.add(sid, 0) 
 
        # loop over every route at this stop 
        for r in routeQuery: 
          # the unique counter for this caller's request 
          counter = memcache.incr(sid) 
 
          # spawn a task for this stop/route tuple 
          task = Task(url='/aggregationtask', 
                      params={'sid':sid, 
                              'stop':stopID, 
                              'route':r.route, 
                              'direction':r.direction, 
                              'url':r.scheduleURL, 
                              'caller':caller 
                              }) 
          task.add('aggregation') 
    else: 
        # do some error handling 

3. Define the task handlers

There are two task handlers. One to tackle the smallest job of determining the schedule for an individual bus at a stop. And one to piece all of these results together when the system is ready to reply to the caller.

The task handler, AggregationHandler, does the specific work to scrape the scheduling information from the bus system's site. The handler does three things.

  • Scrape the web page to find the next stop time.
  • Store the results in the datastore
  • Decrement the memcache counter for this transaction

Many of the implementation details have been stripped out of the following code snippet...

 
 
class AggregationHandler(webapp.RequestHandler): 
 
 def post(self): 
 # extract the parameters for this task 
 sid = self.request.get('sid') 
 directionID = self.request.get('direction') 
 # more inputs as well... 
 
 # 1. fetch the real time data 
 result = urlfetch.fetch(scheduleURL) 
 
 # scrape the page 
 textBody = result.getNextTime() 
 
 # 2. store these results in the datastore 
 stop = BusStopAggregation() 
 stop.stopID = stopID 
 stop.routeID = routeID 
 stop.sid = sid # the sid identifies the caller's transaction 
 stop.text = textBody 
 stop.put() 
 
 # 3. decrement the counter 
 counter = memcache.decr(sid) 
 
 # if we've completed the scraping, create a task to 
 # piece the results together. 
 if counter == 0: 
   task = Task(url='/aggregationSMStask', 
                     params={'sid':sid,'caller':caller}) 
   task.add('aggregationSMS') 
 
 # delete the counter for this transaction 
 memcache.delete(sid) 
 
 return 
 

The task handler, AggregationSMSHandler, does the job of piecing the results together. It relies on the unique SID for a caller's transaction to query the datastore and find the scheduling details.

 
 
class AggregationSMSHandler(webapp.RequestHandler): 
 
 def post(self): 
 # extract the task's inputs 
 sid = self.request.get('sid') 
 phone = self.request.get('caller') 
 
 # sort the results by time to find soonest upcoming stops 
 q = db.GqlQuery("SELECT * FROM BusStopAggregation WHERE sid = :1 ORDER BY time", sid) 
 
 # we'll only send the next four stops in the reply message 
 routeQuery = q.fetch(4) 
 stopID = routeQuery[0].stopID 
 textBody = "Stop: %s\n" % routeQuery[0].stopID 
 for r in routeQuery: 
 textBody += "Route %s: " % r.routeID + " %s" % r.text + "\n" 
 else: 
 textBody = "Doesn't look good... Your bus isn't running right now!" 
 
 # send off the result via the twilio API 
 outboundSMS(phone, textBody) 
 

Results

This pattern allowed me to almost completely mitigate the DeadlineExceededExceptions on GAE. I've yet to see a timeout problem inside the app. It's always possible that a single task could take too long, but if a task fails, it will re-queue itself and run again until it succeeds.

It's worth pointing out that another use of the Task Queue that I used but didn't show in the code snippets were for other repeated, remote tasks. For example, when I interface with the Twilio API, I do that by spawning a task to do the work. Likewise, I log Twilio events in the datastore on their own task queue as well.

Filed under  //   google   madison   programming   projects   software  

Comments [9]

Kids building software (@emmatracy)

Loading mentions Retweet

I was smitten when my daughter told me she had an idea for a web application. We were at the book store and were talking about how expensive it can be to buy books. She thought it would be a great idea to create a website where her friends could list the books they each had and then setup book swaps. And poof! It was born...

Here are the mockups she drew today.

     
Click here to download:
Kids_building_software_emmatra.zip (7083 KB)

Filed under  //   kids   software  

Comments [5]

Posterous API: A wish list

Loading mentions Retweet

I've been building a number of different applications with the Posterous API over the last few months. I've written about most of these experiences here before.

I've used the reading API primarily within Sharendipity, but have also used the posting API with the Ringerous application that lets you post to your Posterous by phone.

The API is very easy to use and works as advertised. And while the simplicity still offers plenty of access to the Posterous platform, I think there are some really wonderful opportunities if the API continues to evolve and adds functionality.

Posterous is proving to be the Uber-Twitter platform, and one of the ways to accelerate the growth and diverse uses is through advanced applications that interface with Posterous content in new ways.

Based on my experience with the API, here's what I'd like to see be improved:

OAuth support for user authentication

This is by the far the most important feature and is really a core requirement for any application using the posting API.

One of the big problems when I built Ringerous was that it required every user of the service to give me their password. I have to post on their behalf on the backend so there is no way to prompt for a password while the user is using the app. Giving up a password is a tall order.

OAuth makes this problem go away.

A search API

Content is king. But only if you can find it. Posterous is a constant stream of great new and timely posts. For applications that are not revolving around a single user, a search API is needed.

A public timeline feed

I'd like to see Posterous add an "explore" call that is equivalent to http://posterous.com/explore/ which is available via the web today.

This page is a fun way to explore the diverse body of content being shared every minute by the Posterous community.

A user subscription feed

Similar to the explore feed, it would be nice to provide an API call for the user subscription feed so it would be possible to present a user's Posterous network of blogs.

Enable granular control for autopost

Currently, the autopost feature is either on or off when posting through the API. Just like email, there are use cases where the user may want more control over the services being updated. It would be nice to provide this feature in the posting API.

Hook up Posterous notification emails

This is likely an easy one. For whatever reason, the email notification system for subscribers is not hooked up for posts that come via the API. This significantly limits the communication benefits of Posterous for group sites.

Provide access to user profile data

The API is missing a user profile call. Providing access to details about the user such as description, thumbnail, and favorites adds a personal touch, a sense of community, and a method for exploring when third-party applications need to provide content navigation tools.

The current API is a great start, but it is clearly geared toward mechanical tasks. It's no mystery that so many of the existing API implementations are utility tools for porting blog content from other vendors.

But there are great opportunities for Posterous and its developer community to add new functionality and experiences to the platform.

 

       
Click here to download:
Posterous_API_A_wish_list_tagp.zip (116 KB)

Filed under  //   posterous   software  

Comments [1]

Five uses for Ringerous

Loading mentions Retweet

To celebrate being named one of the Best New Mashups over at Programmable Web this week, I'm going to list the top five use cases for Ringerous.

Family blogging


This was the original intention of the service. My extended family uses Posterous to share news, photos, and video with one another all over the country. I wanted to get the youngest and oldest in the family involved as well without the need to email.

Now my kids are calling in to the blog from their sporting events and from the backyard to announcement their latest and greatest personal achievements. All in their own voices.

Mobile blogging (for the rest of us)


There are a couple billion people in the world with mobile phones and only a fraction of them are smart phones. Ringerous has proven to be a great mobile blogging tool for the rest of us.

Podcasting in the classroom


In classrooms, Posterous can be a great resource for collaboration projects. When those projects involve story telling, interviews, or reporting, Ringerous is a good medium for recording and sharing it.

Combine this with the drop-dead simple podcasting you can do with iTunes, and you get a great distribution model for the students and teachers as well.

Bring emotion to your posts


Blogging is a great way to communicate and share stories, but text and pictures often don't tell the whole story. Sometimes, there's no better way to capture the excitement (or despair) of a moment than hearing the voice of friends and loved ones.

Public voicemail


I'm waiting for someone to create topical, public voicemail boxes with Ringerous. Perhaps an inbox for Santa so you can tell him what you want for Christmas!? :)

How are you using Ringerous?

http://www.ringerous.com

Filed under  //   posterous   ringerous   software  

Comments [0]

Mini YouTube TVs

Loading mentions Retweet

Sharendipity + YouTube Data API = Creative Goodness

Sharendipity is an awesome way to tap into your favorite web services. I tapped the YouTube API to create this fun little TV showing the Muppets Studio videos.


Want your own TV? Go create your own and set the channel and skin and then embed it on your site. Interested in something else? Let me know - I'm always looking for fun projects to work on.

Filed under  //   google   programming   sharendipity   software  

Comments [0]

Google App Engine - a first timer's experience

Loading mentions Retweet

I discovered Google App Engine by accident several months ago when I first looked into building a robot for Google Wave. It was very much a bookmark-and-move-on kind of an introduction.

I eventually did get back to the bookmark and explored GAE some more and have become a huge fan. For starters, it is very much in the spirit of our mission at Sharendipity - providing tools that make it easier for everyone to create custom web applications.

App Engine still requires app creators to know how to program, but it provides an awesome infrastructure for deploying and scaling applications on the web. Without spending a penny, developers get all sorts of goodies including...

  • A data store for easy database creation
  • Built-in user management using standard Google accounts
  • Built-in logging
  • Cron jobs to manage scheduled tasks
  • Task queues to schedule and manage autonomous jobs
  • An application dashboard for analytics and viewing of application data

With a (free) daily quota of 1.3M requests per application, App Engine is a great way to start a new product. As your product grows, you can move into billable services to increase your quotas.

My Experiment

I needed to find an application to build that met the following criteria...

  • Limited amount of new programming since my time is overbooked already.
  • Enough complexity that I could explore App Engine features beyond the "Hello World" tutorial.

So I decided to port an existing service that I had built in grad school - an email distribution list for the Astronomy Picture of the Day (APOD). Previously, this was being hosted using my alumni account at the University of Wisconsin, Madison.

The APOD email service proved to work great because it fit both criteria. There was very little new programming to do since I'd already built it once. And it let me explore several elements of programming within App Engine including...

  • The use of webapp - they're web application framework for templating and handling requests.
  • The creation of tasqueue tasks to throttle outbound emails.
  • The use of the datastore to manage email subscribers.
  • The use of cron jobs to schedule the daily APOD emails.

The Hangups

The two challenges up front were learning Python plus the App Engine environment (including the APIs for the various services I needed). But the documentation for both is so thorough that it rarely held me up.

The quirks that actually caused friction were:

  1. The subtleties of the App Engine platform itself that are learned through trial and error.
  2. The non-deterministic nature of its performance.

This latter issue is the one thing that should bring pause to the decision of building out a business on top of the platform. However, I tend to be optimistic about this and assume it will improve as it matures.

In the mean time, however, I found myself actually managing bad performance in App Engine without any optimization of my own code. The code is too simple to be slow! One of the overriding quotas for App Engine is the per minute CPU quota. You have somewhere less than 30 seconds to complete a request. And while you wouldn't want to take anything near that for a web request, it becomes a little constraining for non-web requests like cron jobs and taskqueue jobs.

All of the jobs in the APOD application are small and constrained. Parse HTML, send an email, or loop through a list of email addresses. Yet, the time it takes to execute these changes wildly from day to day.

When the execution time exceeds the quota, you need to be prepared to manage the exception everywhere. When it happens in a taskqueue job, it can be particularly annoying since the task will re-queue itself - even if the meat of the job had already been completed.

After I initially deployed the app, it felt a lot like I was patching holes for a boat that was already in the water. I added more instrumentation and caught more exceptions until I mitigated all of the problems.

The most glaring problem appears to be a problem in the use of the Mail package. Sending email will frequently lead to DeadlineExceededError exceptions. Remote calls in a throttled environment like this should always be asynchronous.

It appears that they've done just this with remote HTTP requests. However, one of the subtle problems I had was the intermittent failure of urlfetch() calls. I've seen as much as 20% of these calls failing with DownloadError exceptions. As a result, I've built-in my own retry mechanism wherever urlfetch is used.

What's Missing

App Engine is awesome in its overall breadth and ease of use. But if I had to come up with a wish list, it would be the following...

  • An asynchronous Mail package
  • Better SDK tools for testing and simulating cron jobs and Mail actions.
  • As high as some of the quotas are, the email rate quota is too low (only 8 emails/minute). There is likely a very real concern about spam bots, but perhaps there could be an authorization process so legitimate applications could get higher quotas.

The App Engine is a great way to quickly explore new web application ideas. With an easy to use SDK, push-button deployment, and a wide array of built-in services, there has never been a better time to be a programmer.

Interested in Astronomy? Sign yourself up to receive the APOD picture each day - http://apodemail.appspot.com!

Filed under  //   programming   projects   software  

Comments [2]

Did you save your company?

Loading mentions Retweet

Apparently Evan Weaver did. That's one heck of a way to start a conversation if you're looking for a job.

Filed under  //   observations   software  

Comments [0]

Take your Posterous blog anywhere

Loading mentions Retweet

What I'd really like to do (some day) with the Posterous API is build a Brizzlyesque client for surfing Posterous blogs. But given the complete lack of free time in my life, I don't see that happening anytime soon.

But... I can whip out applications using the API inside of Sharendipity with remarkable ease. I've shown this before with games, slideshows, and other general goodness.

This time I created a widget that lets you take your entire Posterous anywhere. This widget has a nice vertical scroll effect to navigate all of your posts. Each post can be seen in its entirety if you click on the entry's panel.

Want your own? Use the customizer and configure your own Posterous hostname. Then use the embed code to insert your Posterous content anywhere!
 

 
Thanks to Dale for creating the Sharendipity framework for this widget.

Filed under  //   posterous   sharendipity   software  

Comments [0]

Reflecting on Mint's $170M exit

Loading mentions Retweet

Mint, a web based solution for personal money management, was sold to Intuit for $170M this week. A remarkable feat for a company that is just three years old.
 
In a blog post discussing the acquisition, Mint CEO and founder Aaron Patzer, reflects on how Mint was started, and poignantly highlights what might be considered the obvious in hindsight. That is, Mint set out to fix a very specific problem.

So that’s the Mint story. $0 to $170m in three years flat. While everyone else was doing social media, music, video or the startup de jour, we tried to ground ourselves in what any business should be doing: solve a real problem for people. Make something that is faster, more efficient, cheaper (in this case free), and innovate on technology or business model to make a healthy revenue stream doing it.

The trap for most is coming up with an idea first, and building the case for the problem being solved second. Order matters.
 
You can read the full post here.

Filed under  //   software   startups  

Comments [0]

Access should be easy

Loading mentions Retweet

I was in Menards the other day and feeling lost. I normally shop at Home Depot. It was new for me, but it's a hardware store. Some things should be easy.

When you're looking for shower heads, you go to the bath aisle. When you're looking for fasteners, you go to the fastener aisle. You might need to noodle around for a while, but you never need to leave these aisles. Home Depot gets this.
 
Menards actually has multiple fastener aisles. Oh, you need a washer for that bolt? Well that's in the next aisle.
 
Access should be easy.

  • Departments (web pages, dialog boxes, features) are not places where you go and wait for help. They should be places where you go to get things done yourself.
  • Put the help where the people (users, customers, eye balls) need help. Not at some hard to find base, web page, or dialog box.
  • Putting an elevated floor in a big-box store (hiding features in unknown menus) is neither an efficient use of space nor innovative. It's just confusing and impossible to get to with your cart. Don't change things in the name of innovation. Strive to meet expectations and solve problems. Sometimes this leads you to innovations.

Filed under  //   design   observations   software  

Comments [1]