blog | projects | resume

I've been exploring more of the Twilio API recently. Since creating Ringerous and SMSMyBus, I've begun looking into the transcription services. But I immediately ran into a challenge. The transcription service was never designed to be synchronous. So although the recording of a call is immediately available, the transcription appears to be intended for offline processing.

You can get a lot of mileage out of this, but there are applications where it would be nice to have access to the transcription immediately. Especially when accuracy is important. I was able to use the following pattern to achieve a true synchronous voice transcription...

1. Inbound call handler

The main handler for the initial inbound phone calls produces simple TwiML that prompts the caller and records what they say. Note that transcription is enabled and I specify a separate callback (which is asynchronous).

2. Recording handler

The recording handler processes the callback when the recording has completed. This is synchronous and thus occurs while the user is still on the original call. You can do whatever you'd like with the recording, but note that the transcription is not ready yet.

The most important thing to do during this step is to record the active CallSid so you can reference it in step three.

You can do whatever you'd like with the caller at this point, but the bottom line is they need to wait for the transcription to complete. So one thing to do is simply play them some pretty music. :)

3. Respond to the transcription

When Twilio's transcription engine completes, you'll get the callback you specified in the record verb in step one. Take a moment to store away the transcription text.

Now use the stored CallSid from step two along with the REST interface to interrupt the caller that is listening to that pretty music you're serving up for them. Specify yet another handler URL when you interrupt the call. You can serve up brand new TwiML that way.

4. Interrupt handler

The interrupt via the REST interface gives you an opportunity to treat the caller to any workflow you'd like. This is a great chance to ask them if the transcription is correct and gather their input - either via voice or textpad - and react accordingly. In this case, I use the Say verb to read back the transcription I've stored in step three.

That's it. Synchronous voice transcription. Now all you need to do is add your favorite API to the verificationHandler so you can hook it up to another app like Twitter, Posterous, Google Search, etc.

Do you know of an easier way to do this? If so, please share...

Filed under  //   programming   twilio  

Comments [0]

What a great couple of weeks for the tech scene in Madison! The Forward Technology Festival which included High Tech Happy Hour, the Forward Technology Conference, and a Capital Entrepreneurs open house, was an awesome display of Madison's technology community coming together to showcase itself.

The bookend for the festival was BarcampMadison which I helped organize. It's the kind of event that never really comes together until it starts so I was never bubbling with confidence. But that's why it was such a proud moment for Madison. The folks that turned out self-organized, participated, taught and listened. They demonstrated that there is a active and vibrant tech community working on some very cool projects.

We estimated that there were around 100 participants. A great showing for a sunny Saturday in August. With seven simultaneous session tracks and seven time blocks, we had nearly 50 presentations. Most of which were driven by group dialog and discussion. Topics included "Bootstrapping your company", all-things Drupal, "Intellectual Property 101", "RFID green shed", "Making fortune cookies", "DIY Liqueurs", and "Building your own search engine".

My best memories from the event included the interest level in the community related sessions. The guys from Sector67 had overflowing session rooms when they talked about their emerging hacker/maker space in Madison. And there was a big turnout and active discussion around how to reboot the web608 group. These are both great indicators for the demand in Madison to continue to grow and evolve community organization among technology enthusiasts.

Another great memory was the lightning talk session at the end of the day. With the help of some beer - sponsored by our friends at Urban Land Interest - we had rapid fire demos of projects and business that folks were working on. Anyone could get up and talk for five minutes and it generated a lot of great discussion and laughs. A terrific way to end the day.

And it would be a colossal mistake to forget our sponsors. Through their support - which ranged from financial assistance to free access to office space - we were able to create an easy, walk-in environment where campers could simply focus on learning and sharing for the ultimate price of FREE. Thank you to...

So I'm not only proud of the participants for putting together a great event experience, I'm thrilled to see that the businesses in Madison see the importance of making events like this happen.

A personal thanks to the other BarcampMadison organizers... Phil Crawford, Greg Tarnoff, Brad Grzesiak, Andrew Shell, Preston Austin, Jonathan Yankovich. These events never happen without a lot of volunteer work.

If you're looking for more artifacts from the event, check out the Flickr group as well as our Milwuakee friend, Pete Prodoehl who collected a lot of video and audio footage. If you missed the Madison event, make plans for BarcampMilwuakee, October 2nd and 3rd!

Comments [5]

With the demise of my marathon aspirations this year, I set my sites on a different test when I visited my parents in Bristol, Rhode Island last week. I have long wanted to swim around Hog Island, which you overlook everytime you visit their house. It's 3.5 miles, but the tougher test is managing the currents that move around the island.

In the end, I decided to scale back and simply swim around the Hog Island lighthouse that sits off the shore. In a straight line, we measured it to be two miles round trip. And the theory was that if I went at or near low tide, I could finish most of it with a slack tide and avoid fighting any current.

My sister and eldest daughter were kind enough to escort me across the channel in the dingy. As it turned out, I needed it! There was a boat towing that tried to skirt right in front of me (see pictures) and I'm not sure they would of ever seen me without them.

The swim felt great physically, but the mind plays ugly tricks on you when you can't see anything and you swim through slimy jellyfish. Seems kind of fun to stick your hand in a jar of jelly, but when you're going through pockets of 100's and you feel them all over your body, you can't help but shiver. And it's impossible not to start thinking about the other creators that could come up for a visit!

The only other challenge was getting around the lighthouse itself. In spite of our tidal planning, I still ran into a stiff current between the lighthouse and the island. At one point, I stopped and looked over at the dingy where my sister greeted me with, "Yah. You aren't moving anywhere right now."

After a two hundred yard sprint to get around the lighthouse, it was a quick swim home. Now I'm eying a circumnavigation of Hog Island next year.

Thanks again to my sister and daughter for the escort. The map below is from the My Tracks app from the dingy (which didn't stay next to me the entire time).

       
Click here to download:
Rounding_the_Hog_Island_lighth.zip (6609 KB)


View Larger Map

Filed under  //   exercise   travel  

Comments [0]

It's August... the bad news is that summer is almost over. The good news is that BarcampMadison is almost here!

I've been helping with planning and organizing of the event, but it dawned on me that I had yet to post something about the event here on my own blog. So here are the details...

When: August 28th, 2010
Where: 1 S. Pinckney St. 9th Floor (US Bank building on the square)
What: Barcamp is a spontaneously self-organized conference where presentations and sessions are determined based on the participants at the event. That means you get a chance teach and learn about all kinds of interesting topics.

There is an idea board where people have been posting session ideas. It gives a nice snapshot of what to expect. You can check it out here - http://bit.ly/ddiYdI

If you're joining us, please sign up at eventbrite - http://barcampmadison.eventbrite.com

And and to learn more, check out the main Barcamp event site - http://barcampmadison.com

Hope to see you there!

Comments [0]

Over the last couple of months I've been dabbling with a home-spun API for the Madison Metro transit system. It is based on the work that I did building SMSMyBus, the mobile app that lets you find bus arrival estimates in real time using Twilio's SMS API. When I built that app, I simply scraped data from the Metro's website. 

But when I was finished, it immediately became clear that there were lots of other applications that needed to be built using this (and other) transit data. A good example, is the status monitor on display at the Mother Fool's coffee house. So I set out to build an API that would enable anyone to build new transit apps without having to implement the ugly screen scraping techniques. After documenting a draft of the API, I set off trying to implement the server.

Just as I did with SMSMyBus, I built the API server on Google App Engine. After grinding through the pains of scraping poorly formed data on the Metro site, I immediately ran into performance problems and started blowing through my quota. And I did this without even turning on all of the routes in the system. I was forced to actually study the GAE APIs in more detail.

This post is intended to share the experience of tuning the performance, how I measured bottlenecks, and what I did to fix them.

Problem definition

The heart of the API is the continual consumption of a fire hose of data at Madison Metro. This was accomplished using is a list of cron jobs that scrapes location information for every route in the city.

The prefetch handler parses a text file for an individual route that is read from a URL. Each entry is categorized by stop ID or vehicle ID and the arrival estimate and status models are created accordingly. In the case of the longer routes, this could be as many as 480 status entries.

Step one : Identifying the bottlenecks with the Quota API

The first version of the prefetch routine focused exclusively on just getting my hands on the useful information. I didn't pay any attention to the use of datastore calls. I just wanted to get the regular expressions right, create new model entities and shove it in the datastore.

This approach worked just fine on the local dev server, but I was consistently hitting DeadlineExceeded exceptions in the production environment. I needed to identify which elements of the prefetcher were expensive. Enter the Quota API...

    from google.appengine.api import quota


    start = quota.get_request_cpu_usage() 
    # do all the magic 
    end = quota.get_request_cpu_usage() 
    logging.warning("The magic took %s cycles" % (end - start)) 

Assuming GAE was behaving correctly, I started with my own code, the parsing of the fire hose. That did not provide fruitful so I started to add up all the time I was spending in API calls. Although individual calls were small, they quickly added up over hundreds of calls. It became obvious that this type of serial access to the datastore was a contributing factor.

Step two : Batch puts

Previously, the apps I had been building did not require bulk datastore updates so I glossed over that section of the API. But it's easy to overlook. The db.put() function supports a list as an argument. So rather than storing new entities like this, status.put(), I collected a list of model instances in the main loop and followed it with, db.put(statusList). 

This had a dramatic impact on the overall performance. The time it took to loop through the route file went from 12,000 megacycles to 2,000 megacycles (!).

Step three : Memcache tricks

Even after the improvement, I still noticed that I was spending just as much time parsing the file as I was storing results, and that didn't seem possible. After more quota probing, I discovered that I was fetching the same StopLocation entities repeatedly when I needed to find details about a particular stop.

However, when I started memcaching these entities I was disappointed by the overall performance improvement. And this led to perhaps the most revealing aspect of this entire process. The following memcache pattern is slow... 

    # loop over hundreds of entries in the fire hose 
    for e in firehose: 
        stopID = getStop(e) 
        stopEntity = memcache.get(stopID) 
        if stopEntity is None: 
            stopEntity = getItFromDatastore(stopID)
        # parse firehose entry and do other magic


In fact, very slow when you aggregate it over hundreds of accesses. Even if you get a 100% hit rate in the memcache.

The good news is that it led me down another useful API path for the memcache. Just like the batch puts for model entities, you can set and get lists of objects. Furthermore, you can get and set multiple key values with one call - set_multi() and get_multi(). 

This again led to dramatic performance improvements. The time it took to loop through the route file went from 2,000 megacycles (after the batch put optimization) to 600 megacycles. 

Step four : Install Appstats

I still wasn't happy with my performance so I went fishing for more resources and found Appstats. Truly a great resource, but it's something that should be used at the beginning of the process and not the end. :) It gives you a quick overview of how efficient you are (or aren't) being with the datastore. In my case, I had already optimized as best I could, but this tool is now in the front pocket for all future projects.

 Lessons Learned

  1. Become familiar with the quota package. It's the single best way to get granular measurements about where your app is spending its time. I'd link to the API, but I can't!?! It seems to be the only GAE API that isn't documented. The best resource I've seen is the monitoring section found on the platform quota page.
  2. Install Appstat event recorders in every GAE application you create. It's the quickest way learn - at a high level - how effective your memcaching strategy is.
  3. Familiarize yourself with the entire suite of functions in the APIs you are using. Even if you don't know the details of every call, it will help your engineering when you can recall every tool in your toolbox.
  4. Memcache is your friend. Your best friend actually. This might be stating the obvious, but stated nonetheless.
  5. Memcache access is still slow when you aggregate lots of accesses. Use batch processing - even for the memcache.
  6. The taskqueue is the best way to cheat the thirty second handler limit. I didn't talk a lot about this specifically, but if you chunk up problems into smaller ones, the task queue can help you overcome the inherit time restrictions within GAE.

The part I left out... even the best effort can't overcome a bad idea

Even after all of this analysis, tweaking and optimization, I wasn't quite at my goal. I've optimized the primary worker loop, cached all the data that wasn't changing, and optimized the expensive operation of storing new data. Things were better, but they were also more spread out. Instead lots of small costly operations, I had a handful of still costly operations. And operations that were now outside my control.

Running this algorithm for 80+ routes still wasn't feasible without considerable hiking of my billable quota. I've resolved myself to believing that although App Engine is a terrific tool for building web apps, it was never intended to build apps like this. I remain skeptical that they've resolved their datastore performance problems as they've stated they have.

It's also worth pointing out that there is a huge knowledge void when it comes to the Quota API. It is not documented well - both in terms of its use as well as an understanding of how to interpret the results.

I'm going back to the drawing board to determine if there is a better way to access and cache this transit data. Who knows, maybe Madison Metro can save all of us the trouble and just build an API for the existing data! Wouldn't that be groovy?

Filed under  //   appengine   programming   projects   software  

Comments [9]

I opened the New York Times yesterday and said to myself, "I've seen this picture before." A tech leader hovering over a laptop with his child. Granted, I'm no Scott McNealy and Sharendipity is no Sun Microsystems. But I hope to see many more pictures like this in which technologists are using their craft to make education better.

   

"$200 Textbook vs. Free. You Do the Math" -- Ashlee Vance, New York Times. Photo credit: PeterDaSilva.

"Sharendipity aims to help Web creators of all ages" -- Kathleen Gallager, Milwaukee Journal Sentinel. Photo credit: Joe Koshollek.

 

 

Filed under  //   kids   sharendipity  

Comments [0]

Over the last few days, I've been camped out at a local pool for Madison's annual All City Swimming Championships. Twelve teams and 1,703 swimmers. A fun but exhausting couple of days. Managing your kids is a two-part challenge. The first part is actually getting them to the block on time for their race. The second part is trying to sort through the 100+ swimmers in the event to determine how well they did.

So I set out to solve the latter problem. I created an SMS notification system that sent text messages to parents letting them know what their swimmer's time was and more importantly, overall rank (note there are as many as 20+ heats for some events so you have no idea if they've qualified for the finals by simply watching). If a swimmer had a rank in the top sixteen, they'd have the opportunity to swim in the finals on Saturday.

As soon as the scorer's table has scored an event, they post the results to the web which was the trigger for the app to notify parents. It turned out to be hugely popular. So much so that I think there's a great opportunity to monetize it for next year's event. Parents at the meet and at work loved the little notes about their swimmers. Here's what one of my notifications looked like...

Tracy, Anna finished event 21 in 40.26, ranked 142

As much fun and useful as the app was, the most geeky and interesting element was actually the debugging part. The problem I faced was that I had no good way to test the app ahead of the meet. I had a pretty good idea how the meet scorers would format the data posted to the web, but there was no certainty. I also didn't have a lot of confidence that the app could actually follow the flow of the meet since the events don't go in order during the preliminary heats. 

Since I didn't have a laptop or access to the codebase, there wouldn't be a way to triage issues during the meet using traditional debug tools. The solution was to create SMS hooks that let me tweak the app as the meet unfolded. I combined this with the implementation of multiple regular expressions when parsing the results. I built the app on App Engine so I created multiple memcache'd variables that could be controlled via text messages from my phone. 

I coded in the following hooks...
  • The ability to set/reset the swim event being polled so I can re-run (or skip) events if there was a mistake
  • The ability to add new swimmers and phone numbers when parents wanted to be added to the app
  • The ability to disable the entire app. I was paranoid that I'd made a hideous mistake that continuously sent text messages out to users and I wanted a way to shunt the entire app.
  • The ability to query the app to determine the current event being monitored
  • The ability to modify the URL base variable used to find the results on the web
In addition to these hooks being controlled with inbound SMS, I also had a few app events that triggered outbound SMS to my phone so I knew it was behaving correctly. 

The hooks turned out to be invaluable on the first day. Both for keeping the app on the right event as well as adding parents to the app. I didn't actually have to use the emergency kill switch although it was nice to text 'disable' at the end of each day to make sure something didn't happen overnight.

The only downside to these hooks was actually a bug in the Google Voice app on my Droid. I was using a Google Voice number for the inbound events and one out of three texts I sent actually resulted in duplicate messages. The downside was when I added a new user, it added them twice (resulting in duplicate notifications when that swimmer swam!) and when I reset the event number, it reset it twice. 

Here's a look at the main handler for the inbound Twilio messages...

I forgot to take a picture in front of the results board at the meet. There's only one results board (for 1,700 swimmers), and each event is printed with a 10 point font. Most parents received their notification messages thirty minutes before the results board was updated!

Filed under  //   appengine   programming   projects   software   twilio  

Comments [4]

I just scrolled through some history of this blog and ran across a bunch of older posts on my marathon aspirations, and it dawned on me that I've yet to report on the lousy news that I've shut it down for the year.

Everything was moving along swimmingly until the end of May when I basically woke up one day to a bad knee. I have no explanation for it, but after taking six weeks off and trying to ramp again, I simply can't get it better. And I was fighting a shrinking calendar. So instead of getting carried off the course on a stretcher I simply decided to push this off a year and get some treatment on the knee.

If you look back at the last marathon update post, you'll be able to laugh at my own prediction for failing on this mission - "staying healthy".

The flip side of this, however, is that the friend that had originally challenged me to the marathon in the first place can't say he isn't ready. He'll now have over a year to prepare for it. :)

If anyone is looking for a bib to the sold-out Twin Cities Marathon on October 3rd, email me!

Filed under  //   exercise  

Comments [0]

You can find the original post here... http://www.techinmadison.com/greg-tracy/

Comments [0]

Especially of the flavor that isn't self-serving. The world needs more people like this guy. Passionate believers that are motivated by what feels right to them at the moment. Sometimes those ideas are right and sometimes they're wrong, but it shouldn't stop you from trying.

There are organizations and activities everywhere that need people to step in and lead a dance. Go find your music and don't worry about the  naysayers. Because when you're right like this guys is, you make the world a better place.

UPDATE: There's a nice talk from Derek Sivers at TED that uses this same video to highlight the fact that sometimes its just as important to be an early follower to make a movement happen. (Thanks for sharing, Gina)


Comments [1]