Bus routes mapping project for Hyderabad

Welcome! Please scroll to the END to catch the latest updates!

Notes dump from 9.Dec.18 outreach session

Meetup link: https://www.meetup.com/swechafsmi/events/257026328/

I’m copying here the links and notes gathered at the event on a live etherpad doc. Others please feel free to chip in with writeups etc.

Disclaimer: The herokuapp.com links shared on this page are for demonstration only, and are using work-in-progress or flawed data. Please don’t treat anything there as authoritative or proper - it’s not. Also, open-source versions of those apps will take more time to be made generic and published. When it does get posted, it’ll be put up here: https://github.com/WRI-Cities/

Intro

What we want to achieve with Hyderabad’s bus system:

https://tracker.geops.ch/?z=11&s=1&x=-8227002.0399&y=4972217.8438&l=transport - a visualization of gtfs data for public transport systems in New York City

GTFS open data standard for public transport systems: https://github.com/google/transit/tree/master/gtfs/spec/en

Telangana Open Data Portal

https://www.data.telangana.gov.in/

Runs on DKAN, has open APIs for querying datasets

Routes Data Gathering

https://polar-basin-33741.herokuapp.com/

Stop Names Data Cleaning

Local OpenRefine instance:

192.168.1.74:3333 (this was on local network at the session time; expired now)

Decision-making repo for deciding standardized stop names: https://github.com/answerquest/hyd-stop-names-cleaning/issues

Routes Mapping

https://fuzzymapper.herokuapp.com/

Geo fencing the locations with multiple stops

https://drive.google.com/open?id=1ac6VOmkzaRoXU6f98BfKrVYJmK5HR9nl&usp=sharing

Task: This map has all the depots of Hyd. Pls geofence them all!

GTFS Spec

GTFS Manager developed in my larger project with WRI, hope to be able to manage the hyd bus data there once its is made in GTFS:

https://static-gtfs-manager.herokuapp.com/

Loaded with a partial, inaccurate gtfs data for Hyderabad

NYC Taxi Vis:

http://chriswhong.github.io/nyctaxi/

RunParticles : open source visualization creator, someone can do this with Delhi’s GTFS data right now.

http://renderfast.com/runparticles/

Delhi’s GTFS dataset:

http://otd.delhi.gov.in

GIS related:

datameet : an all-india mailing list with several GIS related disucssions going on. https://groups.google.com/forum/#!forum/datameet

More Visualisation links:

https://taxi.imagework.com/

http://chriswhong.github.io/nyctaxi/

https://www.behance.net/gallery/47411555/NYC-taxi-data-visualization-infographic-web-app

http://vgc.poly.edu/projects/taxivis/

Companies, NGOs in Hyderabad doing GIS work

Banyan Nation - http://www.banyannation.com/

1 Like

Answering a question for the herokuapp links:

1. Routes Info Collector

https://polar-basin-33741.herokuapp.com/

  • For data entry of sequence of stops in each route with depot representatives.
  • And for capturing any other information regarding that route.
  • We had scraped some data from third party websites so that’s there in the csv’s dropdown at top. But that wasn’t accurate.
  • So you load a route, then make changes to make it right, then you save it.
  • Then the saved route shows up in the second (jsons) dropdown.
  • This tool saves each route as a .json format file.
  • The second dropdown has depot-wise routes that we did data-entry of and saved.
  • Priority was to have a quick and easy mechanism to take in all the route’s info as it was verbally shared by the depot representative. Hence simple text editors etc.
  • We built support for capturing timings, but given the time crunch we were not able to capture all timings information this time around.

2. Route Mapper

https://fuzzymapper.herokuapp.com/

  • This tool lets you load a route from the dropdown again, but this time not as a textbox, but as a table and map that are interlinked.
  • As the name suggests, we “map” the routes here. We assign lat-long locations to each stop in the route.
  • There is a second table, when you load it, a databank of stops with latlongs is loaded as yellow dots on the map (you can switch off the layer when you want). This is better when you use the filters.
  • This databank is good, but it has multiple entries for each stop, so disambiguation is needed.
  • This was initially called “fuzzymapper” and it was for mapping large list of stops that have yet to be mapped. But that changed as we found that:
    a) simply typing into the filters was doing a better job; and
    b) we were able to map stops better when they were in a route than just an arbitrary list of stops.
  • So I tweaked the program changed to support route mapping instead.
  • One can edit a route here as well. But its a table so it takes more clicks etc.
  • We had originally developed this; but later when the session with depot reps was fixed we created the Routes-Info-Collector program with an interface optimized for fast data entry.
1 Like

Bulk mapping

  • While the data we’re getting from TSRTC is route-oriented, simply mapping each and every route will have lots of redundancy as there are common stops.
  • We have a large databank of stops collected from multiple sources.
  • I first tried automatically mapping all the stop names from the databank
  • But that produced very buggy results. For many names there are multiple possible matches in different parts of the city.
  • Then, we mapped one route manually.
  • These manually mapped stops became a new lookup dataset : We could put the same lat-longs for all other occurences of the same stop in the remaining routes.
  • So a multi-step bulk mapping program was created. First, match with manually mapped stops that are known to be more accurate. After that, for remaining stops for whom no match was found, match with the large databank.
  • A “sanity check” step was put at end to un-map “jumper” stops : auto-mapped stops that have been placed too far away from their previous and next stop on the route. This removed a lot of the clearly inaccurate matches.

Here’s the program, featuring work done on data that we had scraped from 3rd party sources, prior to the recent work with TSRTC depot officials. So please note that the data you’ll see is inaccurate - I’m sharing the program for the code and understanding the flow, and for inviting more ideas.

We ran this in iterations : Manually mapped one route, analyzed the extent of mapping, then manually mapped another and so on. The successive data-visualizations show a clear improvement in completeness of mapping. Mapping one route affects many other routes. Here’s a page showing successive results of 11 routes mapped:

http://nikhilvj.co.in/files/tsrtc/routemapper/scatter-plot-all.html

Q: When is the expected date to complete this?

A: Can’t say for sure right now. Oct end was the expected date of completion when we started out in August. We weren’t able to get our hands on proper data yet but learned several things along the way and I was able to make a wide range of scripts and generated many GTFS feeds and visualizations from which we learned where the data wasn’t proper yet, where work was needed etc. Then we got a boost in mid-November with the decision to bring in depot reps. So now we have a clear line to primary, “legit” information.

My parallel focus now (while we’re also doing massive data cleaning and cross-checking with agency officials) is on bringing together all our myriad programs (currently sitting as separate python notebooks and tornado-web apps, some of them I’ve shared above) to enable continuous incremental improvements to the data and automatic GTFS feed generation, so that there is a continuous iterative process of output, feedback, correction and output again. I’ll peg December-end (or maybe Jan 2nd week ~~) as the date for readiness of this programming setup. And Jan to Feb 2019 could be a phase of continuous iterative data enrichment. But I’m not in a position to say that’s when the data will be “ready”. For now,

It’ll be ready when it’s ready!

Of course, this is where the larger Hyderabad tech/GIS community’s involvement would be essential. We will need as many eyes as possible to look at the data and point out the where all changes are needed. I foresee a github repo for collecting feedback. Actually, getting some ideas while writing this itself.

PS: Thanks to Harish for asking this, on OSM Asia Telegram group.

Last week: After collecting all the routes and some days of refining / data cleaning, we printed and handed over to TSRTC officials these depot-wise files of route listings, so they can review and give corrections if any, mark wrong routes, add anything that was left out. (digital copies shared too but hey its a different feeling to have this in hand)

This was an initial immediate output we could give to TSRTC, “something tangible” of a 6-day exercise in which they had deputed one representative from each depot to come sit with us for a whole day to data-entry all their routes.

python-docx module has been used to generate the documents.
OpenRefine is being used to do data cleaning. We have hosted an instance on a DigitalOcean server so that multiple team members can work on the same data at their own time.

Dataset Release: Stop names unique list

17.12.18: After tons of data cleaning on the primary info gathered from every TSRTC depot, we have prepared a listing of unique bus stop names, with additional columns. You can download a copy or submit things there as comments:

Defining the columns:

stop_name_dataclean : stop name
count : number of times this stop is mentioned in the routes data
routes : list of routes the stop appears under
num_routes : count of previous column
depots : list of depots the stop appears under
num_depots : count of previous column

<fields where you can give suggestions>
general area : the broad area that this stop comes in.
precise lat-long : if you have exact lat-long co-ordinates for the stop (one side of the road or the other), share here. You can use tools like [latlong.net](http://latlong.net) to find it.
outside Outer Ring Road ? : put a "y" if the stop is located outside of Hyderabad's Outer ring road.
multiple locations? : put a "y" if the same stop name is repeated at >1 different locations.
remarks : your comments

One major task is mapping now, but there’s a lot more that can happen too. We invite your involvement in analyzing this dataset. Below is a list of possible activities

  1. Mapping
  2. Pointing out different names belonging to the same stop
  3. Pointing out if something is wrong
  4. Visualizations

The spreadsheet is open for comments /suggestions. If you want to participate in a bigger way then please let me know so I can add you as an editor there.

Notes on contributing on the spreadsheet:

  • Only provide lat-longs of a stop if you can certainly say that’s the stop’s location. First-hand knowledge is preferred over web search. If what you have is the approximate area, write that under “general area” column.
  • You’re welcome to try bulk geocoding etc but we’ve already tried some of that and the results weren’t too promising. If you get something, upload your own sheet on google drive and share the link. Please maintain the “sr” column to ensure cross-linkage - the names may change.
  • “Abcd” and “Abcd X Road” are assumed to be different, not the same. There’s many such cases, but there may be cases where they’re supposed to be one and the same thing only.
  • If you know a stop is supposed to be named differently, place a comment on its name.
  • We want to identify stops that are in the outskirts of the city. If you know such a stop, mark it by putting a “y” in the “outside Outer Ring Road ?” column.

Disclaimers:

  • The route names aren’t all properly hammered out yet. There is some repetition between depots, and there are route variations recorded as separate routes in some cases.
  • As primary stakeholders in the project, we reserve the right to decide which suggestions to adopt and which to ignore.
  • This data is of course not completely correct, which is why it’s being shared for taking corrections. Use in other activities at your own peril.

Please post a reply here on this forum if you are keen on participating in this. If you’re planning to do a visualization, give the general idea here in advance so someone else with a similar idea can take up something else.

Update: We’re doing mapping workshops on 5th, 6th Jan at JNTU, where we’ll map out the bus network.

Looking for volunteers, interns to conduct the event, guide the students.

Pls connect! Time committment needed : 4th, 5th, 6th Jan 2019, full 3 days.

Please contact back on nikhil.js[at]gmail.com .
#hyderabadBusMapping

Poster for 5th-6th event!

Hi Friends,

We had a great mapping session yesterday. Most valuable was the tacit local information of the participants.

We’re changing venue and timings today :

6th Jan : continuation of Hyderabad Bus Routes mapping and discussion about project

3rd Floor, Abhyaas (opp JNTU gate, behind ICICI bank)

1pm to 6pm (break around 3pm)