By Sim Shuzhen
SMU Office of Research & Tech Transfer – Today, cars, taxis, buses, trains and commuters don’t just leave physical traces in the form of tyre tracks or footprints. Thanks to technologies like portable global positioning system (GPS) units and electronic or ‘smart’ commuter cards, vehicles and people also leave behind digital traces – in the form of data about their whereabouts and movements.
This data has numerous applications in modern-day cities, making it much sought after by city planners looking to build more efficient, liveable urban areas; transport operators seeking to optimise routes and cut costs; and even companies eager to push personalised location-based services to commuters, just to name a few.
While data may be plentiful, the hard part is figuring out how to get the most valuable insights out of it, says Associate Professor Zheng Baihua of the Singapore Management University (SMU) School of Information Systems. Among other research topics, the computer scientist works on methods and tools for predicting transport routes based on urban mobility data.
“We try to find out the value of the data, and from there formulate different kinds of research questions and try to find the solution. Finding the solution is not that difficult, but finding a way to utilise the data, to me, is more challenging,” says Professor Zheng.
Honey, I shrunk the data
With GPS units constantly sampling vehicle locations at regular intervals, a taxi fleet can quickly generate enormous volumes of GPS trajectories. To make such datasets easier to store and work with, Professor Zheng and her colleagues developed a method – based on real data from a large taxi company in Shanghai, China – for compressing GPS trajectories while retaining the utility of the information.
Published in ACM Transactions on Database Systems and aptly named COMPRESS – short for Comprehensive Paralleled Road-Network-Based Trajectory Compression – the method maps GPS trajectories onto a road network; decomposes them into spatial and temporal components; and then uses specialised algorithms to compress each one separately.
For example, because the structure of the road network limits the number of possible trajectories that vehicles can move along, much of the spatial data may be redundant and can thus be compressed, says Professor Zheng. “GPS data is unlike other data, [as] movement is constrained by the underlying road network,” she explains.
While existing commercial software such as RAR or ZIP can compress a dataset to about a third of its original size, COMPRESS can go as low as one eighth, estimates Professor Zheng. If even higher compression ratios are needed, users can opt to ‘lose’ some data that is not relevant to them – for example, an exact timestamp (say 8:25AM) for a vehicle’s location may be overkill for an application that only needs to know its whereabouts within a ten-minute window.
“[COMPRESS] gives the user some thresholds or parameters to tune, which decide how accurate the information is when you decompress the data,” says Professor Zheng, adding that depending on the boundaries set, the framework can achieve compression ratios of as high as 20 to 30.
Reconstructing routes
Sometimes, instead of being bogged down by redundant data, researchers may run into the opposite problem: not having enough data to properly reconstruct vehicle routes, says Professor Zheng.
For example, if GPS units do not sample a vehicle’s location frequently enough, they could yield data points located far apart on the road network. “Between two points, there could be multiple routes available bringing the car from one point to the other. How do we recover the exact route taken by the car?” asks Professor Zheng.
By building models based on huge volumes of historical data, Professor Zheng can calculate the probability of drivers making a decision – whether to go straight or turn right at a junction, for example – on a given day and time. “We assume that the majority of drivers follow the same kind of decision-making process, and [so we can] try to recover the route taken by a car when the GPS sampling rate is very low,” she explains.
This same concept can also be used to identify what Professor Zheng calls outliers – vehicles that deviate from the route chosen by the majority of drivers. For taxi companies on the lookout for unscrupulous drivers taking the ‘scenic route’ to earn higher fares, outlier detection can potentially be a useful tool, says Professor Zheng.
Real-world problems, real-world data
In addition to her work on vehicle GPS trajectories, Professor Zheng is also a lead principal investigator at SMU’s Living Analytics Research Centre (LARC), where she focuses on gleaning insights from public transport data.
For example, working with a dataset of EZ-Link commuter smartcard transactions provided by Singapore’s Ministry of Transport, Professor Zheng is developing improved methods to reconstruct routes taken by commuters on public transport – a very different challenge than recreating taxi routes, she says.
“EZ-Link transaction data is very different from taxi [GPS trajectory] data. With taxi data, you have very fine-grained information – for example, you could have a data point every ten seconds or half a minute. But with EZ-Link data, you just have a tap-in and tap-out… [but] there are so many interchanges and so many different routes a commuter can take,” she explains.
To solve real-world problems, it’s crucial for researchers to base their studies on real-world data, emphasises Professor Zheng. “Sometimes in academia, we don’t have direct access to business problems. So while it can be difficult, we need to make sure we understand the problems faced by businesses or society, and then use the data we have to solve those problems,” she says.
Back to Research@SMU February 2019 Issue
See More News
Want to see more of SMU Research?
Sign up for Research@SMU e-newslettter to know more about our research and research-related events!
If you would like to remove yourself from all our mailing list, please visit https://eservices.smu.edu.sg/internet/DNC/Default.aspx