What is This Big Data Thing and Why Should Public Safety Care?
Web 2.0, social networking, SaaS, cloud… it seems like each year there is a new buzz word with which we are expected to be conversant. The latest is “Big Data”. While some of the buzz words du jour are business models or technology definitions, I would argue big data is much broader and its implications for public safety incredibly far reaching and impactful. Much has been written on the big data trend and how it can serve as the basis for innovation, differentiation and growth, but let’s take a look at why it is so transformative for public safety.
Big data is a popular term used to describe the exponential growth, availability and use of information, both structured and unstructured. Meaning, we are creating data of all types (text, images, video, audio) at an ever increasing rate. Often that data is not structured or easily categorized. One of our university clients alone has hundreds of video surveillance units running 24×7 covering their campus. While the video content is time stamped and they know the location of the unit, further analysis is manual due to the inherently unstructured nature of video (i.e. if they are searching for a specific individual or vehicle they would have to parse through hundreds of hours of footage – they can’t simply search on “white van”). While some advanced fusion centers or departments may have real-time video analytics, the ability to automatically analyze trends across a large number of disparate data sources in real-time is a long way from being commonplace.
In a 2001 research report Gartner analyst Doug Laney defined data growth challenges and opportunities as being three-dimensional, i.e. increasing volume (amount of data – think of the amount of video captured in my university example), velocity (speed of data in and out – imagine 250 million 9-1-1 calls per year, combined with huge amounts of real-time dispatch and incident data), and variety (range of data types and sources). Later, Gartner further refined that definition to: “Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.”(source: Douglas, Laney. “The Importance of ‘Big Data’: A Definition”.)
In 2012, the Obama administration announced the Big Data Research and Development Initiative, which explore how big data could be used across 84 different programs. The examples of big data in the federal government are numerous: http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_fact_sheet_final_1.pdf
So what are the implications of the big data trend for those of us in public safety?
First, let’s think of some relevant data type examples:
- Video feeds
- Arrest / Court documents
- Records Management system records (across jurisdictions)
- Health records, including images
- Sensor data
- Social media posts
- 9-1-1 calls
Let’s consider a hypothetical scenario based on a recent speech on Suspicious Activity Reporting I attended at 9-1-1 Goes to Washington. Let’s imagine two men rent a white van with cash and provide an international passport as proof of identification. Shortly there-after a 3-1-1 center takes a call about a large white van blocking a road near a garden center. The vehicle is not there when officers arrive. Next, a white van was cited for having an inoperable tail-light while driving nearby on I-95 toward Washington DC. The next morning the garden center reports 12 bags of fertilizer missing.
Taken independently, each of these events is not significant, and may never be correlated. Were the incidents to happen in different jurisdictions it is highly unlikely that they would ever be put together, nevermind the fact that the rental data actually came a private source. Now let’s add in that across a 50 mile radius, over 100 bags of fertilizer had been reported missing. Suddenly the information taken in whole signifies a potential terrorist threat and the white van, for which we have license plates and driver descriptions and a route of direction becomes a prime suspect in a cross jurisdictional search.
Big data and sophisticated analytics enable the rapid correlation of events across multiple sources and formats of data. Today this sort of correlation is done regularly in the commercial world, in real-time. The banner ads you see on your favorite news site are probably driven by other sites you have recently visited and potentially by your offline purchasing behavior. This analysis can be done in real-time. In our white van scenario, the analysis could have flagged the white van during the routine traffic stop as warranting a deeper look. Fusion centers are designed to support this sort of analysis, but ultimately “big data” technology is the underpinning enabler of actionable analysis.
What has to happen to enable this sort of real-time analysis in public safety? Well, lots, but here’s my list of the big ones:
1) Sharing of data across regional and jurisdictional boundaries. This is partially technical but more so a legal/regulatory challenge. Where does the data come together and who has access? This issue becomes even more challenging when you consider data may also come from private data sources (e.g. rental car billing records).
2) Analysis of disparate data formats. Data required for effective analysis may reside in many formats, have different data dictionaries, and is often unstructured. Big data tools, and particular technologies like NoSql databases, allow effective storage and analysis on many data types. New technologies are emerging for high speed, near real-time, tagging and categorizing of videos and imagery. A combination of these technologies (and some heavy hardware) will allow the scenario of searching on “white van” and seeing results from records management systems and video feeds.
3) Defining a responsible entity. Like many large projects, an “owner” is need to drive the project successfully. Bringing together disparate data sources and analyzing it requires an infrastructure and tons of coordination (and probably more than a little bit of stepping on toes). Many fusion centers have suffered, often in high profile ways, from turf wars and lack of effective cooperation.
4) Predictive modeling takes time, expertise and responses. Ultimately when dealing with vast amounts of seemingly unrelated data, you need predictive analytics to recognize trends that indicate an incident of interest. Think of how you set alert triggers like “tell me if there is a lot of activity around fertilizer”. You had to know that bringing together a lot of fertilizer is an event of interest. Predictive modeling takes that to another level by using sophisticated algorithms to recognize events that are much less obvious. For example, maybe there is a strong correlation between single males paying cash to rent a certain type of vehicle and crime. It takes a history of events to create these models. It’s simple for the more common crimes but difficult for significant (and less common) events like terrorism attacks. It also requires highly trained individuals using sophisticated and powerful tools.
5) Funding (of course). This isn’t free or cheap. Cloud technologies allow for a lot of cost sharing, and are especially attractive since much of the analysis really involves aggregating data across lots of jurisdictions and sources; however, storage, computing power, and the heavy duty analytics tools and human analysts needed to cull out actionable nuggets are not free. Ultimately it’s difficult to make a “hard” cost saving calculation that can be made so it’s net new funding.
There are lots of hurdles to fully taking advantage of the promise of big data, but the benefits can be immeasurable. I look forward to further advances in technology that bring down the costs and complexity and make this type of analysis common place in public safety.