Taxes & Accounting

Community Sense and Response (CSR) systems. Your Phone as Quake Detector

Published
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Description
Community Sense and Response Systems: Your Phone as Quake Detector Matthew Faulkner, Robert Clayton, Thomas Heaton, K. Mani Chandy, Monica Kohler, Julian Bunn, Richard Guy, Annie Liu, Michael Olson, MingHei
Transcript
Community Sense and Response Systems: Your Phone as Quake Detector Matthew Faulkner, Robert Clayton, Thomas Heaton, K. Mani Chandy, Monica Kohler, Julian Bunn, Richard Guy, Annie Liu, Michael Olson, MingHei Cheng, Andreas Krause The proliferation of smartphones and other powerful sensorequipped consumer devices enables a new class of web applications: Community Sense and Response (CSR) systems. These applications are distinguished from standard web applications by the use of community-owned commercial sensor hardware. Just as social networks connect and share humangenerated content, CSR systems work to gather, share, and act on sensory data from people s internet-enabled devices. In this article, we discuss our work building the Caltech Community Seismic Network as a prototypical CSR system harnessing accelerometers in smartphones and consumer electronics. We describe the systems and algorithmic challenges of designing, building and evaluating a scalable network for real-time awareness of dangerous earthquakes. Nearly 2 million Android and ios devices are activated every day, each carrying numerous sensors and a high-speed internet connection. Several recent sensing projects seek to partner with the owners of these and other consumer devices to collect, share, and act on sensor data about phenomena that impact the community. Coupled to cloud computing platforms, these networks can reach an immense scale previously beyond the reach of sensor networks [6]. [5] provides an excellent overview of how the Social and Mobile Web facilitate crowdsourcing data from individuals and their sensor devices. Additional applications of community and participatory sensing include: understanding traffic flows [14, 20, 16, 4]; identifying sources of pollution[2, 1], monitoring public health [18], and responding to natural disasters like hurricanes, floods, and earthquakes [8, 9, 11, 15]. These systems are made possible by volunteer sensors and low-cost web solutions for data collection and storage. However, as these systems mature, they will undoubtedly extend beyond data collection and begin to take real-time action on the community s behalf. For example, traffic networks may reroute traffic around an accident, or a seismic network may automatically slow trains to prevent derailing. From collection to action Acting on community sensor data is fundamentally different than acting on data from standard web applications or scientific sensors. The potential scale of raw data is vast, even by the standards of large web applications. Data recorded by community sensors often include signals produced by the people who operate them. And many of the desired applications, while far-reaching, push the limits of our current understanding of physical phenomena. Scale. The volume of raw data that can be produced by a CSR network is astounding by any standard. Smartphones and other consumer devices often have multiple sensors, and can produce continuous streams of GPS position, acceleration, rotation, audio, and video data. While events of interest (e.g. traffic accidents, earthquakes, disease outbreaks) may be rare, devices must monitor continuously in order to detect them. Beyond obvious data heavyweights like video, rapidly monitoring even a single accelerometer or microphone produces hundreds of megabytes per day. Community sensing makes possible networks containing tens of thousands or millions of devices. For example, equipping taxi cabs with GPS devices or air quality sensors could easily yield a network of 50,000 sensors in a city like Beijing [22]. At these scales, even collecting a small set of summary statistics becomes daunting: if 500,000 sensors reported a brief status update once per minute, the total number of messages would rival the daily load in the Twitter network. Non-traditional sensors. Community devices are also different than those used in traditional scientific and industrial applications. Beyond simply being lower in accuracy (and cost) than professional sensors, community sensors may be mobile, intermittently available, and affected by the unique environment of an individual s home or workplace. For example, the accelerometer in a smartphone could measure earthquakes, but will also observe user motion. Complex phenomena. By enabling sensor networks that densely cover cities, community sensors make it possible to measure and act on a range of important phenomena, including traffic patterns, pollution, and natural disasters. However, due to the previous lack of fine-grained data about these phenomena, CSR systems must simultaneously learn about the phenomena they are built to act upon. For example, a community seismic network may need models learned from frequent, smaller quakes in order to estimate damage during rare, larger quakes. These challenges are compounded by the need to make reliable decisions in real-time, and with performance guarantees. For example, choosing the best emergency response strategies after a natural disaster could be drastically aided by real-time sensor data. However, false alarms and inaccurate data can have high costs; rigorous performance estimates and system evaluations are prerequisites for automating real-world responses. 1. THE CALTECH COMMUNITY SEISMIC NETWORK The Community Seismic Network project at Caltech seeks to rapidly detect earthquakes and provide real-time estimates of their impact using community-operated sensors. Large earthquakes are among the few scenarios that can threaten an entire city. The CSN project is built upon a vision of people sharing accelerometer data from their personal devices to collectively produce the information needed for effective real-time and post-event responses to dangerous earthquakes. To that end, CSN has partnered with more than a thousand volunteers in the Los Angeles area and cities around the world who contribute real-time acceleration data from their Android smartphones and low-cost USB-connected sensors. After an earthquake, fire fighters, medical teams and other first-responders must build situational awareness before they can effectively deploy their resources. Due to variations in ground structure, two points that are only a kilometer apart can experience significantly different levels of shaking and damage, as illustrated in Figure 2. Similarly, different buildings may receive differing amounts of damage due to the types of motion they experience. If communication has been lost in a city, it can take up to an hour for helicopter surveillance to provide the first complete picture of the damage a city has sustained. In contrast, a seismic network with fine spatial resolution could provide accurate measurements of shaking (and thus an estimate of damage) immediately. Because sensors can detect the moderate P-wave shaking that precedes the damaging S-wave shaking, sensors are expected to report data before network and power are lost, and before cellular networks are overloaded by human communication. Another intriguing application of a community seismic network is to provide early warning of strong shaking. Early warning operates on the principle that accelerometers near the origin of an earthquake can observe initial shaking before locations further from the origin experience strong shaking. While the duration of warning that a person receives depends on the speed of detection and their distance from the origin, warning times of tens of seconds to a minute have been produced by early warning systems in Japan, Mexico, and Taiwan. These warning times can be used to evacuate elevators, stop trains, or halt delicate processes such as semiconductor processing or medical surgery. Additionally, warning of aftershocks alerted emergency workers involved in debris clearing during the 1989 Loma Prieta earthquake. Partnering with the community. Community participation is ideal for seismic sensing for several reasons. First, community participation makes possible the densely distributed sensors needed for accurately measuring shaking Figure 1: CSN volunteers contribute data from low-cost accelerometers (above) and from Android smartphones via a CSN app (below). throughout a city. For example, instrumenting the greater Los Angeles area at a spatial resolution of 1 sensor per square kilometer would require over 10,000 sensors. While traditional seismometer stations cost thousands of dollars per sensor to install and operate, the same number of sensors could be reached if 0.5% of the area s population volunteered data from their smartphones. In this way, community sensors can provide fine spatial coverage, and complement existing networks of sparsely deployed, high quality sensors. Community sensors are also ideally situated for assisting the population through an emergency. In addition to collecting accelerometer data, community sensing software on a smartphone could be used to report the last-known location of family members, or give instructions on where to gather for help from emergency teams. In short, community sensing applications provide a new way for people to stay informed about the areas and people they care about. CSNmakesiteasyforthecommunitytoparticipatebyusing low-cost accelerometers and sensors already present in volunteers Android phones. A free Android application on the Google Play app store called CSN-Droid makes volunteering data as easy as installing a new app. The CSN project also partners with LA-area schools and city infrastructure to freely distribute 3000 low-cost accelerometers from Phidget, Inc. that interface via USB to a host PC, tablet, or other internet-connected device. Phidget sensors have also been installed in several high-rise buildings to measure structural responses to earthquakes. Figure 1 displays these sensors. Fundamental challenges. Reliable, real-time inference of spatial events is a core task of seismic monitoring, and also a prototypical challenge for any application utilizing physical sensors. In the following, we outline a methodology devel- oped to rapidly detect quakes from thousands of community sensors. As we will see, the computational power of community devices can be harnessed to overcome the cacophony of noise in community-operated hardware, and that on-device learning yields a decentralized architecture that is scalable and heterogeneous, while still provides rigorous performance guarantees. 2. DECENTRALIZED EVENT DETECTION Suppose that a strong earthquake begins near a metropolitan area, and that a 0.1% of the population contributes accelerometer data from a personally-owned internet-enabled device. In Los Angeles county, this means data from 10,000 noisy sensors located on a coastal basin of rock and sediment, striped with fault lines, and cross-hatched with vibrationproducing freeways. How could we detect the quake, and estimate its location and magnitude as quickly as possible? performs a hypothesis test, but now using the received pick messages instead of the entire raw data. Results from decentralized hypothesis testing theory state that if the sensors measurements are independent conditional on whether there is an event or not, and if the probability of the measurements is known in each case then the asymptotically optimal strategy is to perform a hierarchical hypothesis test [21]: each sensor individually performs a hypothesis test, for some threshold τ, and picks only when P[ one sensor s measurements strong quake ] P[ one sensor s measurements no quake ] τ. (2) Similarly, the Cloud server performs a hypothesis test on the number of picks S received at a given time, and declares a detection when a threshold τ is exceeded: Bin(S;r T;N) Bin(S;r F;N) τ, (3) The parameters r T and r F are the true positive and false positive pick rates for a single sensor, and Bin(,p,N) is the probability mass function of the Binomial distribution. Asymptotically optimal decision performance can be obtained by using the decision rules (2) and (3) with proper choice of the thresholds τ and τ [21]. Additionally, collecting picks instead of raw data may help preserve user privacy. Challenges for the classical approach. Detecting rare events from community sensors presents three main challenges to this classical, decentralized detection approach: 1. How can we perform likelihood ratio tests on each sensor s data, when we do not have enough data (e.g. measurements of large, rare quakes) to accurately model sensor behavior during an event? 2. How can we model each sensor? Server-side modeling scales poorly, while on-device learning involves computational and storage limits. Figure 2: Differences in soil conditions and subsurface structures cause large variations in ground shaking. Data recorded by the Long Beach, CA network. One direct approach from detection theory is to collect all data centrally, and perform classification using a likelihood ratio test, P[ all measurements strong quake ] P[ all measurements no quake ] τ (1) This test declares a detection if the ratio exceeds a predetermined threshold τ. Unsurprisingly, this involves transmitting a daunting amount of data; a global network of 1M phones would be transmitting 30TB of acceleration data per day! Additionally, the likelihood ratio test requires the distribution of all sensor data, conditioned on the occurrence or non-occurrence of a strong earthquake. Each community sensor is unique, and so modeling these distributions requires modeling each sensor individually. A natural next step is a decentralized approach. Suppose each device instead only transmits a finite summary of its current data, called a pick message. The central server again 3. How can we overcome the (strong) assumption of conditionally independent sensors, and incorporate spatial dependencies? Next, we will consider how the abundance of normal data can be leveraged to detect rare events for which we lack training data. Then, we will see that new tools from computational geometry make it possible to compute the needed probabilistic models on resource-constrained devices. Finally, learning on the server-side adapts data aggregation according to spatial dependencies. Leveraging normal data The sensor-level hypothesis test in (2) requires two conditional probability distributions. The numerator models a particular device s acceleration during a strong quake, and due to the rarity of large quakes is impractical to obtain. In contrast, the denominator can be estimated from abundantly available normal data. Can we still hope to produce reliable picks? It turns out that under mild conditions, a simple anomaly detection approach that uses only the probability of an acceleration time series in the absence of a quake can obtain the same asymptotically optimal performance. A given sensor now picks when P[ one sensor s measurements no quake ] τ. (4) For an appropriate choice of threshold, this can be shown to produce the same picks as the full hypothesis test, without requiring us to produce a model of sensor data during future, unknown quakes. For details, see [11]. Learning on smartphones with Coresets The above anomaly detection scheme makes use of the abundant normal data, but leaves us the challenge of computing the conditional distribution. In principle, each sensor could maintain a history of its observations, and periodically estimate a probabilistic model describing that data. On a mobile device, this means logging around 3GB of acceleration data per month. Storing and estimating models on this much data is a burden on volunteers smartphone resources. Could we accurately model a sensor s data with (much) less storage? In the CSN system, the local distribution is chosen to be a Gaussian Mixture Model (GMM) over a feature vector of acceleration statistics from short time windows (similar to phonemes in speech recognition). GMMs are flexible, multimodal distributions that can be practically estimated from data using the simple EM algorithm [3]. In contrast to estimating a single Gaussian, which can be fit knowing only the mean and variance of the data, estimating a GMM requires access to all the data; formally, GMMs do not admit finite sufficient statistics. This precludes, for example, our ability to compress the 3GB of monthly acceleration data and still recover the same GMM that would have been learned from the full data. Fortunately, it turns out that the picture is drastically different for approximating a GMM: a GMM can be fit to an arbitrary amount of data, with an arbitrary approximation guarantee, using a finite amount of storage! A tool from computational geometry, called a coreset, makes such approximations possible. Roughly, a coreset for an algorithm is a (weighted) subset of the input, such that running the algorithm on the coreset gives a constant-factor approximation to running the algorithm on the full input. Coresets have been used to obtain approximations for a variety of geometric problems, such as k-means and k-medians clustering. It turns out that many geometric coreset techniques can also provide approximations for statistical problems. Given an input dataset D, we would like to find the maximum likelihood estimate for the means and variances of a Gaussian mixture model, collectively denoted θ. A weighted set C is a (k, ǫ)-coreset for GMMs if with high probability the log likelihoodonl(c θ)isanǫapproximationtotheloglikelihood on the full data L(C θ), for any mixture of k Gaussians: (1 ε)l(d θ) L(C θ) φ(d θ)(1+ε). [12] showed that given input D, it is possible to sample such a coreset C whose size is independent of the size of input D (i.e. only depends polynomially on the dimension of the input, the number of Gaussians k, and parameters ε,δ), with probability at least 1 δ for all (non-degenerate) mixtures θ of k Gaussians. This implies that learning mixture model parameters θ from a constant size coreset C can obtain approximately the same likelihood as learning the model from the entire, arbitrarily large D. But how do we find C? [12] showed that efficient algorithms to compute coresets for projective clustering problems (e.g. k-means and generalizations) can provide coresets for GMMs. A key insight is that while uniformly subsampling the input may miss important regions of data, an adaptive sampling approach is likely to sample from enough regions to reliably estimate a mixture of k Gaussians; weighting the samples accounts for the sampling bias. Previous work [13] also identified that coresets for many optimization problems can be computed efficiently in the parallel or streaming model, and several of those results apply here. In particular, a stream of input data can be buffered to some constant size, and then compressed into a coreset. Careful merging and compressing of such coresets provides an approximation to the entire stream so far, while using space and update time polynomial in all the parameters, and logarithmic in n. Learning spatial dependencies Quake detection in community networks requires finding a complex spatio-temporal pattern in a large set of noisy sensor measurements. The start of a quake may only affect a small fraction of the network, so the event can easily be concealed in both single-sensor measurements and network-wide statistics. Data from recent high-density seismic studies, Figure 2, show that localized variations in ground structure significantly impact the magnitude of shaking at locations only a kilometer apart. Consequently, effective quake detection requires algorithms that can learn subtle dependencies among sensor data, and detect changes within groups of dependent sensors. The classical approach described at the start of this section assumes that the sensors provide independent, identically distributed measurements conditioned on the occurrence or non-occurrence of an event. In this case, the fusion center would declare a detection if a sufficiently large number of sensors report picks. However, in many p
Search
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks