A Creative Investigation of Philadelphia Crime Trends
-
Upload
johnshopkins -
Category
Documents
-
view
0 -
download
0
Transcript of A Creative Investigation of Philadelphia Crime Trends
AS.430.604.81 Spatial Data Analysis Johns Hopkins University Advanced Academic Programs, Geographic Information Systems
December 10, 2013
Page 1
A Creative Investigation of Philadelphia Crime Trends
By Nancy Morris Hartley
Background
The extraordinarily high crime rate in the City of Philadelphia has resulted in allocation of massive resources to
crime prediction and control, and total commitment to Geographic Information Systems (GIS). The reality of
valid prediction through geostatistical analysis and the effectiveness of prevention through active intervention
is so ingrained in the culture that police officers there matter-of-factly make statements such as, “We
prevented one hundred crimes last week.” Philly may have a reputation for crime, but it’s truly a role model
to the world for embracing GIS. A sophisticated and very user-friendly online mapping site not only provides
needed tools and data, but displays and celebrates the best efforts of citizens and private organizations in
mapping and analysis.
Location of a sufficient volume of appropriate and usable data has been by far my greatest challenge in
learning to use GIS, so I went in search of a city with excellent data, and my search led me to Philadelphia. The
focus on crime caught my attention and I was reminded of our third homework assignment, in which we
performed a regression analysis on 911 calls in Portland, Oregon. I was able to make my single contribution to
a homework assignment in that analysis, by adding what I thought could be additional explanatory variables to
the list of variables provided in the exercise. My addition of three variables caused the R-Squared value to go
up from 0.831080 to 0.847385. I wondered if I could, by analyzing Philadelphia data, contribute anything to
the understanding of the city’s crime. That was the challenge I set for myself in this project.
Introduction
In addition to having worked with maps most of my life, I also have been an avid crime buff since about age
twelve. I have read literally thousands of case studies of crimes of all types. Being able to use GIS to discover
more about the hows and whys of crime appeals to me enormously. I consider crime prevention one of the
most vital uses of GIS for the public good.
AS.430.604.81 Spatial Data Analysis Johns Hopkins University Advanced Academic Programs, Geographic Information Systems
December 10, 2013
Page 2
Because I am not personally acquainted with Philadelphia, I reached a point in my research at which I felt I
needed input from someone with first-hand knowledge of the city’s crime. So I called the Philadelphia Police
Department and was bounced from one office to another for half an hour. I finally was connected with a
Public Relations officer who said he knew the best person to help me, but he was not authorized to give out
the person’s name. On the whispered condition that I not reveal where I got the information, he told me the
name of the police department’s chief Research and Information Analyst, Anthony D’Abruzzo. A quick Google
search yielded an ESRI PDF that included a photo of Mr. D’Abruzzo and a quote from him about “geographic
accountability.” I then got back on the phone and was able to make contact with him, making reference to the
ESRI PDF. Mr. D’Abruzzo was very friendly and open to discussing Philadelphia crime and GIS, so much so that
we talked for 38 minutes and he gave me his e-mail address and cell phone number and said to feel free to
contact him anytime. When I expressed my thoughts regarding high vacancy rates and high rental rates and
the effects of both on crime, he said, “You’re on the right track.” He also offered a tip: “If you want a high
correlation of crime to geographic locations, check out subway stations.” He then e-mailed me the zip file
containing the subway point data that I used in my analysis.
Data
I began with a very large and unwieldy crime dataset covering all recorded crime incidents for all types of
crimes from 2006 through 2012. I chose to study the year 2012 and created a layer of all reports of all types of
crime for that year. For some analyses, the full 2012 data was inconveniently large, so I also created a layer of
all reports of all types of crime for July, 2012. I also created layers for each type of crime for July, 2012—
including homicide, rape, robbery, theft, burglary and assault—in order that I might observe the trends of
different types of crime and also the layers they overlapped. For one homicide analysis, I wanted to be able to
focus in on individual points, so I created a layer containing only homicides for only July 4 through 6, 2012.
The attributes I used most frequently were UCR_General, which identifies each crime point by type of crime
and also serves to provide total numbers, and various census data fields, such as occupation status of buildings
and whether properties were occupied by owners or renters. I found it necessary to create a field in the
AS.430.604.81 Spatial Data Analysis Johns Hopkins University Advanced Academic Programs, Geographic Information Systems
December 10, 2013
Page 3
census data I merged with my crime data, in order to determine the vacancy rate as opposed to vacancy
numbers.
There were two issues with my primary data which I became accustomed to resolving prior to applying any
tools. One was null values, which cropped up in a number of different fields, primarily X and Y values and
information fields such as police sectors. I began routinely sorting the data to locate the nulls and remove
them, because some tools gave me error messages or even ceased functioning as a result of their presence.
Another problem caused by working with two large merged datasets, crime and census, was multicollinearity.
This actually prevented my performing a full regression analysis. Ordinary Least Squares repeatedly failed to
run, and the volume of data was too great for me to be able to resolve the problem. This did not present a
problem for me, however: a full regression analysis was not quite in line with my study, as I was not
attempting to determine any and all reasons for crime. I specifically was searching for explanatory variables
that might not typically be considered.
I followed the workflow in a model I created in Model Builder through Hot Spots for almost all of the layers I
created, and I used Exploration and Interpolation tools on a large sampling of datasets throughout my study.
Once I established whether or not a trend existed for a layer, I examined that layer overlaid with the various
infrastructure, facility and demographic data to see if I could note what might be a causal pattern. When such
a pattern seemed evident, I used whatever tools seemed appropriate to test the data, such as buffering
subway stations and intersecting them with robbery point data and symbolizing vacancy rates to overlay with
crime hotspots.
I created so many layers of data that, halfway through the project, I created multiple additional data frames
and divided the layers into logical groups, including basemaps and the most essential datasets in each frame.
The first data frame contained results of interpolation of the various sets of crime data; the second held the
layers related to vacancies along with total crimes for 2012 and for July, 2012; the third contained the subway
and 2012 robbery datasets. The fourth and fifth frames, which I did not use, contained (4) land use, street
AS.430.604.81 Spatial Data Analysis Johns Hopkins University Advanced Academic Programs, Geographic Information Systems
December 10, 2013
Page 4
centerline, planning and zoning data, and (5) police districts, sectors and stations. I collected additional data
on public facilities of various types with which I experimented but did not ultimately use. It was such a
pleasure to have more data than I needed.
I obtained my data from the following sources:
Anthony D’Abruzzo, Philadelphia Police Department Research and Information Analyst: SEPTA transit
station point data
Open Data Philly, http://www.opendataphilly.org/: crime point data, planning and zoning polygon data
The United States Census Bureau, http://www.census.gov/: census data
American FactFinder, http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml: census and
demographic data
University of Pennsylvania Philadelphia NIS neighborhoodBase, http://cml.upenn.edu/nbase/: crime
point data, school and recreational facility point data, park and zip codes polygons, and tables of
demographic data not included in other census data.
PASDA (Pennsylvania Spatial Data Access), http://www.pasda.psu.edu/: land use and topographic data,
public facility point data, building data, street centerlines.
Database Schema
I followed the method specified in one of our homework assignments and began my project by creating two
file geodatabases, PhillyCrime.gdb and PhillyCrimeScratch.gdb. I set the scratch database as the default and
imported all of my files into it. I saved all output, layers and feature classes to the scratch database, only
importing layers to PhillyCrime.gdb after establishing that they were to be permanent parts of my analysis. If I
were to conduct a similar analysis, I would create a third “Discard” database to hold discarded output and
layers. I performed three clean-up operations on my data, discarding data I felt certain at the time was no
longer useful. I failed to recognize the value of some of that data for comparison. For example, the early
Cluster and Outlier analysis image I have included for comparison to later output includes a basemap I later
ceased to use; had I maintained all created layers, I would have been able to create more parallel comparative
images.
AS.430.604.81 Spatial Data Analysis Johns Hopkins University Advanced Academic Programs, Geographic Information Systems
December 10, 2013
Page 5
Analysis Workflow Description
My workflow is clearly illustrated by the model I created in Model Builder:
I performed the workflow therein on most of the subsets of data I created. For the purpose of the model, I
used what had become my two primary datasets, one containing data on all types of crimes for the month of
July, 2012, and one containing census tract data for 2010. My first step was to perform a Spatial Join of those
two sets of data. An early error in my workflow enabled me to very clearly see the benefit of using the correct
tools in the appropriate order: the first couple of times I ran Collect Events and Hot Spot Analysis, I failed to
first run Integrate and Generate Spatial Weights. I went on to run Average Nearest Neighbor, Natural
Neighbor, Density, Spline, Kriging and Cluster and Outlier Analysis on several sets of data, using the early Hot
AS.430.604.81 Spatial Data Analysis Johns Hopkins University Advanced Academic Programs, Geographic Information Systems
December 10, 2013
Page 6
Spot analysis output as input, and saving screenshots of my output. Later, while researching the use of Hot
Spot analysis, I discovered the process and purpose of first running Integrate and Generate Spatial Weights,
and that is the workflow I used in my model. Subsequent results were much clearer, with far more definite
trends. The results of Cluster and Outlier Analysis were outstanding in this respect. The early output
exhibited clustering, but the clusters were small and somewhat loose, and outliers were scattered randomly
across the map; later output (of a larger but coincident set of data) produced after using Integrate and
Generate Spatial Weights shows large, dense clusters resembling full bunches of grapes, and no outliers.
Likewise, early Histogram results were skewed to the right, and later ones were skewed much more highly to
the right, toward the inner urban area.
AS.430.604.81 Spatial Data Analysis Johns Hopkins University Advanced Academic Programs, Geographic Information Systems
December 10, 2013
Page 7
I did not keep track of the number of output layers and tables I created, but I would estimate that I created
well over one hundred layers and at least twenty-five tables, the majority of which I later discarded. I engaged
in a great deal of trial and error because of the nature of my analysis goal, to discover what was not obvious.
Kriging results demonstrate a skew to the right, with the selected low-value features lying just outside and to
the northwest of the area of highest crime.
AS.430.604.81 Spatial Data Analysis Johns Hopkins University Advanced Academic Programs, Geographic Information Systems
December 10, 2013
Page 8
A directional trend is further indicated by selected pairs of points in the Semivariogram/Covariance analysis
forming a pattern of lines extending outward from a central low-crime area.
AS.430.604.81 Spatial Data Analysis Johns Hopkins University Advanced Academic Programs, Geographic Information Systems
December 10, 2013
Page 9
The slightly U-shaped trend indicates the appropriateness of a second-order polynomial for a global trend
model.
Data Output
I am extremely pleased with the results of my data output. Despite my having written my proposal when I
was nearly finished with my project, the proposal does accurately represent my initial thoughts, and the
results are more dramatic than I had hoped. My initial theory regarding the correlation of high vacancy areas
to high crime was demonstrated to be valid: the bubbles of highest vacancies almost precisely overlay the
hottest crime hotspots. The other question that resulted from Tony D’Abruzzo’s kind tip and my own
observation of the data concerns the proximity of robberies to subway stations. That correlation has been
proven resoundingly, as my analysis indicates that just over 75% of all robberies that occurred in 2012 took
AS.430.604.81 Spatial Data Analysis Johns Hopkins University Advanced Academic Programs, Geographic Information Systems
December 10, 2013
Page 10
place within one-quarter mile of subway stations. A third topic brought up previously warrants further
research and analysis: the correlation of a high concentration of renters to crime.
I posted my vacancies map on Tony D’Abruzzo’s Facebook page, and it resulted in an exchange between Tony
and another analyst that seems likely to result in further investigation. Tony’s comment: “That vacancy thing
is killing me.”
Visualization of Output
AS.430.604.81 Spatial Data Analysis Johns Hopkins University Advanced Academic Programs, Geographic Information Systems
December 10, 2013
Page 11
References
Calderón, G., Spatial Regression Analysis vs. Kriging Methods for Spatial Estimation.
Ferreira, J., João, P., & Martins, J., GIS for Crime Analysis: Geography for Predictive Models.
Gupta, R., Rajitha, K., Basu, S. & Mittal, S. K., Application of GIS in Crime Analysis: A Gateway to Safe City.