A Creative Investigation of Philadelphia Crime Trends

AS.430.604.81 Spatial Data Analysis Johns Hopkins University Advanced Academic Programs, Geographic Information Systems

December 10, 2013

A Creative Investigation of Philadelphia Crime Trends

By Nancy Morris Hartley

Background

The extraordinarily high crime rate in the City of Philadelphia has resulted in allocation of massive resources to

crime prediction and control, and total commitment to Geographic Information Systems (GIS). The reality of

valid prediction through geostatistical analysis and the effectiveness of prevention through active intervention

is so ingrained in the culture that police officers there matter-of-factly make statements such as, “We

prevented one hundred crimes last week.” Philly may have a reputation for crime, but it’s truly a role model

to the world for embracing GIS. A sophisticated and very user-friendly online mapping site not only provides

needed tools and data, but displays and celebrates the best efforts of citizens and private organizations in

mapping and analysis.

Location of a sufficient volume of appropriate and usable data has been by far my greatest challenge in

learning to use GIS, so I went in search of a city with excellent data, and my search led me to Philadelphia. The

focus on crime caught my attention and I was reminded of our third homework assignment, in which we

performed a regression analysis on 911 calls in Portland, Oregon. I was able to make my single contribution to

a homework assignment in that analysis, by adding what I thought could be additional explanatory variables to

the list of variables provided in the exercise. My addition of three variables caused the R-Squared value to go

up from 0.831080 to 0.847385. I wondered if I could, by analyzing Philadelphia data, contribute anything to

the understanding of the city’s crime. That was the challenge I set for myself in this project.

Introduction

In addition to having worked with maps most of my life, I also have been an avid crime buff since about age

twelve. I have read literally thousands of case studies of crimes of all types. Being able to use GIS to discover

more about the hows and whys of crime appeals to me enormously. I consider crime prevention one of the

most vital uses of GIS for the public good.


December 10, 2013

Because I am not personally acquainted with Philadelphia, I reached a point in my research at which I felt I

needed input from someone with first-hand knowledge of the city’s crime. So I called the Philadelphia Police

Department and was bounced from one office to another for half an hour. I finally was connected with a

Public Relations officer who said he knew the best person to help me, but he was not authorized to give out

the person’s name. On the whispered condition that I not reveal where I got the information, he told me the

name of the police department’s chief Research and Information Analyst, Anthony D’Abruzzo. A quick Google

search yielded an ESRI PDF that included a photo of Mr. D’Abruzzo and a quote from him about “geographic

accountability.” I then got back on the phone and was able to make contact with him, making reference to the

ESRI PDF. Mr. D’Abruzzo was very friendly and open to discussing Philadelphia crime and GIS, so much so that

we talked for 38 minutes and he gave me his e-mail address and cell phone number and said to feel free to

contact him anytime. When I expressed my thoughts regarding high vacancy rates and high rental rates and

the effects of both on crime, he said, “You’re on the right track.” He also offered a tip: “If you want a high

correlation of crime to geographic locations, check out subway stations.” He then e-mailed me the zip file

containing the subway point data that I used in my analysis.

Data

I began with a very large and unwieldy crime dataset covering all recorded crime incidents for all types of

crimes from 2006 through 2012. I chose to study the year 2012 and created a layer of all reports of all types of

crime for that year. For some analyses, the full 2012 data was inconveniently large, so I also created a layer of

all reports of all types of crime for July, 2012. I also created layers for each type of crime for July, 2012—

including homicide, rape, robbery, theft, burglary and assault—in order that I might observe the trends of

different types of crime and also the layers they overlapped. For one homicide analysis, I wanted to be able to

focus in on individual points, so I created a layer containing only homicides for only July 4 through 6, 2012.

The attributes I used most frequently were UCR_General, which identifies each crime point by type of crime

and also serves to provide total numbers, and various census data fields, such as occupation status of buildings

and whether properties were occupied by owners or renters. I found it necessary to create a field in the


December 10, 2013

census data I merged with my crime data, in order to determine the vacancy rate as opposed to vacancy

numbers.

There were two issues with my primary data which I became accustomed to resolving prior to applying any

tools. One was null values, which cropped up in a number of different fields, primarily X and Y values and

information fields such as police sectors. I began routinely sorting the data to locate the nulls and remove

them, because some tools gave me error messages or even ceased functioning as a result of their presence.

Another problem caused by working with two large merged datasets, crime and census, was multicollinearity.

This actually prevented my performing a full regression analysis. Ordinary Least Squares repeatedly failed to

run, and the volume of data was too great for me to be able to resolve the problem. This did not present a

problem for me, however: a full regression analysis was not quite in line with my study, as I was not

attempting to determine any and all reasons for crime. I specifically was searching for explanatory variables

that might not typically be considered.

I followed the workflow in a model I created in Model Builder through Hot Spots for almost all of the layers I

created, and I used Exploration and Interpolation tools on a large sampling of datasets throughout my study.

Once I established whether or not a trend existed for a layer, I examined that layer overlaid with the various

infrastructure, facility and demographic data to see if I could note what might be a causal pattern. When such

a pattern seemed evident, I used whatever tools seemed appropriate to test the data, such as buffering

subway stations and intersecting them with robbery point data and symbolizing vacancy rates to overlay with

crime hotspots.

I created so many layers of data that, halfway through the project, I created multiple additional data frames

and divided the layers into logical groups, including basemaps and the most essential datasets in each frame.

The first data frame contained results of interpolation of the various sets of crime data; the second held the

layers related to vacancies along with total crimes for 2012 and for July, 2012; the third contained the subway

and 2012 robbery datasets. The fourth and fifth frames, which I did not use, contained (4) land use, street


December 10, 2013

centerline, planning and zoning data, and (5) police districts, sectors and stations. I collected additional data

on public facilities of various types with which I experimented but did not ultimately use. It was such a

pleasure to have more data than I needed.

I obtained my data from the following sources:

Anthony D’Abruzzo, Philadelphia Police Department Research and Information Analyst: SEPTA transit

station point data

Open Data Philly, http://www.opendataphilly.org/: crime point data, planning and zoning polygon data

The United States Census Bureau, http://www.census.gov/: census data

American FactFinder, http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml: census and

demographic data

University of Pennsylvania Philadelphia NIS neighborhoodBase, http://cml.upenn.edu/nbase/: crime

point data, school and recreational facility point data, park and zip codes polygons, and tables of

demographic data not included in other census data.

PASDA (Pennsylvania Spatial Data Access), http://www.pasda.psu.edu/: land use and topographic data,

public facility point data, building data, street centerlines.

Database Schema

I followed the method specified in one of our homework assignments and began my project by creating two

file geodatabases, PhillyCrime.gdb and PhillyCrimeScratch.gdb. I set the scratch database as the default and

imported all of my files into it. I saved all output, layers and feature classes to the scratch database, only

importing layers to PhillyCrime.gdb after establishing that they were to be permanent parts of my analysis. If I

were to conduct a similar analysis, I would create a third “Discard” database to hold discarded output and

layers. I performed three clean-up operations on my data, discarding data I felt certain at the time was no

longer useful. I failed to recognize the value of some of that data for comparison. For example, the early

Cluster and Outlier analysis image I have included for comparison to later output includes a basemap I later

ceased to use; had I maintained all created layers, I would have been able to create more parallel comparative

images.

http://www.opendataphilly.org/

http://www.census.gov/

http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml

http://cml.upenn.edu/nbase/

http://www.pasda.psu.edu/


December 10, 2013

Analysis Workflow Description

My workflow is clearly illustrated by the model I created in Model Builder:

I performed the workflow therein on most of the subsets of data I created. For the purpose of the model, I

used what had become my two primary datasets, one containing data on all types of crimes for the month of

July, 2012, and one containing census tract data for 2010. My first step was to perform a Spatial Join of those

two sets of data. An early error in my workflow enabled me to very clearly see the benefit of using the correct

tools in the appropriate order: the first couple of times I ran Collect Events and Hot Spot Analysis, I failed to

first run Integrate and Generate Spatial Weights. I went on to run Average Nearest Neighbor, Natural

Neighbor, Density, Spline, Kriging and Cluster and Outlier Analysis on several sets of data, using the early Hot


December 10, 2013

Spot analysis output as input, and saving screenshots of my output. Later, while researching the use of Hot

Spot analysis, I discovered the process and purpose of first running Integrate and Generate Spatial Weights,

and that is the workflow I used in my model. Subsequent results were much clearer, with far more definite

trends. The results of Cluster and Outlier Analysis were outstanding in this respect. The early output

exhibited clustering, but the clusters were small and somewhat loose, and outliers were scattered randomly

across the map; later output (of a larger but coincident set of data) produced after using Integrate and

Generate Spatial Weights shows large, dense clusters resembling full bunches of grapes, and no outliers.

Likewise, early Histogram results were skewed to the right, and later ones were skewed much more highly to

the right, toward the inner urban area.


December 10, 2013

I did not keep track of the number of output layers and tables I created, but I would estimate that I created

well over one hundred layers and at least twenty-five tables, the majority of which I later discarded. I engaged

in a great deal of trial and error because of the nature of my analysis goal, to discover what was not obvious.

Kriging results demonstrate a skew to the right, with the selected low-value features lying just outside and to

the northwest of the area of highest crime.


December 10, 2013

A directional trend is further indicated by selected pairs of points in the Semivariogram/Covariance analysis

forming a pattern of lines extending outward from a central low-crime area.


December 10, 2013

The slightly U-shaped trend indicates the appropriateness of a second-order polynomial for a global trend

model.

Data Output

I am extremely pleased with the results of my data output. Despite my having written my proposal when I

was nearly finished with my project, the proposal does accurately represent my initial thoughts, and the

results are more dramatic than I had hoped. My initial theory regarding the correlation of high vacancy areas

to high crime was demonstrated to be valid: the bubbles of highest vacancies almost precisely overlay the

hottest crime hotspots. The other question that resulted from Tony D’Abruzzo’s kind tip and my own

observation of the data concerns the proximity of robberies to subway stations. That correlation has been

proven resoundingly, as my analysis indicates that just over 75% of all robberies that occurred in 2012 took


December 10, 2013

place within one-quarter mile of subway stations. A third topic brought up previously warrants further

research and analysis: the correlation of a high concentration of renters to crime.

I posted my vacancies map on Tony D’Abruzzo’s Facebook page, and it resulted in an exchange between Tony

and another analyst that seems likely to result in further investigation. Tony’s comment: “That vacancy thing

is killing me.”

Visualization of Output


December 10, 2013

References

Calderón, G., Spatial Regression Analysis vs. Kriging Methods for Spatial Estimation.

Ferreira, J., João, P., & Martins, J., GIS for Crime Analysis: Geography for Predictive Models.

Gupta, R., Rajitha, K., Basu, S. & Mittal, S. K., Application of GIS in Crime Analysis: A Gateway to Safe City.

http://en.wikipedia.org/wiki/Jo%C3%A3o_Gilberto

A Creative Investigation of Philadelphia Crime Trends

Documents

Transcript of A Creative Investigation of Philadelphia Crime Trends