top of page

Predictive Modeling for Gentrification in Philadelphia
Public Policy Analytics in R

The project, led by Kamya Khandelwal, Revathi Machan, and Claudia Schreier, aims to analyze the impact of gentrification in Philadelphia by using data to track demographic changes, housing costs, and signs of neighborhood transformation. By developing a model that incorporates demographic data and potentially leveraging new technologies to identify visual signs of gentrification, the goal is to understand how gentrification affects different communities and provide insights into the displacement and housing affordability issues faced by residents.

Background

Philadelphia, often referred to as the nation’s poorest large city, has been facing a cost-of-living crisis for decades, with a high poverty rate among the ten largest cities in the US. In 2018, approximately 231,000 Philadelphia households were cost-burdened, spending 30% or more of their income on housing costs, a problem more severe for renters with incomes below $30,000 per year. Among these renters, 88% are cost-burdened, and 68% are severely cost-burdened, spending over 50% of their income on housing.

 

The fear of displacement and gentrification is common in many neighborhoods, driven by historically high rents, mortgage rates, and unprecedented levels of new developments. Gentrification, where higher-income residents replace lower-income ones, leading to rising housing costs and displacement of long-term residents, has been a significant concern. Philadelphia's history of displacement, particularly in West Philadelphia due to university expansion plans in the latter half of the 20th century, continues to inform current gentrification trends.

 

In broader discussions, gentrification may be seen when breweries replace corner stores or luxury gyms replace single-family homes. Signs of gentrification vary and can be difficult to track through data alone. Researchers use demographic data to analyze changes in neighborhoods, including shifts in race, age, and income. New technologies, like AI models using temporal maps such as Google Street View, are also being developed to identify these changes.

The code for this repository is hosted on this GitHub page.

Data Exploration and Visualization

This project aims to develop a predictive model for gentrification to assist in urban planning by identifying areas at risk of or currently undergoing gentrification. The model, developed by the team at Marie Antoinette Predictions, relies on key indicators such as construction permits, vacant properties, affordable housing locations, green spaces, and demolitions. Data was sourced from Philadelphia’s open data portal and cleaned for clarity and usability.

 

Data visualization tools were used to map the variables, revealing that areas around Center City to lower North Philadelphia and Kensington have the highest density of construction and demolition permits. West and North Philadelphia show a high density of vacant buildings. The maps highlight the distribution of construction and zoning permits across the city, comparing these with demographic and socio-economic factors like educational attainment and unemployment rates. Higher concentrations of construction permits correlate with areas having a higher percentage of individuals with bachelor's degrees and higher unemployment.

Screenshot 2024-06-05 at 5.23.06 PM.png

   View Code

mapTheme <- function(base_size = 12) {

  theme(

    text = element_text(color = "black"),

    plot.title = element_text(size = 16, colour = "black"),

    plot.caption = element_text(hjust = 0),

    axis.ticks = element_blank(),

    panel.background = element_blank(),

    axis.title = element_blank(),

    axis.text = element_blank(),

    panel.grid.minor = element_blank(),

    panel.border = element_rect(colour = "black", fill = NA, size = 2),

    strip.text.x = element_text(size = 14)

  )

}

 

plotTheme <- function(base_size = 12) {

  theme(

    text = element_text(color = "black"),

    plot.title = element_text(color = "darkgreen", size = 15, face = "bold"),

    plot.subtitle = element_text(face = "italic"),

    plot.caption = element_text(hjust = 0),

    axis.ticks = element_blank(),

    panel.background = element_blank(),

    panel.grid.major = element_line(color = "grey80", size = 0.1),

    panel.grid.minor = element_blank(),

    panel.border = element_rect(colour = "black", fill = NA, size = 2),

    strip.background = element_rect(fill = "grey80", color = "white"),

    strip.text = element_text(size = 12),

    axis.title = element_text(size = 12),

    axis.text = element_text(size = 10),

    plot.background = element_blank(),

    legend.background = element_blank(),

    legend.title = element_text(colour = "black", face = "italic"),

    legend.text = element_text(colour = "black", face = "italic"),

    strip.text.x = element_text(size = 14)

  )

}

# Load necessary libraries

library(sf)

library(dplyr)

library(tidycensus)

 

# 1. Permits Data

phlPermits <- st_read("https://phl.carto.com/api/v2/sql?q=SELECT+*+FROM+permits+WHERE+permitissuedate+>=+'2013-01-01'+AND+permitissuedate+<+'2022-12-31'&filename=permits&format=geojson&skipfields=cartodb_id") %>%

  st_transform('ESRI:102728') %>%

  mutate(Legend = "Philadelphia Permits", Year = as.integer(format(permitissuedate, "%Y")))

 

# 2. Philadelphia Boundaries

phlBoundary <- st_read("http://data.phl.opendata.arcgis.com/datasets/405ec3da942d4e20869d4e1449a2be48_0.geojson") %>%

  st_transform('ESRI:102728')

 

# 3. Fishnet Grid

phlFishnet <- st_make_grid(phlBoundary, cellsize = 500, square = TRUE) %>%

  .[phlBoundary] %>%

  st_sf() %>%

  mutate(uniqueID = rownames(.))

 

# 4. Philadelphia Neighborhoods

phillyNeighborhoods <- st_read("https://raw.githubusercontent.com/opendataphilly/open-geo-data/master/philadelphia-neighborhoods/philadelphia-neighborhoods.geojson") %>%

  st_transform('ESRI:102728') %>%

  st_transform(st_crs(phlFishnet))

 

# 5. Vacant Property

vacantBuilding <- st_read('https://opendata.arcgis.com/datasets/f7ed68293c5e40d58f1de9c8435c3e84_0.geojson') %>%

  na.omit() %>%

  st_transform('ESRI:102728') %>%

  st_transform(st_crs(phlFishnet)) %>%

  mutate(Legend = "Vacant Buildings")

 

# 6. Affordable Housing

affordableHousing <- st_read('https://opendata.arcgis.com/datasets/ca8944190b604b2aae7eff17c8dd9ef5_0.geojson') %>%

  filter(FISCAL_YEAR_COMPLETE >= "2012") %>%

  st_transform('ESRI:102728') %>%

  st_transform(st_crs(phlFishnet)) %>%

  mutate(Legend = "Affordable Housing")

 

# 7. Green Spaces

greenSpace <- st_read('https://opendata.arcgis.com/datasets/d52445160ab14380a673e5849203eb64_0.geojson') %>%

  st_transform('ESRI:102728') %>%

  st_transform(st_crs(phlFishnet)) %>%

  mutate(Legend = "Green Spaces")

 

# 8. Demolition Data

buildingDemolition <- st_read('https://phl.carto.com/api/v2/sql?q=SELECT+*+FROM+demolitions&filename=demolitions&format=geojson&skipfields=cartodb_id') %>%

  mutate(year = substr(start_date, 1, 4)) %>%

  filter(year == '2022') %>%

  st_transform('ESRI:102728') %>%

  st_transform(st_crs(phlFishnet)) %>%

  mutate(Legend = "Building Demolition")

 

# 9. Census Data - ACS 2021

tracts22 <- get_acs(

  geography = "tract",

  variables = c(

    "B01003_001",   # Total Population

    "B19013_001",   # Median Household Income

    "B25008_002",   # Owner-Occupied Units

    "B25008_003",   # Renter-Occupied Units

    "B25008_001E",  # Total Population in Housing Units

    "B15003_022",   # Educational Attainment: Bachelor's Degree

    "B06012_002E",  # Population Below the Poverty Level

    "B02001_002",   # Race and Ethnicity: White Alone

    "B02001_003",   # Race and Ethnicity: Black or African American Alone

    "B27011_008E"   # Population Unemployed

  ),

  year = 2022,

  state = "PA",

  county = "Philadelphia",

  geometry = TRUE,

  output = "wide"

) %>%

  select(-NAME, -ends_with("M")) %>%

  rename(

    totalPop = B01003_001E,                  # Total Population

    medHHInc = B19013_001E,                  # Median Household Income

    totalUnit = B25008_001E,                 # Total Population in Housing Units

    ownerOccupied = B25008_002E,             # Owner-Occupied Units

    renterOccupied = B25008_003E,            # Renter-Occupied Units

    bachDegree = B15003_022E,                # Educational Attainment: Bachelor's Degree

    totalPov = B06012_002E,                  # Population Below the Poverty Level

    totalUnemploy = B27011_008E,             # Population Unemployed

    whiteAlone = B02001_002E,                # Race and Ethnicity: White Alone

    blackAlone = B02001_003E                 # Race and Ethnicity: Black or African American Alone

  ) %>%

  st_transform(st_crs(phlFishnet))

 

# 10. Create new variables

tracts22 <- tracts22 %>%

  mutate(

    pctWhite = ifelse(totalPop > 0, whiteAlone / totalPop * 100, 0),

    pctBlack = ifelse(totalPop > 0, blackAlone / totalPop * 100, 0),

    pctPov = ifelse(totalPop > 0, totalPov / totalPop * 100, 0),

    pctUnemploy = ifelse(totalPop > 0, totalUnemploy / totalPop * 100, 0),

    pctBach = ifelse(totalPop > 0, bachDegree / totalPop * 100, 0),

    pctOwnerOccupied = ifelse(totalPop > 0, ownerOccupied / totalUnit * 100, 0),

    pctRenterOccupied = ifelse(totalPop > 0, renterOccupied / totalUnit * 100, 0)

  ) %>%

  select(-whiteAlone, -blackAlone, -totalPov, -totalUnemploy, -ownerOccupied, -renterOccupied, -totalUnit, -bachDegree, -GEOID) %>%

  st_transform(st_crs(phlFishnet))

 

# 11. Organize into datasets

tracts22.medHHInc <- tracts22 %>%

  select(medHHInc) %>%

  rename(Legend = medHHInc)

 

tracts22.pctWhite <- tracts22 %>%

  select(pctWhite) %>%

  rename(Legend = pctWhite)

 

tracts22.pctBlack <- tracts22 %>%

  select(pctBlack) %>%

  rename(Legend = pctBlack)

 

tracts22.pctPov <- tracts22 %>%

  select(pctPov) %>%

  rename(Legend = pctPov)

 

tracts22.pctUnemploy <- tracts22 %>%

  select(pctUnemploy) %>%

  rename(Legend = pctUnemploy)

 

tracts22.pctBach <- tracts22 %>%

  select(pctBach) %>%

  rename(Legend = pctBach)

 

tracts22.pctOwnerOccupied <- tracts22 %>%

  select(pctOwnerOccupied) %>%

  rename(Legend = pctOwnerOccupied)

 

tracts22.pctRenterOccupied <- tracts22 %>%

  select(pctRenterOccupied) %>%

  rename(Legend = pctRenterOccupied)

# Categorizing the permits for construction and demolition

phlPermits <- phlPermits %>%

  mutate(

    newType = case_when(

      permittype %in% c("BUILDING", "BP_NEWCNST") ~ 'CONSTRUCTION PERMIT',

      permittype %in% c("DEMOLITION", "BP_DEMO") ~ 'DEMOLITION PERMIT'

    )

  )

 

# Filtering construction permits

cnstPermits <- phlPermits %>%

  filter(newType == 'CONSTRUCTION PERMIT')

 

# Filtering demolition permits

demoPermits <- phlPermits %>%

  filter(newType == 'DEMOLITION PERMIT')

 

# Filtering construction permits for the year 2022

cnstPermits_2022 <- cnstPermits %>%

  filter(Year == 2022)

 

# Filtering construction permits for the year 2013

cnstPermits_2013 <- cnstPermits %>%

  filter(Year == 2013)

# Plot 1: Map of all construction and demo permits issued between 2013 and 2022

ggplot() + 

  geom_sf(data = phlBoundary, fill = "grey89", color = "darkgrey") +  

  geom_sf(data = phlPermits, aes(colour = newType), size = 0.5, show.legend = "point") +

  scale_color_manual(values = c("magenta", "lightgreen")) +

  labs(

    title = "Major Permits Issued, 2013-22 in Philadelphia",

    caption = "Figure 1"

  ) +

  theme_void()

# Plot 2 + 3: Mapped points and Density map of construction permits issued

ggplot() + 

  geom_sf(data = phlBoundary, fill = "grey89", color = "darkgrey") +  

  geom_sf(data = cnstPermits, aes(color = "magenta"), size = 0.5) + 

  scale_color_identity(guide = "legend", name = NULL, labels = "Construction Permits") +

  labs(title = "Construction Permits Issued, 2013-22 in Philadelphia",

       caption = "Figure 2") +

  theme_void()

 

ggplot() + 

  geom_sf(data = phlBoundary, fill = "grey89", color = "darkgrey") +  

  stat_density2d(data = data.frame(st_coordinates(cnstPermits)),  

                 aes(X, Y, fill = ..level.., alpha = ..level..),  

                 size = 0.01, bins = 40, geom = 'polygon') +  

  scale_fill_viridis_c(option = "rocket") +  

  labs(title = "Density of Construction Permits") +  

  theme_void() + 

  theme(legend.position = "none")

 

# Plot 4 + 5: Mapped points and Density map of demolition permits issued

ggplot() + 

  geom_sf(data = phlBoundary, fill = "grey89", color = "darkgrey") +  

  geom_sf(data = demoPermits, aes(colour = "lightgreen"), size = 0.5) +

  scale_color_identity(guide = "legend", name = NULL, labels = "Demolition Permits") +

  labs(title = "Demolition Permits Issued, 2013-22 in Philadelphia",

       caption = "Figure 4") +

  theme_void()

 

ggplot() + 

  geom_sf(data = phlBoundary, fill = "grey89", color = "darkgrey") +  

  stat_density2d(data = data.frame(st_coordinates(demoPermits)),  

                 aes(X, Y, fill = ..level.., alpha = ..level..),  

                 size = 0.01, bins = 40, geom = 'polygon') +  

  scale_fill_viridis_c(option = "rocket") +  

  scale_alpha(range = c(0.00, 0.35), guide = FALSE) +  

  labs(title = "Density of Demolition Permits") +  

  theme_void() + 

  theme(legend.position = "none")

 

# Plot 6: Affordable housing

ggplot() + 

  geom_sf(data = phlBoundary, fill = "grey89", color = "darkgrey") +  

  geom_sf(data = affordableHousing, aes(color = "orange"), size = 0.5) + 

  scale_color_identity(guide = "legend", name = NULL, labels = "Affordable Housing Units") +

  labs(title = "Affordable Housing Developments, 2013-22 in Philadelphia",

       caption = "Figure 6") +

  theme_void()

 

# Plot 7: Green spaces

ggplot() + 

  geom_sf(data = phlBoundary, fill = "grey89", color = "darkgrey") +  

  geom_sf(data = greenSpace, aes(color = "darkgreen"), size = 0.5) + 

  scale_color_identity(guide = "legend", name = NULL, labels = "Green Spaces") +

  labs(title = "Green Spaces, 2013-22 in Philadelphia",

       caption = "Figure 7") +

  theme_void()

 

# Plot 8 + 9: Vacant land point and density maps

ggplot() + 

  geom_sf(data = phlBoundary, fill = "grey89", color = "darkgrey") +  

  geom_sf(data = vacantBuilding, aes(color = 'darkred'), size = 0.5, show.legend = "point") +

  scale_color_identity(guide = "legend", name = NULL, labels = "Vacant Buildings") +

  labs(title = "Suspected Vacant Buildings in Philadelphia",

       caption = "Figure 8") +

  theme_void()

 

ggplot() + 

  geom_sf(data = phlBoundary, fill = "grey89", color = "darkgrey") +  

  stat_density2d(data = data.frame(st_coordinates(vacantBuilding)),  

                 aes(X, Y, fill = ..level.., alpha = ..level..),  

                 size = 0.01, bins = 40, geom = 'polygon') +  

  scale_fill_viridis_c(option = "rocket") +  

  scale_alpha(range = c(0.00, 0.35), guide = FALSE) + 

  labs(title = "Density of Vacant Buildings") +  

  theme_void() + 

  theme(legend.position = "none")

 

# Plot 10: Demographic

ggplot() +

  geom_sf(data = tracts22, aes(color = NA, fill = pctUnemploy)) +

  scale_color_viridis() +

  scale_fill_viridis_c() +

  geom_sf(data = cnstPermits, aes(color = "yellow"), show.legend = "point", size = 0.1, alpha = 0.3) +

  scale_color_identity() +

  labs(title = "% Unemployment around Permits Issued") +

  mapTheme() +

  theme(plot.title = element_text(size = 10), legend.title = element_blank())

In this project, we construct a geospatial dataset to examine how the risk of new development is distributed throughout Philadelphia. To achieve this, we created a fishnet, which is a continuous grid pattern consisting of cells measuring 500x500 feet. This grid enables us to conceptualize development risk, which is represented by the presence of new construction permits.

 

By overlaying the fishnet onto the city, we can visualize areas with varying levels of development activity. The following plot depicts the count of new construction permits across the fishnet, with grid cells shown in yellow indicating the highest permit counts. This visualization provides a clear representation of where new developments are most concentrated, helping us understand and analyze the spatial patterns of urban growth and development risk in Philadelphia.

Screenshot 2024-06-05 at 3.36.32 PM.png

   View Code

ggplot() +

  geom_sf(data = tracts22, aes(color = NA, fill = pctBach)) +

  scale_color_viridis() +

  scale_fill_viridis() +

  geom_sf(data = cnstPermits, aes(color = "yellow"), show.legend = "point", size = 0.1, alpha = 0.3) +

  scale_color_identity() +

  labs(title = "% Population with Bachelor's Degree around Permits Issued")

Nearest Neighbor

This code segment creates a nearest neighbor feature for different categories, including vacant buildings, affordable housing, green spaces, and building demolitions. For each category, it first maps the nearest feature from a given dataset to another dataset, converts the datasets to rsgeo geometries, calculates the Euclidean distance between each observation in one dataset and its nearest neighbor in the other dataset. Finally, the calculated distances are appended as new variables to the original dataset for each category, such as dist_vacantBuilding, dist_affordableHousing, dist_greenSpace, anddist_buildingDemolition. Essentially this process turns each individual grid cell into a centroid point and then measures the distances between each to find the nearest risk factor point.

Screenshot 2024-06-05 at 3.53.22 PM.png

   View Code

# Attaching datasets on spatial factors to Fishnet

 

## 1. Extracting geometry for spatial factors

affordableHousings <- affordableHousing %>%

  dplyr::select(geometry, Legend)

 

vacantBuildings <- vacantBuilding %>%

  dplyr::select(geometry, Legend)

 

greenSpaces <- greenSpace %>%

  dplyr::select(geometry, Legend)

 

buildingDemolitions <- buildingDemolition %>%

  dplyr::select(geometry, Legend)

 

## 2. Creating fishnet of spatial factor variables

vars_net <- rbind(

  affordableHousings, 

  vacantBuildings,

  greenSpaces, 

  buildingDemolitions

) %>%

  st_join(., phlFishnet, join = st_within) %>%

  st_drop_geometry() %>%

  group_by(uniqueID, Legend) %>%

  summarize(count = n()) %>%

  full_join(phlFishnet, by = "uniqueID") %>%

  spread(Legend, count, fill = 0) %>%

  st_sf() %>%

  na.omit() %>%

  dplyr::select(-`<NA>`) %>%

  ungroup()

 

cnstPermits <- st_transform(cnstPermits, st_crs(phlFishnet))

 

construction_net <- cnstPermits %>%

  dplyr::select() %>%

  mutate(countPermits = 1) %>%

  aggregate(., phlFishnet, sum) %>%

  mutate(

    countPermits = replace_na(countPermits, 0),

    uniqueID = rownames(.),

    cvID = sample(round(nrow(phlFishnet) / 24), size = nrow(phlFishnet), replace = TRUE)

  )

 

# Visualizing permits joined to fishnet

ggplot() +

  geom_sf(data = construction_net, aes(fill = countPermits), color = NA) +

  geom_sf(data = st_boundary(phlFishnet), color = "darkred", lwd = 0.04, alpha = 0.2) + # Add boundaries in red

  scale_fill_viridis_c(option = "plasma", name = 'Construction Counts') +

  labs(

    title = "Construction Permits Joined to Fishnet",

    subtitle = 'Philadelphia'

  ) + 

  mapTheme() + 

  plotTheme()

## 1.2. Vacant Buildings

 

### Mapping nearest feature

nearest_vacantBuilding <- sf::st_nearest_feature(vars_net, vacantBuilding)

 

### Converting to rsgeo geometries

x <- rsgeo::as_rsgeo(vars_net)

y <- rsgeo::as_rsgeo(vacantBuilding)

 

### Calculating distance

vars_net$dist_vacantBuilding <- rsgeo::distance_euclidean_pairwise(x, y[nearest_vacantBuilding])

 

## 1.3. Affordable Housing

 

### Mapping nearest feature

nearest_affordableHousing <- sf::st_nearest_feature(vars_net, affordableHousing)

 

### Converting to rsgeo geometries

x <- rsgeo::as_rsgeo(vars_net)

y <- rsgeo::as_rsgeo(affordableHousing)

 

### Calculating distance

vars_net$dist_affordableHousing <- rsgeo::distance_euclidean_pairwise(x, y[nearest_affordableHousing])

 

## 1.4. Green Spaces

 

### Mapping nearest feature

nearest_greenSpace <- sf::st_nearest_feature(vars_net, greenSpace)

 

### Converting to rsgeo geometries

x <- rsgeo::as_rsgeo(vars_net)

y <- rsgeo::as_rsgeo(greenSpace)

 

### Calculating distance

vars_net$dist_greenSpace <- rsgeo::distance_euclidean_pairwise(x, y[nearest_greenSpace])

 

## 1.8. Building Demolitions

 

### Mapping nearest feature

nearest_buildingDemolition <- sf::st_nearest_feature(vars_net, buildingDemolition)

 

### Converting to rsgeo geometries

x <- rsgeo::as_rsgeo(vars_net)

y <- rsgeo::as_rsgeo(buildingDemolition)

 

### Calculating distance

vars_net$dist_buildingDemolition <- rsgeo::distance_euclidean_pairwise(x, y[nearest_buildingDemolition])

 

# 2. Visualizing nearest distance for spatial factors on Fishnet

 

## 2.1. Visualizing the nearest three features

vars_net.long.nn <- dplyr::select(vars_net, starts_with("dist")) %>%

  gather(Variable, value, -geometry)

 

vars <- unique(vars_net.long.nn$Variable)

mapList <- list()

 

for (i in vars) {

  mapList[[i]] <- ggplot() +

    geom_sf(data = filter(vars_net.long.nn, Variable == i), aes(fill = value), colour = NA) +

    scale_fill_viridis_c(option = "plasma", name = " ") +

    labs(title = i) +

    mapTheme() +

    theme(plot.title = element_text(size = 12, color = "black"))

}

 

bottomCaption <- textGrob("Figure 8", gp = gpar(hjust = 0))

 

do.call(grid.arrange, c(list(grobs = mapList, ncol = 2, 

                             top = textGrob("Spatial Factors: Nearest Neighbor Distance for Permits Issued\n", 

                                            gp = gpar(fontsize = 15, fontface = "bold", col = "darkred")), 

                             bottom = bottomCaption)))

This spatial analysis is crucial for understanding the relationships between different urban features and their proximity to one another. By calculating the distances between variables like vacant buildings, affordable housing, green spaces, and building demolitions, it provides insights into the spatial distribution and potential interactions among these features.

Screenshot 2024-06-05 at 4.11.03 PM.png

   View Code

# Joining Census Data to Fishnet

tracts22 <- tracts22 %>%

  filter(totalPop > 0)

 

vars_net <- vars_net %>%

  st_centroid() %>%

  st_join(tracts22)

 

vars_net <- vars_net %>%

  mutate_all(~replace(., is.na(.), 0))

 

# Perform Spatial Join of variables with permits

final_net <- left_join(construction_net, st_drop_geometry(vars_net), by = "uniqueID")

 

# Final Net

final_net <- final_net %>%

  st_centroid() %>%

  st_join(dplyr::select(phillyNeighborhoods, name), by = "uniqueID") %>%

  st_drop_geometry() %>%

  left_join(dplyr::select(final_net, geometry, uniqueID)) %>%

  st_sf() %>%

  na.omit()

 

## Generates warnings from PROJ issues

## Using {spdep} to make polygons to neighborhoods...

final_net.nb <- poly2nb(as_Spatial(final_net), queen = TRUE)

 

## ... and neighborhoods to list of weights

final_net.weights <- nb2listw(final_net.nb, style = "W", zero.policy = TRUE)

 

# print(final_net.weights, zero.policy = TRUE)

Modeling Process - Local Moran's I

Incorporating Local Moran’s I analysis into the spatial modeling process is pivotal for understanding the distribution of construction permits across urban areas. Local Moran’s I serves as a vital tool, providing insights into the spatial clustering or dispersion of permit issuance, thereby enhancing the model’s capacity to identify significant spatial patterns in urban development. By integrating Local Moran’s I statistics with other spatial factors, such as nearest neighbor distances and geographic features, the model gains a comprehensive understanding of the underlying spatial dynamics influencing construction permit distributions, thereby facilitating informed urban planning and policy decisions.In the context of this model, Local Moran’s I helps identify significant spatial patterns in construction permit issuance, indicating areas with high or low concentrations of permits.

Initially, the local Moran’s I values are computed for the permit counts using spatial weights. Then, these results are joined with the spatial data, and significant hotspots are identified based on a significance threshold of 0.05. The output is organized into a long format for plotting, where each variable’s Local Moran’s I statistics are visualized as spatial maps. These maps depict the spatial clustering patterns of permit issuance, indicating areas of significant clustering or dispersion.

   View Code

## see ?localmoran

local_morans <- localmoran(final_net$countPermits, final_net.weights, zero.policy = TRUE) %>% 

  as.data.frame()

 

# Join local Moran's I results to fishnet

final_net.localMorans <- cbind(local_morans, as.data.frame(final_net)) %>% 

  st_sf() %>%

  dplyr::select(

    Permit_Count = countPermits, 

    Local_Morans_I = Ii, 

    P_Value = `Pr(z != E(Ii))`

  ) %>%

  mutate(Significant_Hotspots = ifelse((P_Value <= 0.05), 1, 0)) %>%

  gather(Variable, Value, -geometry)

 

## This is just for plotting

vars <- unique(final_net.localMorans$Variable)

varList <- list()

 

for (i in vars) {

  varList[[i]] <- ggplot() +

    geom_sf(data = filter(final_net.localMorans, Variable == i), aes(fill = Value), colour = NA) +

    scale_fill_viridis(name = "") +

    labs(title = i) +

    theme_void() + 

    theme(legend.position = "bottom")

}

 

do.call(grid.arrange, c(varList, ncol = 4, top = "Local Morans I Statistics, Construction Permits"))

 

saveRDS(local_morans, file = "local_morans.rds")

Screenshot 2024-06-05 at 4.22.25 PM.png

Distance to Hotspots

Shorter distances between permits indicate areas with higher construction density, which can be associated with rapid urban development or gentrification.

Screenshot 2024-06-05 at 4.32.10 PM.png

   View Code

local_morans <- readRDS("local_morans.rds")

 

final_net <- final_net %>%

  mutate(

    permit.isSig = ifelse(local_morans[, 5] <= 0.0000001, 1, 0),

    permit.isSig.dist = nn_function(

      st_coordinates(st_centroid(final_net)), 

      st_coordinates(st_centroid(filter(final_net, permit.isSig == 1))), 

      1

    )

  )

 

# Plotting NN hotspots

ggplot() +

  geom_sf(data = final_net, aes(fill = permit.isSig.dist), colour = NA) +

  scale_fill_viridis(name = "NN Distance") +

  labs(title = "Permit NN Distance", caption = "Figure xx") +

  mapTheme() +

  plotTheme()

 

# lmoran <- localmoran(final_net$lightsout, final_net.weights, zero.policy = TRUE)

 

# final_net$lmI <- lmoran[, "Ii"] # local Moran's I

# final_net$lmZ <- lmoran[, "Z.Ii"] # z-scores

# final_net$lmp <- lmoran[, "Pr(z != E(Ii))"]

 

# mp <- moran.plot(as.vector(scale(final_net$lightsout)), final_net.weights, zero.policy = TRUE)

 

## Create a hotspot variable:

# final_net$lmp <- ifelse(is.nan(final_net$lmp), 0.10, final_net$lmp)

# final_net$hotspot <- 0

# final_net[(mp$x >=0 & mp$wx >=0) & final_net$lmp <= 0.05, "hotspot"] <- 1

 

# generates warning from NN

# final_net <- final_net %>% 

#   mutate(lightsout.isSig.dist = 

#     nn_function(st_coordinates(st_centroid(final_net)),

#                 st_coordinates(st_centroid(filter(final_net, 

#                                                   hotspot == 1))), 

#                 k = 1))

 

# ggplot() +

#   geom_sf(data = final_net, aes(fill = lightsout.isSig.dist), colour = NA) +

#   scale_fill_viridis(name = "NN Distance") +

#   labs(title = "Alley Lights Out NN Distance") +

#   theme_void()

Building and evaluating the model

The model used in this analysis is a poisson regression model. The collected and cleaned data that was used in this model was demographic information like race, income, and work status; housing information like percent renter and owner occupied; regional characteristics like vacant buildings, construction permits, affordable housing, and demolitions.

Correlation

There is a relatively high positive correlation between median household income and the percentage of the population that is white. This suggests that areas with more white residents tend to be higher-earning neighborhoods, or that a neighborhood experiencing gentrification may have more higher-income and white residents. Conversely, there is a relatively high negative correlation between total population and the number of vacant buildings. This indicates that more populous areas have fewer vacant buildings, which could be linked to gentrification, as crowded, desirable neighborhoods are more likely to redevelop vacant properties.

 

Conducting a regression model in this context is essential for several reasons. Firstly, it quantifies the relationships between various factors and outcomes of interest, such as the number of permits issued or the likelihood of development. This helps identify significant predictors and their impact, aiding in understanding the dynamics of development risk. Additionally, regression modeling assesses the relative importance of different variables, helping prioritize interventions and inform decision-making to mitigate development-related challenges or promote sustainable urban growth.

 

Cross-validation is performed to evaluate the predictive performance of Poisson regression models on the dataset. This process involves splitting the dataset into distinct folds, where each fold serves as a training set for model development and a testing set for performance assessment. In each iteration, a Poisson regression model is trained on the training data and then applied to the testing data to make predictions. These predictions are compared against the actual values to assess the model’s accuracy and robustness. Cross-validation ensures that the models generalize well to unseen data and provide reliable predictions in real-world scenarios, enhancing the overall reliability of the analysis.

 

The provided code implements a cross-validation procedure to evaluate Poisson regression models on the final_net dataset. The function crossValidate() performs cross-validation by splitting the dataset into distinct folds (cvID) for training and testing. In each fold, a Poisson regression model is trained on the training data (fold.train) and applied to the testing data (fold.test) to make predictions. These predictions are aggregated across all folds to assess the model’s overall predictive performance. Cross-validation validates the model’s ability to generalize to unseen data, ensuring the reliability of the regression analysis.

Screenshot 2024-06-05 at 4.40.36 PM.png
Screenshot 2024-06-05 at 4.40.46 PM.png

   View Code

correlation.long <- st_drop_geometry(final_net) %>%

  dplyr::select(-uniqueID, -cvID, -name) %>%

  gather(Variable, Value, -countPermits) %>%

  mutate(Value = as.numeric(Value))

 

correlation.cor <- correlation.long %>%

  group_by(Variable) %>%

  summarize(correlation = cor(Value, countPermits, use = "complete.obs"))

 

# Visualizing correlations through scatter plots

# We have a lot of demographic variables here that I don't know if we necessarily need or are interested in keeping for our final stuff

# It may be better to pick and choose fewer demo variables and maybe choose more external variables

 

ggplot(correlation.long, aes(Value, countPermits)) +

  geom_point(size = 0.1) +

  geom_text(data = correlation.cor, aes(label = paste("r =", round(correlation, 2))),

            x = -Inf, y = Inf, vjust = 1.5, hjust = -0.1) +

  geom_smooth(method = "lm", se = FALSE, colour = "orange") +

  facet_wrap(~Variable, ncol = 4, scales = "free") +

  labs(title = "Permit Count as a function of risk factors", caption = "Figure 12") +

  plotTheme()

 

numvars <- c("countPermits", "dist_vacantBuilding", "dist_affordableHousing", "dist_greenSpace", "dist_buildingDemolition", "totalPop", "medHHInc", "pctWhite", "pctBlack", "pctBach", "pctPov", "pctUnemploy", "pctOwnerOccupied", "pctRenterOccupied")

 

numeric <- final_net %>%

  st_drop_geometry() %>%

  dplyr::select(numvars) %>%

  na.omit()

 

ggcorrplot(

  round(cor(numeric), 1), 

  p.mat = cor_pmat(numeric),

  colors = c('#d7191c', 'white', '#2c7bb6'),

  type = "lower",

  insig = "blank"

) +  

  labs(title = "Correlation across Variables\n", caption = "Figure 13") + 

  theme(plot.title = element_text(size = 11, face = "bold", color = "darkred")) +

  theme(axis.text.x = element_text(size = 8)) +

  theme(axis.text.y = element_text(size = 8))

Variable Selection

Based on the correlation analysis, variables such as ‘Vacant Building Distance’, ‘Affordable Housing Distance’, ‘Green Space Distance’, ‘Building Demolition Distance’, ‘Median Household Income’, ‘Percent Below Poverty Line’, ‘Percent Unemployed’, and ‘Percent Owner Occupied’ are selected as model variables.

Cross Validation

This shows the process of conducting cross-validation on Poisson regression models using the crossValidate function, aimed at predicting the number of construction permits (countPermits) in Philadelphia neighborhoods. Different models are being evaluated: a non-spatial process model using a defined set of variables (reg.vars) and a spatial process model, which integrates spatial characteristics along with the same set of variables to understand how including spatial data influences the accuracy and predictiveness of the model outcomes. This cross-validation approach helps in assessing the model’s robustness and generalizability by testing it on multiple subsets of data.

   View Code

## Define the variables we want

reg.vars <- c("dist_vacantBuilding", "dist_affordableHousing", "dist_greenSpace", "dist_buildingDemolition", "medHHInc", "pctBach", "pctPov", "pctUnemploy", "pctOwnerOccupied")

 

reg.ss.vars <- c("dist_vacantBuilding", "dist_affordableHousing", "dist_greenSpace", "dist_buildingDemolition", "medHHInc", "pctBach", "pctPov", "pctUnemploy", "pctOwnerOccupied", "permit.isSig")

 

## Creating functions for cross-validation

crossValidate <- function(dataset, id, dependentVariable, indVariables) {

  

  allPredictions <- data.frame()

  cvID_list <- unique(dataset[[id]])

  

  for (i in cvID_list) {

    

    thisFold <- i

    cat("This hold out fold is", thisFold, "\n")

    

    fold.train <- filter(dataset, dataset[[id]] != thisFold) %>% 

      as.data.frame() %>% 

      dplyr::select(id, geometry, indVariables, dependentVariable)

    

    fold.test <- filter(dataset, dataset[[id]] == thisFold) %>% 

      as.data.frame() %>% 

      dplyr::select(id, geometry, indVariables, dependentVariable)

    

    regression <- glm(countPermits ~ ., family = "poisson", 

                      data = fold.train %>% 

                        dplyr::select(-geometry, -id))

    

    thisPrediction <- mutate(fold.test, Prediction = predict(regression, fold.test, type = "response"))

    

    allPredictions <- rbind(allPredictions, thisPrediction)

  }

  

  return(st_sf(allPredictions))

}

 

# Conducting cross-validation on Poisson regression models

reg.cv <- crossValidate(

  dataset = final_net,

  id = "cvID",

  dependentVariable = "countPermits",

  indVariables = reg.vars

) %>%

  dplyr::select(cvID = cvID, countPermits, Prediction, geometry)

 

reg.ss.cv <- crossValidate(

  dataset = final_net,

  id = "cvID",

  dependentVariable = "countPermits",

  indVariables = reg.ss.vars

) %>%

  dplyr::select(cvID = cvID, countPermits, Prediction, geometry)

 

final_net$name <- ifelse(is.na(final_net$name), "UNKNOWN", final_net$name)

 

reg.spatialCV <- crossValidate(

  dataset = final_net,

  id = "name",

  dependentVariable = "countPermits",

  indVariables = reg.vars

) %>%

  dplyr::select(cvID = name, countPermits, Prediction, geometry)

 

reg.ss.spatialCV <- crossValidate(

  dataset = final_net,

  id = "name",

  dependentVariable = "countPermits",

  indVariables = reg.ss.vars

) %>%

  dplyr::select(cvID = name, countPermits, Prediction, geometry)

Error Calculation

In each regression, the absolute error is determined by calculating the difference between the predicted and the actual observed counts of new construction permits.

The areas with some of the greatest mean absolute error are closer to center city, such as Fishtown, University City, and Graduate Hospital. Regions of Philadelphia close to the North, West, South, and along the Delaware river showed lower errors, both in terms of just risk factors and spatial process. There was higher error shown in the Spatial LOGO-CV process than the other validation methods. The random k-fold processes resulted in lower mean absolute error.

This displays the residuals of different regression models used to predict new construction permits, segmented by model type. From the plot, it’s evident that the distribution of residuals varies across the models, with some showing a tighter cluster around the zero line (indicating better prediction accuracy) and others displaying more spread, suggesting less precision.

   View Code

# Calculate errors

reg.summary <- rbind(

  mutate(reg.cv, Error = Prediction - countPermits, Regression = "Random k-fold CV: Just Risk Factors"),

  mutate(reg.ss.cv, Error = Prediction - countPermits, Regression = "Random k-fold CV: Spatial Process"),

  mutate(reg.spatialCV, Error = Prediction - countPermits, Regression = "Spatial LOGO-CV: Just Risk Factors"),

  mutate(reg.ss.spatialCV, Error = Prediction - countPermits, Regression = "Spatial LOGO-CV: Spatial Process")

) %>%

  st_sf()

 

# Calculate MAE and standard deviation for each fold and method

error_by_reg_and_fold <- reg.summary %>% 

  group_by(Regression, cvID) %>% 

  summarize(

    Mean_Error = mean(Prediction - countPermits, na.rm = TRUE),

    MAE = mean(abs(Mean_Error), na.rm = TRUE),

    SD_MAE = mean(abs(Mean_Error), na.rm = TRUE),

    .groups = 'drop'

  )

 

# Arrange by MAE for viewing

error_by_reg_and_fold %>% arrange(desc(MAE))

error_by_reg_and_fold %>% arrange(MAE)

 

# Plot histogram of OOF errors for each method

error_by_reg_and_fold %>%

  ggplot(aes(x = MAE)) + 

  geom_histogram(bins = 30, colour = "black", fill = "#FDE725FF") +

  facet_wrap(~ Regression, scales = "free") +

  scale_x_continuous(breaks = seq(0, 11, by = 1)) +

  labs(title = "Distribution of MAE", subtitle = "Random K-Fold and LOGO-CV", x = "Mean Absolute Error", y = "Count") +

  theme(plot.title = element_text(size = 15, face = "bold", color = "darkred"))

 

error_by_reg_and_fold %>%

  filter(str_detect(Regression, "LOGO")) %>%

  ggplot() +

  geom_sf(data = phlBoundary, fill = "white", color = "darkgrey") +

  geom_sf(aes(fill = MAE)) +

  facet_wrap(~ Regression) +

  scale_colour_viridis(option = "plasma") +

  scale_fill_viridis(option = "plasma") +

  labs(title = "Errors by LOGO-CV Regression", caption = "Figure 15") +

  theme(plot.title = element_text(size = 15, face = "bold", color = "darkred")) +

  theme(strip.text = element_text(size = 8))

 

# Table of MAE and Standard Deviation MAE

st_drop_geometry(error_by_reg_and_fold) %>%

  group_by(Regression) %>% 

  summarize(

    Mean_MAE = round(mean(MAE), 2),

    SD_MAE = round(sd(MAE), 2)

  ) %>%

  kable(caption = "Table 1: MAE and standard deviation MAE by regression") %>%

  kable_styling("striped", full_width = FALSE) %>%

  row_spec(2, color = "black", background = "#FDE725FF") %>%

  row_spec(4, color = "black", background = "#FDE725FF")

 

# Check for unique row names and reset if necessary

if (anyDuplicated(row.names(error_by_reg_and_fold))) {

  rownames(error_by_reg_and_fold) <- make.unique(as.character(row.names(error_by_reg_and_fold)))

}

 

# Create weights only for the selected regression type

neighborhood.weights <- error_by_reg_and_fold %>%

  filter(Regression == "Spatial LOGO-CV: Spatial Process") %>%

  st_as_sf() %>%

  group_by(cvID) %>%

  poly2nb(., queen = TRUE) %>%

  nb2listw(., style = "W", zero.policy = TRUE)

 

filter(error_by_reg_and_fold, str_detect(Regression, "LOGO")) %>% 

  st_drop_geometry() %>%

  group_by(Regression) %>%

  summarize(

    Morans_I = moran.mc(abs(Mean_Error), neighborhood.weights, nsim = 999, zero.policy = TRUE, na.action = na.omit)[[1]],

    p_value = moran.mc(abs(Mean_Error), neighborhood.weights, nsim = 999, zero.policy = TRUE, na.action = na.omit)[[3]]

  ) %>%

  kable(caption = "Table 2: Moran's I on Errors by Regression") %>%

  kable_styling("striped", full_width = FALSE) %>%

  row_spec(1, color = "black", background = "#FDE725FF")

 

reg.summary <- reg.summary %>%

  mutate(Residuals = countPermits - Prediction)

 

# Residual plot

ggplot(reg.summary, aes(x = Prediction, y = Residuals)) +

  geom_point(alpha = 0.4) +  # Use alpha to adjust point transparency

  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +  # Add a horizontal line at y = 0

  labs(title = "Residual Plot", x = "Predicted Values", y = "Residuals") +

  theme_minimal()

 

ggplot(reg.summary, aes(x = Prediction, y = Residuals, color = Regression)) +

  geom_point(alpha = 0.4) +

  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +

  facet_wrap(~ Regression) +  # Separate plot for each regression type

  labs(title = "Residual Plot by Regression Type", x = "Predicted Values", y = "Residuals") +

  scale_color_viridis_d() +  # Corrected to use discrete color scale

  theme_minimal()

Kernel Density

The series of plots provided illustrate the spatial distribution of construction permits in Philadelphia using kernel density estimates with different search radii and a comparison of these estimates with spatial risk predictions. The first set of images shows kernel density maps for construction permits using three different search radii (1000 ft., 1500 ft., and 2000 ft.). As the search radius increases, the density map becomes smoother and broader, indicating a generalization in the concentration of construction activity across the city. This helps identify which areas are experiencing high volumes of construction relative to others, potentially pointing to hotspots of development or gentrification.

The plots below delve deeper by overlaying actual permit locations on the density estimates and comparing these against spatial risk predictions. These risk predictions categorize areas based on the predicted level of construction activity (from “Minimal” to “Very High” risk), allowing for a nuanced understanding of development intensity. These visualizations are crucial as they highlight areas where infrastructure and regulations might need to be reassessed due to rapid development. Specifically, areas classified as “Very High Risk” may require more attention to manage the impacts of gentrification, such as displacement and changes in the socio-economic fabric.

The chart below reveals significant differences in the distribution of predicted permit counts between the two models. Spatial Risk Predictions consistently estimate higher rates of permits across all risk categories when compared to Kernel Density Estimates, with particularly stark contrasts in the “Very High Risk” and “Low Risk” categories. This suggests that the Spatial Risk Prediction model may be more sensitive to factors that predict higher construction activity, potentially providing a more dynamic or nuanced insight into the spatial distribution of construction risk compared to the more generalized Kernel Density approach.

1.png
2.png
4.png
3.png

   View Code

# Demo of kernel width

permits_ppp <- as.ppp(st_coordinates(cnstPermits), W = st_bbox(final_net))

permits_KD.1000 <- density.ppp(permits_ppp, 1000)

permits_KD.1500 <- density.ppp(permits_ppp, 1500)

permits_KD.2000 <- density.ppp(permits_ppp, 2000)

 

permits_KD.df <- rbind(

  mutate(data.frame(rasterToPoints(mask(raster(permits_KD.1000), as(phillyNeighborhoods, 'Spatial')))), Legend = "1000 Ft."),

  mutate(data.frame(rasterToPoints(mask(raster(permits_KD.1500), as(phillyNeighborhoods, 'Spatial')))), Legend = "1500 Ft."),

  mutate(data.frame(rasterToPoints(mask(raster(permits_KD.2000), as(phillyNeighborhoods, 'Spatial')))), Legend = "2000 Ft.")

)

 

permits_KD.df$Legend <- factor(permits_KD.df$Legend, levels = c("1000 Ft.", "1500 Ft.", "2000 Ft."))

 

ggplot(data = permits_KD.df, aes(x = x, y = y)) +

  geom_raster(aes(fill = layer)) + 

  facet_wrap(~Legend) +

  coord_sf(crs = st_crs(final_net)) + 

  scale_fill_viridis(name = "Density") +

  labs(title = "Kernel density with 3 different search radii") +

  theme_void()

 

# Works but may not be needed unless there's one specific thing to look at 

as.data.frame(permits_KD.1000) %>%

  st_as_sf(coords = c("x", "y"), crs = st_crs(final_net)) %>%

  aggregate(., final_net, mean) %>%

  ggplot() +

  geom_sf(aes(fill = value)) +

  geom_sf(data = sample_n(cnstPermits, 1500), size = 0.5) +

  scale_fill_viridis(name = "Density") +

  labs(title = "Kernel density of Construction Permits") +

  theme_void()

 

# Prediction by Kernel Density Model

permits_KD_sf_2022 <- as.data.frame(permits_KD.1500) %>%

  st_as_sf(coords = c("x", "y"), crs = st_crs(final_net)) %>%

  aggregate(., final_net, mean) %>%

  mutate(

    label = "Kernel Density Estimates",

    Risk_Level = ntile(value, 5),  # Change granularity of risk levels

    Risk_Level = case_when(

      Risk_Level == 5 ~ "Very High Risk",

      Risk_Level == 4 ~ "High Risk",

      Risk_Level == 3 ~ "Moderate Risk",

      Risk_Level == 2 ~ "Low Risk",

      TRUE ~ "Minimal Risk"

    )

  ) %>%

  cbind(

    aggregate(

      dplyr::select(cnstPermits_2022) %>% mutate(permitCount = 1), ., sum

    ) %>%

    mutate(permitCount = replace_na(permitCount, 0))

  ) %>%

  dplyr::select(label, Risk_Level, permitCount)

 

# Prediction by Risk Prediction Model

permits_KD_risk_sf_2022 <- reg.ss.spatialCV %>%

  mutate(

    label = "Spatial Risk Predictions",

    Risk_Level = ntile(Prediction, 5),

    Risk_Level = case_when(

      Risk_Level == 5 ~ "Very High Risk",

      Risk_Level == 4 ~ "High Risk",

      Risk_Level == 3 ~ "Moderate Risk",

      Risk_Level == 2 ~ "Low Risk",

      TRUE ~ "Minimal Risk"

    )

  ) %>%

  cbind(

    aggregate(

      dplyr::select(cnstPermits_2022) %>% mutate(permitCount = 1), ., sum

    ) %>%

    mutate(permitCount = replace_na(permitCount, 0))

  ) %>%

  dplyr::select(label, Risk_Level, permitCount)

 

unique(permits_KD_sf_2022$Risk_Level)  # Check for the 'permits_KD_sf' dataset

unique(permits_KD_risk_sf_2022$Risk_Level)  # Check for the 'permits_KD_risk_sf' dataset

 

summary(permits_KD_sf_2022$Risk_Level)

summary(permits_KD_risk_sf_2022$Risk_Level)

 

# Ensure 'option' and 'direction' are set to handle fewer categories

scale_fill_viridis_d(option = "plasma", direction = 1, begin = 0, end = 1, name = "Risk Level")

 

# Plotting comparison between both models

rbind(permits_KD_sf_2022, permits_KD_risk_sf_2022) %>%

  na.omit() %>%

  gather(Variable, Value, -label, -Risk_Level, -geometry) %>%

  ggplot() +

  geom_sf(aes(fill = Risk_Level), colour = NA) +

  geom_sf(data = sample_n(cnstPermits_2022, 1130), size = 0.5, colour = "black") +

  facet_wrap(~label) +

  scale_fill_viridis_d(option = "plasma", name = 'Risk Level') + 

  labs(

    title = "Comparison of Kernel Density and Risk Predictions",

    subtitle = "Bottom Layer is 2022 Predicted Permit Counts.\nDots are observed 2022 Permit Counts.\n",

    caption = 'Figure 19'

  ) +

  mapTheme() +

  plotTheme()

 

# Comparison of predictions - Bar Plots

rbind(permits_KD_sf_2022, permits_KD_risk_sf_2022) %>%

  st_set_geometry(NULL) %>%

  na.omit() %>%

  gather(Variable, Value, -label, -Risk_Level) %>%

  group_by(label, Risk_Level) %>%

  summarize(permitCount = sum(Value)) %>%

  ungroup() %>%

  group_by(label) %>%

  mutate(Rate_of_permit_count = permitCount / sum(permitCount)) %>%

  ggplot(aes(Risk_Level, Rate_of_permit_count)) +

  geom_bar(aes(fill = label), position = "dodge", stat = "identity") +

  scale_fill_viridis_d(option = "plasma", name = 'Model') +

  labs(

    title = "Risk Prediction vs. Kernel Density",

    subtitle = 'Philadelphia, PA',

    caption = 'Figure 20',

    x = 'Risk Level',

    y = 'Rate of Permit Counts'

  ) +

  plotTheme()

Risk Mapping

Neighborhoods that may be at a high risk of gentrification by the proxy of new construction permits are mostly located near center city. Most of West Philadelphia, a large portion of North Philadelphia, and some parts of the Northwest are at risk of gentrification. The highest risk neighborhoods are Point Breeze, Fishtown, Rittenhouse, and Northern Liberties.

In addition to these areas, large swathes of West Philadelphia and significant portions of North Philadelphia, as well as some parts of the Northwest, are depicted as moderate to high risk. This extensive distribution suggests a broad impact of development that may influence housing markets, demographic shifts, and local economies across a considerable part of the city.

The implications of such widespread development are complex, potentially leading to increased property values and living costs, which could displace long-standing residents and alter the cultural fabric of these neighborhoods. The maps help identify where proactive measures might be necessary to mitigate the adverse effects of gentrification, such as promoting affordable housing initiatives and supporting local businesses to preserve neighborhood character and inclusivity.

Screenshot 2024-06-05 at 6.11.22 PM.png
Screenshot 2024-06-05 at 6.11.33 PM.png

   View Code

# Assigning Risk Categories for Predictions

permit_risk_sf_2022 <- filter(reg.summary, Regression == "Spatial LOGO-CV: Spatial Process") %>%

  mutate(

    label = "Spatial Risk Predictions",

    Risk_Level = ntile(Prediction, 5),

    Risk_Level = case_when(

      Risk_Level == 5 ~ "Very High Risk",

      Risk_Level == 4 ~ "High Risk",

      Risk_Level == 3 ~ "Moderate Risk",

      Risk_Level == 2 ~ "Low Risk",

      TRUE ~ "Minimal Risk"

    )

  )

 

permit_risk_sf_2022 %>%

  gather(Variable, Value, -label, -Risk_Level, -geometry) %>%

  ggplot() +

  geom_sf(aes(fill = Risk_Level), colour = NA) +

  geom_sf(data = phillyNeighborhoods, fill = "transparent") +

  scale_fill_viridis(discrete = TRUE) +

  labs(

    title = "Risk Predictions",

    subtitle = "New Construction Permit Predictions",

    caption = "Figure 20"

  ) +

  mapTheme() +

  plotTheme()

 

# Neighborhoods at highest risk of permitting construction

highest_risk_nhoods <- permit_risk_sf_2022 %>%

  dplyr::select(cvID, Risk_Level, countPermits) %>%

  dplyr::filter(Risk_Level == "Very High Risk") %>%  # Adjusted to match the new risk category

  st_drop_geometry() %>%

  group_by(cvID) %>%

  summarise(Predicted_Permits = sum(countPermits)) %>%

  dplyr::filter(Predicted_Permits > 52) %>%

  arrange(desc(Predicted_Permits)) %>%

  rename(Neighborhood = cvID)

 

# Plotting the highest risk neighborhoods

ggplot(highest_risk_nhoods) +

  geom_bar(aes(x = reorder(Neighborhood, -Predicted_Permits), y = Predicted_Permits), fill = "#FE9900", stat = "identity", width = 0.5) +

  labs(

    x = "Neighborhood", 

    y = "Predicted Permits", 

    title = "Neighborhoods with Very High Permitting Risk", 

    subtitle = 'Philadelphia, PA', 

    caption = "Figure 21"

  ) +

  theme(

    axis.text.x = element_text(size = 6, angle = 60, vjust = 0.7, hjust = 0.7),

    axis.text.y = element_text(size = 6),

    axis.title = element_text(size = 8), 

    plot.title = element_text(size = 9, face = "bold", color = "darkred"),

    plot.subtitle = element_text(size = 7, face = "italic", color = "black"),

    plot.margin = margin(0, 50, 8, 50, "pt")

  )

Comparing to a Different Time

Let’s see how our model performed relative to KD on another year’s data. Comparing the spatial risk predictions and kernel density estimates from 2013 to 2022 in Philadelphia, there is a noticeable shift in risk distribution and the concentration of predicted permit counts. In 2013, the very high-risk categories were predominantly emphasized in the kernel density model, indicating a higher concentration of construction activity in a few areas. By 2022, the spatial risk predictions model shows a more distributed risk across the city, with a significant portion of the city classified under very high risk, suggesting an expansion in areas targeted for new construction. This shift could imply a broadening of development focus, potentially spreading gentrification pressures more widely across Philadelphia. This comparison highlights the dynamic nature of urban development and the increasing spread of high-risk areas over the years, necessitating updated and responsive planning and policy measures.

Screenshot 2024-06-05 at 6.17.18 PM.png
Screenshot 2024-06-05 at 6.17.25 PM.png
Screenshot 2024-06-05 at 6.17.33 PM.png

   View Code

# Assigning Risk Categories for 2013 Predictions

permits_KD_sf_2013 <- as.data.frame(permits_KD.1500) %>%

  st_as_sf(coords = c("x", "y"), crs = st_crs(final_net)) %>%

  aggregate(., final_net, mean) %>%

  mutate(

    label = "Kernel Density Estimates",

    Risk_Level = ntile(value, 5),  # Change granularity of risk levels

    Risk_Level = case_when(

      Risk_Level == 5 ~ "Very High Risk",

      Risk_Level == 4 ~ "High Risk",

      Risk_Level == 3 ~ "Moderate Risk",

      Risk_Level == 2 ~ "Low Risk",

      TRUE ~ "Minimal Risk"

    )

  ) %>%

  cbind(

    aggregate(

      dplyr::select(cnstPermits_2013) %>% mutate(permitCount = 1), ., sum

    ) %>%

    mutate(permitCount = replace_na(permitCount, 0))

  ) %>%

  dplyr::select(label, Risk_Level, permitCount)

 

# Prediction by Risk Prediction Model

permits_KD_risk_sf_2013 <- reg.ss.spatialCV %>%

  mutate(

    label = "Spatial Risk Predictions",

    Risk_Level = ntile(Prediction, 5),

    Risk_Level = case_when(

      Risk_Level == 5 ~ "Very High Risk",

      Risk_Level == 4 ~ "High Risk",

      Risk_Level == 3 ~ "Moderate Risk",

      Risk_Level == 2 ~ "Low Risk",

      TRUE ~ "Minimal Risk"

    )

  ) %>%

  cbind(

    aggregate(

      dplyr::select(cnstPermits_2013) %>% mutate(permitCount = 1), ., sum

    ) %>%

    mutate(permitCount = replace_na(permitCount, 0))

  ) %>%

  dplyr::select(label, Risk_Level, permitCount)

 

unique(permits_KD_sf_2013$Risk_Level)  # Check for the 'permits_KD_sf' dataset

unique(permits_KD_risk_sf_2013$Risk_Level)  # Check for the 'permits_KD_risk_sf' dataset

 

summary(permits_KD_sf_2013$Risk_Level)

summary(permits_KD_risk_sf_2013$Risk_Level)

 

# Ensure 'option' and 'direction' are set to handle fewer categories

scale_fill_viridis_d(option = "plasma", direction = 1, begin = 0, end = 1, name = "Risk Level")

 

# Plotting comparison between both models

rbind(permits_KD_sf_2013, permits_KD_risk_sf_2013) %>%

  na.omit() %>%

  gather(Variable, Value, -label, -Risk_Level, -geometry) %>%

  ggplot() +

  geom_sf(aes(fill = Risk_Level), colour = NA) +

  geom_sf(data = sample_n(cnstPermits_2013, 1130), size = 0.5, colour = "black") +

  facet_wrap(~label) +

  scale_fill_viridis_d(option = "plasma", name = 'Risk Level') + 

  labs(

    title = "Comparison of Kernel Density and Risk Predictions",

    subtitle = "Bottom Layer is 2013 Predicted Permit Counts.\nDots are observed 2013 Permit Counts.\n",

    caption = 'Figure __'

  ) +

  mapTheme() +

  plotTheme()

 

rbind(permits_KD_risk_sf_2013, permits_KD_risk_sf_2022) %>%

  na.omit() %>%

  gather(Variable, Value, -label, -Risk_Level, -geometry) %>%

  ggplot() +

  geom_sf(aes(fill = Risk_Level), colour = NA) +

  geom_sf(data = sample_n(cnstPermits, 3000), size = 0.5, colour = "black") +

  facet_wrap(~label) +

  scale_fill_viridis(discrete = TRUE) +

  labs(

    title = "Comparison of Kernel Density and Risk Predictions",

    subtitle = "2013 Construction Permits Density; 2022 Construction Permits Density"

  ) +

  mapTheme()

 

rbind(permits_KD_sf_2013, permits_KD_risk_sf_2013) %>%

  st_drop_geometry() %>%

  na.omit() %>%

  gather(Variable, Value, -label, -Risk_Level) %>%

  group_by(label, Risk_Level) %>%

  summarize(countPermits = sum(Value)) %>%

  ungroup() %>%

  group_by(label) %>%

  mutate(Pcnt_cnst_permits = countPermits / sum(countPermits)) %>%

  ggplot(aes(Risk_Level, Pcnt_cnst_permits)) +

  geom_bar(aes(fill = label), position = "dodge", stat = "identity") +

  scale_fill_viridis(discrete = TRUE, name = "Model") +

  labs(

    title = "Risk prediction vs. Kernel density, 2013",

    y = "% of Test Set (per model)",

    x = "Risk Level"

  ) +

  theme_bw() +

  theme(axis.text.x = element_text(angle = 45, vjust = 0.5))

Conclusion

The model developed can be instrumental in planning future policies to protect legacy residents and small businesses in neighborhoods at risk of gentrification by identifying areas where such changes might occur. Built with data on new construction, permitting, and demolitions, among other predictors, the analysis highlights that many neighborhoods around Center City Philadelphia, such as Point Breeze, Rittenhouse, and river-adjacent areas like East Kensington, Fishtown, and Northern Liberties, are particularly vulnerable to gentrification. This tool, while useful, should be utilized alongside other predictive tools to ensure comprehensive planning by the Office of Innovation and Technology and the Smart Cities office. It’s important to note that the variables chosen by Marie Antoinette Predictions were potential indicators of gentrification, but they do not encompass all possible predictors. The presence or absence of civic assets like schools, libraries, green spaces, and public spaces could also be significant markers of gentrification not captured in this model.

 

Moreover, the model does not fully capture the nuances and various types of displacement that can occur. Predicting the order and timeline of displacement is a challenge that OIT and the Smart Cities office could address further. Direct displacement happens when residents are forced to leave due to rising living costs associated with new developments. Indirect displacement occurs when higher-income residents move into an area as lower-income residents move out, which can be tracked through changes in census data on median income and income brackets. Cultural displacement, on the other hand, involves the transformation of civic assets and shops to cater to a new demographic, altering the neighborhood's character. A comprehensive model that tracks these different phases using various data sources would be invaluable for the city to effectively identify and manage gentrification.

References

1. https://whyy.org/articles/philadelphia-americas-poorest-big-city-poverty/

2. https://www.pewtrusts.org/-/media/assets/2020/09/phillyhousingreport.pdf

3. https://whyy.org/articles/philadelphia-affordable-housing-bills-council-gauthier/

4. https://sites.utexas.edu/gentrificationproject/understanding-gentrification-and-displacement/

5. https://hai.stanford.edu/news/spotting-visual-signs-gentrification-scale

6. https://opendataphilly.org/

© Kamya Khandelwal

bottom of page