Urban Form and Walkability, Kisumu, Kenya
Geospatial Data Science in Python

The project explores the relationship between urban form and walkability in Kisumu, Kenya, by analyzing Landsat 8 imagery with k-means clustering and assessing street networks through OSMnx. It leverages satellite imagery, machine learning, and urban mapping to understand the impact of city design on walkability, focusing on street density and access to amenities.

Background

Kisumu is a port city in western Kenya at 1,131 m (3,711 ft) elevation. The port was founded in 1901 as the main inland terminal of the Uganda Railway and named Port Florence. As Kenya's third-largest city, Kisumu is a microcosm of the country's development trajectory. It grapples with the demands of urbanization while striving to provide for its residents' health, well-being, and rights.

Kisumu's urban footprint today showcases a complex interplay of growth, history, and diverse land uses. The city, evolving from its historical origins, now balances modern development with its cultural and historical heritage. Its expansion reflects a range of functionalities, from residential and commercial spaces to industrial areas, all interwoven into the city's unique character.

The utilization of scikit-learn for k-means clustering is employed to categorize Landsat 8 imagery, providing insights into the urban landscapes of Kisumu. Following this, OSMnx is applied to analyze the street network of the city, evaluating factors that contribute to walkability, such as street density and access to amenities. This methodology facilitates an assessment of the correlation between urban structure and pedestrian accessibility in Kisumu.

Potential Data Sources

1. ESRI World Imagery Wayback Satellite Data

2. Google Earth Engine

3. OSM Street Maps Data

4. USGS Earth Explorer

The code for this repository is hosted on my GitHub page.

Analysing Urban Walkability

Intersection density reflects the compactness and connectivity of a street network, key indicators of walkability. A dense network minimizes unreachable areas, while high connectivity allows for multiple routing options.

Initially, the walkable network of the study area was downloaded using OSMnx, forming a graph of edges (paths) and nodes (intersections). To address overcomplexity, like minor discrepancies creating multiple nodes for a single intersection, the graph was simplified. Nodes within five meters were merged, and cul-de-sacs were removed, yielding a representation that more accurately reflects true intersections. The increase in nodes from the original to the simplified graph indicates a more complex network structure post-simplification.

Number of nodes:

Original graph: 5319

Simplified graph: 6153

View Code

#| echo: true

#| code-fold: true

import osmnx as ox

import matplotlib.pyplot as plt

import geopandas as gpd

import seaborn as sns

import os

import numpy as np

import rasterio as rio

from sklearn.cluster import KMeans

from sklearn.metrics import silhouette_score

from skimage.exposure import equalize_adapthist

import matplotlib.pyplot as plt

import pandana

%matplotlib inline

#| echo: true

#| code-fold: true

# Select city and crs

cityname = 'kisumu, kenya'

crs = 32636

#| echo: true

#| code-fold: true

# Get graph by geocoding

graph = ox.graph_from_place(cityname, network_type="walk")

# Project graph

graph = ox.projection.project_graph(graph, to_crs=crs)

#| echo: true

#| code-fold: true

# Simplify to get real intersections only

# (consolidate nodes within a distance from eachother)

graph_simplified = ox.simplification.consolidate_intersections(

# Graph to simplify

graph,

# buffer around each node (project the graph beforehand)

tolerance=5,

# Get result as graph (False to get nodes only as gdf)

rebuild_graph=True,

# no dead ends

dead_ends=False,

# Reconnect (False to get intersections only)

reconnect_edges=True

)

#| echo: true

#| code-fold: true

# everything to gdfs

nodes, edges = ox.graph_to_gdfs(graph)

nodes_s, edges_s = ox.graph_to_gdfs(graph_simplified)

#| echo: false

#| code-fold: true

# Setup plot

fig, ax = plt.subplots(figsize=(20,15))

ax.set_axis_off()

ax.set_aspect('equal')

fig.set_facecolor('black')

# Plot data

edges.plot(

ax=ax,

color='white',

linewidth=0.2

)

# Tight layout

plt.tight_layout()

# Save

plt.savefig('./data/graph_overview.png')

#| echo: false

#| code-fold: true

# Select city and crs

cityname = 'Kisumu, Kenya'

crs = 'epsg:32636' # Use the string format for CRS to avoid confusion

# Get graph by geocoding

graph = ox.graph_from_place(cityname, network_type="drive")

# Project graph

graph_projected = ox.project_graph(graph, to_crs=crs)

# Simplify the graph to get real intersections only

graph_simplified = ox.consolidate_intersections(

graph_projected,

tolerance=5, # buffer in meters

rebuild_graph=True,

dead_ends=False,

reconnect_edges=True

)

# Convert to geodataframes

nodes, edges = ox.graph_to_gdfs(graph_simplified)

# Setup plot

fig, ax = plt.subplots(figsize=(20, 15))

ax.set_axis_off()

fig.patch.set_facecolor('black')

# Plot drivable streets

edges.plot(

ax=ax,

color='white',

linewidth=0.2

)

# Tight layout for saving

plt.tight_layout()

# Save the figure

plt.savefig('kisumu_drive_network.png', facecolor=fig.get_facecolor())

# Display the plot

plt.show()

#| echo: true

#| code-fold: true

# Print info

print(

'number of nodes:\n\noriginal graph: '+str(len(nodes))

+'\nsimplified graph: '+str(len(nodes_s))

)

The increase in nodes from the original to the simplified graph indicates a more complex network structure post-simplification. Utilizing Matplotlib's hexbin and Seaborn’s KDE plots has allowed for a detailed visualization of spatial density variations, offering a deeper understanding of the network’s intricacies.

Screenshot 2024-02-27 at 11.34.41 PM.png

Screenshot 2024-02-27 at 11.34.59 PM.png

View Code

#| echo: false

#| code-fold: true

# Set all text color to white

COLOR = 'white'

plt.rcParams['text.color'] = COLOR

plt.rcParams['axes.labelcolor'] = COLOR

plt.rcParams['xtick.color'] = COLOR

plt.rcParams['ytick.color'] = COLOR

# Setup plot

fig, ax = plt.subplots(figsize=(20,15))

ax.set_axis_off()

ax.set_aspect('equal')

fig.set_facecolor('black')

# Plot as hexbins

hb = ax.hexbin(

x=nodes_s['x'],

y=nodes_s['y'],

gridsize=75,

cmap='viridis',

mincnt=1,

vmax=100,

)

# Colorbar

cb = plt.colorbar(hb, ax=ax, shrink=0.8, ticks=[1, 20, 40, 60, 80, 100])

cb.ax.tick_params(color='none', labelsize=20)

cb.ax.set_yticklabels(['1', '20', '40', '60', '80', '>= 100'])

cb.set_label('Intersections per hexagon', fontsize=20, fontweight='bold')

# Tight layout

plt.tight_layout()

# Save

plt.savefig('./data/intersection_hexbin.png')

#| echo: false

#| code-fold: true

# Setup plot

fig, ax = plt.subplots(figsize=(30,30))

ax.set_axis_off()

ax.set_aspect('equal')

fig.set_facecolor('black')

# Plot streets underneath

edges_s.plot(ax=ax, color=[1,1,1], linewidth=0.2, zorder= 0)

# Plot KDE on top

sns.kdeplot(

ax=ax,

data=nodes_s,

x='x', y='y',

hue=None,

fill=True,

cmap='mako',

thresh=0,

levels=10,

alpha=0.5,

zorder=10

)

# Tight layout

plt.tight_layout()

# Save

plt.savefig('./data/intersection_kde.png', bbox_inches='tight')

#| echo: true

#| code-fold: true

# Save nodes

nodes.to_file('./data/nodes.gpkg', driver='GPKG')

nodes_s.to_file('./data/nodes_simplified.gpkg', driver='GPKG')

The initial analysis equated dense urban fabric with walkability, but urban space encompasses more than just intersections. For my analysis I used the list of OSM features that indicate sociable places, or so called “third places“. With OSMnx and Pandana, a detailed network was created, positioning the POIs. Here, I used the intricate, original graph for accurate travel time computations. Pandana’s routing analysis calculated walking times from each node to ten proximal POIs, assuming a walking speed of 4.5 km/h and capping at a 15-minute walking radius.

Screenshot 2024-02-27 at 11.49.08 PM.png

View Code

#| echo: true

#| code-fold: true

# Select city and crs

cityname = 'kisumu, kenya'

crs = 32636

#| echo: true

#| code-fold: true

# Get graph by geocoding

graph = ox.graph_from_place(cityname, network_type="walk")

# Project graph

graph = ox.projection.project_graph(graph, to_crs=crs)

#| echo: true

#| code-fold: true

# Select points of interest based on osm tags

tags = {

'amenity':[

'cafe',

'bar',

'pub',

'restaurant'

],

'shop':[

'bakery',

'convenience',

'supermarket',

'mall',

'department_store',

'clothes',

'fashion',

'shoes'

],

'leisure':[

'fitness_centre'

]

}

# Get amentities from place

pois = ox.geometries.geometries_from_place(cityname, tags=tags)

# Project pois

pois = pois.to_crs(epsg=crs)

#| echo: true

#| code-fold: true

# Max time to walk in minutes (no routing to nodes further than this)

walk_time = 15

# Walking speed

walk_speed = 4.5

# Set a uniform walking speed on every edge

for u, v, data in graph.edges(data=True):

data['speed_kph'] = walk_speed

graph = ox.add_edge_travel_times(graph)

# Extract node/edge GeoDataFrames, retaining only necessary columns (for pandana)

nodes = ox.graph_to_gdfs(graph, edges=False)[['x', 'y']]

edges = ox.graph_to_gdfs(graph, nodes=False).reset_index()[['u', 'v', 'travel_time']]

#| echo: true

#| code-fold: true

# Construct the pandana network model

network = pandana.Network(

node_x=nodes['x'],

node_y=nodes['y'],

edge_from=edges['u'],

edge_to=edges['v'],

edge_weights=edges[['travel_time']]

)

# Extract centroids from the pois' geometries

centroids = pois.centroid

#| echo: true

#| code-fold: true

# Specify a max travel distance for analysis

# Minutes -> seconds

maxdist = walk_time * 60

# Set the pois' locations on the network

network.set_pois(

category='pois',

maxdist=maxdist,

maxitems=10,

x_col=centroids.x,

y_col=centroids.y

)

#| echo: true

#| code-fold: true

# calculate travel time to 10 nearest pois from each node in network

distances = network.nearest_pois(

distance=maxdist,

category='pois',

num_pois=10

)

distances.astype(int).head()

#| echo: false

#| code-fold: true

# Set text parameters

COLOR = 'white'

plt.rcParams['text.color'] = COLOR

plt.rcParams['axes.labelcolor'] = COLOR

plt.rcParams['xtick.color'] = COLOR

plt.rcParams['ytick.color'] = COLOR

# Setup plot

fig, ax = plt.subplots(figsize=(20,15))

ax.set_axis_off()

ax.set_aspect('equal')

fig.set_facecolor((0,0,0))

# Plot distance to nearest POI

sc = ax.scatter(

x=nodes['x'],

y=nodes['y'],

c=distances[1],

s=1,

cmap='viridis_r',

)

# Colorbar

cb = fig.colorbar(sc, ax=ax, shrink=0.8, ticks=[0, 300, 600, 900])

cb.ax.tick_params(color='none', labelsize=20)

cb.ax.set_yticklabels(['0', '5', '10', '>= 15'])

cb.set_label('Walking time to nearest POI (minutes)', fontsize=20, fontweight='bold')

# Remove empty space

plt.tight_layout()

# Save

plt.savefig('./data/walk_access.png')

The initial visualization was dense; clarity was improved using Matplotlib’s hexbins, this time mapping average walking times per hex. Notably, showing only the nearest POI can be misleading—a tight cluster of amenities could visually equate to an area with a single amenity. Adjusting the metric to, say, the 5th or 10th nearest POI, can provide a more balanced view, as seen in the comparative visualizations.

View Code

#| echo: false

#| code-fold: true

# Setup plot

fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(30,30), constrained_layout=False)

ax[0][0].set_axis_off()

ax[0][1].set_axis_off()

ax[1][0].set_axis_off()

ax[1][1].set_axis_off()

ax[0][0].set_aspect('equal')

ax[0][1].set_aspect('equal')

ax[1][0].set_aspect('equal')

# Specify colors

fig.set_facecolor('black')

cmap = 'viridis_r'

# Plot distance to nearest POI

hb = ax[0][0].hexbin(

x=nodes['x'],

y=nodes['y'],

gridsize=75,

cmap=cmap,

C=distances[1],

alpha=1

)

# Plot distance to 5th nearest POI

hb = ax[0][1].hexbin(

x=nodes['x'],

y=nodes['y'],

gridsize=75,

cmap=cmap,

C=distances[5],

alpha=1

)

# Plot distance to 10th nearest POI

hb = ax[1][0].hexbin(

x=nodes['x'],

y=nodes['y'],

gridsize=75,

cmap=cmap,

C=distances[10],

alpha=1

)

# Add titles to maps

ax[0][0].set_title('Walking time to nearest POI', fontsize=30, fontweight='bold')

ax[0][1].set_title('Walking time to 5th nearest POI', fontsize=30, fontweight='bold')

ax[1][0].set_title('Walking time to 10th nearest POI', fontsize=30, fontweight='bold')

# Setup color bar axis (location of color bar)

cbar_ax = fig.add_axes([0.5, 0.01, 0.5, 0.5])

cbar_ax.set_axis_off()

# Color bar properties

cb = plt.colorbar(

hb,

ax=cbar_ax,

shrink=0.9,

ticks=[82, 300, 600, 900],

orientation='horizontal',

aspect=20

)

cb.outline.set_edgecolor('none')

cb.ax.tick_params(color='none', labelsize=30)

cb.ax.set_xticklabels(['<= 1', '5', '10', '>= 15'])

# Title for color bar

ax[1][1].text(0.5, 0.23, 'Walking time (minutes)', fontsize=30, ha='center')

# Tight layout

plt.tight_layout()

# Save

plt.savefig('./data/walk_access_comparison.png')

Unsupervised Classification of Landsat 8 Imagery

Unsupervised classification is valuable in satellite imagery for its efficiency in data compression and anomaly detection, such as cloud and ship identification. It's also useful in semi-supervised learning, where it quickly categorizes unlabeled data, complementing partially labeled datasets.

For my analysis of satellite image classification, I first need to set up my coding environment. I'll start by importing all the necessary libraries. These include numpy for handling arrays, rasterio for input/output operations with remotely sensed data, scikit-learn for unsupervised classification and model evaluation, and matplotlib for data visualization. I also use the os library to manage file paths. My focus was on obtaining satellite imagery of Kisumu. For this, I used OSMnx to download the data and ensured the data was in the correct format and projection (CRS) for accurate analysis.

Before diving into the analysis, I started by visualizing the scene to understand its characteristics. For this, I use bands 2, 3, and 4 of the Landsat 8 scene. When applying the k-means algorithm, I incorporate a broader range of data, including bands 1 to 11. Since all these bands have a resolution of 30 meters per pixel, there's no need for upsampling or downsampling in the analysis. The Kisumu area was densely packed with buildings and structures, making it challenging to discern vegetation or subtle details in the predominantly brownish images. To enhance visualization, particularly of man-made objects, I utilized a different band combination: 7-6-4. This selection significantly emphasizes these features. I adjusted the red, green, and blue arguments in the show_rgb function to accommodate this new combination.

View Code

#| echo: true

#| code-fold: true

import os

import numpy as np

import rasterio as rio

from rasterio.mask import mask

from shapely.geometry import Point, mapping

from shapely.geometry.polygon import Polygon

from shapely.geometry import Point

import geopandas as gpd

from sklearn.cluster import KMeans

from sklearn.metrics import silhouette_score

from sklearn.preprocessing import StandardScaler

from skimage.exposure import equalize_adapthist

import matplotlib.pyplot as plt

#| echo: true

#| code-fold: true

#loading data

path = 'data/landsat8'

complete_dataset = os.listdir(path)

complete_dataset = [path + x for x in complete_dataset]

print(complete_dataset)

#| echo: true

#| code-fold: true

path = 'data/landsat8'

bands_list = [os.path.join(path, filename) for filename in os.listdir(path) if filename.endswith('.tif')]

#| echo: true

#| code-fold: true

def show_rgb(bands_list, red=4, green=3, blue=2):

stack = []

for band_number in [red, green, blue]:

# Adjust the pattern to match the filename structure we see in the output

band_file = next((band for band in bands_list if f"T1_B{band_number}.tif" in band), None)

if band_file:

with rio.open(band_file) as src:

array = src.read(1)

# Normalize the array

array = array.astype('float32')

array_min, array_max = array.min(), array.max()

array = (array - array_min) / (array_max - array_min)

stack.append(array)

else:

print(f"No band file found for band number {band_number}")

if stack: # Check if stack is not empty

stack = np.dstack(stack)

# Display the image

plt.figure(figsize=(15, 15))

plt.axis('off')

plt.imshow(stack)

plt.show()

else:

print("No arrays were added to stack, check the band files.")

#| echo: true

#| code-fold: true

# Example of using bands_list

band_number = 4 # Replace with the correct band number you are looking for

band_file = next((band for band in bands_list if f"T1_B{band_number}.tif" in band), None)

#| echo: true

#| code-fold: true

band_file = next((band for band in bands_list if f"T1_B{band_number}.tif" in band), None)

#| echo: false

#| code-fold: true

path = 'data/landsat8' # Adjust the path as needed

bands_list = [os.path.join(path, filename) for filename in os.listdir(path) if filename.endswith('.tif')] # Make sure the extension matches

print(bands_list) # Check the band paths

show_rgb(bands_list, red=4, green=3, blue=2) # For natural color

#| echo: true

#| code-fold: true

show_rgb(bands_list, red=5, green=4, blue=3)

For the models development, I employed the KMeans package to classify pixel values in the satellite images. I experimented with different numbers of clusters to understand which best captures the terrain types in the scene. To determine the number of clusters that best represent the terrain types, I experimented with 3 and 6 clusters, stacking all bands for a comprehensive analysis. After comparing the resulting images with the original RGB scene, I conclude that six categories yield the most optimal results. Three categories fail to capture the spatial variability, with a significant presence of NaN values obscuring useful data.

Cluster 0: 29.80%

Cluster 1: 14.53%

Cluster 2: 4.30%

Cluster 3: 11.55%

Cluster 4: 28.88%

Cluster 5: 10.93%

Cluster 0: 42.46%

Cluster 1: 15.46%

Cluster 2: 42.09%

View Code

#| echo: true

#| code-fold: true

# Stack all bands

def stack_bands(bands_list):

stack = []

for band_path in bands_list:

with rio.open(band_path) as src:

stack.append(src.read(1).flatten()) # Flatten each band

return np.stack(stack, axis=1)

for band_path in bands_list:

with rio.open(band_path) as src:

array = src.read(1)

print(array.shape) # Print out the shape of each array

def stack_bands(bands_list):

stack = []

for band_path in bands_list:

with rio.open(band_path) as src:

array = src.read(1)

# Only append bands that match the desired shape (279, 280) for example

if array.shape == (279, 280):

stack.append(array.flatten()) # Flatten each band

else:

print(f"Excluded {band_path} due to shape mismatch: {array.shape}")

return np.vstack(stack).T # Use vstack and transpose to get the correct shape for KMeans

# Get the stacked data excluding any mismatched bands

data_stack = stack_bands(bands_list)

# Stack all bands

def stack_bands(bands_list):

stack = []

for band_path in bands_list:

with rio.open(band_path) as src:

stack.append(src.read(1).flatten()) # Flatten each band

return np.stack(stack, axis=1)

#| echo: true

#| code-fold: true

# Reshape the data

data_stack = stack_bands(bands_list)

# Standardize the data

scaler = StandardScaler()

data_standardized = scaler.fit_transform(data_stack)

# Determine the optimal number of clusters

# Use methods like the Elbow method or Silhouette score here

# Sample for silhouette score

range_n_clusters = [2, 3, 4, 5, 6]

silhouette_avg = []

for num_clusters in range_n_clusters:

kmeans = KMeans(n_clusters=num_clusters, random_state=42)

cluster_labels = kmeans.fit_predict(data_standardized)

silhouette_avg.append(silhouette_score(data_standardized, cluster_labels))

#| echo: true

#| code-fold: true

# Choose the optimal number of clusters

optimal_clusters = range_n_clusters[silhouette_avg.index(max(silhouette_avg))]

# Apply K-Means Clustering

kmeans6 = KMeans(n_clusters=6, random_state=42)

y_pred = kmeans.fit_predict(data_standardized)

# Reshape classified data back to the original image shape

with rio.open(bands_list[0]) as src:

meta = src.meta

original_shape = (src.height, src.width)

classified_image6 = y_pred.reshape(original_shape)

# Visualize the result

plt.figure(figsize=(10, 10))

plt.imshow(classified_image6, cmap='terrain')

plt.colorbar()

plt.show()

# Assuming y_pred is your flattened array of predictions from KMeans

# and classified_image is your 2D array of cluster labels reshaped into the image's dimensions

# Count the occurrences of each cluster label

unique, counts = np.unique(classified_image6, return_counts=True)

cluster_counts6 = dict(zip(unique, counts))

# Calculate the total number of pixels

total_pixels6 = classified_image6.size

# Calculate the percentage of each cluster

cluster_percentages6 = {k: (v / total_pixels6) * 100 for k, v in cluster_counts6.items()}

# Print the percentages

for cluster, percentage in cluster_percentages6.items():

print(f'Cluster {cluster}: {percentage:.2f}%')

# Apply K-Means Clustering

kmeans3 = KMeans(n_clusters=3, random_state=42)

y_pred = kmeans3.fit_predict(data_standardized)

# Reshape classified data back to the original image shape

with rio.open(bands_list[0]) as src:

meta = src.meta

original_shape = (src.height, src.width)

classified_image3 = y_pred.reshape(original_shape)

# Visualize the result

plt.figure(figsize=(10, 10))

plt.imshow(classified_image3, cmap='terrain')

plt.colorbar()

plt.show()

# Assuming y_pred is your flattened array of predictions from KMeans

# and classified_image is your 2D array of cluster labels reshaped into the image's dimensions

# Count the occurrences of each cluster label

unique, counts = np.unique(classified_image3, return_counts=True)

cluster_counts3 = dict(zip(unique, counts))

# Calculate the total number of pixels

total_pixels3 = classified_image3.size

# Calculate the percentage of each cluster

cluster_percentages3 = {k: (v / total_pixels3) * 100 for k, v in cluster_counts3.items()}

# Print the percentages

for cluster, percentage in cluster_percentages3.items():

print(f'Cluster {cluster}: {percentage:.2f}%')

clustered_models = ClusteredBands(complete_dataset)

clustered_models.set_raster_stack()

Conclusion

Initial visual assessment of the walkability map and the unsupervised classification map suggests a potential relationship between urban land use clusters and areas with higher walkability scores in Kisumu. Urban clusters, as identified by the classification algorithm, appear to correspond with regions that have shorter walking times to points of interest, which may indicate better walkability. Conversely, less walkable areas might align with land use clusters that are characteristic of rural or undeveloped regions. To confirm and quantify this relationship, a comprehensive spatial analysis is recommended.

Urban Form and Walkability, Kisumu, Kenya
Geospatial Data Science in Python

Back to Projects page

Back to top

Urban Form and Walkability, Kisumu, Kenya Geospatial Data Science in Python

Back to Projects page

Back to top

Urban Form and Walkability, Kisumu, Kenya
Geospatial Data Science in Python