Using Spatial Optimization to Expand the CTA

Mariela Perignon
Data Scientist

The Chicago Transit Association is the second largest public transit system in the United States, and provides 1.6 million rides on average each week. Public transit systems like the CTA provide immense benefits both individually, with riders saving $11,643 annually compared to driving, as well as globally, with substantial reductions in carbon emissions by communities with accessible public transportation options.

With such obvious benefits, it’s clear that cities should continue to work towards increasing access to public transportation. How can the city of Chicago enable more access to the “L” and distribute both the societal and individual benefits to more residents? 

How can the city of Chicago enable more access to the “L” and distribute both the societal and individual benefits to more residents?

The CTA currently operates eight train routes over 224 miles of track serving 145 stations. Although recent proposals to improve access have not yet materialized, as a part of this project we evaluated the impact of new extensions to the “L” with the goal of answering the question:

If it were possible to add a new line to the CTA rail, what would the best route be?

To do so, we use publicly available data and optimization techniques to form our own data-driven opinion for how to improve public transit in the city. Critically, answering the question in a brief analysis like this requires substantial simplification (i.e., a real solution would take a large interdisciplinary team a very long time – this serves merely as a prototype solution).

The data

From the City of Chicago Open Data Portal we collected geospatial datasets of the paths of CTA train lines and the location of train stations, as well as daily ridership counts at each ‘L’ station since 2001. 

One of the largest sources of train ridership is weekday commuters. Census Transportation Planning Products (CTPP) are a set of special products derived from the American Community Survey that contain information about the movement of workers between their households and workplaces, including location, usual mode of transportation and travel time. We use CTPP data (2012-2016) for the 801 census tracts within the City of Chicago.

Simplifying the problem

To constrain the problem, we start by making some simplifying assumptions:

  • In the future, people (in aggregate) will continue to live and work in roughly the same areas they currently do (i.e., we don’t model any substantial population shift).
  • When selecting the path of a new train line, we only consider the needs of daily commuters as these constitute the majority of rides on CTA trains.
  • There are no limitations on right-of-way easements for new train lines (i.e., they can be placed anywhere).
  • Existing CTA stations can magically adjust to accommodate additional trains and riders using the proposed new route.

We also set minimum requirements for the proposed train line to limit the number of potential solutions: any new segment of train line must connect to at least one existing CTA station, and the total length of proposed new lines must be over 10 Kms long for the investment to be worthwhile.

Commuter flow

The density (per unit area) of working adults in Chicago is highest in neighborhoods on the north and northwest side of the city. The highest concentration of workplaces, on the other hand, is in and around the Loop.

Density of workers
The highest density neighborhoods of working adults are on the north and northwest sides, while the highest density workplace is in the loop.

Of all working adults that live within the Chicago city limits, 11% regularly commute to work on the ‘L’. As would be expected, people who live near CTA stations are more likely to take the train to work compared to people that live far from stations. The fraction of workers in a census tract that commute on the ‘L’ decreases rapidly with distance from a station, with less than 20% of workers choosing to the train if they live more than 1 Km away from their nearest station. 

Commuters who take the train tend to live close to a station. Less than 20% of individuals who live over 1km away from a station take the L to work.

Thus, substantial opportunity exists to improve ridership by targeting potential weekday commuters without access to the train system. As a peripheral goal, we seek to improve the overall rider experience by identifying a new train line that captures a portion of commuters who currently travel long distances to existing stations, therefore reducing average congestion within the system.

Finding potential paths for a new CTA rail line

To identify the optimal location for a potential CTA line, we search for paths that increase accessibility to neighborhoods that are currently far away from the existing network. Starting from areas that are farthest away from any existing station, we use a breadth-first search algorithm on a 500-meter grid to find all paths that meet the following conditions:

  1. A path can extend from one cell to any of its 8 neighboring cells.
  2. A path must follow a line of decreasing distance to an existing station (but not necessarily the line with the steepest gradient). This condition favors paths that run in the direction of the Loop, where train lines are closer together, without forcing all paths to veer towards their nearest station.
  3. A path cannot enter a cell that is already part of a path with the same starting point. This condition limits extraneous curves along the path and minimizes 90 degree turns.
  4. A path ends once it has reached a local minima of distance to stations (ie., it reaches a cell that overlaps with the location of a station. The last link of the path is forced to connect to the nearest existing station.

Searching for the optimal solution

Our goal is to improve access to the “L” for underserved regions of Chicago while simultaneously improving the overall experience by reducing congestion. To fit these requirements, a new rail line would need to both capture a population of workers that is currently commuting to the Loop by means other than CTA trains as well as re-route some of the population that uses existing stations to the new line. As such, an optimization algorithm needs to:

  1. Minimize the average distance between each census tract and an ‘L’ station to increase connectivity across the city.
  2. Maximize the employed population living within 1 Km of an ‘L’ station to increase the number of workers for whom commuting by train is a feasible alternative.
  3. Minimize the average fraction of the employed population that lives close to each ‘L’ station to reduce congestion across the system.
The optimal new CTA line starts in the Montclaire neighborhood and serves Belmont Cragin, Austin, Hermosa, Humbolt Park and the Near West Side, joining the CTA Green Line at the existing Ashland station.

Out of the nearly 1000 potential paths we identified, we found one route that best matches these criteria. This optimal path for a new CTA line would start in the Montclaire neighborhood and serve Belmont Cragin, Austin, Hermosa, Humboldt Park and the Near West Side, joining the CTA Green Line at the existing Ashland station. For much of its path, this proposed rail line matches the route of the 65 bus, which follows Grand Ave and serves over 200k riders every month.

Looking forward

Finding the optimal path for a new rail line is a much more complex problem than we display here. Even our simple approach is very sensitive to choices in the methodology. For example, if potential lines are forced to follow paths of decreasing distance to existing train lines (as opposed to distance to existing train stations), our method finds that the optimal path for a new line starts at Lake Calumet, travels through the neighborhoods of Riverdale, West Pullman, Morgan Park and Beverly, and connects to the existing ‘L’ network at the southern end of the CTA Green Line.

Changing the constraints allows us to find a path similar to the CTA Red Line Extension Project.

This route resembles the proposed path for the CTA Red Line Extension Project, suggesting that our approach, while reasonably simplistic, is still addressing some of the concerns dealt with by formal proposals.

Data driven approaches to allocate public transit resources have immense potential to improve quality of life in urban environments. While the simple approach leveraged here generates only a prototype solution, it demonstrates the effectiveness of integrating sophisticated analytical methods into the complex and multi-faceted urban planning process. Future extensions should take into account shifting population dynamics in addition to the social and economic effects of public transportation projects.


Strong Analytics builds enterprise-grade data science, machine learning, and AI to power the next generation of products and solutions. Our team of full-stack data scientists and engineers accelerate innovation through their development expertise, scientific rigor, and deep knowledge of state-of-the-art techniques. We work with innovative organizations of all sizes, from startups to Fortune 500 companies. Come introduce yourself on Twitter or LinkedIn, or tell us about your data science needs.