Poverty Estimation
with Satellite Imagery
at Neighborhood Levels


Results and Lessons for Financial Inclusion from
Ghana and Uganda
By Soren Heitmann, Sinja Buri
     Table of Contents
    Executive Summary                                                                            4
    Introduction                                                                                 6
    Data and Methods                                                                             7
       Ground-Truth Survey data                                                                  7
       Call Detail Record and Mobile Money data                                                  8
       Satellite Images                                                                          9
       Spatial Segmentation                                                                      11
       Spatial Boosting                                                                          11

       Computer Vision Models                                                                    13
    Results                                                                                      14
       Models                                                                                    14
       Benchmarking                                                                              14
       Poverty Estimators                                                                        15

       Exploring and Interpreting Poverty Maps                                                   16
    Discussion                                                                                   24
       Lessons for Estimating Welfare with Satellite Imagery and Call Detail Records             24
       Application of Poverty Estimation Findings – Financial Inclusion and Beyond               25
       Layering heat maps of poverty, telephone usage and financial activity                     25
       Increasing impact by identifying areas of biggest need and largest reach                  26
       Improved understanding of the financial behavior and needs of the bottom of the pyramid   26
       Application beyond financial inclusion                                                    26
    Conclusion                                                                                   27
    References                                                                                   28




2
 Table of Contents
Figures
Figure 1: Enumeration coverage areas in Northern Uganda                                            7
Figure 2: Enumeration coverage areas in Ghana                                                      8
Figure 3: Nearest-neighbor spatial boosting                                                        12
Figure 4: Pooled observations of transfer learning model and nightlights model
                                                                                                   15
by Jean et al. (2016)
Figure 5: Observed vs Predicted PPI Score Distributions in Ghana                                   16



Tables
Table 1: Comparison of Poverty Prediction Models                                                   14
Table 2: Predicted Poverty Statistics, Telephone and Mobile Money Activity (per tower per month)
                                                                                                   22
in Chorkor



Image Tables
Image Table 1: Daytime satellite sampled images: 67m2 resolution from DigitalGlobe                 9
Image Table 2: Nighttime satellite sampled images: 750m2 resolution from VIIRS                     10
Image Table 3: Comparing bottom and top wealth images in rural Uganda                              11
Image Table 4: Aggregated spatial boosting in Uganda: comparing survey and predicted values        13
Image Table 5: Satellite image-based PPI Prediction Mapping – Ghana                                17
Image Table 6: Empirical Observations Comparing Poverty Scores and Images – Urban Wealth           18
Image Table 7: Empirical Observations Comparing Poverty Scores and Images – Urban Poverty 1        19
Image Table 8: Empirical Observations Comparing Poverty Scores and Images – Urban Poverty 2        20
Image Table 9: Layering poverty predictions, telephone and mobile money activity in Ghana          21




                                                                                                        3
    Executive Summary                                               that can, for example, detect features such as cars or trees
                                                                    on satellite images in urban areas. This constitutes a
                                                                    future area of research for the community. In lieu of more
                                                                    advanced machine vision feature detection, this study
    By successfully reaching historically underserved and           employs spatial boosting techniques, which are found to
    vulnerable populations such as women, the poor, and people      improve models for estimating poverty in rural areas where
    living in rural communities, Digital Financial Services have    there are fewer welfare variations among neighbors than
    contributed to unprecedented growth in financial inclusion      in urban centers. Although, with increased urbanization
    in Sub-Saharan Africa during the past decade. The adoption      over time, this is another area that satellite imaging could
    and usage of DFS -- and the subsequent financial inclusion      support. Changes in welfare over time, particularly due to
    that has resulted -- has helped reduce poverty and increase     the availability of financial services, are likely to considerably
    prosperity throughout the region. Still, service providers      outpace observable infrastructure changes, not least due to
    and development practitioners often lack reliable, detailed,    the time needed to construct new buildings. Here, changes
    and low-cost poverty data that could help them accurately       in financial service usage patterns through provider data
    identify additional communities and individuals who would       are expected to yield stronger implications on financial
    benefit the most from access to financial services. The lack    inclusion and livelihood effects on a year to year basis.
    of data hinders the deployment of services throughout the
    region and complicates efforts to monitor and evaluate the      The study compared various statistical poverty estimation
    impact that interventions have on poverty.                      methods and identified the Poverty Probability Index
                                                                    (PPI), which yields better results with satellite imagery.
    Relying on traditional household surveys for poverty data       Low activity levels and variation of transaction behavior
    is time consuming and expensive. What’s more, by the            can make it difficult to use phone and mobile money
    time the data are collected and analyzed, it is often out of    data for poverty prediction. Although, remote sensing
    date. But there are alternatives for estimating and mapping     poverty estimation models can reduce the sample size for
    poverty with the goal of accelerating and expanding             surveys, a broad spectrum of representative ground-truth
    financial inclusion and helping DFS providers target the        survey data is essential for developing and training well
    poorest. Machine learning algorithms can, for example,          performing poverty estimation models. In this respect,
    be trained to predict poverty based on imagery captured         the research finds that remote sensing and geospatial
    by satellites and from call detail records, which document      boosting approaches can be used to improve efficiency and
    mobile phone usage. For this research study, the IFC-           optimization for traditional household survey methods.
    Mastercard Foundation Partnership for Financial Inclusion       However, significant work remains before remote sensing
    collaborated with the Stanford University Sustainability        models can fully replace ground-based surveys.
    and Artificial Intelligence Lab to advance existing poverty
    prediction models to generate poverty estimates at              This paper also explores the interpretation of predicted
    neighborhood-level resolution, which is much more refined       poverty scores, using PPI estimators, presenting them on
    than macro-level estimates produced by research to date.        heat maps for Ghana at neighborhood-level granularity
    Satellite Imagery and call detail records (CDR), validated by   and layering atop information about telephone and mobile
    ground-truth surveys, were used to develop models that          money activity of users in the same areas to inform targeting
    can predict poverty in Ghana and Uganda.                        and monitoring of interventions for poverty reduction
                                                                    and financial inclusion. This visual layering is proposed
    The study finds that it is possible to make meaningful          as a conceptual strategy for how combining techniques
    welfare estimates based on satellite imagery combined           discussed in this study might be used to better quantify
    with geo-spatial boosting at the neighborhood-level when        financial access, financial inclusion reach and support
    lower levels of precision are acceptable. The study makes       providers to better understand customer demographics
    estimates about poverty demographics in regions that are        and size their markets.
    bounded by cell tower locations. Predicting poverty around
    cell tower locations allows small area welfare estimation in
    urban environments where cell tower density is high. Above
    all, daytime satellite imagery proves to be a good basis for
    poverty prediction, but significant caveats remain. Models
    may be improved by adding context-specific segmentation




4
                          0.35                                            Mean: 62.5
                                                                                 Median: 63.3
                                                         Selected: 55.1
                           0.3


                          0.25
             Proportion




                           0.2


                          0.15


                           0.1


                          0.05


                            0
                                 10     23          36          49          62          76         89


                                                  PPI Predictions


Executive Summary Visualization of Image Table 8: Mapping PPI predictive scores using the study mapping approach
and predictive scores, compared against a satellite image. PPI is the Poverty Probability Index, a standard poverty estimator
tool that can translate a PPI score into estimates of multiple benchmarks (eg, $1.90/day or $5/day or access to types of
infrastructure).




                                                                                                                                5
    Introduction
    Financial Inclusion empowers underserved individuals to            As light diffuses over large areas, this approach alone
    participate in the formal economy, facilitates access to           provides meaningful interpretation often only at the city-
    financial services that help businesses grow, and is critical      level, or even at more roughly defined coverage areas
    to achieving economic development policies that aim to             of larger administrative districts. As demographic and
    eliminate poverty. Digital Financial Services support these        wealth variations are far more granular - both within
    development interventions by increasing the breadth of             urban neighborhoods and in rural environments - satellite-
    delivery channels, variety of services, and affordability          based poverty estimation models must deliver much more
    of financial access for consumers and companies. DFS               granular estimates to yield sufficient information for
    are tuned to reaching segments that are historically               policy makers to target underserved populations; and for
    underserved, such as women, rural individuals and the              commercial DFS providers to better segment their potential
    poor. This is especially evident in Sub-Saharan Africa, where      customer base and service coverage areas.
    cell phone penetration reached 44 percent as of 20181,
    meaning that nearly half of the one billion adults in the          Using day-time satellite images provides an alternative
    region now have the potential to access financial services         approach to resolve these granularity issues and deliver
    through mobile phones. The growing prevalence of DFS on            results that are more aligned with the data required by
    the continent has been a driving factor in enabling financial      policy-makers and DFS providers. This approach was
    access for poor and underserved individuals, as mobile             demonstrated by Jean et al. (2016), using a convolutional
    money usage has increased from near nil just seven years           neural network methodology to identify visible features in
    ago, to 20.9 percent by 2018. Today, financial inclusion is at     high-resolution day-time satellite images, which correlate
    43 percent in Sub-Saharan Africa2. While marking impressive        with demographic data (e.g., roads, agricultural areas,
    reach, it is difficult to precisely quantify the extent to which   urban environments, building types).
    the poorest segments are represented in this growth.
                                                                       This study expands the approach through a collaboration
    Development strategies to accelerate financial inclusion —         between Stanford University’s Sustainability and Artificial
    and commercial providers seeking to scale Digital Financial        Intelligence Lab and the IFC-Mastercard Foundation
    Services — lack access to reliable demographic data on             Partnership for Financial Inclusion. The study engages
    poverty. Collecting data using traditional household surveys       questions and areas of further exploration identified in
    is time consuming, expensive, and data are quickly outdated        existing literature to specifically look at using day-time
    by economic changes and population movements. Using                satellite imagine methods to predict poverty at the lowest
    remote sensing technology, call detail records and machine         income segments (e.g., below $1.90, or $5.00 per capita per
    learning algorithms provides a solution to close this gap.         day, using standard poverty threshold benchmarks).

    Call detail records have been successfully used to predict         Here, different poverty estimation models are developed
    poverty in some countries; both, for models that attempt           for two African countries. Multiple measures of poverty are
    to predict welfare based on call activity only, as well as         employed to compare and understand relevance for training
    for combined models that include telephone data and                models of this nature. The study compares modelling
    remote sensing covariates3. However, relying on CDR data           methods and poverty definitions across these two country
    for regular poverty measurement may be complicated as              contexts to learn about trade-offs and optimizations
    these data are privately managed by service providers.             for developing models to predict poverty. The applied
    Unless data from all main service providers in a country           research goal is to support DFS providers and financial
    is combined, poverty estimation is likely to be biased or          inclusion policy interventions with a strategy for enhanced
    incomplete.                                                        information about markets, services and the characteristics
                                                                       of the people who use (or don’t use) these services. The
    Other methods have also shown promise. Notably, using              approach defines demographic segments geographically,
    night time satellite images to view and measure ground-            to establish tangible micro-markets as a unit of analysis,
    based light emissions that can correlate the magnitude of          and then explores these segments with respect to predicted
    intensity and coverage area cast by light emissions with           wealth characteristics, access and usage of digital financial
    economic activity and general well-being of denizens within        services.
    the coverage zone4. While results from night light images
    are tantalizing, the level of granularity is low.




    1	   GMSA 2018
    2	   Demirgüç-Kunt et al. 2018
    3	   Steele et al. 2017
    4	   See for example Gosh et al. 2013

6
Data and Methods
Ground-Truth Survey data                                                            •	 The SustainLab Asset-based Wealth Index calculated
                                                                                       for this study used principal component analysis
This study was implemented in Uganda and Ghana. In                                     on responses to a panel of seven asset ownership
both countries, ground-truth poverty data was collected                                questions within a household survey. The largest
using household survey instruments. These instruments                                  resulting principal component was used as an index
incorporated modules to assess household poverty                                       value. The hypothesis was that this index would
and welfare levels. Instead of directly asking household                               potentially provide a better method of aggregating
respondents about their consumption levels, which are                                  different contributions of variables to derive poverty
likely subject to inaccuracies due to seasonal fluctuations                            levels than a mere sum of scores that weighs different
and recall bias, different poverty measurement tools                                   answers to a list of questions, as the PPI methodology
were used that eliminated the need to collect detailed                                 does. This method was previously employed by Jean et
consumption data. The survey instrument for Uganda                                     al as a poverty predictor in remote sensing models and
included a SWIFT (Survey of Well-being via Instant and                                 was therefore used for prediction models in both Ghana
Frequent Tracking) poverty estimation module. Whereas in                               and Uganda for consistency.
Ghana, a PPI (Poverty Probability Index) estimation module
was used. In addition, information about households’ asset                          Figure 1: Enumeration coverage areas in
ownership was collected in both countries to calculate an                           Northern Uganda
asset-based wealth index, using a similar approach as the
SustainLab Index used similar research in this area5.

•	 PPI is a poverty measurement tool to compute the
   likelihood that a surveyed household is living below
   a given poverty line based on answers to 10 country
   specific multiple-choice questions about household
   characteristics and asset ownership. Questions
   can also include visual, observable features such as
   house roofing material (e.g., is your roof tile, thatch,
   corrugated metal) or if there is an outdoor latrine. The
   PPI score is a value between zero and 100; it can be
   calculated for every household. The lower the score, the
   higher the likelihood of a given household to be poor.
   Look-up tables convert PPI scores into likelihoods to
   fall under different poverty lines in a country and may
   be interpreted for multiple different poverty threshold
   benchmarks using the same PPI score.

•	 The SWIFT methodology was originally developed                                   The Uganda survey focused strictly on Northern Uganda,
   to monitor one of the World Bank Group’s goals of                                one of the poorest areas of the country. In conjunction
   ending extreme poverty. It helps estimate household                              with another study investigating the adoption and impact
   expenditure data and poverty rates in a simple and cost-                         of DFS to better scale financial inclusion, IFC collected
   effective manner based on answers to 10-15 general                               data between November 2017 and January 2018 for 9,037
   household level survey questions (e.g. education levels,                         households within 926 enumeration areas covering the
   asset ownership and household size). SWIFT models for                            Ugandan administrative areas of Karamoja, Mid North and
   specific regions and countries are derived from existing                         West Nile, and Adjumani (see Figure 1).
   household budget survey data (multiple rounds of
   LSMS surveys) indicating which variables are poverty                             To ensure ability to tune the satellite image-based
   correlates and should be collected in the core SWIFT                             modelling, the survey incorporated robust GPS data for
   survey to then estimate consumption and poverty                                  each surveyed household, at high levels of precision6. This
   rates.                                                                           aimed to resolve one of the issues that was previously
                                                                                    faced by Jean et al. (2016), which drew on third-party geo-
                                                                                    localized survey data that reduced precision by adding up
                                                                                    to 10 km of random noise. Here, coordinates were precise
                                                                                    within a few meters of survey location.


5	   Neal Jean, Marshall Burke, Michael Xie, W. Matthew Davis, David B. Lobell, Stefano Ermon. “Combining satellite imagery and machine learning to
     predict poverty,” Science, 19 Aug 2016: Vol. 353, Issue 6301, pp. 790-794
6	   GPS data achieved high levels of precision overall and the survey data collection methodology implemented robust cross-checking and validation to
     ensure accuracy and correction of GPS measurement errors. However, due to survey environments in very rural areas, often with farmers, individuals
     often responded to surveys at community village areas that were proximate to houses but not precisely located at the household for which household
     information was being reported.

                                                                                                                                                          7
    Similarly, the Ghana survey ensured precise geo-localized       The survey was implemented from December 2017 to
    survey data. The survey design covered a much larger area,      February 2018 for 2,165 individuals within six enumeration
    spanning across Ghana, rather than the regional focus           areas, in which coverage zones ranged between a one
    implemented in Uganda. Moreover, the enumeration areas          and three km radius in seven Ghanaian cities and villages,
    were more focused on urban centers and densely populated        distributed in five principle administrative regions.
    areas (see Figure 2).




    Figure 2: Enumeration coverage areas in Ghana.
    Zones are described in terms of their principle regional cities. 1=Bolgatanga, 2=Tamale, 3=Yendi, 4=Kumasi,
    5=Tarkwa, 6=Accra + Tema




    Call Detail Record and Mobile Money Data                        9,037 survey responses, only 222 matched the CDR data. For
                                                                    Ghana, of the 2,165 survey responses, 166 matched the CDR
    Through IFC project operations, the study incorporated          data. With mobile money adoption levels still lying below the
    anonymized call detail records and mobile money transaction     levels of sim card ownership, matching survey and mobile
    data from mobile network operators (MNOs) respectively          money transaction data was even more difficult. In Ghana
    in Ghana and Uganda. Both operators have significant            only 57 household records matched the respective mobile
    national coverage and the datasets provided customer-level      money dataset. Ultimately, too few observations could be
    information on numbers of incoming and outgoing calls;          matched directly to train meaningful prediction models. For
    SMS volumes; Cash-In transactions; Cash-Out transactions;       the CDR models presented in this study, another approach
    and transfers between mobile money accounts. Information        was therefore used for approximation. CDR models were
    about the geo-localization of activity through cell tower       trained with household information that was aggregated and
    locations was also provided.                                    matched with telephone activity data by cell tower catchment
                                                                    area and not by individual household. Results from these
    The study expected to identify correspondence between
                                                                    models are presented in Table 1 for the sake of completeness,
    call detail record data and the household survey data from
                                                                    but accuracy figures are unsurprisingly very low and hold
    respondent by matching phone numbers across these data
                                                                    little interpretive value due to the poor alignment between
    sets. The objective was to explore additional CDR-based
                                                                    survey and provider data and extremely small training sample
    models for predicting poverty. In both countries, the surveys
                                                                    underlying the model.
    were conducted using a randomized design; and in both
    cases, meaningful overlaps were not obtained between the
    surveyed customers and the CDR data. In Uganda, of the



8
Multiple factors may explain the low overlap of survey and           Satellite Images
mobile network operator data. Different time periods of data
collection for survey and MNO data are one factor. Survey            Models were developed using both day and night-time
respondents may have joined the respective MNO networks              satellite imagery. Day-time images were sourced from
after CDR data was extracted or may have churned before.             DigitalGlobe, with a high resolution of 67m2. Night-time
It was moreover determined that survey enumeration areas             images were sourced from VIIRS, at 750m2 resolution. Below,
poorly overlapped with areas where these operators had               example images are shown for Uganda and Ghana. Survey
meaningful market share. Even though service was widely              regions in Uganda were far more rural, by design. Whereas in
available, other providers dominated these markets. This was         Ghana, survey coverage included more urban and peri-urban
confirmed by survey respondents, who reported using other            environments across the country. The ‘ruralness’ in the Uganda
providers. Lastly, in Uganda, the rural-focused survey also          survey area is more pronounced in the nightlight images,
found that only one-third of respondents had phones, which           showing scant light signals in most of the images. The more
considerably limited the pool of potential correspondences           urbanized regions in Ghana, by comparison, show gradients
between survey and CDR. This yields an important insight for         of deep purple zones (dark, low-light emissions, correlating
future research: there may be trade-offs between randomized          to low electrification) to bright yellow zones (bright, intense-
representativeness of a population sample and ability to             light emissions, correlating to people using artificial light in
meaningfully correlate demographic statistics with provider          homes, offices, cars; general urbanization). These sample
data. A stratified survey approach to deliberately over-sample       images also illustrate the difficulty of using night-time light
individuals who are customers of the service provider should         emission imagery for household-level poverty prediction. At
be considered.                                                       750m2 resolution, entire neighborhoods can fit under a single
                                                                     pixel, and an entire city within a single image. Even though
                                                                     night-time images were incorporated into the predictive
                                                                     modelling, the coarse resolution of the information yielded
                                                                     little predictive value to improve model accuracy or descriptive
                                                                     power at the granular neighborhood level sought.




 Image Table 1: Daytime satellite sampled images: 67m2 resolution from DigitalGlobe




 Uganda Samples




 Ghana Samples




Uganda shows increased prevalence of rural features: farms and open space surrounding single-level buildings. Ghana shows
mixtures of features and increased urbanization, neighborhood housing configurations, paved roads and multi-story buildings.




                                                                                                                                        9
      Image Table 2: Nighttime satellite sampled images: 750m2 resolution from VIIRS




      Uganda Samples




      Ghana Samples




     The random sampling of representative nighttime images aligns with the day time images: the uniformly dark
     purple images in Uganda indicate extremely little light emission, meaning few people live in the coverage zone, or those that do
     aren’t using lighting at night. Whereas in Ghana, the bright yellow and shades of color depict increased urbanization and peoples’
     collective usage of lighting within the coverage zone.




10
Spatial Segmentation                                                                 the environs beyond may be entirely uninhabited. In that case,
                                                                                     the poverty prediction model would therefore likely over-
The MNO cell tower data was used to create geographical                              estimate poverty averages for the associated polygon area.
segments by constructing Voronoi7 polygons based on the                              This acknowledged, poverty estimates may still reasonably
tower locations. As both network operators had national                              estimate the average demographics of the individuals within a
coverage, this provided a useful method of creating spatial                          polygon area, since disproportionally more people are likely to
zones that covered areas from neighborhoods in densely                               live in a rural town as compared to the very sparsely populated
populated urban areas (where cell tower density is high), up                         surrounding area.
to much larger zones in rural or unpopulated areas (where cell
tower density is low, or nonexistent). Although approximate,                         Spatial Boosting
the technique enables grouping poverty estimates by cell
tower coverage zone, and by association, to estimate                                 Granular poverty prediction models based on satellite
demographic averages of people living within the coverage                            imagery are challenged by individual images having relatively
zone of their nearest “home” tower. For service providers,                           low-density of signal-rich features in a given image tile. For
this allows interpreting customer demographics in terms of                           example, a grassy image might show a green field whose
populations living near cell towers.                                                 trimmed grass is recently “mowed” by livestock in a sparsely
                                                                                     populated rural area. Similarly, an image of trimmed grass
For developing poverty prediction models, physical images                            might also show the manicured lawn of an upscale residential
around cell towers were used, rather than the full geographic                        neighborhood in an urban area. The figure below illustrates
area covered by the Voronoi polygon. This was for the sake                           this point, comparing top and bottom wealth photo examples
of computational simplicity and the cost of accessing and                            in Northern Uganda. Where the wealthier image is similarly
computationally processing high quality daytime satellite                            rural and depicting agricultural features, it also shows nuances
images (Ghana, for example, would be represented by                                  with more refined looking fields, higher quality thatch roofing
approximately 53 million individual high-resolution image                            on out-buildings, and in the bottom-right corner, the cropped
files). In this analysis, satellite images were downloaded                           portion of a larger building with an angular blue roof. In this
from DigitalGlobe and VIIRS, around the geographical                                 sense, it may be unclear if trimmed grass per se; or thatched
areas corresponding to the GPS coordinates collected in                              roofs per se, correlate with income (let alone the ability of
the household surveys conducted and the mobile network                               machine learning algorithms to identify such features). The
operator tower sites.                                                                problem arises: which visual patterns generate signals to pay
                                                                                     attention to.
In high-density urban areas, higher tower density results
in much smaller polygons, permitting the satellite picture
around the tower to suitably represent the demographics
of the overall area, typically a neighborhood or even smaller
coverage zone. However, in rural areas, the Voronoi polygons
are far larger due to low cell tower density. A shortcoming of
the methodology is that it may not be reasonable to assume
that the area seen on the satellite image around the tower is
representative of the broader region. In fact, it may not be: in
rural areas, a tower could be placed centrally in a town, while


     Image Table 3: Comparing bottom and top wealth images in rural Uganda




                            Bottom 1% of surveyed wealth                             Top 1% of surveyed wealth


7	    A Voronoi decomposition is a method of segmenting space around a set of points such that the borders of the resulting polygon area are equidistant from
      other adjacent points. Any given point is thereby located at the weighted center of the polygon, in relation to its neighboring points.


                                                                                                                                                                11
     The ideal goal of remote sensing poverty prediction would be         In rural areas that are sparsely populated, this assumption is
     to shine a viewpoint over any spot on earth, accurately extract      stronger, as more geographic distance will likely be traveled
     visual information, and derive estimates of the demographic          before large changes are observed in the population income
     norms of the individuals living in the area. Presently, the          characteristics. In urban areas, a household might also be
     technology does not yet permit ex-ante predictions – ground-         expected to evidence similar characteristics as immediate
     truth data is necessary to train predictive models with known        neighbors, but the rate of change between a less-wealthy
     true data points. Research of this nature therefore requires         and relatively more-wealthy neighborhood might be more
     relatively large survey samples. A challenge – also faced by this    sudden. Indeed, this assumption is borne out in the results,
     study – is that surveys must be representative not only of the       where spatial modelling in Uganda shows a higher R-squared
     demographic population, but also of the visual space. That is,       value as compared to Ghana, where the small-scale spatial
     to have data for individuals across the income spectrum – and        variation is higher, leading to lower explanatory power.
     also to have an additional dimension of the spectrum of visual
     environments in which they live. For example, what a low-            The spatial boosting approach is depicted in Figure 3 below,
     income house and field in a rural area looks like, as compared       where the central satellite image’s asset-based wealth index
     to a high-income house and field in a rural area.                    score is estimated by using the nearest neighbors whose
                                                                          scores are calculated using known survey data.
     In Uganda, surveys expressly focused on rural low-income
     areas and generated GPS coordinates for images around low-
     income households. The result was thin survey data of wealthy
     comparators with which to train models to differentiate the          Figure 3: Nearest-neighbor spatial boosting
     visual cues associated with the spread of welfare and poverty
     levels. Somewhat conversely, in Ghana, survey data focused
     on urban areas, which resulted in fewer samples of what
     rural demographic variations looked like. However, as Ghana
     surveys had much broader geographic coverage, surveys and
     images were far more diverse compared with Uganda and
     generated a stronger set of features.

     To ameliorate the issue of survey visual variation, a method
     of spatial boosting was employed to estimate income
     demographic information for visual areas that did not have an
     explicit survey data point. This was done by creating Voronoi
     polygons between known household survey GPS coordinates,
     a strategy similar to the spatial segmentation employed at the
     level of cell tower locations. Here, images that were not directly
     associated with household GPS coordinates inferred poverty
     levels as a weighted average of several closest neighbors, by
     assuming that households near known survey locations were
     likely to evidence similar income demographics.




12
Aggregating the spatial boosting across the survey               In Ghana, 8-nearest neighbors clustering was used. Image
enumeration areas is depicted in Figure 3, for Uganda. The       Table 4 below compares spatial boosting at the aggregate
strongest signal was detected at an aggregation of 10-nearest    level of the entire survey coverage area in Uganda: the visual
neighbors.                                                       differences are quite small between the actual survey data and
                                                                 the predicted scores at the level of the Voronoi polygons.




  Image Table 4: Aggregated spatial boosting in Uganda: comparing survey and predicted values




               Ground-Truth Survey Score                                    Poverty Prediction Score
                       Gradients                                                   Gradients



Computer Vision Models                                           The satellite models presented below are so-called ResNet
                                                                 models. ResNet models are convolutional models that
The satellite models that are presented in this study are        through their design address the common challenge when
models that predict poverty levels based on features derived     training networks with multiple layers that normally let
from satellite imagery through programmed visual pattern         model performance saturate or decrease with the addition
recognition. The models are convolutional neural networks,       of layers (vanishing gradient problem). For this study, ResNet
which are a classification of machine learning algorithms.       models were used that had been pre-trained for pattern
Meaning, with appropriate training, the computer can             recognition from generic images and they were further fine-
effectively learn to “see” relevant features in the associated   tuned with additional layers and simple extensions based on
images.                                                          the relevant country datasets in Ghana and Uganda and the
                                                                 study’s objectives to see features that correlate with income
Convolutional neural networks are neural networks with
                                                                 demographics.
multiple mathematical layers (between input and output
layers) that can recognize visual patterns directly from
pixel images with minimal processing since they filter pixel
connections by proximity.




                                                                                                                                  13
     Results
     A variety of prediction methods were explored in the Uganda                Models
     and Ghana contexts, as well as using different poverty score
     metrics as dependent variables for the models. A comparative               In Uganda, using a combined model approach yielded an
     table of key models is shown below (Table 1) specifying each               R-squared value of 0.28 using an asset-based wealth index as
     model by listing the category of features that were included in            the outcome poverty metric. In Ghana, the highest R-squared
     the model (from satellite imagery, through spatial boosting or             value observed was 0.2, using an asset-based wealth index as
     derived from call detail records) as well as the poverty metric            the dependent variable. In basic terms, this means that the
     that was predicted respectively. Models that use spatial                   predictive models are able to explain 28 percent and 20 percent
     boosting in combination with satellite imaging yielded the                 of the variation in poverty observed in Uganda and Ghana,
     most explanatory power in both countries.                                  respectively. Generally, these figures are not considered
                                                                                especially strong indicators of explanatory power. However,
                                                                                in the context of explaining differences in welfare from one
                                                                                neighborhood to the next, even a small percentage may offer
                                                                                meaningful insight.




        Table 1: Comparison of Poverty Prediction Models


                          MODEL                                  POVERTY METRIC                                        R2

         Ghana
         Satellite                                             Asset-based wealth index                               0.01
         Spatial                                               Asset-based wealth index                               0.20
         Satellite + Spatial                                              PPI                                         0.15
         CDR                                                              PPI                                         0.07
         Uganda
         Satellite                                             Asset-based wealth index                               0.14
         Spatial                                               Asset-based wealth index                               0.23
         Satellite + Spatial                                   Asset-based wealth index                               0.28
         CDR                                                             SWIFT                                        0.01




     Benchmarking                                                               Explanatory power falls between 1x and 2x poverty line,
                                                                                suggesting difficulty in identifying visual signals to segment
     Among the poorest demographics, these results are                          gradients of poverty. Noting that the authors’ approach
     comparable to previous work conducted by Jean et al. (2016).               yielded scores for larger geographic areas, the model was able
     In that study, the pooled results of the day-time satellite image          to achieve R-squared results of up to 0.6 across all ranges of
     model yielded R-squared values of approximately 0.10 to 0.25               income demographic clusters, notably increasing explanatory
     for the set of poorest clusters below the poverty line of $1.90            power at levels greater than $5.00 per capita per day income
     per capita per day (see Figure 4).                                         (i.e., approximately 3x and above).




14
     Figure 4: Pooled observations of transfer learning model and nightlights model
     by Jean et al. (2016)


                                                      0.6




                                                                International poverty line




                                                                                                                      2x poverty line




                                                                                                                                             3x poverty line
                                                      0.5
                                 Probability of SME




                                                      0.4


                                                      0.3


                                                      0.2
                                                                                              transfer learning

                                                      0.1                                         nightlights


                                                      0.0
                                                            0                                20       40           60                   80                     100


                                                                                              Poorest percent of clusters used




This research explored poverty estimates at more granular                                                         Below, predicted poverty scores, their interpretation, and
household and neighborhood levels. As previously noted, the                                                       comparison with actual images are explored and discussed
CDR-based models were inconclusive due to poor ability to                                                         in more detail by the example of the PPI predictions of the
match phone customers and survey responses and acquire a                                                          Ghana satellite model. Although model results primarily
statistically robust sample. Therefore, the models produced                                                       incorporated the SustainLab asset-based index approach and
results at the relatively granular resolution of neighborhoods,                                                   provide some comparability across the Uganda and Ghana
as defined by areas in proximity to cell phone tower locations                                                    contexts, the PPI was considered to offer more interpretive
at a variable resolution of the Voronoi polygons. As spatial                                                      power due to the ability to resolve PPI index scores across
resolution of poverty estimates was variable, depending on                                                        multiple poverty benchmarks. Further, in the course of this
cell tower density, the results are not directly comparable to                                                    study, some exploratory analysis suggested the design of the
the more constant resolution discussed in Jean et al. (2016).                                                     PPI survey might better correspond to visual features that can
Nevertheless, models achieving R-squared values of 0.28                                                           be resolved by vision models. This may be one area where
and 0.2 may be considered reasonable, given the nature of                                                         future research might specifically focus on identifying features
the input data and granularity of estimates sought, and that                                                      that tools like PPI have established as statistically significant
estimates were specifically targeting the lowest clusters of                                                      poverty estimators.
observed income.
                                                                                                                  For Ghana, using the PPI poverty estimator, the predicted
Poverty Estimators                                                                                                distribution compares favorably with the observed PPI results
                                                                                                                  from the survey. Across the 1,262 Voronoi polygon coverage
Whereas other research has focused on income estimates in                                                         areas in Ghana, the model predicts a median PPI score of 63.3.
a more absolute range across populations (such as predicting                                                      This is consistent with a median PPI score of 63 observed by
a specific income value), this study incorporated different                                                       the household surveys.8 Figure 5 shows that the distribution
poverty estimation methods, PPI, SWIFT and an asset-based                                                         of observed PPI scores and the distribution of predicted scores
wealth index, to estimate poverty prevalence more generally.                                                      are very similar, centered around a score value of 62-63, with
The PPI and SWIFT approaches achieve this by providing a                                                          most score variation happening ten score points below and
statistical estimate that a household is simply above or below                                                    above this value, and slightly skewed toward higher (non-
a given poverty line. Focusing at more granular spatial levels                                                    poor) scores. A score of 60-64 means that nine percent of the
of urban neighborhoods results in lower power of models                                                           population is likely to fall below the $2.50/day poverty line;
to explain the range of approximated levels of household                                                          and about 52 percent are likely to fall below the higher $5.00/
consumption and poverty incidence but the models show a                                                           day poverty line.
reasonable ability to impute overall prevalence of poverty.




8	   Statistical lookup tables that convert PPI scores (here between 60 and 64) into the corresponding likelihoods of falling below different poverty lines in a
     country are produced by Innovations for Poverty Action and are available here: https://www.povertyindex.org/country/ghana.


                                                                                                                                                                                      15
     Figure 5: Observed vs Predicted PPI Score Distributions in Ghana

     Distribution of: PPI Predictions
                                                       Mean: 62.2
                                                                                                                   Observed PPI distribution:
                                                                Median: 63
                      10
                                                                                                                   Median 63

                      8

                                                                                                                   A high PPI score corresponds to a
                      6                                                                                            lower probability of being poor.
       Percent




                      4


                       2


                      0
                                20        40          60           80          100
                                               PPI Score




                    0.35                                              Mean: 62.5
                                                                             Median: 63.3
                                                                                                                   Predicted PPI distribution:
                     0.3
                                                                                                                   Median 63.3
                    0.25
       Proportion




                     0.2


                    0.15


                     0.1


                    0.05


                      0
                           10        23        36          49           62           76        89

                                                    PPI Predictions




     Exploring and Interpreting Poverty Maps                                              In this manner, poverty estimation is more granular at a
                                                                                          neighborhood level in higher density urban areas; whereas
     Using the results obtained in this study, poverty maps are                           in rural areas, polygons are far larger. Zooming-in on urban
     presented at varying national, regional and localized scales                         centers in Accra and Kumasi shows the granular nature of
     by using the cell tower geo-segmentation approach. Image                             the polygons, whose geographic area becomes smaller as
     Table 5 presents the mapping of predicted PPI poverty scores                         cell tower density increases. Many map areas do not have
     in Ghana at the country level. A total of 1,262 polygons are                         predicted poverty levels (they are filled with a gray checked
     visualized, nationally.                                                              pattern): satellite images were not available country-wide at
                                                                                          the resolution used; some areas faced processing errors that
     Shaded polygons are established using mobile network
                                                                                          resulted in incomplete mapping; and as already discussed, only
     provider cell tower locations, where darker shades show
                                                                                          areas around cell towers attempted generating estimates, as
     estimates of low poverty incidence; lighter shades, higher
                                                                                          computing several million images far exceeded the coverage
     incidence of poverty. With greater cell tower density to
                                                                                          of the network and survey data.
     serve more densely-populated urban areas, polygon sizes
     become more granular, as do predictive score coverage areas.




16
Image Table 5: Satellite image-based PPI Prediction Mapping – Ghana


                                                                                                              Zoom into Accra
                                                                                                              Municipality




                                                                                                               Zoom into
                                                                                                               Center of Kumasi




               Map of entire country



This Image Table visualizes the mapping tool developed for this project, providing shared coverage zones corresponding to
the Voronoi polygon segmentation approach. The sequence of images illustrate the ability to zoom-in from country-level to
neighborhood-level coverage areas. Here, PPI scores are depicted (darker is higher PPI score prediction, meaning higher wealth;
lighter non-checked areas show low scores and therefore increased predicted prevalence of poverty). As discussed elsewhere, the
tile-based mapping approach enables layering multiple indicators of interest.




In terms of satellite imaging, urban areas are also more              Visually exploring urban areas in Accra helps to make this
feature-rich in terms of buildings and roads, while in rural          point, while also illustrating the application (and challenges)
areas there may be more grasslands or uninhabited areas. Yet,         of the poverty estimation models combined with maps
urban areas also have much more demographic diversity in              segmented by the Voronoi estimation zones. The Image
smaller areas, meaning neighboring households may be less             Table 6 depicts one of Accra’s wealthiest areas, serving as an
similar in terms of welfare, despite sharing common visual            empirical example of high-income visual features.
features in a satellite image.




                                                                                                                                        17
     Image Table 6: Empirical Observations Comparing Poverty Scores and Images – Urban Wealth




     Distribution of: PPI Predictions

                                        0.35                                        Mean: 62.5
                                                                                            Median: 63.3
                                         0.3
                                                                                            Selected: 64.7
                                        0.25
                           Proportion




                                         0.2


                                        0.15


                                         0.1


                                        0.05


                                          0
                                               10        23    36         49           62         76          89

                                                                     PPI Predictions
                       Map: https://earth.google.com/web/@5.65555391,-0.111776,33.4206017a,314.48224383d,35y,0h,0t,0r



     Trasacco Valley is recognized as one of Accra’s wealthy              neighborhood10; and also, the Southern area of Achimota,
     neighborhoods9. Selecting this area specifically on the              which notably includes the slum area of Abofu (see lightly-
     predicted poverty map shows above-average PPI predicted              shaded low-income estimated coverage area, highlighted with
     scores, although only modestly so. This clearly shows                an orange-border polygon area). A satellite image snapshot of
     limitations of the model’s accuracy, with a predicted score          the zone covered by Google Maps shows visual differences in
     of 64.7 – an improbably low prediction corresponding to 43           the housing density and construction of buildings, particularly
     percent below $5.00/day. This sort of discrepancy may likely         clustered around the crossing highways. Identified through
     be an artifact of the spatial boosting approach combined with        the predictive satellite mapping, the predicted PPI score in this
     satellite imaging. Zooming-in on the coverage area, multi-           area is 53--on the lower end of the distribution of values across
     story single family houses are seen, lawns and swimming              the country (see orange line in distribution chart). With a mix
     pools, which are expected to correlate with near-zero poverty        of more affluent housing stock and slum areas, this score
     for the coverage area.                                               indicates probability of 68 percent of denizens in this area to
                                                                          fall below the $5.00/day poverty line.
     One of the poorest zones predicted within the greater
     Accra area, depicted below (Image Table 7), includes the
     Northwest section of the Abelemkpe, a relatively wealthier


     9	https://www.africa.com/a-million-gets-you-in-ghana/
     10	https://en.wikipedia.org/wiki/Neighborhoods_of_Accra
18
Image Table 7: Empirical Observations Comparing Poverty Scores and Images – Urban Poverty 1




 Distribution of: PPI Predictions
                            0.35                                             Mean: 62.5
                                                                                    Median: 63.3
                             0.3
                                                        Selected: 53
                            0.25
               Proportion




                             0.2


                            0.15


                             0.1


                            0.05


                              0
                                   10      23          36         49           62          76          89

                                                             PPI Predictions
        Map: https://earth.google.com/web/@5.61074734,-0.2240724,20.66296749a,1087.02950268d,35y,0h,0t,0r




Another example of a poorer neighborhood in Accra is depicted          Although the predicted poverty level for this area is at the
below in Image Table 8 at different zoom levels. Chorkor is            lower end of the distribution poverty scores across the county
a fishing village at the coastline in Accra. The corresponding         (see distribution in Image 8), it is not as low as expected for a
satellite image shows a densely populated neighborhood at              slum area known for its high levels of poverty and lower access
the coastline in Accra. It is a fishing village struggling with        to infrastructure. This result may be explained by a closer look
poor sanitation, access to water and power infrastructure,             at the Google Maps satellite snapshot. Apart from the slum
and waste management. The predicted PPI score for this area            area at the bottom half, the upper part of the image shows
is 55.1 which corresponds to a 60.3 percent likelihood for the         less dense housing structures surrounded by more greenery
population living there to fall below the $5.00/day poverty            suggesting higher levels of wealth. Indeed, this upper part of
line.                                                                  the image shows a university and a hospital campus.




                                                                                                                                           19
     Image Table 8: Empirical Observations Comparing Poverty Scores and Images – Urban Poverty 2




     Distribution of: PPI Predictions
                                             0.35                                         Mean: 62.5
                                                                                                Median: 63.3
                                                                         Selected: 55.1
                                              0.3


                                             0.25
                                Proportion




                                              0.2


                                             0.15


                                              0.1


                                             0.05


                                               0
                                                    10   23        36           49         62          76       89


                                                                  PPI Predictions

                   Map: https://earth.google.com/web/@5.5334772,-0.23087813,18.66830879a,2254.39682943d,35y,0h,0t,0r




     Both examples of neighborhoods presented above (Image                       •	   Map A.1 depicts the satellite-based predicted PPI scores for
     Table 7 and 8) are areas where welfare levels were predicted                     Ghana. Darker shades visualize higher scores in respective
     solely based on the underlying satellite imagery. None of the                    areas, which translate into lower predicted poverty
     survey data used for model training was collected in those                       incidence in those polygons.
     areas. The fact that both wealthy and poor areas are covered by
     the polygons respectively, explains moderately low predicted                •	 Map B.1 visualizes call activity for users of a Ghanaian
     poverty incidence and provides anecdotal evidence that the                     mobile network operator. The map shows gradients of
     satellite model for PPI estimation aligns to some degree with                  telephone calls incoming to respective smaller areas.
     observed characteristics.                                                      Darker areas depict relatively more calls received.

     These two examples also illustrate the complexity in                        •	 Map C.1 shows mobile money transaction activity. The
     generating granular neighborhood-level estimates, especially                   map shows gradients of the total value of transfers that
     in more urban environments precisely because of the rapid                      are being sent and received in a respective area. The higher
     changes that may occur between low- and high-income                            the value of transfers per month, the darker the shade.
     segments, the visual features that characterize them,
     and general lack of border boundaries (e.g., a political or
     administrative line).

     Ultimately, the goal of this research is to explore the interplay
     of poverty and Digital Financial Services. Image Table 9 shows
     how poverty heat maps can be meaningfully compared to
     layers of telephone and mobile money activity. Three metrics
     are selected for comparison:
20
 Image Table 9: Layering poverty predictions, telephone and mobile money activity in Ghana

  Map A.1–Poverty Prediction                            Map B.1-Telephone Activity                            Map C.1-Mobile Money Activity
  (PPI score) (Lighter=Poorer)                          (Number of Incoming Calls)                            (Total Value of Transfers)




  Map A.2 – Zoom into Accra                             Map B.2 - Zoom into Accra                             Map C.2- Zoom into Accra




Across the maps, the same polygon areas are shaded,11                                 mobile money activity layer as adoption levels are still largely
enabling the ability to directly layer transactional data atop                        lacking behind cell phone ownership. Zooming-in on urban
poverty estimates. Moreover, as polygons are approximating                            areas (see for example Map B.2 zoomed into Accra in Image
mobile network operator service areas, insights are equally                           Table 9) shows again the granular nature of the polygons,
valuable to providers seeking a better understanding of                               whose geographic areas become smaller as cell tower density
consumer segments with respect to their service areas.                                increases. As a result, these urban zones also show significant
                                                                                      gradients of calling and mobile money activity between them
Regarding call behavior and mobile money transaction                                  at this level of resolution.
activity, nationally, darker-shaded urban areas show increased
activity, as would be expected. This is most pronounced for the




11	 Polygons with missing data are again filled with a gray checked pattern. Different reasons can explain missing data. No available high-resolution satellite
    imagery, processing errors or no recorded call or mobile money activity during given time period in respective polygon.


                                                                                                                                                                  21
     To illustrate the feature layering approach with a concrete        values across locations in Ghana. By definition each Voronoi
     example, one area that was discussed before (capturing parts       polygon constitutes a geographic area with a single cell tower
     of the poor village of Chorkor in Accra) is again highlighted      at its geometric middle. Therefore, values may be interpreted
     with orange boarders (Image Table 9). Table 2 lists the            in terms of volume of activity per tower per month.
     corresponding poverty, telephone, and mobile money activity
     metrics for the selected areas, comparing them to the median




        Table 2: Predicted Poverty Statistics, Telephone and Mobile Money Activity (per tower per
        month) in Chorkor

                                                                        EXAMPLE POLYGON ZONE
                                     METRIC                                                                 GHANA MEDIAN
                                                                          (PART OF CHORKOR)
         Poverty Statistics
         Predicted PPI Score                                                        55.1                           63.3
         $5.00/day poverty rate - PPI interpretation                               60.3%                          52.1%
         $1.90/day poverty rate - PPI interpretation                                3.6%                           1.5%
         Telephone Activity
         Number of outgoing calls per month (per month and user)                    104                            47.5
         Number of incoming calls (per month and user)                               41                             15
         Outgoing call duration (total tower minutes per month)                  129 hours                      69 hours
         Incoming call duration (total tower minutes per month)                   61 hours                       29 hours
         Number of incoming SMS per month (total per tower)                       26,600                          7,900
         Number of outgoing SMS per month (total per tower)                       81,000                          39,000
         Mobile Money Activity
         Mobile money transfer (average amount per month)                           $21                            $19.5
         Mobile money cash in (average amount per month)                            $13                             $15
         Mobile money cash out (average amount per month)                           $17                             $16




     The predicted poverty incidence in the selected polygon is         Overall, this shows that in this specific area, a community
     with an estimated 60.3 percent of the population living below      with higher poverty prevalence also shows much higher
     the $5.00/day poverty line, which is an eight percentage point     telephone usage; and similar mobile money activity patterns
     higher poverty rate than the median value in Ghana. But            (slightly biased toward cash-out, suggesting net inflows into
     despite low welfare levels, the area still shows high levels of    the community).
     telephone activity. Across all call activity metrics, the values
     for the specific polygon are more than twice as high as the        As previously observed, this neighborhood shows a mix
     median values. In other words, the cell tower in this area         of features that expect to correlate with higher and lower
     hosts a highly active userbase, as compared to tower and user      poverty prevalence (eg., slum areas adjacent to areas with
     communities elsewhere.                                             single family homes and lawns). It is impossible to identify
                                                                        wealth characteristics at the individual user level to know
     Regarding mobile money activity, results differ depending          if the telephone and mobile money patterns are driven by
     on the metric. Mobile money activity in the area is higher         wealthier demographics or poorer demographics or evenly
     than the countrywide median with respect to the volume of          distributed across all users. However, what is known – and
     cash-outs and mobile money transfers; whereas the average          what is important from the perspective of both providers and
     cash-in amount is lower than in the majority of other polygons     policy makers is this: the community depicted here shares a
     across the country.                                                common infrastructure.




22
Telephone statistics are reported in terms of the traffic served   computational approach that explores relationships among
by the tower at the geometric center of the polygon; cash-in       these types of data would identify “hot spots” of interest
and cash-out statistics are similarly reported in terms of the     according to specific strategic interests for providers or
tower that intermediated the agent float balances to facilitate    policy-makers. Digital financial service providers and donors
the service transaction. Consequently, any commercial or           might use the layering approach to compare even static or
developmental interventions designed to expand financial           slowly changing poverty baseline estimates with a variety of
access will reach communities that access those services           different indicators that help to monitor and identify areas
via this shared infrastructure. It is therefore meaningful to      for targeted interventions to reduce poverty and to increase
articulate the reach of financial services with respect to the     financial inclusion. Remittance rates as well as other metrics
demographic make-up of the communities to share the                of (net) financial inflows and outflows of neighborhoods, and
“home” network tower in their neighborhood.                        especially population numbers that better estimate financial
                                                                   reach and micro-market sizing, are interesting indicators for
The Chorkor neighborhood discussed here was selected               layering atop of poverty rates.
simply by having a low-scoring PPI prediction for the sake
of exemplifying a layered analytic approach between
poverty models, GSM and DFS activity. Overall, a refined




                                                                                                                                    23
     Discussion
     Lessons for Estimating Welfare with Satellite Imagery                                     tower12. It is reasonable to assume that tower density is
     and Call Detail Records                                                                   proportional to population density (or at least provider
                                                                                               subscriber density). That is, providers are incentivized to
     It is necessary to measure welfare and poverty levels regularly,                          put more towers where increased service is need. Doing
     at high spatial resolution, at high temporal frequency, and at                            so refines coverage polygons into smaller geographic
     low cost. The increasing availability of day- and night-time                              spaces, importantly characterized by the people using
     satellite images and powerful deep learning algorithms has                                the shared access point. The area defined by the Voronoi
     introduced new methods to predict poverty and welfare levels.                             polygon therefore describes the DFS usage statistics of
     This research aimed to further these methods, specifically by                             the denizens since the tower intermediates transactions
     increasing the granularity of analysis to smaller areas.                                  performed by users and agents.

     Overall, the study finds that it is possible to identify                             3.	 Daytime satellite imagery improves poverty prediction,
     meaningful welfare estimates at neighborhood-level                                       but caveats remain. Nightlight satellite imagery can
     resolution. However, these estimates are likely to lack                                  provide baseline estimates for regional poverty, but
     precision. A joint spatial/satellite model provides the                                  they are less useful in rural areas that do not have
     highest explanatory power, which combined interpolated                                   much variation or nightlight signals due to lower levels
     geo-marked survey data with machine vision feature                                       of electrification. Daylight satellite imagery provides
     identification. This demonstrates that there are components                              a better alternative in many cases, although high-
     of estimated wealth that are detectable through satellite                                resolution imagery is not always available for all regions
     imagery. While ground surveys are still necessary to develop                             in a country and individual high-resolution satellite
     country-specific models, adding remote-sensing information                               images are unlikely to have uniformly distributed
     can reduce the sample sizes needed for detailed poverty                                  features that carry meaningful signals of poverty or
     estimation.                                                                              wealth. Enough variation in wealth exists across the
                                                                                              visual space to make wealth estimation with day time
     The following are key lessons learned from this study about
                                                                                              satellite imagery difficult without a robust ability to
     different data sources, methods, and challenges depending
                                                                                              detect and identify the features that characterize the
     on context and targeted levels of granularity:
                                                                                              visual space. There is room for ample improvement for
     1.	 Evaluating welfare at neighborhood levels and with high                              granular level poverty estimation, especially for urban
         spatial resolution may be valuable when lower levels                                 neighborhoods. In this study, higher R-squared values
         of precision are acceptable. A rough understanding of                                for rural Uganda are due to the relatively lower rate of
         income can meaningfully segment geographic areas that                                change across adjacent satellite images. Urban Ghana’s
         are below (or above) specific poverty threshold, such as                             rapid feature changes across smaller geographic space
         in this case where poor versus not-poor can characterize                             results in fewer salient features (or conflicting features)
         neighborhoods or provide estimators for service reach to                             in the visual space. Here, further research might focus
         demographic segments. More so, when rough estimates                                  on specialized feature detection, such as models that
         can help to describe highly variable wealth demographics                             can detect cars, prominent urban characteristics and
         among neighbors in densely populated areas, it may be                                other indicators of wealth.
         possible to approximate general poverty preponderance
                                                                                          In the course of this research, it was conjectured that additional
         within that neighborhood, rather than a specific per-
                                                                                          feature detection research might prioritize identifying features
         household value.
                                                                                          that correspond to visually-identifiable features of poverty
     2.	 Poverty prediction at the level of Voronoi polygon-based                         survey tools. Specifically, the PPI methodology uses some
         cell tower locations allows small area welfare estimation                        statistical measures that have strong visual determinants
         in urban environments. Predicting poverty at lower than                          (such as a building’s roof material, or whether there is an
         regional levels raises the question of how to segment                            outdoor latrine). While a challenging problem to solve, it
         space – simply, where do boundaries exist? Political                             is nevertheless reasonable to train a visual model to see a
         or map boundaries may or may not exist in a national                             thatch roof or metal roof or tile roof, for example, and perhaps
         context, especially for smaller towns. More importantly,                         recognize features like community washrooms or cisterns
         how political boundaries are drawn is unlikely to reliably                       for potable water. Whereas poverty survey tools driven by
         characterize demographic features of people who                                  consumption data, ownership or household expenditure
         live within that zone. Using cell towers to segment                              provide less direct opportunity to “see” these types of features
         geographies is beneficial, as the Voronoi polygon approach                       in the visual space and interpret them accordingly.
         groups populations in terms of proximity to nearest



     12	 Cell tower location data is available publicly or may be purchased by independent organizations that map infrastructure locations, globally, such as
         OpenCellID (https://www.opencellid.org/).



24
An area of future research might therefore seek to train visual    Application of Poverty Estimation Findings – Financial
models specifically to recognize observable features present in    Inclusion and Beyond
PPI (or other methods) to improve prediction accuracy.
                                                                   Research shows that telephone usage13 and increased social
4.	 Spatial boosting is particularly helpful to improve models     network size14 are strong predictors for active uptake and use of
    for rural poverty estimation. The ostensible goal of using     Digital Financial Services. Moreover, Digital Financial Services
    remote-sensing to estimate poverty is to reduce the            boost financial inclusion and contribute to poverty reduction
    time and expense of ground surveys. This study found           and improved livelihood indicators. DFS tend to be adopted
    that using spatial boosting helps to address this, as          first among higher income demographics, particularly urban
    meaningful estimates can be inferred for non-surveyed          youth. They scale by diffusing from early adopters and are likely
    image locations by weighting surveyed observations             to grow along remittance corridors or social networks (such
    from nearest-neighbors. This approach was found to be          as urban laborers who send money home to families in rural
    more effective in rural areas that are less likely to have     areas).15 Identifying these corridors is key, and normalizing the
    substantial variations in welfare over short distances.        use of DFS is a means of scaling financial inclusion. Tracking
    Whereas in urban areas, large disparities of wealth were       financial flows into (or out of) areas with low-income welfare
    observed among neighbors, posing a significant challenge       estimates may help to monitor the reach of financial inclusion
    for training machine vision models.                            and help target areas of greatest need.

5.	 Representative sampling may not meaningfully overlap           Given that DFS can play a significant role in diminishing poverty,
    with provider data. This study also tried to predict poverty   it is important to be able to accurately identify locations
    levels in neighborhoods with call activity data, expecting     where the poor live for the purpose of deploying targeted
    randomly selected survey respondents to be sufficiently        financial inclusion strategies, as well as for monitoring the
    represented in provider data to model CDR usage and            use of financial services, and observing the impact they have
    wealth demographics. For both countries, the survey            on the population. Current national survey methods are slow
    results effectively showed that providers had relatively low   and costly, meaning that observing and measuring reach and
    market share in the enumeration zones. Although results        change is likely to take place on multi-year time lines. Even
    are presented in Table 1 for the sake of completeness, there   techniques that employ remote-sensing perfectly, while less
    is little interpretive value due to very low coincidence       expensive, may also be slow to observe poverty changes.
    between CDR users and survey respondents. Therefore,           However, indicators of financial empowerment represented
    future similar research should conduct minimal baseline        through provider data change much faster, at the rate of
    surveys to understand general market share when                usage and uptake.
    attempting to use provider data and then design the
    full survey to over-sample in a statistically controllable     Layering heat maps of poverty, telephone usage and
    manner to ensure adequate coincidence between data             financial activity
    sets.
                                                                   Financial inclusion insights can therefore be obtained
6.	Broad-spectrum representative ground-truth survey               from comparing different data layers of telephone usage,
   data is essential for training poverty estimation models.       financial transaction activity and poverty levels to deepen the
   Breadth here implies that ground-truth welfare data             understanding of how they are interconnected. A base layer
   encompasses the range of economic well-being within             of poverty estimates is fundamental to drive these types of
   the population. Data should come from households with           insights, which can benefit providers and policy makers alike.
   sufficient geographical dispersion so that the number
                                                                   The maps presented in this report depict single-variable layers
   of areas they fall in are high enough to train machine
                                                                   to illustrate the approach. But further research is necessary to
   learning models if the variance is too small. Additionally,
                                                                   computationally aggregate population, income and financial
   breadth also implies that images selected also encompass
                                                                   activity estimates to quantify the reach and scale of financial
   the range of buildings, roads, fields, farms and relevant
                                                                   inclusion meaningfully at the national-level and at a more
   features that are representatively associated with the
                                                                   granular scale. Population layers are also critical to further
   spectrum of welfare. The variation should ensure sufficient
                                                                   this approach and should be considered equally in further
   feature capture to identify and differentiate a wealthy
                                                                   research.
   household’s manicured lawn from a poorer household’s
   adjacent pasture, for example; equally for urban areas,
   to ensure that the variety of visual features are captured
   along with the variety of wealth segments that may
   correlate with those features. This problem was evident
   in this study’s focus in rural Uganda, particularly since (by
   design) surveys focused on the poorest households, but
   this also resulted in a relative lack of wealthy households
   against which to compare and train models.




13	 IFC 2016; Blumenstock et al. 2015.
14	 Mattson and Stuart 2018
15	 IFC et al. 2017, Aga and Martinez Peria 2014


                                                                                                                                        25
     Increasing impact by identifying areas of biggest need                            Application beyond financial inclusion
     and largest reach
                                                                                       This study was more specifically focused on identifying
     This research finds that satellite imaging can be used to                         poverty as a basis to compare and assess with respect to the
     meaningfully segment welfare levels at neighborhood-                              prevalence of Digital Financial Services. However, the need
     level granularity, although with relatively low precision.                        for regular and granular poverty prediction with the help of
     While these models offer only modest ability to explain the                       satellite imagery and call activity data goes of course beyond
     variation in wealth at granular levels, the ability to segment                    financial inclusion. Layering publicly-available population
     and rank poverty estimates can identify key areas to focus                        information16 onto cell-tower location-based polygons allows
     on, potentially advancing both commercial and financial                           for example to approximate estimates for how populations
     inclusion strategies. Further, layering poverty estimate data                     with different income demographics access Digital Financial
     can identify financial inclusion engagement opportunities                         Services. Applications in other domains is also possible, by
     (i.e., high cell coverage, low welfare) or populations that are                   assessing the proximity to access different services as well as
     particularly underserved that donors may seek to strategically                    the coverage density provided to populations within an area
     target (i.e., low cell coverage, low welfare). In this manner,                    of interest. Other areas of application include for example
     poverty estimates such as those obtained through this study                       agriculture and infrastructure.
     can provide viable insights, despite the low precision of
     results: relative rank of welfare estimate is sufficient to provide
     directional information on financial inclusion targeting and
     reach, as does a categorical assessment of poverty prevalence
     above-or-below a given threshold.

     Improved understanding of the financial behavior and
     needs of the bottom of the pyramid

     Comparing poverty estimates with financial activity data
     helps to explore the scale of financial inclusion among the
     poorest income demographics. Providers seeking to better
     understand their own markets and customer demographics
     may gain insight into how services are used across geographies
     and income demographics. Such as whether money is sent
     and received from high-to-low predicted poverty areas or
     vice versa; or to quantify the volume of activity with respect
     to these parameters or relative per-capita metrics within
     coverage areas. Do these wealthier and poorer segments
     make phone calls to each other? Do remittances flow from one
     to another? If so, to what degree? If not, how may remittance
     and communication corridors be described in terms of the
     demographic characteristics of sender and receiver zones?




     16	 Such as Center for International Earth Science Information Network https://www.ciesin.columbia.edu/data/hrsl/ or WorldPop
         http://www.worldpop.org.uk/data/summary/?doi=10.5258/SOTON/WP00098




26
Conclusion
Neighborhood level poverty estimation with satellite           The ability to map small area poverty estimates and to
imagery is possible, although aided significantly by spatial   combine them with layered financial transaction data,
boosting techniques that draw on traditional survey data.      as explored in this study, provides opportunities for
Combined, the coverage of surveys is effectively increased     development professionals and Digital Financial Services
substantially, enabling smaller sample sizes to yield more     providers alike to identify and quantify engagement,
information. While the precision of poverty estimates is       particularly among the poorest individuals. Equally, to
limited, the ability to segment and rank geospatial areas      identify opportunities where high engagement on telephone
in terms of welfare is nevertheless insightful. Further work   channels or other demographic characteristics may signal
is needed to refine the models developed in this study and     opportunities to strategically engage underserved markets
to develop research of this nature into insights for service   that are likely to adopt and benefit from improved services.
providers. However, the basic building blocks are here to
start using them. Even directional information on estimates
of wellbeing can help to direct better understanding of
financial inclusion reach.




                                                                                                                              27
References
Aga, Gemechu Ayana; and Martinez Peria, Maria Soledad (2014). “International Remittances and Financial Inclusion in Sub-
Saharan Africa”. Policy Research Working Paper No. 6991. World Bank Group, Washington, DC.

Blumenstock, Joshua (2016) “Fighting Poverty with Data.” Science 353 (6301), 753-756

Blumenstock, Joshua; Cadamuro, Gabriel; On, Robert (2015). “Predicting Poverty and Wealth from Mobile Phone Metadata.”
Science 350 (6264), 1073-1076

Blumenstock, Joshua (2018). “Don’t forget people in the use of big data for development”, Nature, 561 (7722), 170-172

Blumenstock, Joshua (2018). “Estimating Economic Characteristics with Phone Data.” AER Papers and Proceedings, 108, 72-76

Demirgüç-Kunt, Asli, Leora Klapper, Dorothe Singer, Saniya Ansar, and Jake Hess (2018). The Global Findex Database 2017:
Measuring Financial Inclusion and the Fintech Revolution. World Bank: Washington, DC. - Global Findex 2017 https://
globalfindex.worldbank.org/

Donaldson, Dave and Storeygard, Adam. (2016) “The View from Above: Applications of Satellite Data in Economics.” Journal of
Economic Perspectives – Volume 30, Number 4, 171-198

Fung, Vincent (2017). “An Overview of ResNet and its Variants.” Medium – Towards Data Science. https://towardsdatascience.
com/an-overview-of-resnet-and-its-variants-5281e2f56035

Ghosh, T.; Anderson, S.J.; Elvidge, C.D.; Sutton, P.C. (2013). “Using Nighttime Satellite Imagery as a Proxy Measure of Human
Well-Being.” Sustainability 2013, 5, 4988-5019.

GSMA (2018). “The Mobile Economy” - https://www.gsma.com/mobileeconomy/wp-content/uploads/2018/05/The-Mobile-
Economy-2018.pdf

IFC (2016). “Find the Gap: Can Big Data help to increase Digital Financial Services Adoption?” Partnership for Financial Inclusion

IFC, Cignifi and Airtel Uganda (2017). “Mobile Wallet Usage Study – Apply CDR models to increase Mobile Wallet Adoption &
Activity.” Netmob. Book of Abstracts, p.11 http://www.netmob.org/www17/assets/img/bookofabstract_oral_2017.pdf

Jean, Neal; Burke, Marshall; Xie, Michael; Davis, W. Matthew; Lobell, David B.; Ermon, Stefano (2016). “Combining satellite
imagery and machine learning to predict.” Science 353 (6301), 790-794.

Mattson, Carolina and Stuart, Guy (2018). “Understanding Key Mobile Money Users.” MFO

PPI – Poverty Probability Index - https://www.povertyindex.org/

Steele, Jessica E.; Sundsøy, Pål Roe; Pezzulo, Carla; Alegana, Victor A.; Bird; Tomas J., Blumenstock; Joshua; Bjelland, Johannes;
Engø-Monsen, Kenth; de Montjoye, Yves-Alexandre; Iqbal, Asif M.; Hadiuzzaman, Khandakar N.; Lu, Xin; Wetter, Erik; Tatem,
Andrew J. and Bengtsson, Linus (2017). “Mapping poverty using mobile phone and satellite data.” Journal of the Royal Society
Interface –14 (127) - https://royalsocietypublishing.org/doi/10.1098/rsif.2016.0690

Suri, Teevnet; and Jack, William. 2016. The long-run poverty and gender impacts of mobile money. Science, 354 (6317):1288-1292.

World Population Review Ghana - http://worldpopulationreview.com/countries/ghana-population/ - Estimate as of January
31st, 2018 (reference date during survey data collection)

Yoshida, N., R. Munoz, A. Skinner, C. Kyung-eun Lee, M. Brataj, W. Durbin and D. Sharma (2015). “SWIFT Data Collection
Guidelines Version 2”
             AUTHORS

              Soren Heitmann leads the applied research and learning program for the Partnership
              for Financial Inclusion. His background is in data science, development economics
              and cultural anthropology.

              Sinja Buri is a data operations analyst for the Partnership for Financial Inclusion. Her
              research focuses on digital financial service customer behavior and demographics
              and applying insights for product development and strategy.




             CONTRIBUTING AUTHORS

              Guanghua Chi, doctoral student at the UC Berkeley School of Information and
              Nikhil Desai, software engineer at Google (formerly a researcher at the Stanford
              Sustainability and Artificial Intelligence Lab) also contributed to this report.




             ACKNOWLEDGMENTS

              IFC and the Mastercard Foundation Partnership for Financial Inclusion are grateful to
              Marshall Burke and Joshua Blumenstock for their scientific input and oversight of the
              model development for this research project. To the Bill & Melinda Gates Foundation for
              supporting the research engagements whose data are included in the analyses. Additional
              appreciation to Shafique Jamal for his support in mapping and visualizing poverty
              predictions and to Gary Seidman and Lesley Denyes from IFC for their editorial support.
April 2019