Mastering the Layered Grammar of Graphics with ggplot2: A Complete Guide Using Global Findex Data

 

1. Introduction: Why ggplot2 Dominates Data Visualization

The ggplot2 package, developed by Hadley Wickham, is the most influential visualization tool in R.
Its power comes from the Layered Grammar of Graphics, a systematic way of thinking about data visualization.

Your Global Findex visualization is an excellent illustration of how this grammar works to uncover insights about:

  • Education

  • Account ownership

  • Financial exclusion

  • Socioeconomic behavior


2. Understanding the Dataset Context

The Global Findex Database, maintained by the World Bank, measures:

  • Access to financial accounts

  • Barriers to financial inclusion

  • Borrowing and saving behaviours

  • Reasons for not having an account

The variables used in your sample:

  • educ → Education level

  • account → Has or does not have a financial account

  • fin11d → Whether lack of money is a barrier

  • count → Weighted number of adults

This allows for layered, categorical comparisons.


3. Layered Grammar of Graphics Explained

Every plot in ggplot2 follows six layers:

  1. Data

  2. Aesthetics

  3. Geometries

  4. Facets

  5. Statistics

  6. Themes

The chart uses all six layers, making it an excellent pedagogical example.


4. Layer 1 — Data

ggplot(data = findex_data_sample, …

Data is the foundation.
Your sample replicates key Findex variables accurately.


5. Layer 2 — Aesthetic Mapping (aes)

aes(x = educ, y = count, fill = account)

Aesthetics define how data variables map to visual properties.

Here:

  • xx → Education level

  • yy → Proportion (after transformation)

  • Fill color → Account ownership

This mapping controls every visual decision.


6. Layer 3 — Geometric Objects

geom_bar(stat = "identity", position = "fill")

You choose a stacked proportional bar chart, ideal for showing:

  • Differences across education levels

  • Account ownership distribution

  • Comparative ratios

position = "fill" converts raw counts into percentages, improving interpretability.


7. Layer 4 — Statistical Transformations

Although no explicit statistical layer is added, ggplot automatically:

  • Normalizes values

  • Calculates proportions

  • Manages factor grouping

The choice of position = "fill" is itself a statistical decision.


8. Layer 5 — Faceting to Compare Groups

facet_wrap(~ fin11d)

Faceting creates separate mini-plaques:

  • Those who cite lack of money

  • Those who do not cite it

Faceting is powerful for:

  • Comparing demographic variations

  • Producing panelled reports

  • Showing interaction between variables

Your visualization clearly distinguishes financial inclusion patterns under different economic constraints.


9. Layer 6 — Scales & Themes

Scales

scale_fill_manual(values = c("tomato", "steelblue")) scale_y_continuous(labels = scales::percent)

Custom colors:

  • Improve readability

  • Make legends intuitive

  • Help align with brand guidelines (for academic/industry reports)

Theme

theme_minimal()

Provides:

  • Clean white background

  • Simple grid

  • Professional aesthetic


10. Labels as a Communication Tool

Your label block:

labs( title = "Account Ownership by Education Level…", y = "Proportion of Adults", x = "Education Level", fill = "Account Status" )

Enhances interpretability by making the plot self-contained.




11. Insights Produced by This Visualization

This plot can yield multiple analytical insights:

A. Education strongly correlates with account ownership

Higher education → Greater likelihood of having an account.

B. Financial barriers differ by literacy

Adults with lower education levels more frequently cite “lack of money” as the reason for not having an account.

C. Proportional representation matters

Raw counts hidden inside the proportions allow fairer comparisons.


12. Why This Plot Is an Excellent Analytics Project

The visualization demonstrates mastery of:

  • Factor manipulation

  • Layered graphics

  • Proportional bar charts

  • Faceting

  • Custom scales

  • Academic-quality output

This is an ideal inclusion in a data visualization portfolio, analytics coursework, or research report.


13. Extensions to Improve the Visualization

You can enhance this visualization by:

  • Adding confidence intervals

  • Ordering bars by education level numerically

  • Using position = "dodge" for side-by-side comparison

  • Applying interactive versions using plotly


14. Conclusion

This ggplot2 visualization stands as a clear, rigorous, well-theorized demonstration of the Layered Grammar of Graphics.
It transforms complex financial inclusion data into an intuitive story using:

  • Aesthetics

  • Statistical logic

  • Visual clarity

  • Clean design


This blog presents key insights from our project report for the ‘Data Visualization and Communication’ course (MBA 2024–26, 5th trimester) at Amrita School of Business, Coimbatore, under the guidance of Dr. Prashobhan Palakkel. 


Comments

Popular posts from this blog

Automating Trash Sorting with AI: Building a CNN Model to Classify Waste

Unlocking Hidden Patterns: A Practical Guide to Customer Segmentation and Market Basket Analysis

Peeking into the Future of Weather: Forecasting with Neural Networks in Python