Potential causes of repetition and dropout in Primary Education covering Primary 1 (P1) to Primary 6 case study: Rural and Urban areas(P6)

Visualization
Statistics
R
Analysis
Published

March 27, 2024

The Task

This document is presents the report from analysis conducted on Primary Education for the Ministry of Education (MINEDUC) in Rwanda. This report mainly focuses on rural and urban areas in

source of the data: here

The code below illustrates on how to load packages needed for the analysis

#load package
library(tidyverse)
library(tidyr)
library(broom)
library(purrr)

Then we read in the data using the link to avoid wasting space and increase spead using tidyverse read_csv

data <- read_csv("https://raw.githubusercontent.com/vmandela99/laterite-interview/master/laterite_education_data.csv" )

Introduction

The first task would be to clean the names, check for missing values, undestand the column names, check the data types and, also make sure that the data is in tidy format.

The following code is for renaming the columns for the to have meaningful names

## rename the variables
names(data)
r_data <- data %>% rename(Sex = s1q1, 
                          Age=s1q3y, 
                          region_class = ur2012,
                          Father_alive=s1q13,
                          Mother_alive = s1q14, 
                          health_prob=s3q4,
                          Grade_2012=s4aq6a ,
                          Grade_2013=s4aq6b , 
                          sch_attended_prev_yr=s4aq8, 
                          prob_in_sch=s4aq9,
                          edu_expenses=s4aq11h,                         paid_edu_expenses_year_end=s4aq12 , 
                          sch_days_missed=s4aq14, 
                          why_not_attending_sch=s4aq15,
                          why_leave_sch=s4aq17,
                          can_read=s4bq3,
                          can_write=s4bq4,
                          can_calculate=s4bq5,
                          farm_work=s6aq2)

Then we define the missing values as NAs then remove them from the two variables. But this is always advisable after you have inquired from other departments why the data is missing in the first place. Beware that some missing data can not just be deleted, instead there are a couple of imputing techniques that we discuss the upcoming blogs. For now we illustrate how to delete them.

data <- data %>% 
  na_if("") %>% 
  filter(!is.na(Grade_2012),!is.na(Grade_2013))

This report investigates the causes of repetiton in primary education. Rural area here mean that the setting of the school location has low standards of living status and low population to infrastructure ratio while urban is the opposite.

Descriptive analysis

Provinces

The report was mainly done in Rwanda where 5 provinces were considered. The provinces were Kigali city, Southern province, Western province, Northern Province and Eastern province. The table below summarises the percentage distribution of students from each province. Kigali city had 22.15 percent which was the highest numbers from a province in this study. However, the rest of the provinces had a nearly similar number with Southern province having the lowest number of students at 17.1 percent.

This is R code used to produce the table

table(r_data$province)->tabb
prop.table(tabb)*100->tabb1
tabb1%>% knitr::kable()
Var1 Freq
Kigali City 22.15247
Southern Province 17.10015
Western Province 21.13602
Northern Province 18.83408
Eastern Province 20.77728

Districts

The report also looked at the following districts in Rwanda which are; Nyarugenge,Gasabo,Kicukiro,Nyanza,Gisagara,Nyaruguru,HuyeNyamagabe,Ruhango,Muhanga,Kamonyi,Karongi,RutsiroRubavu,Nyabihu,Ngororero,Rusizi,Nyamasheke,Rulindo,Gakenke,Musanze,Burera,Gicumbi,Rwamagana,Nyagatare,Gatsibo,Kayonza,Kirehe,Ngoma,Bugesera. The table below summarises the percentage distribution of students from each district. For this study, Gesabu had the highest number of student, 357 and Huye had the lowest number, which is 31 students.

Var1 Freq
Nyarugenge 133
Gasabo 357
Kicukiro 251
Nyanza 43
Gisagara 122
Nyaruguru 125
Huye 31
Nyamagabe 66
Ruhango 82
Muhanga 61
Kamonyi 42
Karongi 86
Rutsiro 62
Rubavu 72
Nyabihu 110
Ngororero 109
Rusizi 92
Nyamasheke 176
Rulindo 75
Gakenke 162
Musanze 104
Burera 179
Gicumbi 110
Rwamagana 58
Nyagatare 127
Gatsibo 88
Kayonza 114
Kirehe 106
Ngoma 145
Bugesera 57

Spread by region

The study divided region into four regions depending on the economic and development status. The regions considered in this study include;- Urban, rural, semi-urban and peri-urban regions. The table below show that the highest number was from the rural region having 75,5 percent of the number of students in this study. This shows that the researcher chose higher samples from the population from the rural set-up which might be suspected to have high turn over of repeating in grades.

Var1 Freq
Peri urban 15.6950673
Urban 8.0119581
Rural 75.4559043
Semi urban 0.8370703

Gender Distribution

The study tried to sample an equal number of students in respect to gender. This is shown by the table below where the ratio of famales to men was almost one to one.

Var1 Freq
Female 50.9417
Male 49.0583

Analysis

Repetition within grades in Primary Education

This analysis shows the findings of how repetition in grades in primary school varies across grades in school. It can be seen that at the time of study, apart from primary 1 having the highest number of students, 39.5 percent of the 686 pupils in that class had actually repeated the same grade from 2012. The other classes with the highest repetition rate are primary 2 (23.6 percent of 470), primary 5 (22.3 percent of 260), primary 3 (16.8 percent of 392) and primary 4 (16.7 percent of 305). It is also worthy noting that from post primary 1 to post primary 5 there was no cases of repetiton from 2012.

The R code of producing this is

## how grade repetition varies by grade in Primary Education 
comparis <- data %>% filter(!(Grade_2012%in%c("Not in class")))
table(comparis$Grade_2012,comparis$repeated)->comparison_repetition_in_classes_2012
prop.table(comparison_repetition_in_classes_2012,1)*100->tabwew
tabwew %>% knitr::kable()

ggplot(comparis, aes(x=repeated))+ geom_bar(position = "dodge")+facet_wrap(~Grade_2012)
ggplot(comparis, aes(x=Grade_2012,fill =repeated))+ geom_bar(position = "stack")+coord_flip()
FALSE TRUE
Post primary 1 100.00000 0.000000
Post primary 3 100.00000 0.000000
Post primary 4 100.00000 0.000000
Post primary 5 100.00000 0.000000
Pre-primary 95.88235 4.117647
Primary 1 60.49563 39.504373
Primary 2 76.38298 23.617021
Primary 3 83.16327 16.836735
Primary 4 83.27869 16.721311
Primary 5 77.69231 22.307692
Primary 6,7,8 93.19728 6.802721
Secondary 1 97.67442 2.325581
Secondary 2 90.56604 9.433962
Secondary 3 94.11765 5.882353
Secondary 4 100.00000 0.000000
Secondary 5 100.00000 0.000000
Secondary 6 100.00000 0.000000

Males equally likely to drop out as females.

The research also wanted to check which gender had a higher drop out rate. The results showed a comparisons which was not significate between the two genders (since t.test for which variance is same showes a p-value of 0.9765 using Welch two sample test, which is > 0.05). This shows that the two means of the genders were almost equal and therefore the conclusion would be that both male and female pupils had equal chances of dropping out from school.

Regression analysis

In with the aim of investigating the determinants contributing to increase in rate of repetition, the researcher opted to consider the following predictor variables;- the weight, age, whether the father or mother was alive or not, the health problems suffered in the last 4 weeks, grade attended in during 2012 and 2013, who paid for the student expenses for the last 12 months and the reason why they(pupils who missed) didnt attend school. The response variable would be repeating a grade in school which would be binary,where 1 would mean repeated is true and 0 if otherwise. A binary logistic regression model was used. The predictor with p-value that were less than 0.05 were reported as significant.

region_class

term

estimate

std.error

p.value

p.adjusted

Rural

Grade_2012Primary 5

2.6246517

0.1333536

0.0000000

0.0000000

Rural

Grade_2012Primary 6,7,8

2.3926222

0.1510041

0.0000000

0.0000000

Rural

Grade_2012Primary 4

2.2149811

0.1142086

0.0000000

0.0000000

Rural

Grade_2012Primary 3

1.8104449

0.0960966

0.0000000

0.0000000

Rural

Grade_2012Primary 2

1.3240467

0.0711579

0.0000000

0.0000000

Rural

Grade_2012Primary 1

0.7385963

0.0481840

0.0000000

0.0000000

Rural

prob_in_schMediocre teaching

-0.3153225

0.0858360

0.0002598

0.0220845

Rural

Grade_2013Pre-primary

-0.5500117

0.1059252

0.0000003

0.0000259

Rural

Grade_2013Primary 1

-0.7292078

0.0599060

0.0000000

0.0000000

Rural

Grade_2013Secondary 4

-0.8776014

0.1657405

0.0000002

0.0000154

Rural

Grade_2013Primary 2

-1.4516728

0.0739026

0.0000000

0.0000000

Rural

Grade_2012Secondary 1

-1.9269528

0.4238364

0.0000066

0.0005706

Rural

Grade_2013Primary 3

-2.0677477

0.0893855

0.0000000

0.0000000

Rural

Grade_2013Primary 4

-2.5562536

0.1099310

0.0000000

0.0000000

Rural

Grade_2013Primary 5

-2.9426288

0.1245526

0.0000000

0.0000000

Rural

Grade_2013Secondary 1

-3.1527569

0.1589982

0.0000000

0.0000000

Rural

Grade_2013Primary 6,7,8

-3.4075162

0.1413879

0.0000000

0.0000000

The table above show the findings of analysis from the rural area in rwanda where only problems experienced in school, grade in 2012 and 2013 were significant, we took . It shows that in 2012, pupils in primary 5,(6 to 8),4,3,2 and 1 were 13.73, 10.91, 9.12, 6.11, 3.74, 2.09 times more likely to repeat the same grade, in that order, than the students in post primary 1. Other hand, in 2013, pupils in pre-primary were 27 percent less likely to repeat the same grade as compared to those in post primary 1 in 2013. Further, those in primary 6, 7, 8 in 2013, were the least likely to repeat, i.e, 96.7 percent less likely to repeat as compared to post primary 1 in 2013.

region_class

term

estimate

std.error

p.value

p.adjusted

Urban

Grade_2012Primary 5

2.531815

0.3913698

0.0000000

0.0000019

Urban

Grade_2012Primary 4

1.737260

0.3011179

0.0000003

0.0000270

Urban

Grade_2012Primary 3

1.507446

0.2891581

0.0000024

0.0002122

Urban

Grade_2012Primary 2

1.188648

0.2228084

0.0000015

0.0001380

Urban

Grade_2012Primary 1

0.803433

0.1679609

0.0000116

0.0009979

Urban

Grade_2013Secondary 1

-1.497037

0.4055512

0.0004833

0.0401101

Urban

Grade_2013Secondary 5

-1.829716

0.4934564

0.0004582

0.0384918

Urban

Grade_2013Primary 2

-1.922012

0.3655888

0.0000020

0.0001822

Urban

Grade_2013Primary 3

-2.367172

0.3945070

0.0000001

0.0000115

Urban

Grade_2013Primary 4

-2.724553

0.4261606

0.0000000

0.0000025

Urban

Grade_2013Primary 5

-2.922021

0.4293049

0.0000000

0.0000005

Urban

Grade_2013Primary 6,7,8

-3.963184

0.5191857

0.0000000

0.0000000

The table above as shows a snap short of the analysis of urban areas in Rwanda. As compared to the rural areas, the urban pupil were less likely to repeat, as shown in the table.

Strengths and weakness of the data

The advantages of this data is that;-

  1. The variables were specific since it was mainly focused in the rural areas.

  2. The data was reliable because it did not have many outliers in the variables.

  3. The data was observational and designed to control for gender as a cofounding variable

The disadvantages of this data was that;-

  1. It had alot of missing values
  1. The variables were highly correlated as shown from some of the test that has been conducted.
  2. The data was collected not clean and needed some transformation.
  3. Some of the variable were not suitable to answer the main object of students repeating or passing.