Blogs and Articles - Potential causes of repetition and dropout in Primary Education covering Primary 1 (P1) to Primary 6 case study: Rural and Urban areas(P6)

The Task

This document is presents the report from analysis conducted on Primary Education for the Ministry of Education (MINEDUC) in Rwanda. This report mainly focuses on rural and urban areas in

source of the data: here

The code below illustrates on how to load packages needed for the analysis

#load package
library(tidyverse)
library(tidyr)
library(broom)
library(purrr)

Then we read in the data using the link to avoid wasting space and increase spead using tidyverse read_csv

data <- read_csv("https://raw.githubusercontent.com/vmandela99/laterite-interview/master/laterite_education_data.csv" )

Introduction

The first task would be to clean the names, check for missing values, undestand the column names, check the data types and, also make sure that the data is in tidy format.

The following code is for renaming the columns for the to have meaningful names

## rename the variables
names(data)
r_data <- data %>% rename(Sex = s1q1, 
                          Age=s1q3y, 
                          region_class = ur2012,
                          Father_alive=s1q13,
                          Mother_alive = s1q14, 
                          health_prob=s3q4,
                          Grade_2012=s4aq6a ,
                          Grade_2013=s4aq6b , 
                          sch_attended_prev_yr=s4aq8, 
                          prob_in_sch=s4aq9,
                          edu_expenses=s4aq11h,                         paid_edu_expenses_year_end=s4aq12 , 
                          sch_days_missed=s4aq14, 
                          why_not_attending_sch=s4aq15,
                          why_leave_sch=s4aq17,
                          can_read=s4bq3,
                          can_write=s4bq4,
                          can_calculate=s4bq5,
                          farm_work=s6aq2)

Then we define the missing values as NAs then remove them from the two variables. But this is always advisable after you have inquired from other departments why the data is missing in the first place. Beware that some missing data can not just be deleted, instead there are a couple of imputing techniques that we discuss the upcoming blogs. For now we illustrate how to delete them.

data <- data %>% 
  na_if("") %>% 
  filter(!is.na(Grade_2012),!is.na(Grade_2013))

This report investigates the causes of repetiton in primary education. Rural area here mean that the setting of the school location has low standards of living status and low population to infrastructure ratio while urban is the opposite.

Descriptive analysis

Provinces

The report was mainly done in Rwanda where 5 provinces were considered. The provinces were Kigali city, Southern province, Western province, Northern Province and Eastern province. The table below summarises the percentage distribution of students from each province. Kigali city had 22.15 percent which was the highest numbers from a province in this study. However, the rest of the provinces had a nearly similar number with Southern province having the lowest number of students at 17.1 percent.

This is R code used to produce the table

table(r_data$province)->tabb
prop.table(tabb)*100->tabb1
tabb1%>% knitr::kable()


	Var1		Freq

	Kigali City		22.15247

	Southern Province		17.10015

	Western Province		21.13602

	Northern Province		18.83408

	Eastern Province		20.77728

Districts

The report also looked at the following districts in Rwanda which are; Nyarugenge,Gasabo,Kicukiro,Nyanza,Gisagara,Nyaruguru,HuyeNyamagabe,Ruhango,Muhanga,Kamonyi,Karongi,RutsiroRubavu,Nyabihu,Ngororero,Rusizi,Nyamasheke,Rulindo,Gakenke,Musanze,Burera,Gicumbi,Rwamagana,Nyagatare,Gatsibo,Kayonza,Kirehe,Ngoma,Bugesera. The table below summarises the percentage distribution of students from each district. For this study, Gesabu had the highest number of student, 357 and Huye had the lowest number, which is 31 students.


	Var1		Freq

	Nyarugenge		133

	Gasabo		357

	Kicukiro		251

	Nyanza		43

	Gisagara		122

	Nyaruguru		125

	Huye		31

	Nyamagabe		66

	Ruhango		82

	Muhanga		61

	Kamonyi		42

	Karongi		86

	Rutsiro		62

	Rubavu		72

	Nyabihu		110

	Ngororero		109

	Rusizi		92

	Nyamasheke		176

	Rulindo		75

	Gakenke		162

	Musanze		104

	Burera		179

	Gicumbi		110

	Rwamagana		58

	Nyagatare		127

	Gatsibo		88

	Kayonza		114

	Kirehe		106

	Ngoma		145

	Bugesera		57

Spread by region

The study divided region into four regions depending on the economic and development status. The regions considered in this study include;- Urban, rural, semi-urban and peri-urban regions. The table below show that the highest number was from the rural region having 75,5 percent of the number of students in this study. This shows that the researcher chose higher samples from the population from the rural set-up which might be suspected to have high turn over of repeating in grades.


	Var1		Freq

	Peri urban		15.6950673

	Urban		8.0119581

	Rural		75.4559043

	Semi urban		0.8370703

Gender Distribution

The study tried to sample an equal number of students in respect to gender. This is shown by the table below where the ratio of famales to men was almost one to one.


	Var1		Freq

	Female		50.9417

	Male		49.0583

Analysis

Repetition within grades in Primary Education

This analysis shows the findings of how repetition in grades in primary school varies across grades in school. It can be seen that at the time of study, apart from primary 1 having the highest number of students, 39.5 percent of the 686 pupils in that class had actually repeated the same grade from 2012. The other classes with the highest repetition rate are primary 2 (23.6 percent of 470), primary 5 (22.3 percent of 260), primary 3 (16.8 percent of 392) and primary 4 (16.7 percent of 305). It is also worthy noting that from post primary 1 to post primary 5 there was no cases of repetiton from 2012.

The R code of producing this is

## how grade repetition varies by grade in Primary Education 
comparis <- data %>% filter(!(Grade_2012%in%c("Not in class")))
table(comparis$Grade_2012,comparis$repeated)->comparison_repetition_in_classes_2012
prop.table(comparison_repetition_in_classes_2012,1)*100->tabwew
tabwew %>% knitr::kable()

ggplot(comparis, aes(x=repeated))+ geom_bar(position = "dodge")+facet_wrap(~Grade_2012)
ggplot(comparis, aes(x=Grade_2012,fill =repeated))+ geom_bar(position = "stack")+coord_flip()


	FALSE	TRUE

Post primary 1	100.00000	0.000000

Post primary 3	100.00000	0.000000

Post primary 4	100.00000	0.000000

Post primary 5	100.00000	0.000000

Pre-primary	95.88235	4.117647

Primary 1	60.49563	39.504373

Primary 2	76.38298	23.617021

Primary 3	83.16327	16.836735

Primary 4	83.27869	16.721311

Primary 5	77.69231	22.307692

Primary 6,7,8	93.19728	6.802721

Secondary 1	97.67442	2.325581

Secondary 2	90.56604	9.433962

Secondary 3	94.11765	5.882353

Secondary 4	100.00000	0.000000

Secondary 5	100.00000	0.000000

Secondary 6	100.00000	0.000000

Males equally likely to drop out as females.

The research also wanted to check which gender had a higher drop out rate. The results showed a comparisons which was not significate between the two genders (since t.test for which variance is same showes a p-value of 0.9765 using Welch two sample test, which is > 0.05). This shows that the two means of the genders were almost equal and therefore the conclusion would be that both male and female pupils had equal chances of dropping out from school.

Regression analysis

In with the aim of investigating the determinants contributing to increase in rate of repetition, the researcher opted to consider the following predictor variables;- the weight, age, whether the father or mother was alive or not, the health problems suffered in the last 4 weeks, grade attended in during 2012 and 2013, who paid for the student expenses for the last 12 months and the reason why they(pupils who missed) didnt attend school. The response variable would be repeating a grade in school which would be binary,where 1 would mean repeated is true and 0 if otherwise. A binary logistic regression model was used. The predictor with p-value that were less than 0.05 were reported as significant.


region_class	term	estimate	std.error	p.value	p.adjusted

Rural	Grade_2012Primary 5	2.6246517	0.1333536	0.0000000	0.0000000

Rural	Grade_2012Primary 6,7,8	2.3926222	0.1510041	0.0000000	0.0000000

Rural	Grade_2012Primary 4	2.2149811	0.1142086	0.0000000	0.0000000

Rural	Grade_2012Primary 3	1.8104449	0.0960966	0.0000000	0.0000000

Rural	Grade_2012Primary 2	1.3240467	0.0711579	0.0000000	0.0000000

Rural	Grade_2012Primary 1	0.7385963	0.0481840	0.0000000	0.0000000

Rural	prob_in_schMediocre teaching	-0.3153225	0.0858360	0.0002598	0.0220845

Rural	Grade_2013Pre-primary	-0.5500117	0.1059252	0.0000003	0.0000259

Rural	Grade_2013Primary 1	-0.7292078	0.0599060	0.0000000	0.0000000

Rural	Grade_2013Secondary 4	-0.8776014	0.1657405	0.0000002	0.0000154

Rural	Grade_2013Primary 2	-1.4516728	0.0739026	0.0000000	0.0000000

Rural	Grade_2012Secondary 1	-1.9269528	0.4238364	0.0000066	0.0005706

Rural	Grade_2013Primary 3	-2.0677477	0.0893855	0.0000000	0.0000000

Rural	Grade_2013Primary 4	-2.5562536	0.1099310	0.0000000	0.0000000

Rural	Grade_2013Primary 5	-2.9426288	0.1245526	0.0000000	0.0000000

Rural	Grade_2013Secondary 1	-3.1527569	0.1589982	0.0000000	0.0000000

Rural	Grade_2013Primary 6,7,8	-3.4075162	0.1413879	0.0000000	0.0000000

The table above show the findings of analysis from the rural area in rwanda where only problems experienced in school, grade in 2012 and 2013 were significant, we took . It shows that in 2012, pupils in primary 5,(6 to 8),4,3,2 and 1 were 13.73, 10.91, 9.12, 6.11, 3.74, 2.09 times more likely to repeat the same grade, in that order, than the students in post primary 1. Other hand, in 2013, pupils in pre-primary were 27 percent less likely to repeat the same grade as compared to those in post primary 1 in 2013. Further, those in primary 6, 7, 8 in 2013, were the least likely to repeat, i.e, 96.7 percent less likely to repeat as compared to post primary 1 in 2013.


region_class	term	estimate	std.error	p.value	p.adjusted

Urban	Grade_2012Primary 5	2.531815	0.3913698	0.0000000	0.0000019

Urban	Grade_2012Primary 4	1.737260	0.3011179	0.0000003	0.0000270

Urban	Grade_2012Primary 3	1.507446	0.2891581	0.0000024	0.0002122

Urban	Grade_2012Primary 2	1.188648	0.2228084	0.0000015	0.0001380

Urban	Grade_2012Primary 1	0.803433	0.1679609	0.0000116	0.0009979

Urban	Grade_2013Secondary 1	-1.497037	0.4055512	0.0004833	0.0401101

Urban	Grade_2013Secondary 5	-1.829716	0.4934564	0.0004582	0.0384918

Urban	Grade_2013Primary 2	-1.922012	0.3655888	0.0000020	0.0001822

Urban	Grade_2013Primary 3	-2.367172	0.3945070	0.0000001	0.0000115

Urban	Grade_2013Primary 4	-2.724553	0.4261606	0.0000000	0.0000025

Urban	Grade_2013Primary 5	-2.922021	0.4293049	0.0000000	0.0000005

Urban	Grade_2013Primary 6,7,8	-3.963184	0.5191857	0.0000000	0.0000000

The table above as shows a snap short of the analysis of urban areas in Rwanda. As compared to the rural areas, the urban pupil were less likely to repeat, as shown in the table.

Strengths and weakness of the data

The advantages of this data is that;-

The variables were specific since it was mainly focused in the rural areas.
The data was reliable because it did not have many outliers in the variables.
The data was observational and designed to control for gender as a cofounding variable

The disadvantages of this data was that;-

It had alot of missing values

The variables were highly correlated as shown from some of the test that has been conducted.
The data was collected not clean and needed some transformation.
Some of the variable were not suitable to answer the main object of students repeating or passing.