The Task
This document is presents the report from analysis conducted on Primary Education for the Ministry of Education (MINEDUC) in Rwanda. This report mainly focuses on rural and urban areas in
source of the data: here
The code below illustrates on how to load packages needed for the analysis
#load package
library(tidyverse)
library(tidyr)
library(broom)
library(purrr)
Then we read in the data using the link to avoid wasting space and increase spead using tidyverse read_csv
data <- read_csv("https://raw.githubusercontent.com/vmandela99/laterite-interview/master/laterite_education_data.csv" )
Introduction
The first task would be to clean the names, check for missing values, undestand the column names, check the data types and, also make sure that the data is in tidy format.
The following code is for renaming the columns for the to have meaningful names
## rename the variables
names(data)
r_data <- data %>% rename(Sex = s1q1,
Age=s1q3y,
region_class = ur2012,
Father_alive=s1q13,
Mother_alive = s1q14,
health_prob=s3q4,
Grade_2012=s4aq6a ,
Grade_2013=s4aq6b ,
sch_attended_prev_yr=s4aq8,
prob_in_sch=s4aq9,
edu_expenses=s4aq11h, paid_edu_expenses_year_end=s4aq12 ,
sch_days_missed=s4aq14,
why_not_attending_sch=s4aq15,
why_leave_sch=s4aq17,
can_read=s4bq3,
can_write=s4bq4,
can_calculate=s4bq5,
farm_work=s6aq2)
Then we define the missing values as NAs then remove them from the two variables. But this is always advisable after you have inquired from other departments why the data is missing in the first place. Beware that some missing data can not just be deleted, instead there are a couple of imputing techniques that we discuss the upcoming blogs. For now we illustrate how to delete them.
data <- data %>%
na_if("") %>%
filter(!is.na(Grade_2012),!is.na(Grade_2013))
This report investigates the causes of repetiton in primary education. Rural area here mean that the setting of the school location has low standards of living status and low population to infrastructure ratio while urban is the opposite.
Descriptive analysis
Provinces
The report was mainly done in Rwanda where 5 provinces were considered. The provinces were Kigali city, Southern province, Western province, Northern Province and Eastern province. The table below summarises the percentage distribution of students from each province. Kigali city had 22.15 percent which was the highest numbers from a province in this study. However, the rest of the provinces had a nearly similar number with Southern province having the lowest number of students at 17.1 percent.
This is R code used to produce the table
table(r_data$province)->tabb
prop.table(tabb)*100->tabb1
tabb1%>% knitr::kable()
Var1 | Freq | |||
Kigali City | 22.15247 | |||
Southern Province | 17.10015 | |||
Western Province | 21.13602 | |||
Northern Province | 18.83408 | |||
Eastern Province | 20.77728 | |||
Districts
The report also looked at the following districts in Rwanda which are; Nyarugenge,Gasabo,Kicukiro,Nyanza,Gisagara,Nyaruguru,HuyeNyamagabe,Ruhango,Muhanga,Kamonyi,Karongi,RutsiroRubavu,Nyabihu,Ngororero,Rusizi,Nyamasheke,Rulindo,Gakenke,Musanze,Burera,Gicumbi,Rwamagana,Nyagatare,Gatsibo,Kayonza,Kirehe,Ngoma,Bugesera. The table below summarises the percentage distribution of students from each district. For this study, Gesabu had the highest number of student, 357 and Huye had the lowest number, which is 31 students.
Var1 | Freq | |||
Nyarugenge | 133 | |||
Gasabo | 357 | |||
Kicukiro | 251 | |||
Nyanza | 43 | |||
Gisagara | 122 | |||
Nyaruguru | 125 | |||
Huye | 31 | |||
Nyamagabe | 66 | |||
Ruhango | 82 | |||
Muhanga | 61 | |||
Kamonyi | 42 | |||
Karongi | 86 | |||
Rutsiro | 62 | |||
Rubavu | 72 | |||
Nyabihu | 110 | |||
Ngororero | 109 | |||
Rusizi | 92 | |||
Nyamasheke | 176 | |||
Rulindo | 75 | |||
Gakenke | 162 | |||
Musanze | 104 | |||
Burera | 179 | |||
Gicumbi | 110 | |||
Rwamagana | 58 | |||
Nyagatare | 127 | |||
Gatsibo | 88 | |||
Kayonza | 114 | |||
Kirehe | 106 | |||
Ngoma | 145 | |||
Bugesera | 57 | |||
Spread by region
The study divided region into four regions depending on the economic and development status. The regions considered in this study include;- Urban, rural, semi-urban and peri-urban regions. The table below show that the highest number was from the rural region having 75,5 percent of the number of students in this study. This shows that the researcher chose higher samples from the population from the rural set-up which might be suspected to have high turn over of repeating in grades.
Var1 | Freq | |||
Peri urban | 15.6950673 | |||
Urban | 8.0119581 | |||
Rural | 75.4559043 | |||
Semi urban | 0.8370703 | |||
Gender Distribution
The study tried to sample an equal number of students in respect to gender. This is shown by the table below where the ratio of famales to men was almost one to one.
Var1 | Freq | |||
Female | 50.9417 | |||
Male | 49.0583 | |||
Analysis
Repetition within grades in Primary Education
This analysis shows the findings of how repetition in grades in primary school varies across grades in school. It can be seen that at the time of study, apart from primary 1 having the highest number of students, 39.5 percent of the 686 pupils in that class had actually repeated the same grade from 2012. The other classes with the highest repetition rate are primary 2 (23.6 percent of 470), primary 5 (22.3 percent of 260), primary 3 (16.8 percent of 392) and primary 4 (16.7 percent of 305). It is also worthy noting that from post primary 1 to post primary 5 there was no cases of repetiton from 2012.
The R code of producing this is
## how grade repetition varies by grade in Primary Education
comparis <- data %>% filter(!(Grade_2012%in%c("Not in class")))
table(comparis$Grade_2012,comparis$repeated)->comparison_repetition_in_classes_2012
prop.table(comparison_repetition_in_classes_2012,1)*100->tabwew
tabwew %>% knitr::kable()
ggplot(comparis, aes(x=repeated))+ geom_bar(position = "dodge")+facet_wrap(~Grade_2012)
ggplot(comparis, aes(x=Grade_2012,fill =repeated))+ geom_bar(position = "stack")+coord_flip()
FALSE | TRUE | |||||
Post primary 1 | 100.00000 | 0.000000 | ||||
Post primary 3 | 100.00000 | 0.000000 | ||||
Post primary 4 | 100.00000 | 0.000000 | ||||
Post primary 5 | 100.00000 | 0.000000 | ||||
Pre-primary | 95.88235 | 4.117647 | ||||
Primary 1 | 60.49563 | 39.504373 | ||||
Primary 2 | 76.38298 | 23.617021 | ||||
Primary 3 | 83.16327 | 16.836735 | ||||
Primary 4 | 83.27869 | 16.721311 | ||||
Primary 5 | 77.69231 | 22.307692 | ||||
Primary 6,7,8 | 93.19728 | 6.802721 | ||||
Secondary 1 | 97.67442 | 2.325581 | ||||
Secondary 2 | 90.56604 | 9.433962 | ||||
Secondary 3 | 94.11765 | 5.882353 | ||||
Secondary 4 | 100.00000 | 0.000000 | ||||
Secondary 5 | 100.00000 | 0.000000 | ||||
Secondary 6 | 100.00000 | 0.000000 | ||||
Males equally likely to drop out as females.
The research also wanted to check which gender had a higher drop out rate. The results showed a comparisons which was not significate between the two genders (since t.test for which variance is same showes a p-value of 0.9765 using Welch two sample test, which is > 0.05). This shows that the two means of the genders were almost equal and therefore the conclusion would be that both male and female pupils had equal chances of dropping out from school.
Regression analysis
In with the aim of investigating the determinants contributing to increase in rate of repetition, the researcher opted to consider the following predictor variables;- the weight, age, whether the father or mother was alive or not, the health problems suffered in the last 4 weeks, grade attended in during 2012 and 2013, who paid for the student expenses for the last 12 months and the reason why they(pupils who missed) didnt attend school. The response variable would be repeating a grade in school which would be binary,where 1 would mean repeated is true and 0 if otherwise. A binary logistic regression model was used. The predictor with p-value that were less than 0.05 were reported as significant.
|
||||||||||||
|
region_class
|
|
term
|
|
estimate
|
|
std.error
|
|
p.value
|
|
p.adjusted
|
|
|
||||||||||||
|
Rural
|
|
Grade_2012Primary 5
|
|
2.6246517
|
|
0.1333536
|
|
0.0000000
|
|
0.0000000
|
|
|
||||||||||||
|
Rural
|
|
Grade_2012Primary 6,7,8
|
|
2.3926222
|
|
0.1510041
|
|
0.0000000
|
|
0.0000000
|
|
|
||||||||||||
|
Rural
|
|
Grade_2012Primary 4
|
|
2.2149811
|
|
0.1142086
|
|
0.0000000
|
|
0.0000000
|
|
|
||||||||||||
|
Rural
|
|
Grade_2012Primary 3
|
|
1.8104449
|
|
0.0960966
|
|
0.0000000
|
|
0.0000000
|
|
|
||||||||||||
|
Rural
|
|
Grade_2012Primary 2
|
|
1.3240467
|
|
0.0711579
|
|
0.0000000
|
|
0.0000000
|
|
|
||||||||||||
|
Rural
|
|
Grade_2012Primary 1
|
|
0.7385963
|
|
0.0481840
|
|
0.0000000
|
|
0.0000000
|
|
|
||||||||||||
|
Rural
|
|
prob_in_schMediocre teaching
|
|
-0.3153225
|
|
0.0858360
|
|
0.0002598
|
|
0.0220845
|
|
|
||||||||||||
|
Rural
|
|
Grade_2013Pre-primary
|
|
-0.5500117
|
|
0.1059252
|
|
0.0000003
|
|
0.0000259
|
|
|
||||||||||||
|
Rural
|
|
Grade_2013Primary 1
|
|
-0.7292078
|
|
0.0599060
|
|
0.0000000
|
|
0.0000000
|
|
|
||||||||||||
|
Rural
|
|
Grade_2013Secondary 4
|
|
-0.8776014
|
|
0.1657405
|
|
0.0000002
|
|
0.0000154
|
|
|
||||||||||||
|
Rural
|
|
Grade_2013Primary 2
|
|
-1.4516728
|
|
0.0739026
|
|
0.0000000
|
|
0.0000000
|
|
|
||||||||||||
|
Rural
|
|
Grade_2012Secondary 1
|
|
-1.9269528
|
|
0.4238364
|
|
0.0000066
|
|
0.0005706
|
|
|
||||||||||||
|
Rural
|
|
Grade_2013Primary 3
|
|
-2.0677477
|
|
0.0893855
|
|
0.0000000
|
|
0.0000000
|
|
|
||||||||||||
|
Rural
|
|
Grade_2013Primary 4
|
|
-2.5562536
|
|
0.1099310
|
|
0.0000000
|
|
0.0000000
|
|
|
||||||||||||
|
Rural
|
|
Grade_2013Primary 5
|
|
-2.9426288
|
|
0.1245526
|
|
0.0000000
|
|
0.0000000
|
|
|
||||||||||||
|
Rural
|
|
Grade_2013Secondary 1
|
|
-3.1527569
|
|
0.1589982
|
|
0.0000000
|
|
0.0000000
|
|
|
||||||||||||
|
Rural
|
|
Grade_2013Primary 6,7,8
|
|
-3.4075162
|
|
0.1413879
|
|
0.0000000
|
|
0.0000000
|
|
|
The table above show the findings of analysis from the rural area in rwanda where only problems experienced in school, grade in 2012 and 2013 were significant, we took . It shows that in 2012, pupils in primary 5,(6 to 8),4,3,2 and 1 were 13.73, 10.91, 9.12, 6.11, 3.74, 2.09 times more likely to repeat the same grade, in that order, than the students in post primary 1. Other hand, in 2013, pupils in pre-primary were 27 percent less likely to repeat the same grade as compared to those in post primary 1 in 2013. Further, those in primary 6, 7, 8 in 2013, were the least likely to repeat, i.e, 96.7 percent less likely to repeat as compared to post primary 1 in 2013.
|
||||||||||||
|
region_class
|
|
term
|
|
estimate
|
|
std.error
|
|
p.value
|
|
p.adjusted
|
|
|
||||||||||||
|
Urban
|
|
Grade_2012Primary 5
|
|
2.531815
|
|
0.3913698
|
|
0.0000000
|
|
0.0000019
|
|
|
||||||||||||
|
Urban
|
|
Grade_2012Primary 4
|
|
1.737260
|
|
0.3011179
|
|
0.0000003
|
|
0.0000270
|
|
|
||||||||||||
|
Urban
|
|
Grade_2012Primary 3
|
|
1.507446
|
|
0.2891581
|
|
0.0000024
|
|
0.0002122
|
|
|
||||||||||||
|
Urban
|
|
Grade_2012Primary 2
|
|
1.188648
|
|
0.2228084
|
|
0.0000015
|
|
0.0001380
|
|
|
||||||||||||
|
Urban
|
|
Grade_2012Primary 1
|
|
0.803433
|
|
0.1679609
|
|
0.0000116
|
|
0.0009979
|
|
|
||||||||||||
|
Urban
|
|
Grade_2013Secondary 1
|
|
-1.497037
|
|
0.4055512
|
|
0.0004833
|
|
0.0401101
|
|
|
||||||||||||
|
Urban
|
|
Grade_2013Secondary 5
|
|
-1.829716
|
|
0.4934564
|
|
0.0004582
|
|
0.0384918
|
|
|
||||||||||||
|
Urban
|
|
Grade_2013Primary 2
|
|
-1.922012
|
|
0.3655888
|
|
0.0000020
|
|
0.0001822
|
|
|
||||||||||||
|
Urban
|
|
Grade_2013Primary 3
|
|
-2.367172
|
|
0.3945070
|
|
0.0000001
|
|
0.0000115
|
|
|
||||||||||||
|
Urban
|
|
Grade_2013Primary 4
|
|
-2.724553
|
|
0.4261606
|
|
0.0000000
|
|
0.0000025
|
|
|
||||||||||||
|
Urban
|
|
Grade_2013Primary 5
|
|
-2.922021
|
|
0.4293049
|
|
0.0000000
|
|
0.0000005
|
|
|
||||||||||||
|
Urban
|
|
Grade_2013Primary 6,7,8
|
|
-3.963184
|
|
0.5191857
|
|
0.0000000
|
|
0.0000000
|
|
|
The table above as shows a snap short of the analysis of urban areas in Rwanda. As compared to the rural areas, the urban pupil were less likely to repeat, as shown in the table.
Strengths and weakness of the data
The advantages of this data is that;-
The variables were specific since it was mainly focused in the rural areas.
The data was reliable because it did not have many outliers in the variables.
The data was observational and designed to control for gender as a cofounding variable
The disadvantages of this data was that;-
- It had alot of missing values
- The variables were highly correlated as shown from some of the test that has been conducted.
- The data was collected not clean and needed some transformation.
- Some of the variable were not suitable to answer the main object of students repeating or passing.