Hypothesis is that the further into a term, the lower grades become. Could provide evidence and estimate into how score trends might change based on term length.

The dataset includes all of our warehoused daily grades.

, gh.state_id 
, gh.school_number 
, gh.course 
, gh."section" 
, gh.grade 
, gh.time_stored 
, gh.term 
, t.end_year
, t.calendar_name
, t.name AS term
, t.start_date
, t.end_date
, gh.date_stored - t.start_date AS daysIn
FROM gradesync.grade_history gh
INNER JOIN gradesync.term  t
ON gh.school_number = t.school_number
AND gh.term = t.name
AND gh.date_stored BETWEEN t.start_date AND t.end_date
WHERE gh.school_number = 1202056
ORDER BY time_stored DESC;

Load the data

gradeHistory <- read.csv("~/Downloads/gradeHistory.csv")

Summarize the data. Check to make sure we don’t need to clean anything up.

##  date_stored           state_id        school_number        course         
##  Length:1281910     Min.   :10010165   Min.   :1202056   Length:1281910    
##  Class :character   1st Qu.:30718317   1st Qu.:1202056   Class :character  
##  Mode  :character   Median :53867416   Median :1202056   Mode  :character  
##                     Mean   :54716676   Mean   :1202056                     
##                     3rd Qu.:78084032   3rd Qu.:1202056                     
##                     Max.   :99726475   Max.   :1202056                     
##     section            grade        time_stored            term          
##  Min.   :  1.000   Min.   :  0.00   Length:1281910     Length:1281910    
##  1st Qu.:  1.000   1st Qu.: 79.73   Class :character   Class :character  
##  Median :  2.000   Median : 90.85   Mode  :character   Mode  :character  
##  Mean   :  7.014   Mean   : 85.41                                        
##  3rd Qu.:  4.000   3rd Qu.: 96.96                                        
##  Max.   :500.000   Max.   :460.00                                        
##     end_year    calendar_name         term.1           start_date       
##  Min.   :2020   Length:1281910     Length:1281910     Length:1281910    
##  1st Qu.:2020   Class :character   Class :character   Class :character  
##  Median :2020   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :2020                                                           
##  3rd Qu.:2021                                                           
##  Max.   :2021                                                           
##    end_date             daysin      
##  Length:1281910     Min.   :  4.00  
##  Class :character   1st Qu.: 43.00  
##  Mode  :character   Median : 77.00  
##                     Mean   : 77.11  
##                     3rd Qu.:109.00  
##                     Max.   :158.00

Clean up records that have irregular scores (scores > 100). We will set those to be 100 to be uniform. We don’t want those bad grades to become outliers, but we can assumed that the grade is a 100%.

gradeHistory <- within(gradeHistory, grade[grade > 100] <- 100)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   79.73   90.85   85.36   96.96  100.00

Remove records that have more than the typical 120 daysIn.

## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##     filter, lag
## The following objects are masked from 'package:base':
##     intersect, setdiff, setequal, union
gradeHistory <- filter(gradeHistory, daysin <= 120)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.00   38.00   67.00   66.48   95.00  120.00

Aggregate our data by grouping by the days into a term.

mean_data <- group_by(gradeHistory, daysin)

Calculate the mean and standard deviation of grade by the days into a term.

summarizeddf <- summarize(mean_data
                          , mean = mean(grade, 
                                  sd = sd(grade, 
                          , n=120)
print(summarize(mean_data, mean = mean(grade, na.rm=TRUE), sd = sd(grade, na.rm=TRUE)), n=120)
## # A tibble: 117 x 3
##     daysin  mean    sd
##   *  <int> <dbl> <dbl>
##   1      4  92.7  20.3
##   2      5  92.7  20.3
##   3      6  92.3  20.9
##   4      7  89.6  24.2
##   5      8  89.4  23.3
##   6      9  88.7  23.2
##   7     10  89.3  21.7
##   8     11  90.7  18.2
##   9     12  90.6  18.2
##  10     13  90.7  18.1
##  11     14  90.2  18.6
##  12     15  87.1  22.7
##  13     16  87.8  20.7
##  14     17  88.2  20.0
##  15     18  87.7  20.0
##  16     19  87.7  19.9
##  17     20  87.8  19.7
##  18     21  87.0  20.7
##  19     22  87.0  21.3
##  20     23  87.0  20.6
##  21     24  87.3  20.2
##  22     25  87.2  20.0
##  23     26  86.2  21.8
##  24     27  87.2  20.0
##  25     28  87.1  19.9
##  26     29  87.1  18.7
##  27     30  87.2  18.4
##  28     31  86.6  19.3
##  29     32  86.6  18.7
##  30     33  86.6  18.6
##  31     34  86.6  18.7
##  32     35  86.5  18.6
##  33     36  86.1  18.6
##  34     37  86.1  18.6
##  35     38  85.8  18.5
##  36     39  86.1  18.2
##  37     40  86.1  18.2
##  38     41  86.1  18.2
##  39     42  85.7  18.5
##  40     43  85.6  18.4
##  41     44  85.5  18.2
##  42     45  85.6  18.0
##  43     46  85.5  18.0
##  44     47  85.4  17.9
##  45     48  85.4  17.9
##  46     49  85.5  17.8
##  47     50  85.2  17.5
##  48     51  85.3  17.5
##  49     52  85.4  17.3
##  50     53  85.4  17.2
##  51     54  85.3  17.2
##  52     55  85.3  17.2
##  53     56  85.2  17.2
##  54     57  85.2  17.2
##  55     58  84.9  17.3
##  56     59  84.8  17.2
##  57     60  84.8  17.1
##  58     61  84.8  17.1
##  59     62  84.8  17.1
##  60     63  84.3  17.3
##  61     64  86.2  14.4
##  62     65  84.3  17.1
##  63     66  84.4  17.0
##  64     67  84.5  16.8
##  65     68  84.5  16.8
##  66     69  84.5  16.9
##  67     70  84.2  17.1
##  68     71  84.1  17.0
##  69     72  84.0  17.0
##  70     73  84.0  17.0
##  71     74  84.0  17.0
##  72     75  84.0  16.9
##  73     76  84.0  16.9
##  74     77  84.2  17.0
##  75     78  84.2  16.9
##  76     79  84.0  17.0
##  77     80  84.1  16.9
##  78     81  84.0  16.9
##  79     82  84.1  16.8
##  80     83  84.1  16.9
##  81     84  84.2  16.8
##  82     85  85.4  14.3
##  83     86  84.3  16.8
##  84     87  84.3  16.7
##  85     88  84.4  16.5
##  86     89  84.4  16.6
##  87     90  84.4  16.5
##  88     91  84.2  16.6
##  89     92  84.2  16.5
##  90     93  84.2  16.5
##  91     94  84.2  16.5
##  92     95  84.2  16.5
##  93     96  84.2  16.5
##  94     97  84.1  16.6
##  95     98  84.0  16.7
##  96     99  84.0  16.6
##  97    100  84.1  16.6
##  98    101  84.0  16.6
##  99    102  84.1  16.6
## 100    103  84.1  16.6
## 101    104  84.1  16.6
## 102    105  84.1  16.6
## 103    106  84.1  16.6
## 104    107  84.1  16.5
## 105    108  84.0  16.6
## 106    109  84.0  16.5
## 107    110  84.0  16.6
## 108    111  83.9  16.6
## 109    112  83.8  16.6
## 110    113  83.0  17.9
## 111    114  83.9  16.5
## 112    115  83.0  17.8
## 113    116  83.0  17.8
## 114    117  83.8  16.5
## 115    118  83.8  16.5
## 116    119  83.7  16.5
## 117    120  82.7  17.5

Plot the data.

     , summarizeddf$mean
     , type="o"
     , xlab="Days from Term Start"
     , ylab="Average Course Grade")


At this point, there would be enough evidence to move forward with a prediction model based on possible term lengths being proposed. We could make the assumption that the shorter the term, the better grades will be at the end of the term.