1 This definition covers the process of taking the pre-match files generated from ILR data and creating an output set of fuzzy match files that can be combined to get a picture of how a learner’s data progresses across years. These files will then help in improving the tracking of learners between years and in this way provide a more accurate new measures success rates.
Purpose
2 Learner outcome data used by the LSC as part of Performance Review and the data used by ALI during inspections focuses on success rates extracted from the individualised student record (ISR) and individualised learner record (ILR).
3 By attempting a thorough matching process the aim of the WBL new measure success rates is to provide a more reliable, accurate and consistent way of measuring the quality of WBL provision.
4 Learner numbers at each provider should be consistently recorded and each learner at the provider should only have one learner number. However, for various reasons this is not always the case and so the matching process attempts to use information such as National Insurance numbers and dates of birth to find cases where learner numbers have changed.
Relevant Collections
5 The method is run on data derived from the most recent freeze for each year, allowing for success rates to be calculated up until the most recent month.
Source Data
6 The method uses various other files to aid in collating the information it needs to produce for the fuzzy match.
Derived Variables and Output Datasets
7 The method produces the following datasets
Detailed definition
8 This methodology follows on from preceding methodologies, which are carried out in the following order:
9 The methodology is followed by these methodologies:
10 The following intermediate matching approaches are taken in order to produce the output files:
Match 1
11 Each of the Pre-match Files is matched to the previous year using provider number (L01), National Insurance number (L26) and programme type (A15). Any matched cases where the learner number (L03) has changed are then kept.
12 Each file will contain the learner numbers from the two different files so we can ascertain if they have changed. For example, in the 2001/02 to 2002/03 match there will be L03a and L03b variables showing each year’s respective learner numbers.
13 The Match 1 files are saved for later use in this methodology.
Match 2
14 Similarly to Match 1, the Pre-match Files are matched to the previous year’s file using L01, A15, matchv and date of birth (L11). Any matched cases where the learner number (L03) has changed are then kept.
15 Learner numbers are recorded in the same way as for Match 1.
16 The Match 2 files are saved for later use in this methodology.
Merge Match 1 and Match 2
17 There will now be one match file in each set to cross each year gap (2001/02 to 2002/03, 2002/03 to 2003/04 etc.). The Match 1 file from each year gap is combined with the Match 2 file from each year gap and if both files have matched the same learner then Match 1 is preferred.
18 The files resulting from this are then saved for later use in this methodology. They are called the Match files.
Clean Matches
19 The next step is to create lists of learners who match cleanly from year to year. This is done by doing the same matching process as in Match 1 and Match 2 but using L01, L03, A15 and matchv to match files. All matches are kept and the file is saved.
20 The clean match files are then matched against the merged files from above. By removing any cases that we know were cleanly matched, we create a lookup of learners that we know to have changed L03 between years.
21 These lookups are saved as the set of Match Final Files.
Match Final Files
22 The match final files are combined in order to produce a single lookup that shows how a learner’s number has changed between years.
23 This is achieved by matching the later learner number in each file with the earlier number in the next year’s file. For example, the 2001/02 to 2002/03 and 2002/03 to 2003/04 files can be matched to each other on the L03b variable, which is the value of L03 in the 2002/03 ILR.
24 The L03_match variable is then created. It is defined as the most recent L03 that we have found for that learner.
25 A lookup file is created such that all earlier L03 values for a learner are paired against the L03_match.
Fuzzy Matching
26 Each of the pre-match files is then loaded and compared against the lookup table created in the paragraph above. If a more recent and different L03 has been found then the original value in the pre-match file is replaced with the value from L03_match.
27 The files are saved as the Fuzzy Match Files and as a result of this process the learners will have far more consistent recording of L03 across different years. The Fuzzy Match Files can then be used to create the WBL Success Rates Master File.
Sample Code
28 The following sample code is available