1 The following SPSS code is to illustrate the WBL Success Rates Methodology - Fuzzy Data Matching. 2 This code illustrates one between year fuzzy match. This process is repeated for every between year gap. MATCH FILES /FILE='QSR_WBL_0203_prematch - resort.sav' /FILE='QSR_WBL_0102_prematch - resort.sav' /BY l01 L26 A15.
SELECT IF (sfl_0102 = 1 AND sfl_0203 = 1). SELECT IF (l03a <> l03b). SELECT IF (l26 <> ''). SORT CASES BY l01 l03a.
SAVE OUTFILE='0102 - 0203 Match 1.sav' /KEEP l01 l03a l03b a15 a26a a_aola a26b a_aolb /COMPRESSED. |
3 The process here is similar to Match 1, the only differences being the variables used to do the fuzzy matching and the additional criteria applied about start dates. The pre-match files have also been resorted to the different order required for this fuzzy match. MATCH FILES /FILE='QSR _WBL_0203_prematch - resort.sav' /FILE='QSR_WBL_0102_prematch - resort.sav' /BY L01 L11 A15 Matchv.
SELECT IF (sfl_0102 = 1 AND sfl_0203 = 1). SELECT IF (l03a <> l03b). SELECT IF (XDATE.QUARTER(a27a) = XDATE.QUARTER(a27b)). SELECT IF (XDATE.YEAR(a27a) = XDATE.YEAR(a27b)). SELECT IF (a27a < DATE.DMY(01,08,2002) AND a27b < DATE.DMY(01,08,2002)). SORT CASES BY l01 l03a.
SAVE OUTFILE='0102 - 0203 Match 2.sav' /KEEP l01 l03a l03b a15 a26a a_aola a26b a_aolb /COMPRESSED. |
4 The match files created above are then merged together so that there is one fuzzy match between each year. The data from match 1 is preferred to that from match 2. GET FILE='0102 - 0203 Match 1.sav'. COMPUTE Match = 1. ADD FILES /FILE=* /FILE='0102 - 0203 Match 2.sav'. IF (Match <> 1) Match = 2. SORT CASES BY l01 l03a Match. COMPUTE dupe = 0. IF (l01 = lag(l01,1) AND l03a = lag(l03a,1)) dupe = 1. EXE. SELECT IF dupe = 0. SAVE OUTFILE='0102 - 0203 Match.sav' /COMPRESSED. |
5 Learners that match cleanly between years are then removed as we know that they have not changed their L03. This leaves us with a lookup of all non-clean fuzzy matches we have made. MATCH FILES /FILE='QSR_WBL_0203_prematch - resort.sav' /FILE='QSR_WBL_0102_prematch - resort.sav' /BY L01 L03 A15 Matchv. SELECT IF (sfl_0102 = 1 AND sfl_0203 = 1). COMPUTE Clean_Match = 1. SAVE OUTFILE='0102 - 0203 Clean Match.sav' /KEEP l01 l03a Clean_Match /COMPRESSED. MATCH FILES /FILE='0102 - 0203 Clean Match.sav' /FILE='0102 - 0203 Match.sav' /BY L01 L03a. RECODE Clean_Match (SYSMIS = 0) (else = copy). SELECT IF NOT (Clean_Match = 1). EXE. SELECT IF(NOT(l01 = lag(l01,1) AND l03a = lag(l03a,1))). SORT CASES BY l01 l03b Match. SELECT IF(NOT(l01 = lag(l01,1) AND l03b = lag(l03b,1))). SORT CASES BY l01 l03a. SAVE OUTFILE ='0102 - 0203 Match final.sav' /DROP Clean_Match Match /COMPRESSED. |
6 This is the process of combining the final match files into a single lookup that can then be used to update the L03 values in the original data. GET FILE = '0102 - 0203 Match final.sav'. SORT CASES BY l01 l03b. MATCH FILES /FILE=* /FILE='0203 - 0304 Match final.sav' /BY L01 L03b. SORT CASES BY l01 l03c. MATCH FILES /FILE=* /FILE='0304 - 0405 Match final.sav' /BY L01 L03c. SORT CASES BY l01 l03d. MATCH FILES /FILE=* /FILE='0405 - 0506 Match final.sav' /BY L01 L03d. STRING l03_match (A12). DO IF (l03e <> ''). COMPUTE l03_match = l03e. ELSE IF (l03d <> ''). COMPUTE l03_match = l03d. ELSE IF (l03c <> ''). COMPUTE l03_match = l03c. ELSE IF (l03b <> ''). COMPUTE l03_match = l03b. END IF. STRING l03 (A12). IF (l03d <> '' AND l03e <> '') l03 = l03e. IF (l03c <> '' AND l03d <> '') l03 = l03c. IF (l03b <> '' AND l03c <> '') l03 = l03b. IF (l03a <> '' AND l03b <> '') l03 = l03a. SAVE OUTFILE ='0102 - 0506 Match final.sav' /KEEP l01 l03 l03_match l03a l03b l03c l03d l03e /COMPRESSED. SELECT IF (l03b <> '' AND l03c <> '' and l03 <> l03b). COMPUTE l03 = l03b. ADD FILES /FILE = * /FILE = '0102 - 0506 Match final.sav'. SAVE OUTFILE ='0102 - 0506 Match final.sav'. SELECT IF (l03c <> '' AND l03d <> '' and l03 <> l03c). COMPUTE l03 = l03c. ADD FILES /FILE = * /FILE = '0102 - 0506 Match final.sav'. SAVE OUTFILE ='0102 - 0506 Match final.sav'. SELECT IF (l03d <> '' AND l03e <> '' and l03 <> l03d). COMPUTE l03 = l03d. ADD FILES /FILE = * /FILE = '0102 - 0506 Match final.sav'. SAVE OUTFILE ='0102 - 0506 Match final.sav' /KEEP l01 l03 l03_match l03a l03b l03c l03d l03e. GET FILE = '0102 - 0506 Match final.sav'. SORT CASES BY l01 l03. COMPUTE dupe = 0. IF (l01 = lag(l01,1) and l03 = lag(l03,1)) dupe = 1. EXE. SELECT IF dupe = 0. SAVE OUTFILE ='0102 - 0506 Match final.sav' /KEEP l01 l03 l03_match l03a l03b l03c l03d l03e. |
7 For each year’s pre-match data, we update the L03 values to the most recent value we found through the fuzzy matching process. This ensures the L03 values will be consistent between years. MATCH FILES /FILE='QSR_WBL_0102_prematch.sav' /TABLE='0102 - 0506 Match final.sav' /BY L01 L03. IF (l03a <> l03_match AND l03_match <> '') l03 = l03_match. SORT CASES BY l01 l03 a15 matchv. SAVE OUTFILE ='QSR_WBL_0102_fuzzyl03.sav' /DROP l03_match l03b l03c l03d l03e /COMPRESSED. |
|