java - Has anyone written a Fuzzy date match to catch data entry errors? -


i wrote routine in pl/sql try , match dates there might typographical/data entry errors.

it works, see if has other/better ideas. routine not need in pl/sql, read many languages.

  function fuzzy_date_match(in_date_1 date, in_date_2 date) return number     month_1 number(2);     month_2 number(2);     day_1 number(2);     day_2 number(2);     year_1 number(4);     year_2 number(4);     match_score number(3) := 0;   begin     if trunc(in_date_1) = trunc(in_date_2)           match_score := 100;     else       if abs(trunc(in_date_1) - trunc(in_date_2)) < 2               match_score :=50;       else         month_1 := to_number(to_char(in_date_1,'mm'));         month_2 := to_number(to_char(in_date_2,'mm'));         if month_1 = month_2                   match_score := match_score + 15;         else           if (abs(month_1 - month_2) < 2) or              (to_number(substr(lpad(month_1,2,'0'),2,1)||substr(lpad(month_1,2,'0'),1,1)) = month_2)                       match_score := match_score + 7;           end if;         end if;         day_1 := to_number(to_char(in_date_1,'dd'));         day_2 := to_number(to_char(in_date_2,'dd'));         if day_1 = day_2                   match_score := match_score + 10;         else           if (abs(day_1 - day_2) < 2) or              (to_number(substr(lpad(day_1,2,'0'),2,1)||substr(lpad(day_1,2,'0'),1,1)) = day_2)                       match_score := match_score + 5;           end if;         end if;         year_1 := to_number(to_char(in_date_1,'yyyy'));         year_2 := to_number(to_char(in_date_2,'yyyy'));         if year_1 = year_2                   match_score := match_score + 25;         else           if (abs(year_1 - year_2) < 2) or              (to_number(substr(lpad(year_1,2,'0'),4,1)||substr(lpad(year_1,2,'0'),3,1)) = to_number(substr(to_char(year_2),3)))                       match_score := match_score + 12;           end if;         end if;       end if;     end if;     return match_score;   end fuzzy_date_match; 

the basic concept compare 2 dates , return value between 0 , 100 100 exact match , 0 no match. types of errors looking single digit errors, , transposition errors. assumption years have more weight months, in turn have more weight days.

i tried googling fuzzy date matching, answers typically deal distance between dates opposed data entry errors.

aall appreciated.

paul

no need implement yourself. take @ utl_match package standard part of oracle. here's quick summary:

function edit_distance(s1 in varchar2, s2 in varchar2)                        return pls_integer;   -- computes levenshtein distance between s1 , s2.  function jaro_winkler(s1 in varchar2, s2 in varchar2)                       return binary_double;   -- similar levenshtein distance, tries account mis-typings,   -- character swaps, etc.  function edit_distance_similarity(s1 in varchar2, s2 in varchar2)                                   return pls_integer;   -- similar levenshtein distance, returns integer 0 100   -- 0 means no similarity , 100 means strings identical.  function jaro_winkler_similarity(s1 in varchar2, s2 in varchar2)                                  return pls_integer;   -- similar above, based on jaro-winkler. 

here's quick example:

select utl_match.edit_distance('potato', 'tomato') lev,        utl_match.edit_distance_similarity('potato', 'tomato') lev_sim,        to_number(utl_match.jaro_winkler('potato', 'tomato')) jw,        utl_match.jaro_winkler_similarity('potato', 'tomato') jw_sim   dual; 

sounds me might able use jaro_winkler_similarity. convert both dates standard string representation (e.g. to_char(adate, 'dd/mm/yyyy hh24:mi:ss')) , compare them.

(incidentally - to_number applied results of jaro_winkler because oracle throws ora-031115 : unsupported network datatype or representation when jaro_winkler called because returns binary_double, oracle interface routines on windows platform can't seem deal with. why have type if can't use type? ??? :-)

share , enjoy.


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

android - send complex objects as post php java -

charts - What graph/dashboard product is facebook using in Dashboard: PUE & WUE -