performance - Quicker way of finding duplicates in SQL Server -


i'm trying find better way of finding duplicates in sql server. took on 20 minutes run on 300 million records before results started showing in results window within ssms. 22 minutes elapsed before crashed.

then ssms threw error after displaying 16,777,216 records:

an error occurred while executing batch. error message is: exception of type 'system.outofmemoryexception' thrown. 

schema:

encounter_num - numeric(22,0) concept_cd - varchar(50) provider_id - varchar(50) start_date - datetime modifier_cd - varchar(100) instance_num - numeric(18,0)   select     row_number() on (order f1.[encounter_num],f1.[concept_cd],f1.[provider_id],f1.[start_date],f1.[modifier_cd],f1.[instance_num]),     f1.[encounter_num],      f1.[concept_cd],      f1.[provider_id],      f1.[start_date],      f1.[modifier_cd],      f1.[instance_num]     [dbo].[i2b2_observation_fact] f1     inner join [dbo].[i2b2_observation_fact] f2 on         f1.[encounter_num] = f2.[encounter_num]          , f1.[concept_cd] = f2.[concept_cd]         , f1.[provider_id] = f2.[provider_id]         , f1.[start_date] = f2.[start_date]         , f1.[modifier_cd] = f2.[modifier_cd]         , f1.[instance_num] = f2.[instance_num] 

not sure how faster is, worth try.

select     count(*) dupes,     f1.[encounter_num],      f1.[concept_cd],      f1.[provider_id],      f1.[start_date],      f1.[modifier_cd],      f1.[instance_num]     [dbo].[i2b2_observation_fact] f1 group     f1.[encounter_num],      f1.[concept_cd],      f1.[provider_id],      f1.[start_date],      f1.[modifier_cd],      f1.[instance_num] having     count(*) > 1 

Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

javascript - Clean way to programmatically use CSS transitions from JS? -

android - send complex objects as post php java -