performance - Quicker way of finding duplicates in SQL Server -
i'm trying find better way of finding duplicates in sql server. took on 20 minutes run on 300 million records before results started showing in results window within ssms. 22 minutes elapsed before crashed.
then ssms threw error after displaying 16,777,216 records:
an error occurred while executing batch. error message is: exception of type 'system.outofmemoryexception' thrown.
schema:
encounter_num - numeric(22,0) concept_cd - varchar(50) provider_id - varchar(50) start_date - datetime modifier_cd - varchar(100) instance_num - numeric(18,0) select row_number() on (order f1.[encounter_num],f1.[concept_cd],f1.[provider_id],f1.[start_date],f1.[modifier_cd],f1.[instance_num]), f1.[encounter_num], f1.[concept_cd], f1.[provider_id], f1.[start_date], f1.[modifier_cd], f1.[instance_num] [dbo].[i2b2_observation_fact] f1 inner join [dbo].[i2b2_observation_fact] f2 on f1.[encounter_num] = f2.[encounter_num] , f1.[concept_cd] = f2.[concept_cd] , f1.[provider_id] = f2.[provider_id] , f1.[start_date] = f2.[start_date] , f1.[modifier_cd] = f2.[modifier_cd] , f1.[instance_num] = f2.[instance_num]
not sure how faster is, worth try.
select count(*) dupes, f1.[encounter_num], f1.[concept_cd], f1.[provider_id], f1.[start_date], f1.[modifier_cd], f1.[instance_num] [dbo].[i2b2_observation_fact] f1 group f1.[encounter_num], f1.[concept_cd], f1.[provider_id], f1.[start_date], f1.[modifier_cd], f1.[instance_num] having count(*) > 1
Comments
Post a Comment