[Scroll down for code, redacted to protect confidentiality.] Assistance with a SQL sampleDan.Haygood@wellsfargo.com Thu, Mar 18, 2010 at 2:08 PM To: JZ Cc: Dan@danielbrockman.com Dan Brockman said you were looking for some sample code--This is the easiest picking I had at hand, but I have more available. Dan’s SQL mastery is excellent, and he understands relational data extremely well, from both a mining and database creation point of view. The samples below are separated by “--==” comments, and I think they show his grasp of the appropriateness of join calculus, inner and outer joins, subqueries, and when one would want to use one over the other, either for performance, code structure, or readability, as well as basics like functional transforms, DDL, procedural elements, etc. Note that for our work, in many cases Dan chose to implement scripts that worked from template files he developed after approaching a solution with straight SQL. This gave us repeatable development of multiple queries for different sections of datasets, different source schema, and so forth. His creativity in even this aspect of our development saved us dozens, hundreds of hours in redundant effort. If you need more information, feel free to contact me, - Dan ________________________________ /* pkg-susp-[ac]-1 v:090817 From cust & acct info, get [ab] info */ select package_id, rsch_set, [J]_[O], [J], [J]_on_event, [R]_addr_code_missing_error * 2 + [R]_addr_code_missing_error as [R]_error, event_subj_[ab]_cd, [R]_[ae]_cd, [R]_[ab]_cd_2, [R]_[ab]_cd_5 , rs_[ac]_subj , rs_[ac]_[af] , rs_[ac]_cd_2 , rs_[ac]_cd_5 /* intermediate values in summary calc */ , coalesce(rs_[ac]_subj,0) rscs , coalesce(rs_[ac]_[af],0) rscn , coalesce(rs_[ac]_cd_2,0) rsc2 , coalesce(rs_[ac]_cd_5,0) rsc5 , (case when rscs > rscn then rscs else rscn end ) maxsn , (case when maxsn > rsc2 then maxsn else rsc2 end ) maxsn2 /* Summary [ab] risk */ , (case when maxsn2 > rsc5 then maxsn2 else rsc5 end ) [ac]_rs_a from ( select u.PACKAGE_ID as package_id, u.rsch_set as rsch_set, c.CUST_[O] as [J]_[O], -- es.CUSTOMER[F] c.CUST_NBR as [J], -- es.[i] 1 as [J]_on_event, es.[ab]_OF_ORIGIN_CODE as event_subj_[ab]_cd, c.[ae]_CD as [R]_[ae]_cd, -- c.INTL_TRN as [R]_intl_trn -- Seems like garbage, may not be INTernationaL, anyway. - DPH ca.ADDR_USE_CD as [R]_addr_use_cd, ca.ADDR_USE_OCC_NBR as [R]_addr_use_occ_nbr, ca.ADDR_USE_TYPE_IND as [R]_addr_use_type_ind, a.ADDR_[ad]_NBR as [R]_[ab]_nbr, -- Not important, NOT the ISO number. - DPH a.[ad]_CD_2 as [R]_[ab]_cd_2, a.[ad]_CD_5 as [R]_[ab]_cd_5, case when a.ADDR_[ad]_NBR <> 1 and (a.[ad]_CD_2 is null or a.[ad]_CD_2 = '' or a.[ad]_CD_2 = 'US') then 1 else 0 end as [R]_addr_code_missing_error, case when coalesce(a.[ad]_CD_2,'') = '' and coalesce(a.[ad]_CD_5,'') <> '' then 1 else 0 end as [R]_addr_code_5_no_2_error , ctrs.risk_score_with_override rs_[ac]_subj , ctrn.risk_score_with_override rs_[ac]_[af] , ctr2.risk_score_with_override rs_[ac]_cd_2 , ctr5.risk_score_with_override rs_[ac]_cd_5 from [A].sco_pkg_univ u left join ( ( [A].[D]_Event e inner join [A].[D]_Event_Subject es on es.EVENT_ID = e.EVENT_ID -- and -- es.CUSTOMER[F] is not null -- and -- es.[i] is not null ) left join [R].CUSTOMER_TB c on c.CUST_[O] = es.CUSTOMER[F] and c.CUST_NBR = es.[i] -- and -- c.CUST_[O] is not null -- and -- c.CUST_NBR is not null left join [R].CUST_X_ADDR_TB ca on ca.CUST_[O] = c.CUST_[O] and ca.CUST_ID = c.CUST_ID and ca.CUST_[Q] = c.CUST_[Q] left join [R].ADDRESS_TB a on a.ADDR_[O] = ca.ADDR_[O] and a.ADDR_[ad]_NBR = ca.ADDR_[ad]_NBR and a.ADDR_ZIP_CD_HASH = ca.ADDR_ZIP_CD_HASH and a.ADDR_STREET_ID = ca.ADDR_STREET_ID and a.ADDR_BLDG_NBR_HASH = ca.ADDR_BLDG_NBR_HASH and a.ADDR_APT_NBR_HASH = ca.ADDR_APT_NBR_HASH and a.ADDR_[Q] = ca.ADDR_[Q] ) on e.PACKAGE_ID = u.PACKAGE_ID and u.rsch_set = 8 left outer join [A].[ab]_Risk ctrs on 1=1 and es.[ab]_OF_ORIGIN_CODE is not null and es.[ab]_OF_ORIGIN_CODE = ctrs.hogan_[ab]_code left outer join [A].[ab]_Risk ctrn on 1=1 and c.[ae]_CD is not null and c.[ae]_CD = ctrn.hogan_[ab]_code left outer join [A].[ab]_Risk ctr2 on 1=1 and a.[ad]_CD_2 is not null and a.[ad]_CD_2 = ctr2.hogan_[ab]_code left outer join [A].[ab]_Risk ctr5 on 1=1 and a.[ad]_CD_5 is not null and a.[ad]_CD_5 = ctr5.hogan_[ab]_code where coalesce(es.[ab]_OF_ORIGIN_CODE, '') <> '' or coalesce(c.[ae]_CD,'') not in ('','US') or ( a.ADDR_[ad]_NBR is not null and a.ADDR_[ad]_NBR <> 1 -- Seems to be join default meanining 'US' ) or coalesce(a.[ad]_CD_2,'') not in ('','US') or coalesce(a.[ad]_CD_5,'') <> '' ) x ; --==--==--==--== /* ../Scripts/sco-elem-2-Agg.sql EM Scoring Rsch - Scoring elements 20090810-140718 */ /* months 200901 thru 200812 */ /* This file written by sco-elem-2.sh */ /* drop table [b]_USERWORK1.sco_agg_5_tmp ; /* */ /* pDates-1.sqk 090415 */ /* Note: Prefer using .../HRCS/SQL/sh/pdatz.pl to modify pDates-1.sqk */ /* psco01 v:090701 Parms for EM Scoring research */ /* trans-agg-1 v:090810 Extract aggregate txns for packages */ /* Create temp table of agg txns for research set 5 */ create table [b]_USERWORK1.sco_agg_5_tmp as ( sel u.package_id , u.created_date /* pkg create date */ , sum(case when dx.tran_id is not null then 1 else 0 end ) csh_in_ct , sum(case when dx.tran_id is not null then dx.tran_am else 0 end ) csh_in_amt , sum(case when dx.tran_id is not null then 1 else 0 end ) csh_out_ct , sum(case when dx.tran_id is not null then dx.tran_am else 0 end ) csh_out_amt , sum(case when dx.tran_id is not null then 1 else 0 end ) chk_in_ct , sum(case when dx.tran_id is not null then dx.tran_am else 0 end ) chk_in_amt , sum(case when dx.tran_id is not null then 1 else 0 end ) chk_out_ct , sum(case when dx.tran_id is not null then dx.tran_am else 0 end ) chk_out_amt , sum(case when dx.tran_id is not null then 1 else 0 end ) wir_in_ct , sum(case when dx.tran_id is not null then dx.tran_am else 0 end ) wir_in_amt , sum(case when dx.tran_id is not null then 1 else 0 end ) wir_out_ct , sum(case when dx.tran_id is not null then dx.tran_am else 0 end ) wir_out_amt , sum(case when dx.tran_id is not null then 1 else 0 end ) oth_ct , sum(case when dx.tran_id is not null then dx.tran_am else 0 end ) oth_amt , (csh_in_ct + csh_out_ct + chk_in_ct + chk_out_ct + wir_in_ct + wir_out_ct + oth_ct) tot_ct , (csh_in_amt + csh_out_amt + chk_in_amt + chk_out_amt + wir_in_amt + wir_out_amt + oth_amt) tot_amt -- , di.tran_id -- , pa.[N][F] [H] -- , pa.[N]_product_code [G] -- , pa.adw_[N]_number ACCT_ID -- , di.tran_dt -- , di.tran_am -- , substr(di.tran_type_nm,1,1) FLG_DRCR /* '+' = credit */ -- , di.tran_type_nm -- , di.posted_tran_cd -- , di.high_risk_cd -- , ( case when [U] -- then 1 else 0 end ) FLG_STR -- /* log() is logarithm base 10 */ -- , ( case when cast (di.tran_am as integer) = di.tran_am -- then 1 else 0 end ) FLG_RND_DLR -- , ( case when di.Rule_Nr = 1 then 'CSH IN' -- when di.Rule_Nr = 2 then 'CSH OUT' -- when di.Rule_Nr = 3 then 'CHK IN' -- when di.Rule_Nr = 4 then 'CHK OUT' -- when di.Rule_Nr = 5 then 'WIR IN' -- when di.Rule_Nr = 6 then 'WIR OUT' -- else 'OTHER' end ) FLG_CCW /* CSH, CHK, WIR, OTH */ from [A].sco_pkg_univ u /* package universe */ inner join [b]_USERWORK1.[D]_PACKAGE_[N] pa on 1=1 /* List of all accts in pkg */ and u.package_id = pa.package_id and u.rsch_set = 5 /* limit packages to research set */ and NOT EXISTS ( sel cold.[H], cold.[G], cold.ACCT_ID from [A].FCA_COLD_ACCTS cold where 1=1 and pa.[N][F] = cold.[H] and pa.[N]_product_code = cold.[G] and pa.adw_[N]_number = cold.ACCT_ID ) left outer join (sel d1.tran_id , d1.[H] , d1.[G] , d1.ACCT_ID , d1.rule_nr , d1.tran_am , d1.tran_dt from ADWDM_[D]_v1.DLY_TRAN_ACCT_ITEM d1 where 1=1 and ( [T] ) ) dx on 1=1 and pa.[N][F] = dx.[H] and pa.[N]_product_code = dx.[G] and pa.adw_[N]_number = dx.ACCT_ID and (u.created_date - 120 ) <= dx.tran_dt and dx.tran_dt <= u.created_date --where 1=1 --and FLG_STR=1 --and FLG_RND_DLR=1 --and FLG_CCW in ('CSH IN', 'CSH OUT','CHK IN','CHK OUT') /* */ group by 1, 2/*, 3, 4, 5 , 6, 7, 8, 9, 10 , 11, 12, 13, 14, 15 /* */ ) with data primary index (package_id) ; collect statistics on [b]_USERWORK1.sco_agg_5_tmp index (package_id) ; grant all on [b]_USERWORK1.sco_agg_5_tmp to [aa] ; /* ------------------- */ select count(*), '[b]_USERWORK1.sco_agg_5_tmp' tbl from [b]_USERWORK1.sco_agg_5_tmp group by 2; select a.*, '[b]_USERWORK1.sco_agg_5_tmp' tbl from [b]_USERWORK1.sco_agg_5_tmp a sample 20; select Top 20 count(*) , a.package_id , a.created_date , '[b]_USERWORK1.sco_agg_5_tmp' tbl from [b]_USERWORK1.sco_agg_5_tmp a group by 2, 3, 4 order by 1 desc ; select * from ( sel count(distinct u.package_id) ct_pkg_agg_txn from [b]_USERWORK1.sco_agg_5_tmp s inner join [A].sco_pkg_univ u on 1=1 and u.rsch_set = 5 /* limit packages to research set */ and s.package_id = u.package_id ) su0 , ( sel count(distinct u.package_id) ct_pkg_[C]_agg_txn from [b]_USERWORK1.sco_agg_5_tmp s inner join [A].sco_pkg_univ u on 1=1 and u.rsch_set = 5 /* limit packages to research set */ and s.package_id = u.package_id and u.[C] = 1 ) su1 , ( sel count(distinct u.package_id) ct_pkg_rsch_set_5 from [A].sco_pkg_univ u where 1=1 and u.rsch_set = 5 /* limit packages to research set */ ) ua , ( sel count(distinct u.package_id) ct_pkg_[C]_rsch_set_5 from [A].sco_pkg_univ u where 1=1 and u.rsch_set = 5 /* limit packages to research set */ and u.[C] = 1 ) u1 ; /* ------------------- */ /* END ../Scripts/sco-elem-2-Agg.sql EM Scoring Rsch - Scoring elements 20090810-140725 */ --==--==--==--== /* ../Scripts/sco-elem-1-Agg.sql EM Scoring Rsch - Txns 20090708-104933 */ /* months 200901 thru 200812 */ /* This file written by sco-elem-1.sh */ drop table [b]_USERWORK1.sco_txn_8_tmp ; create table [b]_USERWORK1.sco_txn_8_tmp as ( sel u.package_id , u.created_date /* pkg create date */ , di.tran_id , pa.[N][F] [H] , pa.[N]_product_code [G] , pa.adw_[N]_number ACCT_ID , di.ORGNT_ACCT_NM , di.tran_dt , di.tran_am , substr(di.tran_type_nm,1,1) FLG_DRCR /* '+' = credit */ , di.tran_type_nm , di.posted_tran_cd , di.high_risk_cd , ( case when [U] then 1 else 0 end ) FLG_STR /* log() is logarithm base 10 */ , ( case when cast (di.tran_am as integer) = di.tran_am then 1 else 0 end ) FLG_RND_DLR , ( case when Rule_Nr = 1 then 'CSH IN' when Rule_Nr = 2 then 'CSH OUT' when Rule_Nr = 3 then 'CHK IN' when Rule_Nr = 4 then 'CHK OUT' when Rule_Nr = 5 then 'WIR IN' when Rule_Nr = 6 then 'WIR OUT' else 'OTHER' end ) FLG_CCW /* CSH, CHK, WIR, OTH */ from [A].sco_pkg_univ u /* package universe */ inner join [b]_USERWORK1.[D]_PACKAGE_[N] pa on 1=1 /* List of all accts in pkg */ and u.package_id = pa.package_id and u.rsch_set = 8 /* limit packages to research set */ inner join ADWDM_[D]_v1.DLY_TRAN_ACCT_ITEM di on 1=1 and pa.[N][F] = di.[H] and pa.[N]_product_code = di.[G] and pa.adw_[N]_number = di.ACCT_ID and pa.[N][F] is not null and pa.[N]_product_code is not null and pa.adw_[N]_number is not null where 1=1 and (u.created_date - 120 ) <= di.tran_dt and di.tran_dt <= u.created_date and FLG_STR=1 and FLG_RND_DLR=1 and FLG_CCW in ('CSH IN', 'CSH OUT','CHK IN','CHK OUT') /* */ group by 1, 2, 3, 4, 5 , 6, 7, 8, 9, 10 , 11, 12, 13, 14, 15 , 16 ) with data primary index (tran_id) ; collect statistics on [b]_USERWORK1.sco_txn_8_tmp index (tran_id) ; grant all on [b]_USERWORK1.sco_txn_8_tmp to [aa] /* ------------------- */ select count(*), '[b]_USERWORK1.sco_txn_8_tmp' tbl from [b]_USERWORK1.sco_txn_8_tmp group by 2; select a.*, '[b]_USERWORK1.sco_txn_8_tmp' tbl from [b]_USERWORK1.sco_txn_8_tmp a sample 20; select Top 20 count(*) , a.[H] , a.[G] , a.ACCT_ID , a.ORGNT_ACCT_NM , '[b]_USERWORK1.sco_txn_8_tmp' tbl from [b]_USERWORK1.sco_txn_8_tmp a group by 2, 3, 4, 5 order by 1 desc ; select * from ( sel count(distinct u.package_id) ct_pkg_rnd_dlr from [b]_USERWORK1.sco_txn_8_tmp s inner join [A].sco_pkg_univ u on 1=1 and u.rsch_set = 8 /* limit packages to research set */ and s.package_id = u.package_id ) su0 , ( sel count(distinct u.package_id) ct_pkg_[C]_rnd_dlr from [b]_USERWORK1.sco_txn_8_tmp s inner join [A].sco_pkg_univ u on 1=1 and u.rsch_set = 8 /* limit packages to research set */ and s.package_id = u.package_id and u.[C] = 1 ) su1 , ( sel count(distinct u.package_id) ct_pkg_univ from [A].sco_pkg_univ u where 1=1 and u.rsch_set = 8 /* limit packages to research set */ ) ua , ( sel count(distinct u.package_id) ct_pkg_univ_[C] from [A].sco_pkg_univ u where 1=1 and u.rsch_set = 8 /* limit packages to research set */ and u.[C] = 1 ) u1 , ( sel count(distinct s.tran_id) ct_txn_rnd_dlr from [b]_USERWORK1.sco_txn_8_tmp s ) s1 ; /* ------------------- */ /* END ../Scripts/sco-elem-1-Agg.sql EM Scoring Rsch - Txns 20090708-104933 */ --==--==--==--==--==--== /* ------------------- */ /* prior-[C]-1 v:090709 packages with cust or acct in prior [C] */ /* use parm file psco01.sqk */ /* to work around spool space constraints, /* we avoid outer joins by creating our /* table as the union of two inner-join queries. /* Note that some duplications can occur. /* */ create table [b]_USERWORK1.sco_prior[C]_8_tmp as ( select combo.* from ( /* in this subset, we find [N]s that appeared in previous [C]s */ sel p.package_id , p.case_id , c1.FILED_DATE , s2.case_id PRIOR_[C] , c2.FILED_DATE PRIOR_[C]_DT , (case when add_months(c1.FILED_DATE,-12) <= c2.FILED_DATE and c2.FILED_DATE < c1.FILED_DATE then 1 else 0 end ) AGO1YR , (case when add_months(c1.FILED_DATE,-24) <= c2.FILED_DATE and c2.FILED_DATE < add_months(c1.FILED_DATE,-12) then 1 else 0 end ) AGO2YR , s2.COMPANY_ID [H] /* cast ( as decimal(18,0)) */ , s2.PRODUCT_CODE [G] /* cast ( as varchar(3)) */ , s2.[N]_NUMBER ACCT_ID /* cast ( as varchar(23)) */ , cast ( NULL as decimal(15,0)) [J] /* - - - */ from [A].sco_pkg_univ p /* research package universe */ /* - - - */ join [b]_USERWORK1.[D]_CASE c1 on 1=1 /* cases table */ and p.case_id = c1.case_id and p.rsch_set = 8 /* - - - */ join [b]_USERWORK1.[D]_CASE_SUSACCT s1 on 1=1 /* susacct table */ and p.case_id = s1.case_id and s1.[C]_TOP_FOUR = 'Y' /* Y=[M] */ join [b]_USERWORK1.[D]_CASE_SUSACCT s2 on 1=1 /* susacct table */ and s2.[C]_TOP_FOUR = 'Y' /* Y=[M] */ and s2.COMPANY_ID=s1.COMPANY_ID /* same acct */ and s2.PRODUCT_CODE=s1.PRODUCT_CODE and s2.[N]_NUMBER=s1.[N]_NUMBER /* - - - */ join [b]_USERWORK1.[D]_CASE c2 on 1=1 and ( s2.case_id = c2.case_id ) and c2.CURR_STATE_ID = [E] /* [E] = [C] Closed */ and c2.FILED_DATE < c1.FILED_DATE /* PRIOR [C] */ and add_months(c1.FILED_DATE,-24) <= c2.FILED_DATE /* within 2 yrs */ ----------------------------------------------------------------- UNION ----------------------------------------------------------------- /* in this subset, we find Customers that appeared in previous [C]s */ sel p.package_id , p.case_id , c1.FILED_DATE , s4.case_id PRIOR_[C] , c2.FILED_DATE PRIOR_[C]_DT , (case when add_months(c1.FILED_DATE,-12) <= c2.FILED_DATE and c2.FILED_DATE < c1.FILED_DATE then 1 else 0 end ) AGO1YR , (case when add_months(c1.FILED_DATE,-24) <= c2.FILED_DATE and c2.FILED_DATE < add_months(c1.FILED_DATE,-12) then 1 else 0 end ) AGO2YR , cast ( NULL as decimal(18,0)) [H] , cast ( NULL as varchar(3)) [G] , cast ( NULL as varchar(23)) ACCT_ID , s4.[K] [J] /* cast ( as decimal(15,0)) */ /* - - - */ from [A].sco_pkg_univ p /* research package universe */ /* - - - */ join [b]_USERWORK1.[D]_CASE c1 on 1=1 /* cases table */ and p.case_id = c1.case_id and p.rsch_set = 8 /* - - - */ join [b]_USERWORK1.[D]_[L] s3 on 1=1 /* [M] table */ and p.case_id = s3.case_id and s3.[M]_TYPE_ID = 1 /* 1=[M] */ join [b]_USERWORK1.[D]_[L] s4 on 1=1 /* [M] table */ and s4.[M]_TYPE_ID = 1 /* 1=[M] */ and s4.[K]=s3.[K] /* same cust */ /* - - - */ join [b]_USERWORK1.[D]_CASE c2 on 1=1 and ( s4.case_id = c2.case_id ) and c2.CURR_STATE_ID = [E] /* [E] = [C] Closed */ and c2.FILED_DATE < c1.FILED_DATE /* PRIOR [C] */ and add_months(c1.FILED_DATE,-24) <= c2.FILED_DATE /* within 2 yrs */ ) combo group by 1, 2, 3, 4, 5 , 6, 7, 8, 9, 10 , 11 ) with data primary index (package_id,case_id,PRIOR_[C]) ; collect statistics on [b]_USERWORK1.sco_prior[C]_8_tmp index (package_id,case_id,PRIOR_[C]) ; grant all on [b]_USERWORK1.sco_prior[C]_8_tmp to [aa] /* - - - - - - - - */ sel count(*), '[b]_USERWORK1.sco_prior[C]_8_tmp' tbl from [b]_USERWORK1.sco_prior[C]_8_tmp group by 2 ; sel count(*) Unq_pairs from ( sel distinct package_id , case_id , FILED_DATE , PRIOR_[C] , PRIOR_[C]_DT from [b]_USERWORK1.sco_prior[C]_8_tmp ) pr ; sel count(*) , package_id , case_id , FILED_DATE , PRIOR_[C] , PRIOR_[C]_DT , '[b]_USERWORK1.sco_prior[C]_8_tmp' tbl from [b]_USERWORK1.sco_prior[C]_8_tmp group by 2, 3, 4, 5, 6, 7 order by 3 desc, 5 desc, 4, 6, 2 ; sel count(*) nnn , AGO1YR , AGO2YR from ( sel distinct * from [b]_USERWORK1.sco_prior[C]_8_tmp ) pr group by 2, 3 order by 2, 3 ; sel distinct * from [b]_USERWORK1.sco_prior[C]_8_tmp order by 2, 1, 4, 3, 5 ; /* ---------------------------------------------------------- */ /* END ../Scripts/sco-elem-1-Agg.sql EM Scoring Rsch - Txns 20090709-164037 */ Scheduled PTO: None Dan Haygood Financial Crime Analytics Corporate BSA Office | Wells Fargo & Co. Office: 480-575-3730 | Cell: 480-421-8468 | Fax: 480-575-3520 | MAC S4063-014 P Please consider the environment before printing this e-mail. "This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." From: danielbrockman@gmail.com [mailto:danielbrockman@gmail.com] On Behalf Of Daniel Brockman Sent: Tuesday, March 16, 2010 10:29 PM To: Haygood, Dan Subject: Assistance with a SQL sample Hi Dan... I hope your hand is healing nicely. What percent capacity is restored now, do you guess? […snip…] I interviewed with JZ, manager of the Targeting and Analytics Credit Card Customer Marketing group at Wells Fargo, and he asked for a sample of SQL code. I looked through my dusty files and found none. Have you perhaps retained some SQL that I produced? If so, would you send 4 or 5 pages (something showing a complicated join would be nice) to him at [...] and copy me? […snip…] DANIEL BROCKMAN 1-707-543-6814