2012-04-02

Keep first

Green is good. In database console it is saying that the database is doing work with CPU. It is not waiting. It might be thou that the end user is waiting for the query to finnish. These problems are not to be dealt with buying more CPU. Maybe faster CPU. But more performance may be gained by not doing stuff.

drop table a cascade constraints;

create table a as 
select mod(level,5) b,level c,'foo'||level d from dual connect by level < 100000;

exec dbms_stats.gather_table_stats(user,'A');

select * from a where rownum <= 10;

1    1    foo1
2    2    foo2
3    3    foo3
4    4    foo4
0    5    foo5
1    6    foo6
2    7    foo7
3    8    foo8
4    9    foo9
0    10    foo10

A simple aggregation from the data:

select b,min(c) c
  from a
 group by b
;

Plan
SELECT STATEMENT  ALL_ROWSCost: 92  Bytes: 40  Cardinality: 5          
    2 HASH GROUP BY  Cost: 92  Bytes: 40  Cardinality: 5      
        1 TABLE ACCESS FULL TABLE RAFU.A Cost: 88  Bytes: 799.992  Cardinality: 99.999  

1    1
2    2
4    4
3    3
0    5

But when we want to have also d columns included in the result of aggregation we need to write sql a bit differently.

0    5    foo5
1    1    foo1
2    2    foo2
3    3    foo3
4    4    foo4

Here is a analytic approach to get the desired rows.
select b,c,d 
  from (
 select b,c,d, row_number()over(partition by b order by c) rn
   from a
)
where rn=1
;

Plan
SELECT STATEMENT  ALL_ROWSCost: 643  Bytes: 12.699.873  Cardinality: 99.999              
    3 VIEW RAFU. Filter Predicates: "RN"=1  Cost: 643  Bytes: 12.699.873  Cardinality: 99.999          
        2 WINDOW SORT PUSHED RANK  Filter Predicates: ROW_NUMBER() OVER ( PARTITION BY "B" ORDER BY "C")<=1  Cost: 643  Bytes: 1.699.983  Cardinality: 99.999      
            1 TABLE ACCESS FULL TABLE RAFU.A Cost: 88  Bytes: 1.699.983  Cardinality: 99.999  

;
The analytic way is sorting all rows and restricting rows after. How about collecting the needed information while aggregating. An alternative using keep and first reserved words in SQL.
select b,min(c) c,min(d) keep (dense_rank first order by c) dd
  from a
 group by b
;

Plan
SELECT STATEMENT  ALL_ROWSCost: 93  Bytes: 85  Cardinality: 5          
    2 SORT GROUP BY  Cost: 93  Bytes: 85  Cardinality: 5      
        1 TABLE ACCESS FULL TABLE RAFU.A Cost: 88  Bytes: 1.699.983  Cardinality: 99.999  
Plans are taken from Toad. The cardinality information there is more accurate in the aggregate version. This may have huge influence if such code is a part of a bigger query. Comparing analytic to aggregate keep first approach we get 77% improvement in execution time. It is all CPU time. Using Tom Kyte runstats.

DECLARE
  CURSOR c_rn
  IS
select b,c,d 
  from (
 select b,c,d, row_number()over(partition by b order by c) rn
   from a
)
where rn=1
;
--
  CURSOR c_keepfirst
  IS
select b,min(c) c,min(d) keep (dense_rank first order by c) dd
  from a
 group by b
;
--
BEGIN
   runstats_pkg.rs_start;
  FOR i IN 1 .. 100 LOOP
     FOR rec IN c_rn LOOP
        NULL;
     END LOOP;
  END LOOP;
   runstats_pkg.rs_middle;
  FOR i IN 1 .. 100 LOOP
     FOR rec IN c_keepfirst LOOP
        NULL;
     END LOOP;
  END LOOP;
   runstats_pkg.rs_stop;
END;
/

STAT...CPU used by this sessio       1,872         426      -1,446
STAT...recursive cpu usage           1,872         424      -1,448
STAT...Elapsed Time                  1,873         424      -1,449


No comments:

Post a Comment

About Me

My photo
I am Timo Raitalaakso. I have been working since 2001 at Solita Oy as a Senior Database Specialist. My main focus is on projects involving Oracle database. Oracle ACE alumni 2012-2018. In this Rafu on db blog I write some interesting issues that evolves from my interaction with databases. Mainly Oracle.