Here's the scenario: I have item x (item_id, customer_number, cost) which can be submitted to another system multiple times over months which may or may not reject item x before finally accepting it at some point.
Reporting requirement: Arbitrary date range for start date and end date Want to return unique number of item which were rejected in time range (note that if item x was rejected three times in this time range, we would only want to count it once).
Realistic Example data:
Item X Rejected 01/02/2014 Item X Rejected 01/03/2014 Item X Rejected 02/15/2014
If I want to run the report for 01/01/2014 to 02/01/2014 for item x I should only get a count of 1 for the range. It gets weird when I run the report for January then also run it for February because the same item should show up as a count of 1 for both months, but it I run it for the first 3 months of the year, it still only should show up as a count of 1.
The problem: I am dealing with billions of records on the database. Normally we would just pre-calculate totals for the data and bucket it by month. We can't do that in this case because when running for arbitrary date ranges February and January couldn't be totaled in the prior example because that would result in a count of 2 for item x instead of the unique total of 1.
Question: Is there a way to pre-calculate data for unique counts like this for arbitrary date ranges? Does anyone have any suggestions for optimization of reporting here (not involving throwing more hardware at it)?
We are using Oracle Database 11g.
CREATE OR REPLACE MATERIALIZED VIEW
will be your friend here. Materialized View Concepts here. Share and enjoy.