7

I'm having trouble creating a mysql compatible algorithm for this.

Background

App with mysql, perl and JS. It's a booking system where each booking is comprised of a start, end and qty. Start and end are timestamps.

Schema simplified to a single table:

| bookings |------------------- | id | pkey | start | timestamp | end | timestamp | qty | int 

Question

In SQL, how do you check how many resources are booked at once for a given timeRange? Code with explanation or an algorithm that is SQL compatible both work.

So, for the following schedule:

09:00 ----- <-| 09:30 | | | A maximum of 12 are booked at once during this range 10:00 |x7 | | 10:30 ----- ----- ----- | 11:00 | | | | | 11:30 |x2 | |x10| <-| 12:00 | | | | 12:30 ----- ----- 

I should get 12 since the x2 and x10 bookings don't overlap with the x7 booking, so there are never more than 12 items booked at once between 9:00 and 11:30.

Progress

--It's been heavily shrunk to show just the relevant part, so it might have some errors SELECT coalesce(max(qtyOverlap.sum),0) booked FROM ( SELECT coalesce(sum(b2.qty),0) sum FROM booking b1 LEFT JOIN ( SELECT b.qty, b.tStart, b.tEnd FROM booking b ) b2 ON b1.tStart < b2.tEnd AND b1.tEnd > b2.tStart AND b2.tStart < '2015-02-19 16:30:00' AND b2.tEnd > '2015-02-19 06:00:00' WHERE b1.tStart < '2015-02-19 16:30:00' AND b1.tEnd > '2015-02-19 06:00:00' GROUP BY b1.id ) qtyOverlap GROUP BY qtyOverlap.itemId 

Which is this algorithm:

Max of For each booking that overlaps given timeRange return sum of each booking that overlaps this booking and given timeRange 

In the schedule above this would be:

max([7],[2+10],[10+2]) = 12 

But given a schedule like:

09:00 ----- <-| 09:30 | | | A maximum of 17 are booked at once during this range, not 19 10:00 |x7 | | 10:30 | | ----- | 11:00 ----- | | | 11:30 ----- |x10| <-| 12:00 |x2 | | | 12:30 ----- ----- 

This gives:

max([7+10],[2+10],[10+7+2]) = 19 

Which is wrong.

The only way I can think of to fix this is to use recursion (which isn't mysql compatible afaik).

It would look something like (in working JS code)

function getOverlaps(bookings,range) { return bookings.filter(function(booking){ return isOverLapping(booking,range); }); } function isOverLapping(a, b) { return (a.start < b.end && a.end > b.start); } function maxSum(booking, overlaps) { // main recursive function var currentMax = 0; var filteredOverlaps = getOverlaps(overlaps,booking); for (var i = 0; i < filteredOverlaps.length; i++) { currentMax = Math.max( maxSum(filteredOverlaps[i], removeElement(filteredOverlaps,i)), currentMax ); } return currentMax + booking.qty; } function removeElement(array,i){ var clone = array.slice(0) clone.splice(i,1); return clone; } var maxBooked = maxSum(timeRange, getOverlaps(bookings,timeRange)); 

Visual JSFiddle demo

Any way to do this in SQL? (any reasonable way, that is)

Update I just tried to use a stored procedure recursion emulation method as documented here. But part way through implementing it, I tried it with the demo data and decided the performance was far too terrible. Actually, it just needed indexing. Now it's just 'kinda' bad.

2
  • Convert your time-ranges to time-slots and your problem becomes a lot easier to solve.
    – Pieter B
    CommentedFeb 26, 2015 at 9:16
  • @PieterB He could even use one-minute time-slots as worst case scenario.CommentedFeb 26, 2015 at 17:53

3 Answers 3

6

This is tricky, because you've modeled your bookings as time intervals with granularity as fine as your DB allows. Perfectly natural to do, but as you've found out it makes some comparisons difficult.

Max of For each booking that overlaps given timeRange return sum of each booking that overlaps this booking and given timeRange 

The problem with this algorithm is that it checks that each other booking interval matches the one currently being examined (the foreach iteration), but it doesn't check overlapping bookings against each other, to see if they line up. A run through of your second example goes like this:

  • Select 7x booking
    • 7x does not overlap with 2x; +0
    • 7x overlaps with 10x; +10
    • Total 17
  • Select 2x booking
    • 2x does not overlap with 7x; +0
    • 2x overlaps with 10x; +10
    • Total 12
  • Select 10x booking
    • 10x overlaps with 7x; +7
    • 10x overlaps with 2x; +2
    • [Missing step: check if 7x and 2x overlap]
    • Total 19
  • Max 19

Is it reasonably possible to map your data to nice, neat discrete chunks of some size? For example, do your bookings generally begin and end on the 15s (12:00, 12:15, 12:30, 12:45)? If so, you can change your algorithm to compare bookings against a static time interval, rather than each other, and drastically reduce the required number of comparisons:

Max of For each 15 minute chunk in timeRange Sum quantities of all bookings overlapping this chunk 

In terms of SQL implementation, choose an interval size and use a numbers or tally table to generate an inline query to create the chunks:

select @startTime + interval (15 * numbers.value) minute as start , @startTime + interval (15 * (numbers.value + 1)) minute as end from numbers where (@startTime + interval (15 * numbers.value) minute) < @endTime 

(Off the cuff, may contain minor syntax or math errors)

This is a relatively sane way of performing this query in SQL without recursion. It has the obvious drawback that it will never perfectly align with your current schema, but do you really need absolute perfection?

I've used 15 minutes as an example size. You can easily make this as finely grained as you like: 5 minutes, 1 minute, 1 second, etc. There must be some point at which the granularity is too fine, because MySQL's timestamp type does not possess arbitrary precision. "Booking" to me implies something involving humans actually showing up. If this is true, I can't imagine that an interval size smaller than one minute would be appropriate.

In the comments you expressed some concern about performance because of a large number of comparisons. Complexity for this algorithm is O(n*m), where n is the number of chunks (time range / interval size) and m is the number of booking rows within the given time range. I'll hazard that in practice, n >> m, meaning that what really matters to the computation time is the number of intervals. This should be a non-issue, as long as you use sane timeframes and your DB is indexed and maintained correctly. For example, using an interval size of one second for the time range in your question (9:00 - 11:30) is only 9000 intervals to inspect. 9000 rows is paltry to an SQL server. I trust this to be performant a lot more than I do using dynamic SQL to emulate recursion.

If the interval size is 50 million times smaller than the time range, then yes, this will take a significant amount of time to run (notice I didn't say perform poorly), because you'll be running a query against 50 million rows. But is querying the maximum bookings for every millisecond in a twelve hour span (43.2 million ms) reasonable and necessary? There are only 604800 seconds in a week. Performing a query on a set that size, while not trivial, shouldn't give an SQL server any difficulty.

What does your data look like? How fine of an inspection period do you need? If there's a two minute (or second, decasecond, millisecond...) interval where there are 105 bookings instead of 100 because someone entered an "unusual" end time, will that destroy the integrity of your report, or can that data be discarded as noise? I can't answer these questions, but some simple data and requirements analysis on your part can.

7
  • +1 for converting timerange to chunks. That way you can easily add them together and get your answer.
    – Pieter B
    CommentedFeb 26, 2015 at 9:15
  • This query will mostly be used to tell the user how many items are available while they are making a booking. So mis-data would turn into overbookings or would block the user from booking valid quantities. I could impose a minimum granularity but the application will be used by a large variety of clients and some of them do use odd time intervals. I really hate to limit the data for this one type of query. Maybe I'll just client-side the whole thing. With a version in perl for server-side validation. But thank you, I didn't realise you could map timestamps to intervals on the fly.
    – DanielST
    CommentedFeb 26, 2015 at 14:49
  • 1
    @slicedtoad Set the granularity to one minute and use this solution.CommentedFeb 26, 2015 at 17:48
  • @user61852 I'll run some test cases but that sounds like it would perform really bad for any long selection to test against. It will not be uncommon to have long and short bookings together. But let me give that a go.
    – DanielST
    CommentedFeb 26, 2015 at 18:39
  • @slicedtoad Just as user61852 says, use a one minute interval. The computational expense of this algorithm grows with the number of chunks; everything else is negligible. As long as you aren't using many millions of chunks at once (e.g. 1ms chunk, 1 day interval), there shouldn't be any performance issue, assuming your DB is properly indexed and maintained. Operating on large sets is what SQL is designed for. Pushing all the data client side to perform the computation is a poor solution: bandwidth usage, transfer time, data leaking, precludes cached reports...CommentedFeb 26, 2015 at 18:48
1

So Esoteric's solution worked but it still bothered me since it feels kinda bruteforce-ish. I knew there had to be a solution that only looked at the relevant data (start, end and qty) and didn't need to translate it into a different form.

Then I remembered order by and a solution hit me.

Ordered edge tally

  1. Create a list of edges and their quantities (start U end). End edges get negated qtys.
  2. Order them by date (for duplicate dates, end goes first).
  3. Make a running total and combine any duplicate dates.
+---------------------+-----------+-------+ | edgedate | qtyChange | tally | +---------------------+-----------+-------+ | 2015-02-19 09:00:00 | 7 | 7 | | 2015-02-19 10:30:00 | 10 | 17 | | 2015-02-19 11:00:00 | -7 | 10 | | 2015-02-19 11:30:00 | 2 | 12 | | 2015-02-19 12:30:00 | -12 | 10 | +---------------------+-----------+-------+ 

4. Return max tally.

Actual SQL:

SET @i = 0; SELECT max(edge.tally) FROM ( SELECT sum(@i:= b1.qty + @i) AS tally /*Cumulative sum and combine any duplicate dates*/ FROM ( /*Get every edge (start U end)*/ SELECT tstart, qty, 1 as ord FROM booking b WHERE b.tstart < '2015-02-19 12:30:00' AND b.tend > '2015-02-19 08:00:00' UNION SELECT tend AS tstart, (qty*-1) AS qty, 0 as ord /*End edges have negative qtys*/ FROM booking b WHERE b.tstart < '2015-02-19 12:30:00' AND b.tend > '2015-02-19 08:00:00' ORDER BY tstart, ord ) b1 GROUP BY b1.tstart ) edge; 

Perfect precision, no joins, minimum complexity (my big O notation skills are lacking, maybe O(2*b) where b is the number of bookings?)

explain query:

+----+--------------+------------+-------+---------------+--------+---------+------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------+------------+-------+---------------+--------+---------+------+------+---------------------------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 5 | | | 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 6 | Using temporary; Using filesort | | 3 | DERIVED | b | range | tstart,tend | tstart | 9 | NULL | 2 | Using where | | 4 | UNION | b | range | tstart,tend | tstart | 9 | NULL | 2 | Using where | | NULL | UNION RESULT | <union3,4> | ALL | NULL | NULL | NULL | NULL | NULL | Using filesort | +----+--------------+------------+-------+---------------+--------+---------+------+------+---------------------------------+ 
1
  • Nice. This is O(n log n) though, because the data must be sorted before performing the bookings summation. But using the same assumptions about the data from my answer, O(n log n) ought to be nontrivially smaller than my O(m) (m = time slices) solution.CommentedFeb 28, 2015 at 1:39
0

This is a possibility with SQL, however it needs to generate a sequence of numbers, since SQL server i was testing this does not support it I had to fetch the sequence from sys.all_objects ROW_NUMBER() function,

SELECT n = ROW_NUMBER() OVER (ORDER BY [object_id]) FROM sys.all_objects 

the approach is to generate a view with number of time intervals small enough for your booking systems smallest allowed time slot (in this case I used 5 minutes and you can change it as you want)

select DATEADD(MINUTE, 5 * n, '2015-02-19 08:00:00') t_start, DATEADD(MINUTE, 5 * (n + 1), '2015-02-19 08:00:00') t_end from bookings b, ( SELECT n = ROW_NUMBER() OVER (ORDER BY [object_id]) FROM sys.all_objects ) numbers where DATEADD(MINUTE, 5 * (n + 1), '2015-02-19 08:00:00') < '2015-02-19 17:00:00' 

So the date value passed to the DATEADD function would be the from time and the one used in the last is the end time. This will generate resultset like this,

t_start | t_end -------------------------------------------------- 2015-02-19 08:05:00.000 | 2015-02-19 08:10:00.000 2015-02-19 08:10:00.000 | 2015-02-19 08:15:00.000 2015-02-19 08:15:00.000 | 2015-02-19 08:20:00.000 ................. 

Going a bit out of question and you can see sum for each period from this query

select tInt.t_start, tInt.t_end, (select sum(b.qty) from bookings b where b.tstart <= tInt.t_start and b.tend >= tInt.t_end) as total from ( select DATEADD(MINUTE, 5 * n, '2015-02-19 08:00:00') t_start, DATEADD(MINUTE, 5 * (n + 1), '2015-02-19 08:00:00') t_end from bookings b, ( SELECT n = ROW_NUMBER() OVER (ORDER BY [object_id]) FROM sys.all_objects ) numbers where DATEADD(MINUTE, 5 * (n + 1), '2015-02-19 08:00:00') < '2015-02-19 17:00:00' ) tInt 

will result in,

t_start | t_end |total ----------------------------------------------------------- 2015-02-19 08:55:00.000| 2015-02-19 09:00:00.000 |NULL same repeating..... | | 2015-02-19 09:00:00.000| 2015-02-19 09:05:00.000 |7 same repeating..... | | 2015-02-19 10:30:00.000| 2015-02-19 10:35:00.000 |NULL same repeating..... | | 2015-02-19 11:00:00.000| 2015-02-19 11:05:00.000 |10 same repeating..... | | 2015-02-19 12:00:00.000| 2015-02-19 12:05:00.000 |12 same repeating..... | | 2015-02-19 12:30:00.000| 2015-02-19 12:35:00.000 |NULL same repeating..... 

Now all you have to do is to get max value

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.