- this seems pretty straightforward. Though there might be a slightly more optimal way than what I'm thinking of. My plan is to:
- use a
collections.Counter()
to get the number of occurrences of each string inarr
. This takes$O(n)$ time and space. - I then want to find the items that have a count of 1, and get the $k$th one of those. There are a few ways I could do this:
- initialize a variable
n_dist
to 0, then loop overarr
and check if each item's count is 1. If yes, incrementn_dist
by 1. Ifn_dist == k
, return the item. If I exhaust the list, return""
. This would take$O(n)$ time. Counter
s preserve insertion order, so I could instead loop over its items and incrementn_dist
by 1 each time I encounter a key whose value is 1, and return the key whenn_dist == k
(or""
if I exhaust theCounter
). This would take$O(m)$ time where$m$ is the number of unique items inarr
, which could be all of them, so it'd be$O(n)$ in the worst case.- use
Counter.most_common()
to get a list of(item, count)
tuples in descending order of count, with tied counts appearing in insertion order. Index this with-k
to get the $k$th least common item and its count. If its count is 1, return the item; otherwise return""
. I initially thought this might be the best approach, but I think it's actually the worst one because.most_common()
has to sort the items by count internally, which would could take up to$O(n \log n)$ time if all items inarr
are unique. - Given this, I'll go with the second option... though I might also try the 3rd one to compare, because my intuition is that for small $n$s (
arr
contains at most 1,000 items), an$O(n \log n)$ operation in C might end up being faster than an$O(n)$ operation in Python. - actually, never mind -- I can't index it with
-k
, I'd have to iterate over themost_common()
list in reverse to find the total number of elements with a count of 1 (n_distincts
), then index it with-(n_distincts - k)
to get the $k$th item with a count of 1. That makes it not worth it, I think.
- initialize a variable
- use a
- This won't change the asymptotic complexity, but I thought of a way to do this in
$O(n)$ time instead of$O(2n)$ , at the cost of some additional (less than $O(n)$) space. I could:- initialize a dict
distincts
to store distinct items as keys with values ofNone
- this basically acts as a set that provides
$O(1)$ lookup, insertion, and deletion, but also preserves insertion order
- this basically acts as a set that provides
- initialize a set
non_distincts
to store items that are not distinct - for each item in
arr
:- if it's in
non_distincts
, continue on. If it's not, then check whether it's indistincts
- if so, remove it from
distincts
and add it tonon_distincts
- otherwise, add it to
distincts
- if so, remove it from
- if it's in
- if the length of
distincts
is$\lt k$ , return""
. Otherwise, useitertools.islice
to return the $k$th key
- initialize a dict
- I might try this as well, just to compare.
classSolution: defkthDistinct(self, arr: List[str], k: int) ->str: counts=Counter(arr) n_distincts=0foritem, countincounts.items(): ifcount==1: n_distincts+=1ifn_distincts==k: returnitemreturn""
classSolution: defkthDistinct(self, arr: List[str], k: int) ->str: distincts= {} non_distincts=set() foriteminarr: ifitemnotinnon_distincts: ifitemindistincts: deldistincts[item] non_distincts.add(item) else: distincts[item] =Noneiflen(distincts) <k: return""returnnext(islice(distincts.keys(), k-1, None))
Huh, I'm surprised both that this one was slower and that it used less memory. I think with a sufficiently large arr
, the result would be the opposite.