0
$\begingroup$

I am a data science beginner and I am looking for some directions regarding my problem. I hope it is ok if I ask my question here.

My data set consists of the overall memory usage of a JVM every 30s and the ids of the tasks that where running at that time. Each task runs in its own thread and 5 threads can run simultaneously at maximum. Some tasks run at different times. For example: some tasks run multiple times a day and some only once a week.

My job is to estimate which task uses how much memory or at least which tasks are "heavy" in memory usage and which are "lightweight". (I know there are different ways to measure memory usage in Java but my employer does not want to change the source code).

I am not sure how to solve this. Since it seems not to be a classification, regression or clustering problem. A simple idea would be to simply count how often a task is active when the memory usage is high. A more complicated idea would be to use masked learning to train a model that predicts the active tasks based on memory usage and by that learns the used memory of each thread. But I am not sure if this would work.

If anyone knows of similar problems or could give me some directions on how to solve this, I would be very glad.

Thanks for your time.

$\endgroup$

    0

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.