Questions tagged [data]
Questions mostly concerned with managing data, without focus on pre-processing or modelling.
857 questions
5votes
1answer
36views
Analyzing if my email notifications increase or decrease total subscriptions
I am hoping to reach someone who knows how to interpret data, if not, someone with better logic than me would still help :) I had around 9000 users paying for monthly subscriptions for a service on ...
5votes
1answer
49views
How to do Exploratory Data Analysis when my response variable is binary?
I am doing a multilevel regression, and my response variable is binary (presence of females on a tech board). all the EDA methods i know are about plotting correlation, but this as this is a binary i ...
2votes
0answers
13views
Question about preprocessing two time-series datasets from different measurement devices
I have a question regarding the preprocessing step in a project I'm working on. I have two different measurement devices that both collect time-series data. My goal is to analyze the similarity ...
2votes
0answers
21views
Pipeline Orchestration Tool with a large Number of Nodes
We are currently looking for a pipeline orchestration tool to refactor a complex biodata pipeline. However, our since we are dealing with biodata, the orchestration tool would have to manage an ...
2votes
2answers
40views
Best application to convert snapshot to data
For various data science projects, I frequently need to convert snapshots from excel file to proper data frames, which I later use in Python or R. One such example snapshot is below I would like to ...
0votes
0answers
59views
What do the edges represent in the "Gnutella Peer-to-Peer Network, August 8, 2002" dataset?
I was tasked with analysing the "Gnutella peer-to-peer network, August 8 2002" dataset and given only very limited information about it. In particular, I want to figure out what exactly the ...
0votes
0answers
11views
Cochran-Mantel-Haenszel Test Using Weighted Data
I'm doing an analysis on birth control methods used postpartum and pregnancy ambivalence. For the analysis, I'm using PRAMS data which is weighted and stratified by race/ethnicity. Everything that I'...
0votes
0answers
10views
How to Represent Structured Inputs in a Neural Network for Multi-Entity Prediction?
I'm building a neural network model to predict which student in a class will achieve the highest score on an upcoming exam (this is not the actual task, I actually modified the task to maintain ...
5votes
1answer
55views
Community-driven Open Data Platforms
Does anybody know community-driven open data platforms? For example, consider object detection task. Then, next platforms come to my mind: Kaggle and Roboflow. However, in my opinion, both has a ...
0votes
1answer
16views
Looking for PDF samples in different categories to train a data classification model
I'm looking for some 50-70 documents in each of the below categories to train a custom classification model that can identify the document category. Business card Booklet Post card Calendar Letter ...
1vote
1answer
40views
Handling negative/near-zero EPS in financial time series analysis - ratio metrics vs raw data approach?
I'm working with financial time series data on a large global universe of companies. Specifically using fundamentals from FactSet right now, and my question concerns earnings per share (EPS), and I'm ...
1vote
0answers
38views
Find and filter break times in data set (python)
I have a data record that contains the hourly output of a machine and the corresponding time stamp. Now I have the problem that there are break times during which the hourly performance naturally ...
0votes
0answers
11views
What evaluation method is suitable if the detected data size is different from the actual (expected) one?
I want to evaluate the sequential tone detection system. Although the results are similar to what is expected, the problem is that the data size is different between the predicted data and the actual ...
0votes
1answer
96views
Permutation importance question - all zero for the features
I have the following code: ...
0votes
0answers
12views
How to find a list of categorical columns that have same values?
I would like to make something like we do to loc() and rows. Though, now I want the columns that have same values. Or, to apply some filter with the unique values to find the columns. Things that I ...