Skip to main content

Questions tagged [data]

Questions mostly concerned with managing data, without focus on pre-processing or modelling.

5votes
1answer
36views

Analyzing if my email notifications increase or decrease total subscriptions

I am hoping to reach someone who knows how to interpret data, if not, someone with better logic than me would still help :) I had around 9000 users paying for monthly subscriptions for a service on ...
adrianTNT's user avatar
5votes
1answer
49views

How to do Exploratory Data Analysis when my response variable is binary?

I am doing a multilevel regression, and my response variable is binary (presence of females on a tech board). all the EDA methods i know are about plotting correlation, but this as this is a binary i ...
Anya's user avatar
2votes
0answers
13views

Question about preprocessing two time-series datasets from different measurement devices

I have a question regarding the preprocessing step in a project I'm working on. I have two different measurement devices that both collect time-series data. My goal is to analyze the similarity ...
TTC's user avatar
  • 21
2votes
0answers
21views

Pipeline Orchestration Tool with a large Number of Nodes

We are currently looking for a pipeline orchestration tool to refactor a complex biodata pipeline. However, our since we are dealing with biodata, the orchestration tool would have to manage an ...
LiKao's user avatar
2votes
2answers
40views

Best application to convert snapshot to data

For various data science projects, I frequently need to convert snapshots from excel file to proper data frames, which I later use in Python or R. One such example snapshot is below I would like to ...
Bogaso's user avatar
0votes
0answers
59views

What do the edges represent in the "Gnutella Peer-to-Peer Network, August 8, 2002" dataset?

I was tasked with analysing the "Gnutella peer-to-peer network, August 8 2002" dataset and given only very limited information about it. In particular, I want to figure out what exactly the ...
jonupp's user avatar
0votes
0answers
11views

Cochran-Mantel-Haenszel Test Using Weighted Data

I'm doing an analysis on birth control methods used postpartum and pregnancy ambivalence. For the analysis, I'm using PRAMS data which is weighted and stratified by race/ethnicity. Everything that I'...
April Lopez's user avatar
0votes
0answers
10views

How to Represent Structured Inputs in a Neural Network for Multi-Entity Prediction?

I'm building a neural network model to predict which student in a class will achieve the highest score on an upcoming exam (this is not the actual task, I actually modified the task to maintain ...
Saffy's user avatar
5votes
1answer
55views

Community-driven Open Data Platforms

Does anybody know community-driven open data platforms? For example, consider object detection task. Then, next platforms come to my mind: Kaggle and Roboflow. However, in my opinion, both has a ...
Leon Useinov's user avatar
0votes
1answer
16views

Looking for PDF samples in different categories to train a data classification model

I'm looking for some 50-70 documents in each of the below categories to train a custom classification model that can identify the document category. Business card Booklet Post card Calendar Letter ...
pradeep's user avatar
1vote
1answer
40views

Handling negative/near-zero EPS in financial time series analysis - ratio metrics vs raw data approach?

I'm working with financial time series data on a large global universe of companies. Specifically using fundamentals from FactSet right now, and my question concerns earnings per share (EPS), and I'm ...
torkestativ's user avatar
1vote
0answers
38views

Find and filter break times in data set (python)

I have a data record that contains the hourly output of a machine and the corresponding time stamp. Now I have the problem that there are break times during which the hourly performance naturally ...
Lorenz Meng's user avatar
0votes
0answers
11views

What evaluation method is suitable if the detected data size is different from the actual (expected) one?

I want to evaluate the sequential tone detection system. Although the results are similar to what is expected, the problem is that the data size is different between the predicted data and the actual ...
user176504's user avatar
0votes
1answer
96views

Permutation importance question - all zero for the features

I have the following code: ...
Victorsmoreschi's user avatar
0votes
0answers
12views

How to find a list of categorical columns that have same values?

I would like to make something like we do to loc() and rows. Though, now I want the columns that have same values. Or, to apply some filter with the unique values to find the columns. Things that I ...
Apollo's user avatar

153050per page
close