Newest 'sampling' Questions

2votes

0answers

56views

How do I train a model on data where there should be a statistical difference but it can't find it?

I'm trying to create a predictive model for a dataset with continuous input variables and a binary/probability output. The input are sensors (up to 400 columns, but some very irrelevant) which are ...

user46124

21

asked Apr 3 at 8:53

0votes

0answers

8views

Importance of resampling when establishing a cutoff for categorical data

I am reading Feature Engineering and Selection by Max Kuhn and Kjell Johnson, and on page 97, section 5.2 it has the following (my question is ref. the last sentence): 'Although near-zero variance ...

horned-sphere

101

asked Sep 15, 2024 at 17:26

1vote

0answers

29views

Sampling multiple masked tokens through Metropolis–Hastings

I'm trying to replicate the finding of the the publication "Exposing the Implicit Energy Networks behind Masked Language Models via Metropolis-Hastings" for obtaining the joint distribution ...

Chris

11

asked Aug 4, 2024 at 16:21

1vote

0answers

7views

Optimizing Sampling Strategy to Enhance Uniformity Under Conditional Constraints

I am facing a challenge in a project that involves sampling from a design space defined by 10 variables. I use Latin Hypercube Sampling (LHS) and/or Sobol sequences, and initially, the samples are ...

Chris

11

asked Apr 23, 2024 at 14:16

4votes

1answer

49views

Algorithm for picking N random uniformly distributed samples, in irregular polygon?

Say want to pick a fixed number of samples from a large 2D dataset, such that they relatively evenly distributed over the whole sample area. Imagine places in a country - so the border of the data is ...

barryhunter

171

asked Mar 5, 2024 at 17:02

1vote

1answer

227views

Top_p parameter in langchain

I am trying to understand the top_p parameter in langchain (nucleus sampling) but I can't seem to grasp it. Based on this we sort the probabilities and select a ...

Labyrinthian

13

asked Mar 1, 2024 at 16:16

1vote

1answer

160views

Correct way to take a subset of a dataset?

I am attempting a binary classification problem (using Weka). My dataset has 100,000 rows, 14 attributes (1 output variable). It takes already too long just to open the dataset in excel so I just know ...

FlexMcMurphy

113

asked Dec 17, 2023 at 23:53

1vote

1answer

3kviews

Why is 0.7, in general, the default value of temperature for LLMs?

I have recently read through a lot of documentation and articles about Large Language Models (LLMs), and I have come to the conclusion that 0.7 is, most of the time, the default value for the ...

jmpion

11

asked Nov 14, 2023 at 15:47

0votes

1answer

49views

how to evaluate a model on our data when the model is imported from a library and thus not trained by us?

The company I work for has deployed a trained rule-based sentiment analyzer model vader to make predictions on customer's attitude. We import the model from nltk library directly, so we didn't train ...

Shelby

3

asked Sep 22, 2023 at 13:28

1vote

0answers

28views

Calculating an integral with as few grid points as possible

Suppose I have a function $f\colon [0,1] \to \mathbb{R}$ which is maybe continuous (it's at least in $L^1$). I have a sample of $N$ points $\{x_i\}$ taken from the domain $[0,1]$ randomly from some ...

math_guy

111

asked Jul 27, 2023 at 23:17

0votes

1answer

58views

Question about collapsing variable and oversampling minority classes

i have imbalanced data consisting of nine classes, and i am planning to collapse them into two classes. i performed stratified (proportionate) sampling between test, validation, and training sets ...

RyRy the Fly Guy

151

asked Feb 8, 2023 at 15:32

1vote

0answers

13views

Group or find associations and orderings for elements that appear in different samples (analyzing examples of input files for undocumented code)

I'm trying to understand and use a physics simulation code that was written decades ago. It uses input files that have their origins in stacks of punch cards as input. In other words each line is a ...

uhoh

121

asked Jan 6, 2023 at 10:19

0votes

1answer

168views

Is Logistic Regression possible using a Convenience Sample?

I've collected some survey data on homeless individuals, surveying their drug use, education level, age, gender etc. I hope to run a logistic regression to see how impactful homelessness (+other ...

JS Holding

1

asked Dec 12, 2022 at 18:36

0votes

1answer

243views

Understanding bootstrapping in bias variance decomposition

I was going through bias and variance tradeoff article and it makes use of bias_variance_decomp function from mlxtend library. ...

Mahesha999

299

asked Nov 27, 2022 at 20:18

0votes

1answer

73views

Determining the information loss due to undersampling

I have an image dataset that I need to segment into directories (train, validation and test) using ImageDataGenerator in TensorFlow/Keras. The dataset is highly imbalanced: For this I have decided to ...

Harsh Khare

87

asked Nov 24, 2022 at 9:44

Stack Exchange Network

Questions tagged [sampling]

How do I train a model on data where there should be a statistical difference but it can't find it?

Importance of resampling when establishing a cutoff for categorical data

Sampling multiple masked tokens through Metropolis–Hastings

Optimizing Sampling Strategy to Enhance Uniformity Under Conditional Constraints

Algorithm for picking N random uniformly distributed samples, in irregular polygon?

Top_p parameter in langchain

Correct way to take a subset of a dataset?

Why is 0.7, in general, the default value of temperature for LLMs?

how to evaluate a model on our data when the model is imported from a library and thus not trained by us?

Calculating an integral with as few grid points as possible

Question about collapsing variable and oversampling minority classes

Group or find associations and orderings for elements that appear in different samples (analyzing examples of input files for undocumented code)

Is Logistic Regression possible using a Convenience Sample?

Understanding bootstrapping in bias variance decomposition

Determining the information loss due to undersampling

Hot Network Questions

Questions tagged [sampling]

Related Tags