Skip to main content

Questions tagged [deep-learning]

For questions related to deep learning, which refers to a subset of machine learning methods based on artificial neural networks (ANNs) with multiple hidden layers. The adjective deep thus refers to the number of layers of the ANNs. The expression deep learning was apparently introduced (although not in the context of machine learning or ANNs) in 1986 by Rina Dechter in the paper "Learning while searching in constraint-satisfaction-problems".

0votes
0answers
11views

Convolutional Kernels in CNN learning to find different patterns

Suppose we have an input image of dimensions $w \times h $ and the first hidden layer has dimension $(w-1) \times (h-1) \times 3$. We have $3$ seperate $3 \times 3$ kernels with no padding. I ...
Stan's user avatar
0votes
0answers
15views

Why do my DNN convergence graphs behave differently on linear vs. dB scales?

I'm working on a deep neural network (DNN) and using the Adam optimizer to train it by learning parameters through backpropagation. My goal is to minimize the objective function. I’ve plotted the ...
Alee's user avatar
0votes
0answers
10views

Is there other ways than using negative log-likelihood or KL-divergence to compute a loss function?

I've read that the two common ways to express a loss function in ML problems was to start either from the likelihood, then use the negative log likelihood to find a good expression of the loss, or to ...
Tristan Beruard's user avatar
0votes
0answers
17views

Intuition behind Load-Balancing Loss in the paper OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER

I'm trying to implement the paper "OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER" But got stuck while implementing the Load-Balancing Loss. Could someone ...
qmzp's user avatar
1vote
0answers
30views

Applying the RTD task to a model trained with MLM leads to a decrease in performance as training progresses

We are developing a new LLM based on the CodeBERT architecture. As part of this effort, we initially trained our model using the Masked Language Modeling (MLM) objective with HuggingFace API. To ...
One Bad Student's user avatar
0votes
0answers
38views

How to Improve Levenshtein Distance in CNN-BiLSTM Morse Decoder?

Problem Context: I'm building a Morse code audio decoder using CNN-BiLSTM with CTC loss. My current 4-layer model achieves Levenshtein distance ≈0.6, but attempts to improve performance by adding a ...
alexander's user avatar
1vote
0answers
35views

How to make a variational autoencoder work on time-freq matrix?

I want to use a complex valued variantional autoencoder for unsupervised blind source separation. As an input to the network, I am giving the time-freq matrix of the spectrogram instead of the ...
ananya's user avatar
0votes
0answers
19views

CLIPSeg: no change in performance metrics with a better convolutional decoder

I am training CLIPSeg on the Oxford IIIT pet dataset for semantic segmentation (3 classes: background, cat, dog). In short, what I do is I stick a decoder on the CLIP encoder. The encoder outputs: ...
Stan's user avatar
3votes
2answers
44views

Required background for thorough understanding of Causal ML research papers?

I'm interested in pursuing research in the intersection of causal inference and machine learning, particularly on causal discovery and causal representation learning. Through my exploration so far, I ...
Harsh Shrivastava's user avatar
0votes
0answers
21views

why's there Nan values for forecast and total loss?

So I am training a Graph attention based model on time series dataset(Swat) for which while evaluating the dataset function for it is ...
Priyanshu Singh's user avatar
1vote
0answers
19views

Implementation of TSMAE model in Keras

I’m currently implementing the TSMAE model described in the paper “TSMAE: A Novel Anomaly Detection Approach for Internet of Things Time Series Data Using Memory-Augmented Autoencoder” (https://pxl.to/...
Nguyễn Hoàng Hà's user avatar
2votes
1answer
84views

How to deal with actions that complete in multiple steps (delayed reward) in reinforcement learning?

I have been exploring RL and using DQN to train an agent for a problem where i have two possible actions. But one of the action is supposed to complete over multiple steps while other one is ...
m101's user avatar
5votes
2answers
128views

Is there a conflict between NFL theorem and multimodal learning?

The definition of multimodal learning and NFL theorem is clear to me. My question is, if model good at a specific field might perform badly in another field, is there any need to find out a multimodal ...
Heartache_Doctor's user avatar
0votes
0answers
12views

Need Guidance on Gameplay Video Analysis for Storyline Graph Extractio

I'm a college student working on a project related to storyline graph extraction from gameplay videos and new player position identification in the graph. However, I'm completely clueless about how to ...
22I218 - GAYATHRI R's user avatar
2votes
1answer
76views

Why doesn't deep learning use modular arithmetic like cryptography, even though both deal with non-linear functions?

So, deep learning models are great at learning complex, non-linear patterns and seem to handle noise just fine. But under the hood, they rely on IEEE754 floating-point numbers, which can lose ...
Muhammad Ikhwan Perwira's user avatar

153050per page
close