Questions tagged [version-control]
The version-control tag has no summary.
15 questions
3votes
1answer
258views
ValueError when loading sklearn DecisionTreeClassifier pickle in Python 3.10
I'm encountering an issue while transitioning from Python 3.7.3 to Python 3.10 due to the deprecation of the older version. The problem arises when attempting to load a pickled sklearn ...
0votes
1answer
29views
I cannot run MNIST MWE (hello world for DL)
I have installed Anaconda and want tor run MWE for MNIST but I'me getting this error: D:\STAZENE_last\Anaconda2\Lib\site-packages\torch\cuda\__init__.py:107: ...
4votes
2answers
73views
Merging data approach in Data Science projects
This is more of an infrastructural question about data science. How would you manage data merging in your GitHub repository? As an example, as a data scientist I might be working on my branch and ...
0votes
0answers
21views
version control for code and output models
I have a question about version control for both code and the models it generates. We are developing ML models that often involve hyperparameters and so we might do many runs with different ...
1vote
0answers
18views
Suggestion on practice to model and dataset version documentation
I want to steer my question towards the practical side of ML. As a practitioner, I feel keeping different versions of models and datasets is difficult. From time to time I need to revisit my data and ...
2votes
1answer
121views
What is the difference between Pachyderm and Git?
I learned that tools like Pachyderm version-control data, but I cannot see any difference between that tool with Git. I learned from this post that: It holds all your data in a central accessible ...
0votes
1answer
48views
How to version data science projects with large files
I am working on a project with large data files (~300MB). I want to version my work along with the data files so that it is always available online. I tried using git-lfs but it has a 1GB/month ...
2votes
1answer
39views
What is the right way to store datasets for a CNN project [closed]
Our image classification project has thousands of raw photos, masks and reshaped images. We store source code in git. But datasets don't belong to source code version control. How should we store thee ...
7votes
1answer
417views
A the end of a big DS project, should I make trained models available on GitHub?
I almost completed two big Data Science personal projects based on Deep Learning. They are the fanciest models I've implemented up to now, and I'm pushing all my code on GitHub. Do you advice to ...
1vote
0answers
32views
Embedding git commit into the resulting data
Our pipeline works something like that: Collect bunch of raw data (10-100 GB) from microscope Process data using MATLAB scripts Change few parameters based on raw data, as well as add new features to ...
1vote
0answers
440views
Keras trained model exported with older version of Keras ( < 2.2.0 )
Is it possible to update a trained model saved in a file without retraining it ? I found the model on the web and I would like to use it but it uses Merge layers ...
2votes
1answer
958views
Dataset management: What are some strategies/solutions for efficiently storing datasets with their versions?
The problem: I've N classification models (independent), for each of these N models, I've different versions (eg: V0, V1, ..., Vfinal_production,Vexperimental). I'm looking for a way to store my ...
-3votes
1answer
63views
Extract all releases from GIT repository [closed]
I would like to examine an existing Git repository and extract all defined releases into a subfolder. For example, if application A had 26 releases, my bash script would extract all 26 versions into ...
63votes
11answers
31kviews
How to deal with version control of large amounts of (binary) data
I am a PhD student of Geophysics and work with large amounts of image data (hundreds of GB, tens of thousands of files). I know svn and ...
63votes
9answers
10kviews
Tools and protocol for reproducible data science using Python
I am working on a data science project using Python. The project has several stages. Each stage comprises of taking a data set, using Python scripts, auxiliary data, configuration and parameters, and ...