6
$\begingroup$

I have a compiled function cfun that produces a very large array of data. I learned that compiled functions automatically use PackedArrays. I just wonder whether there is a more efficient way in Mathematica to store large arrays of numerical data.

Is there somewhere a table where I can see how much memory a PackedArray consisting of numerical data with MachineSize precision should consume?

For an output array of length 10^8 ByteCount[] returns memory consumption of 1.6 GB on my machine. Would be great if somebody could tell me whether this is a normal value and whether there is a way to store the array in a more efficient way (since for the purpose of my program I need to store a list of length 10^9-10^10 which would correspond to 16-160 GB RAM).

$\endgroup$
9
  • $\begingroup$Not an anwser, but could you use a stream to write the data to a file, using Openwrite ,Write etc? Or create an enourmous (~100gb) swap file, like I did today?$\endgroup$CommentedFeb 9, 2018 at 16:50
  • $\begingroup$10^8 elements will take up about 0.75 GB if the elements are real, or 1.5 GB if they are complex. Note that a gigabyte is 1024^3 bytes, not 1000^3.$\endgroup$
    – Szabolcs
    CommentedFeb 9, 2018 at 16:54
  • $\begingroup$Ruud3: My problem is that I need to do computational stuff with the data I create. I'm afraid, writing it to a file and then import it again won't be helpful here. Szabolcs: You are right, of course. Thank you for the reminder. Perhaps the meta data makes it 1.6 GB. I need to check again on this.$\endgroup$
    – NeverMind
    CommentedFeb 9, 2018 at 17:03
  • $\begingroup$Maybe your data has certain redundancies or some "smoothness" that allow a (nearly) lossless compression...$\endgroup$CommentedFeb 9, 2018 at 17:40
  • 1
    $\begingroup$Btw.: For pseudorandom data, you may use SeedRandom to generate the data and to retrieve it later without storing the actual pseudorandom numbers.$\endgroup$CommentedFeb 9, 2018 at 18:43

1 Answer 1

6
$\begingroup$

Real-type packed arrays are stored as a contiguous block of double-precision values, plus a small bit of metadata.

This means that:

  • If you have n elements, it takes n*8 bytes
  • There is no reasonable way to store data of the same precision more efficiently

An integer takes up 8 bytes on a 64-bit system or 4 byte on a 32-bit system in a packed array. A complex number takes up 16 bytes (both the real and imaginary parts are double-precision).

$\endgroup$
8
  • $\begingroup$Thank you for your explanation. In fact, my numbers are complex, which explains why my array occupies 1.6 GB. Is your statement true for any programming language? Or do other languages have smarter ways of storing data? I am afraid, that my local memory will be the bottleneck for my computation then if there is no other reasonable way. :($\endgroup$
    – NeverMind
    CommentedFeb 9, 2018 at 16:57
  • $\begingroup$@DisplayName It is true for any language. If you use a low-level language like C, you can use single-precision numbers (~7 dgitis of precision), which take up half the space. 10^10 is almost never a reasonable size of an array. Consider carefully if you really need to store this much data simultaneously. Maybe you can generate and process the data piece by piece.$\endgroup$
    – Szabolcs
    CommentedFeb 9, 2018 at 17:00
  • $\begingroup$Ok interesting, wasn't aware of this. I guess I cannot reduce the precision that easily with Mathematica? In principle, generating and processing the data piece by piece is a possibility. I have already thought about this but first wanted to know whether there is a more efficient way of storing the data. Essentially, it is stochastic data and I need to compute some quantities like moments etc. As it seems these averaged quantites seem to converge very slowly. This just as background information why I am aiming for such a large array.$\endgroup$
    – NeverMind
    CommentedFeb 9, 2018 at 17:07
  • 1
    $\begingroup$That is true, of course. Though, I was more afraid about whether the total size of the list is sufficient to get a nice convergence of these averages. And I think there is no way to circumvent this problem. The maximum size of the data array (in one or many pieces) is limited by my computer's memory. Ok, I could buy more memory, but maybe I should also look again into the generation of the data. Perhaps using many arrays and changing the initial conditions for every one of them leads to a faster convergence of the averages. Though there should be no difference to just one array I suppose.$\endgroup$
    – NeverMind
    CommentedFeb 9, 2018 at 17:17
  • 2
    $\begingroup$@DisplayName Mathematica encourages working with entire arrays. You generate the whole array, then you do operations on the whole array. In performance-oriented languages you would not do this. You would process those random numbers one by one as they are generated. I assume you are generating them and not reading from a file. This way memory is not an issue. Example$\endgroup$
    – Szabolcs
    CommentedFeb 9, 2018 at 18:22

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.