Slow copying of memmap array to numpy array

Question

I have multiple binary (structured) file, each of 2GB, which I am currently reading in pair, using memmap to cross-correlate the same. I want to minimise the time required by this IO process, in the code.

I am implementing this as a Cython function, though the copying of memmap array to numpy array is quite fast (~3 sec) when same set of files are processed twice, it takes large amount of time if new files are to be read (~71 sec), possibly because of cache memory, this is the same with numpy fromfile as well.

What is the efficient and fastest way of copying the memmap to numpy array?

Any suggestions on the same are appreciated.

Code used:

comf = np.memmap(file_name, dtype = dt, mode = 'c') comf1 = np.memmap(file_name1, dtype = dt, mode = 'c') cdef np.ndarray tempcomf = np.zeros((templen, 1024), dtype = np.int8) cdef np.ndarray tempcomf1 = np.zeros((templen1, 1024), dtype = np.int8) tempcomf = comf['data'] tempcomf1 = comf1['data']

EDIT:

Here is the function used:

cpdef tuple decrypt_file(file_name, file_name1): cdef long long int templen = 0 cdef long long int templen1= 0 cdef np.ndarray tempcomf = np.zeros((templen, 1024),dtype=np.int8) cdef np.ndarray tempcomf1 = np.zeros((templen1, 1024),dtype=np.int8) dt = np.dtype([('header', 'S8'), ('Source', 'S10'), ('header_rest', 'S10'), ('Packet', '>u4'), ('data', '>i1', 1024)]) comf = np.memmap(file_name, dtype = dt, mode = 'c') comf1 = np.memmap(file_name1, dtype = dt, mode = 'c') templen = comf['Packet'][-1]-comf['Packet'][0] templen1= comf1['Packet'][-1]-comf1['Packet'][0] t_1 = time.time() tempcomf = comf['data'] tempcomf1= comf1['data'] print('Time take for memarray copy...'+str(time.time()-t_1)) tempcomf = tempcomf.ravel() tempcomf1= tempcomf1.ravel() tempcomf_X = np.array(tempcomf[1::2], order = 'F') tempcomf_Y = np.array(tempcomf[0::2], order = 'F') tempcomf1_X= np.array(tempcomf1[1::2],order = 'F') tempcomf1_Y= np.array(tempcomf1[0::2],order = 'F') return tempcomf_X, tempcomf_Y, tempcomf1_X, tempcomf1_Y

Input Data Structure: The binary file input has 32 bytes header and 1024 bytes data, the focus is on reading the latter memmap array to numpy array.

This is the function where the files are read and the data is separated from the header. If same file set is given twice, memory copy takes ~2 sec, but when a different set of files are given the copying takes ~72 sec.

EDIT - MORE INFORMATION

After further investigation, I found that this problem indeed stems from caching of memory. As part of the test I cleared cache (echo 3 > /proc/sys/vm/drop_caches), which results in longer time for the copy of memmap array to numpy array (to volatile memory).

As part of confirmation of the issue, when I pre-cache the binary files into memory using vmtouch it takes ~3 sec for the copy (memmap to numpy array) to take place.

Though the solution to the problem is not yet found, as even the pre-caching takes ~52 sec, when done by vmtouch, the reason for the problem is related to the caching of memory.

vmtouch OUTPUT:

vmtouch -vt /data/ch01_SOURCE_Binary_20201011_110101.bin [OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 522720/522720 Files: 1 Directories: 0 Touched Pages: 522720 (1G) Elapsed: 52.403 seconds

Read this article that may help you pythonspeed.com/articles/reduce-memory-array-copies — camp0, CommentedNov 5, 2020 at 20:19
I feel that this question is more suitable for stackoverflow.com — user228914, CommentedNov 5, 2020 at 20:32
@AryanParekh I disagree in this case: issues of performance are expressly permitted on CR. This question needs work, but for reasons of missing context, not due to its subject. — Reinderien, CommentedNov 5, 2020 at 20:53
And I agree with @Reinderien that the question is missing context, there is no problem with asking performance related question on CR. — pacmaninbw, CommentedNov 5, 2020 at 20:55

Stack Exchange Network

Slow copying of memmap array to numpy array

0

Hot Network Questions

Slow copying of memmap array to numpy array

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Related

Hot Network Questions