0

I have the following numpy array:

numpy_x.shape (9982, 26) 

numpy_x have 9982 records/observations and 26 columns index. Is this right really?

numpy_x[:] array([[0.00000000e+00, 9.60000000e-01, 1.00000000e+00, ..., 1.20000000e+00, 6.90000000e-01, 1.17000000e+00], [1.00000000e+00, 9.60000000e-01, 1.00000000e+00, ..., 1.20000000e+00, 7.00000000e-01, 1.17000000e+00], [2.00000000e+00, 9.60000000e-01, 1.00000000e+00, ..., 1.20000000e+00, 7.00000000e-01, 1.17000000e+00], ..., [9.97900000e+03, 6.10920994e-01, 7.58135980e-01, ..., 1.08704204e+00, 7.88187535e-01, 1.23021669e+00], [9.98000000e+03, 6.10920994e-01, 7.58135980e-01, ..., 1.08704204e+00, 7.88187535e-01, 1.23021669e+00], [9.98100000e+03, 6.10920994e-01, 7.58135980e-01, ..., 1.08704204e+00, 7.88187535e-01, 1.23021669e+00]]) 

I want generate a dataframe with numpy_x data, index and columns (index and columns are the same really?), then I proceed to perform the following:

import pandas as pd pd.DataFrame(data=numpy_x[:], # I want pass the entire numpy array content index=numpy_x[1:26], columns=numpy_x[9982:26]) 

But I get the following error:

/.conda/envs/x/lib/python3.6/site-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e) 4606 raise ValueError("Empty data passed with indices specified.") 4607 raise ValueError("Shape of passed values is {0}, indices imply {1}".format( -> 4608 passed, implied)) 4609 4610 ValueError: Shape of passed values is (26, 9982), indices imply (0, 25) 

How to can I understand what parameters pass on index and columns attributes?

1
  • columns are the columns that you require for your table. indices are kind of group by value by which the table will be indexed with.CommentedMay 6, 2018 at 6:08

1 Answer 1

1

Use -

numpy_x=np.random.random((100,10)) df=pd.DataFrame(numpy_x) 

Output

 0 1 2 3 4 5 6 \ 0 0.204839 0.837503 0.696896 0.235414 0.594766 0.521302 0.841167 1 0.041490 0.679537 0.657314 0.656672 0.524983 0.936918 0.482802 2 0.318928 0.423196 0.218037 0.515017 0.107851 0.564404 0.218297 3 0.644913 0.433771 0.297033 0.011239 0.346021 0.353749 0.587631 4 0.127949 0.517230 0.969399 0.743442 0.268566 0.415327 0.567572 7 8 9 0 0.882685 0.211414 0.659820 1 0.752496 0.047198 0.775250 2 0.521580 0.655942 0.178753 3 0.123761 0.483601 0.157191 4 0.849218 0.098588 0.754402 

I want generate a dataframe with numpy_x data, index and columns (index and columns are the same really?)

Yes and no. Index is simply the axis labelling information in pandas. Depending upon the axis, Index can either mean row indexing or column indexing.

The axis labeling information in pandas objects serves many purposes:

  • Identifies data (i.e. provides metadata) using known indicators, important for analysis, visualization, and interactive console display
  • Enables automatic and explicit data alignment
  • Allows intuitive getting and setting of subsets of the data set

It can also be a simple single integer index or it can also be Multi-Index

Index and Columns Parameter

The columns parameter is simply the column labels that you want to provide to your dataset, in this case you want to pass 26 names for the 26 columns in your numpy array. This will default to np.arange(n) if no column labels are provided

The index parameter is simply the Index to use for the resulting frame. Will default to np.arange(n) if no indexing information part of input data and no index provided (which is what is the case in my example)

2
  • This mean, that the pd function receive a numpy array and by default take your content, index and columns and is in the capacity or transform it on dataframe ...
    – bgarcial
    CommentedMay 6, 2018 at 6:22
  • 1
    Yes. Actually most of the pd implementations are in numpy. So by default it supports a lot of numpy operations off the shelf including consuming numpy arraysCommentedMay 6, 2018 at 6:24

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.