How to create tensors in spark?

Asked5 years, 8 months ago

Viewed 293 times

I have the following data stored in HDFS: each row has three columns, id, date, item, which means a person with a particular id bought a particular item on a particular date. The dataset has billions of rows and all rows are distinct. I can query this table by hive.

Now, I want to read this table in memory and transform this table to rdd so that I can use spark to process it.

Particularly, I want to transform the table in a form similar to a 3 dimensional ndarray in python's numpy: $X_{ijk}$ = 1 if person $i$ bought item $k$ on date $j$, otherwise $X_{ijk}=0$. In other words, I want an rdd such that each record is a matrix with 0s and 1s.

How can I achieve this in python?

asked Aug 6, 2019 at 23:46

neverevernever

1211 bronze badge

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

How to create tensors in spark?

0

Hot Network Questions

How to create tensors in spark?

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Related

Hot Network Questions