Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

Insufficient RAM for co-occurrence matrix

Hello,
We tried creating the co-occurrence matrix for the text classification project with all the tweets, however we run out of RAM. We tried it on Google Colab, but even with 25GB of RAM, it was not enough to create the matrix and the kernel crashes whene executing this line:
cooc = coo_matrix((data, (row, col)))

How are we supposed to create it? Do you have any ressource you know of that we could use to do it?

Thank you!

Hi,

The matrix should comfortably fit in memory, as long as it is stored in a sparse format (coo_matrix).

coo_matrix will not store more data than "data", "row" and "col".

What are you passing into coo_matrix? data should be a list counts (not a matrix), and row and col should be lists of the same size, containing indices of the words that co-occur.

Hope this helps.

If you still struggle, please provide a minimal breaking example on collab, so we can take a look.

Thank you for the quick reply.
It is so weird, I can build the three lists just fine (thanks to having 25GB of RAM), but then the kernal crashes after a few minutes when trying to build the matrix because apparently it ran out of memory.
I am using the provided code without modification and can't figure out why it crashes, so that's the minimal breaking example.
Any clue where it could come from?

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification