Skip to content Skip to sidebar Skip to footer

Can't Export Cassandra Table Using Python

I am trying to export Cassandra table to CSV format using Python. But I couldn't do it. However, I am able to execute 'select' statement from Python. I have used the following code

Solution 1:

This is not working for you because COPY is not a part of CQL.

COPY is a cqlsh-only tool.

You can invoke this via command line or script by using the -e flag:

cqlsh 127.0.0.1 -u username -p password -e "copy chandan.emp (id,name) to 'E:\HANA\emp.csv' with HEADER = true"

Edit 20170106:

export Cassandra table to CSV format using Python

Essentially... How do I export an entire Cassandra table?

I get asked this a lot. The short answer...is DON'T.

Cassandra is best-used to store millions or even billions of rows. It can do this, because it distributes its load (both operational and size) over multiple nodes. What it's not good at, are things like deletes, in-place updates, and unbound queries. I tell people not to do things like full exports (unbound queries) for a couple reasons.

First of all, running an unbound query on a large table in a distributed environment is usually a very bad idea (introducing LOTS of network time and traffic into your query). Secondly, you're taking a large result set that is stored on multiple nodes, and condensing all of that data into a single file...probably also not a good idea.

Bottom line: Cassandra is not a relational database, so why would you treat it like one?

That being said, there are tools out there designed to handle things like this; Apache Spark being one of them.

Please help me to execute the query with session.execute() statement.

If you insist on using Python, then you'll need to do a few things. For a large table, you'll want to query by token range. You'll also want to do that in small batches/pages, so that you don't tip-over your coordinator node. But to keep you from re-inventing the wheel, I'll tell you that there already is a tool (written in Python) that does exactly this: cqlsh COPY

In fact the newer versions of cqlsh COPY have features (PAGESIZE and PAGETIMEOUT) that allow it to avoid timeouts on large data sets. I have used the new cqlsh to successfully export 370 million rows before, so I know it can be done.

Summary: Don't re-invent the wheel. Write a script that uses cqlsh COPY, and leverages all of those things I just talked about.

Post a Comment for "Can't Export Cassandra Table Using Python"