Posts

Showing posts from October, 2018

Modin dataframes and IBM Cloud Object Storage

Modin  is a  Python framework capable to efficiently scale Pandas dataframe.  To achieve this Modin uses a high performance distributed  Ray framework. This short post explains how to use  Modin and read data objects from IBM Cloud Object Storage. Requirements IBM Cloud Object Storage account If you doesn't have one already, navigate to IBM Cloud and choose IBM Cloud Object Storage . Using dashboard, create a new bucket and upload some CSV objects there. You will need to obtain HMAC credentials for the bucket, just follow simple steps as described  here Python and dependencies I used Python 3.6 but i assume other versions will work as well.  Install the following packages:   IBM COS SDK for Python , smart_open  (we will use smart_open to access IBM Cloud Object Storage) and  modin Example import modin.pandas as pd import ibm_boto3 import smart_open if __name__ == '__main__' :       access_key =   'ACCESS KEY'     secret_key =   &