0% found this document useful (0 votes)
121 views

Python Hana - ML - Define HANA Table and Upload (Create - Dataframe - From - Pandas) - SAP Blogs

Uploaded by

Lohith
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views

Python Hana - ML - Define HANA Table and Upload (Create - Dataframe - From - Pandas) - SAP Blogs

Uploaded by

Lohith
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

9/6/22, 3:07 PM Python hana_ml: define HANA table and upload(create_dataframe_from_pandas) | SAP Blogs

Community Topics Groups Answers Blogs Events Programs Resour

Ask a Question Write a Blog Post Login

Technical Articles

Yohei Fukuhara
September 6, 2022
| 3 minute read

Python hana_ml: define HANA table


and
upload(create_dataframe_from_pan
Follow
das)
 Like  0  1  65

 RSS Feed I am writing this blog to define HANA tables and upload data to the table using
python package hana_ml.

Environment
Environment is as below.

Python: 3.7.13(Google Colaboratory)


HANA: Cloud Edition 2022.16

Python packages and their versions.

hana_ml: 2.13.22072200
pandas: 1.3.5
scikit-learn: 1.0.2

As for HANA Cloud, I activated scriptserver and created my users.  Though I don’t
recognize other special configurations, I may miss something since our HANA
Cloud was created long time before.

https://blogs.sap.com/2022/09/06/python-hana_ml-define-hana-table-and-uploadcreate_dataframe_from_pandas/ 1/7
9/6/22, 3:07 PM Python hana_ml: define HANA table and upload(create_dataframe_from_pandas) | SAP Blogs

I didn’t use HDI here to make environment simple.

Python Script
1. Install Python packages
Install python package hana_ml, which is not pre-installed on Google
Colaboratory.

As for pandas and scikit-learn, I used pre-installed ones.

!pip install hana_ml

Looking in indexes: https://pypi.org/simple, https://us-python.pkg


Collecting hana_ml

Downloading hana_ml-2.13.22072200-py3-none-any.whl (5.1 MB)

|████████████████████████████████| 5.1 MB 4.2 MB/s

Collecting hdbcli==2.13.13

Downloading hdbcli-2.13.13-cp34-abi3-manylinux1_x86_64.whl (11.6


|████████████████████████████████| 11.6 MB 15.5 MB/s

Requirement already satisfied: matplotlib in /usr/local/lib/python


Collecting Deprecated

Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)

Requirement already satisfied: pandas>=0.24.2 in /usr/local/lib/pyt


Requirement already satisfied: tqdm>=4.43.0 in /usr/local/lib/pytho
Requirement already satisfied: numpy>=1.16.4 in /usr/local/lib/pyth
Requirement already satisfied: pydot in /usr/local/lib/python3.7/d
Collecting schedule

Downloading schedule-1.1.0-py2.py3-none-any.whl (10 kB)

Collecting htmlmin>=0.1.12

Downloading htmlmin-0.1.12.tar.gz (19 kB)

Requirement already satisfied: python-dateutil>=2.7.3 in /usr/loca


Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/pytho
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7
Requirement already satisfied: wrapt<2,>=1.10 in /usr/local/lib/pyt
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/pytho
Requirement already satisfied: typing-extensions in /usr/local/lib/
Building wheels for collected packages: htmlmin

Building wheel for htmlmin (setup.py) ... done

Created wheel for htmlmin: filename=htmlmin-0.1.12-py3-none-any.w


Stored in directory: /root/.cache/pip/wheels/70/e1/52/5b14d250ba8
Successfully built htmlmin

Installing collected packages: schedule, htmlmin, hdbcli, Deprecate


Successfully installed Deprecated-1.2.13 hana-ml-2.13.22072200 hdb

https://blogs.sap.com/2022/09/06/python-hana_ml-define-hana-table-and-uploadcreate_dataframe_from_pandas/ 2/7
9/6/22, 3:07 PM Python hana_ml: define HANA table and upload(create_dataframe_from_pandas) | SAP Blogs

2. Import modules
Import python package modules.

from hana_ml.dataframe import ConnectionContext, create_dataframe_f


import pandas as pd

from sklearn.datasets import make_classification

3. Connect to HANA Cloud


Connect to HANA Cloud and check its version.

ConnectionContext class is for connection to HANA.

HOST = '<HANA HOST NAME>'

SCHEMA = USER = '<USER NAME>'

PASS = '<PASSWORD>'

conn = ConnectionContext(address=HOST, port=443, user=USER,

password=PASS, schema=SCHEMA)

print(conn.hana_version())

4.00.000.00.1660640318 (fa/CE2022.16)

4. Create test data


Create test data using scikit-learn.

There are 3 features and 1 target variable.

def make_df():

X, y = make_classification(n_samples=1000,

n_features=3, n_redundant=0)

df = pd.DataFrame(X, columns=['X1', 'X2', 'X3'])

df['CLASS'] = y

return df

df = make_df()

print(df)

df.info()

Here is  dataframe overview.

X1 X2 X3 CLASS

0 0.964229 1.995667 0.244143 1

https://blogs.sap.com/2022/09/06/python-hana_ml-define-hana-table-and-uploadcreate_dataframe_from_pandas/ 3/7
9/6/22, 3:07 PM Python hana_ml: define HANA table and upload(create_dataframe_from_pandas) | SAP Blogs

1 -1.358062 -0.254956 0.502890 0

2 1.732057 0.261251 -2.214177 1

3 -1.519878 1.023710 -0.262691 0

4 4.020262 1.381454 -1.582143 1

.. ... ... ... ...

995 -0.247950 0.500666 -0.219276 1

996 -1.918810 0.183850 -1.448264 0

997 -0.605083 -0.491902 1.889303 0

998 -0.742692 0.265878 -0.792163 0

999 2.189423 0.742682 -2.075825 1

[1000 rows x 4 columns]

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 1000 entries, 0 to 999

Data columns (total 4 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 X1 1000 non-null float64

1 X2 1000 non-null float64

2 X3 1000 non-null float64

3 CLASS 1000 non-null int64

dtypes: float64(3), int64(1)

memory usage: 31.4 KB

5. define table and upload data


Define HANA Table and upload data using function
“create_dataframe_from_pandas”.

The function is very useful, since it automatically define table and upload at the
same time.  Please check options for further detail.

TRAIN_TABLE = 'PAL_TRAIN'

dfh = create_dataframe_from_pandas(conn, df, TRAIN_TABLE,

schema=SCHEMA,

force=True, # True: truncate and inse


replace=True, # True: Null is replaced
chunk_size=100_000)

6. Check upload result


check the result of HANA table definition and upload result.  With HANA
dataframe, python has connection to HANA table, so need to collect when
getting data.

def show_hana_df(dfh):

print(dfh.collect())

https://blogs.sap.com/2022/09/06/python-hana_ml-define-hana-table-and-uploadcreate_dataframe_from_pandas/ 4/7
9/6/22, 3:07 PM Python hana_ml: define HANA table and upload(create_dataframe_from_pandas) | SAP Blogs

print(f'Table Structure: {dfh.get_table_structure()}')

print(dfh.describe().collect())

show_hana_df(dfh)

X1 X2 X3 CLASS

0 0.964229 1.995667 0.244143 1

1 -1.358062 -0.254956 0.502890 0

2 1.732057 0.261251 -2.214177 1

3 -1.519878 1.023710 -0.262691 0

4 4.020262 1.381454 -1.582143 1

.. ... ... ... ...

995 -0.247950 0.500666 -0.219276 1

996 -1.918810 0.183850 -1.448264 0

997 -0.605083 -0.491902 1.889303 0

998 -0.742692 0.265878 -0.792163 0

999 2.189423 0.742682 -2.075825 1

[1000 rows x 4 columns]

Table Structure: {'X1': 'DOUBLE', 'X2': 'DOUBLE', 'X3': 'DOUBLE',


column count unique nulls mean std min
0 X1 1000 1000 0 -0.022545 1.403956 -4.543441 4.020
1 X2 1000 1000 0 0.006131 0.987222 -3.019512 3.926
2 X3 1000 1000 0 -0.048433 1.322137 -3.836929 3.994
3 CLASS 1000 2 0 0.498000 0.500246 0.000000 1.000

median 25_percent_cont 25_percent_disc 50_percent_cont \

0 -0.197133 -1.054538 -1.056985 -0.197133

1 -0.000251 -0.648000 -0.649373 -0.000251

2 -0.224366 -1.021008 -1.021405 -0.224366

3 0.000000 0.000000 0.000000 0.000000

50_percent_disc 75_percent_cont 75_percent_disc

0 -0.197984 0.990549 0.990513

1 -0.000709 0.666021 0.665924

2 -0.229456 0.969732 0.967854

3 0.000000 1.000000 1.000000

7. Close connection
Last but not least, closing connection explicitly is preferable.

conn.close()

Alert Moderator
https://blogs.sap.com/2022/09/06/python-hana_ml-define-hana-table-and-uploadcreate_dataframe_from_pandas/ 5/7
9/6/22, 3:07 PM Python hana_ml: define HANA table and upload(create_dataframe_from_pandas) | SAP Blogs
Alert Moderator

Assigned Tags

SAP HANA

Python

hana_ml

hana-ml

Similar Blog Posts 


Import multiple excel files into a single SAP HANA table
By
Raymond Yao Dec 17, 2020

Multi-model in hana_ml 2.6 for Python (part 00): Setup


By
Witalij Rudnicki Nov 05, 2020

Multi-model in hana_ml 2.6 for Python (part 01): Dataframes


By
Witalij Rudnicki Nov 10, 2020

Related Questions 
Error while calling procedure from python sqlalchemy
By
Arunkumar KP Mar 11, 2020

batch insert of tuples into HANA using HANA Client in SAP Data Intelligence
By
Rajarshi Muhuri Jul 16, 2021

Datatype guessing
By
Marvin Kennel Sep 01, 2022

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

https://blogs.sap.com/2022/09/06/python-hana_ml-define-hana-table-and-uploadcreate_dataframe_from_pandas/ 6/7
9/6/22, 3:07 PM Python hana_ml: define HANA table and upload(create_dataframe_from_pandas) | SAP Blogs

Find us on

Privacy Terms of Use

Legal Disclosure Copyright

Trademark Cookie Preferences

Newsletter Support

https://blogs.sap.com/2022/09/06/python-hana_ml-define-hana-table-and-uploadcreate_dataframe_from_pandas/ 7/7

You might also like