Hello Everyone! Today we are going to discuss
about “how we can manage the datasets and tables in BigQuery by using Python”.
Before starting the discussion, you might have question like Google could platform
provided an awesome console for the BigQuery service what is needed to do same
things by using python. Exactly I was also thinking like when I was dealing
with less volume of data, but when I got large amount of data then I felt
difficulty.
I will try to cover all operations
and compare the schema and data types of BigQuery and other databases.
So let’s start…
1) Convert
Relational database schema (DDL) into BigQuery Schema:
Let’s understand
thing with the help of example. In this illustration I am converting Oracle DDL
into BigQuery schema. If you are familiar with BigQuery then you must know what
kind of data types are supported by BigQuery, I have written few data types
below and you can get more details about it on official
gcp websites.
·
Numeric Types:
(a)
integer
(b)
Numeric
(c)
floating
·
Boolean Types
·
String
·
Bytes Types
·
Date Types
Here below is the schema which in supported in Oracle
database and you can see that there are some data types(CLOB, VARCHAR etc.)
which are not supported in BigQuery that’s why we need convert to schema after
that we can easily create table into datasets.
CREATE TABLE Test.OracleDB
(
RECORDID NUMBER(15) NOT NULL,
ACTLINKID NUMBER(15) NOT NULL,
ACTIONINDEX NUMBER(15) NOT NULL,
SERVERNAME VARCHAR2(64) NOT NULL,
GUIDENAME VARCHAR2(254) NOT NULL,
GUIDEMODE NUMBER(15) NOT NULL,
GUIDETABLEID NUMBER(15) NOT NULL,
ASSIGNSHORT VARCHAR2(255),
ASSIGNLONG CLOB,
SAMPLESERVER VARCHAR2(64),
SAMPLEGUIDE VARCHAR2(254),
OVERLAYGROUP VARCHAR2(254) NOT NULL
);
We
can convert Oracle Schema into BigQuery supported schema by using below python
Script. First you need to install ddlparse python module then use the
below code and replace the sample_ddl value with your desire Oracle
schema.
from ddlparse.ddlparse import DdlParse
# Change the schema According your requirement :
sample_ddl = """
CREATE TABLE ARADMIN.Actlink_call
(
RECORDID NUMBER(15) NOT NULL,
ACTLINKID NUMBER(15) NOT NULL,
ACTIONINDEX NUMBER(15) NOT NULL,
SERVERNAME VARCHAR2(64) NOT NULL,
GUIDENAME VARCHAR2(254) NOT NULL,
GUIDEMODE NUMBER(15) NOT NULL,
GUIDETABLEID NUMBER(15) NOT NULL,
ASSIGNSHORT VARCHAR2(255),
ASSIGNLONG CLOB,
SAMPLESERVER VARCHAR2(64),
SAMPLEGUIDE VARCHAR2(254),
OVERLAYGROUP VARCHAR2(254) NOT NULL
);
"""
table = DdlParse().parse(sample_ddl)
print("* BigQuery Schema *")
print(table)
1)
Create dataset using Python:
First
need to install “google-cloud-bigquery” library then we use the
below python code.
from google.cloud import bigquery
client = bigquery.Client()
#Set dataset_id to the ID of the dataset to create, here i have named "demodata" as dataset_id.
dataset_id = "{}.demodata".format(client.project)
dataset = bigquery.Dataset(dataset_id)
dataset.location = "US"
dataset = client.create_dataset(dataset) # Make an API request.
print("Created dataset {}.{}".format(client.project, dataset.dataset_id))
2)
Create the Tables in dataset:
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name"
schema = [
bigquery.SchemaField("full_name", "STRING", mode="REQUIRED"),
bigquery.SchemaField("age", "INTEGER", mode="REQUIRED"),
]
table = bigquery.Table(table_id, schema=schema)
table = client.create_table(table) # Make an API request.
print(
"Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)
3)
List the tables belonging to a dataset with the list_tables()
method:
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# Set dataset_id to the ID of the dataset that contains
#the tables you are listing.
# dataset_id = 'your-project.your_dataset'
tables = client.list_tables(dataset_id) # Make an API request.
print("Tables contained in '{}':".format(dataset_id))
for table in tables:
print("{}.{}.{}".format(table.project, table.dataset_id, table.table_id))
0 comments:
Post a Comment