8000 GitHub - anpandu/i1m: Exploring Go language and its features to write high-performance program.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

anpandu/i1m

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inserting 1 Million data into BigQuery Table

Introduction

This is the source code of my blog post "Inserting 1 Million data into BigQuery Table". In this blog post, we will demonstrate how to insert JSON text file into Google's BigQuery Table using Go language. Go is known for one of the best language to write high-performance programs due to its native libraries that make concurrent and parallel programming easier. We will also demonstrate how to use Go various native libraries (channel, goroutine, waitgroup).

Usage

Installation

# Install Go (1.12.1)
go version
# go version go1.12.1 linux/amd64

# Clone this project
go get https://github.com/anpandu/go-json-to-bq

# Install dependencies
cd $GOPATH/src/github.com/anpandu/go-json-to-bq
go get

# Generate dataset
python3 gen-txt.py
# you will see files like this
# -rw-rw-r-- 1 pandu pandu  220 Mar  5 18:31 students-10.json.txt
# -rw-rw-r-- 1 pandu pandu 2.3K Mar  5 18:31 students-100.json.txt
# -rw-rw-r-- 1 pandu pandu  24K Mar  5 18:31 students-1000.json.txt
# -rw-rw-r-- 1 pandu pandu 244K Mar  5 18:31 students-10000.json.txt
# -rw-rw-r-- 1 pandu pandu 2.5M Mar  5 18:31 students-100000.json.txt
# -rw-rw-r-- 1 pandu pandu  26M Mar  5 18:31 students-1000000.json.txt

# Enable credential
export GOOGLE_APPLICATION_CREDENTIALS=~/mykey.json

How to use

# Run Part One: Simple Approach
go run cmd/main1/main1.go \
  --project=myproject \
  --dataset=mydataset \
  --table=mytable \
  --filepath=./students-10.json.txt

# Run Part Two: Multiple Rows Insertion
go run cmd/main2/main2.go \
  --project=myproject \
  --dataset=mydataset \
  --table=mytable \
  --buffer-length=100 \
  --filepath=./students-10.json.txt

# Part Three: Multiple Workers and Multiple Rows Insertion
go run cmd/main3/main3.go \
  --project=myproject \
  --dataset=mydataset \
  --table=mytable \
  --buffer-length=100 \
  --worker=4 \
  --filepath=./students-10.json.txt

Benchmark

Benchmark was taken using same type of machine, n1-standard-1 (1 vCPU, 3.75 GB mem). Using multiple JSON text files generated at different sizes, we benchmark three approaches and measure time taken.

File Parameter 1.000 rows 10.000 100.000 1.000.000
main.go w1 - n1 312.164s 3242.766s n/a n/a
main2.go w1 - n100 4.599s 35.735s 381.251s 3738.669s
main3.go w4 - n100 1.453s 9.137s 95.666s 939.175s
main3.go w16 - n100 0.808s 2.938s 24.754s 224.630s
main3.go w64 - n100 0.848s 1.643s 11.667s 62.624s
main3.go w300 - n500 0.934s 1.296s 4.081s 14.787s

As we can see, higher number of w and n make insertion faster. Using our highest configuration enable us to insert one million rows in a mere 14 seconds!

License

MIT © Ananta Pandu

About

Exploring Go language and its features to write high-performance program.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0