Measuring Databases

Measuring the first V – Volume of big data becomes critical and essential. Here are some samples in the technologies I’ve worked on.

SQL Server

sp_helpdb ‘database_name’ – returns the size of data and log files of a database

sp_spaceused ‘table_name’ – returns the size and index of a table

GreenPlum / PostgreSQL

# SELECT sodddatname, sodddatasize from gp_toolkit.gp_size_database; – returns the size of all databases using gp_toolkit

# SELECT pg_database_size(‘database_name’); – returns the size of database in bytes

# SELECT pg_size_pretty(pg_database_size(‘database_name’)); – returns the size of database in MBs

# SELECT pg_size_pretty(pg_relation_size(‘schema.tablename’)); – returns the size of a non-partitioned table excluding indexes 

Hive / HDFS

sudo -u hdfs hadoop fs -du /user/hive/warehouse/ | awk ‘/^[0-9]+/ { print int($1/(1024**3)) ” [GB]\t” $2 }’

This entry was posted in Code Snippet and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s