Measuring the first V – Volume of big data becomes critical and essential. Here are some samples in the technologies I’ve worked on.
SQL Server
sp_helpdb ‘database_name’ – returns the size of data and log files of a database
sp_spaceused ‘table_name’ – returns the size and index of a table
GreenPlum / PostgreSQL
# SELECT sodddatname, sodddatasize from gp_toolkit.gp_size_database; – returns the size of all databases using gp_toolkit
# SELECT pg_database_size(‘database_name’); – returns the size of database in bytes
# SELECT pg_size_pretty(pg_database_size(‘database_name’)); – returns the size of database in MBs
# SELECT pg_size_pretty(pg_relation_size(‘schema.tablename’)); – returns the size of a non-partitioned table excluding indexes
Hive / HDFS
sudo -u hdfs hadoop fs -du /user/hive/warehouse/ | awk ‘/^[0-9]+/ { print int($1/(1024**3)) ” [GB]\t” $2 }’