database - How Row Key is designed in Hbase -
i writing program converts rdbms hbase. selected sequential entity row key employee id (1,2,3....)but read somewhere row key shouldn't sequential entity. question why selecting sequential row key not recommended. design prospects associated doing same?
although sequential rowkeys allow faster scans, becomes problem after point causes undesirable regionserver hotspotting during read/write time. default behavior hbase stores rows similar keys same region. allows faster range scans. if rowkeys sequential of data start going same machine causing uneven load on machine. called regionserver hotspotting , main motivation behind not using sequential keys. i'll take "writes" explain problem here.
when records sequential keys being written hbase writes hit 1 region. not problem if region served multiple regionservers, not case – each region lives on 1 regionserver. each region has pre-defined maximal size, after region reaches size split in 2 smaller regions. following that, 1 of these new regions takes new records , region , regionserver serves becomes new hotspot victim. obviously, uneven write load distribution highly undesirable because limits write throughput capacity of single server instead of making use of multiple/all nodes in hbase cluster.
you can find explanation of problem along solution here.
you might find page helpful, shows how design rowkeys efficiently.
hope answers question.
Comments
Post a Comment