Technology Sharing

Go Language Introduction: Map Detailed Explanation

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Go Language Introduction: Map Detailed Explanation

1. Basic Definition

A map is an unordered collection of key-value pairs.

(1) Characteristics

  • Type characteristics: map is a reference type, so updating the value in the function will change it permanently
  • Sequential characteristics: Map traversal is unordered because the underlying layer is a hash table, which is unordered
  • Initialization: 0 value or uninitialized value is nil. Uninitialized values ​​cannot be assigned, otherwise panic will occur.
  • Key-value pair: key and value always appear in pairs, key is unique and can be any comparable type
  • Dynamicity: Key-value pairs can be added or deleted dynamically at runtime without the need to declare the size in advance
  • Fast search: map provides fast search, insertion and deletion operations, with an average time complexity of O(1)
  • Concurrency: Not thread-safe, locking is required to ensure safety

(2) Definition statement

var name map[key_type]value_type
  • 1
  • name: variable name
  • key_type: key type
  • value_type: the type of value
// 方式一
var m map[int]string = map[int]string{}

// 方式二
m := map[int]string{
    1 : "老一",
    2 : "老二",
    3 : "老三",
}

// 方式三:5代表容量,也就是在内存中占用多大的空间,可以省略
m := make(map[int]string,5)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

2. Basic use

(1) Adding elements

  • 1. The most common way is to add it when declaring the map by literal value, as shown in method 2 above
  • 2. The second is to directly set the corresponding value for the specified key
mapName[key] = value

// 假设map名为m,key为int,value为string
m[5] = "老五"
  • 1
  • 2
  • 3
  • 4

(2) Deleting an element

Delete elements according to the key, and no error will be reported when deleting a non-existent key

delete(mapName, key)  

// 假设map名为m,key为int,value为string
delete(m, 5)
  • 1
  • 2
  • 3
  • 4

(3) Modify elements

Just modify the value corresponding to the specified key.

mapName[key] = newValue 

// 假设map名为m,key为int,value为string
m[5] = "五"
  • 1
  • 2
  • 3
  • 4

(4) Get elements

Get the value according to the key. OK is the flag bit of whether it is found. The type is Boolean.

If the value is not found, no error is reported and a null value of the corresponding type is returned

value, ok := mapName[key]
if !ok {
		fmt.Println(ok)
	}
  • 1
  • 2
  • 3
  • 4

(5) Traverse all elements

Note: Map traversal is unordered

for key, value := range myMap {
   // 处理每对键值
}

// 例子
for i, s := range m {
		fmt.Println(i, s)
	}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

3. Underlying principles

The underlying essence of the map in golang language is to usehashmapTo implement, so map is essentially哈希表

哈希表It is a use哈希函数Organize data into data structures that support fast insertion and searching.

哈希函数, also known as hash function, is a function that transforms an input of any length (such as a string) into an output of fixed length through a specific hash algorithm. Hash values ​​are usually stored in an array-like form to ensure the access performance of hash values.

If the input range exceeds the range of the mapped output, it may result in different inputs getting the same output.哈希冲突

There are usually two ways to solve this problem:开放地址法and拉链法

(1) Implementation of map

Open address method:

Arrays are often used to implement data structures

  • 1. First, the array is composed of different hash values, called a hash table
  • 2. Then many keys are processed and the hash function is performed to confirm the address and put it into the corresponding position. If this slot in the hash table is already occupied, a probing sequence (such as linear probing, quadratic probing, or double hashing) is used to find the next available slot and store the conflicting key there.

shortcoming:

This approach requires more space to resolve collisions because not only does the data have to be stored, but additional space is required to resolve the collision.

Zipper method (map in go language uses this method):

Arrays and linked lists are often used as underlying data structures

  • 1. First, the array is composed of different hash values, called a hash table. Each slot in the hash table will store a linked list.
  • 2. Then many keys will come in. After hashing different keys, the hash value is calculated modulo and linked to the array. If the hash value is the same (hash collision), the new key will be hung behind the existing key list.
  • 3. When a specific key needs to be found, first use the hash function to determine its position, and then perform a linear search on the linked list at that position until a matching key is found or the end of the linked list is reached.

The linked list at different indexes of the array is also called a bucket.

(2) The underlying structure of the map

hmap
type hmap struct {
 count     int // 当前哈希表中的元素数量,即键值对数量,可用内置函数len()获取
 flags     uint8  // 标志位,标记map状态和属性的字段,如正在迭代等状态
 B         uint8  // 表示哈希表桶(buckets)的数量为2的B次方
 noverflow uint16 // 溢出桶的大致数量,扩容时会用到
 hash0     uint32 // 哈希种子,对key做哈希是加入种子计算哈希值,确保map安全性

 buckets    unsafe.Pointer // 存储桶数组的指针
 oldbuckets unsafe.Pointer // 扩容时用于保存旧桶数组的指针 , 大小为新桶数组的一半
 nevacuate  uintptr        // 扩容时的迁移进度器,迁移桶下标小于此值说明完成迁移

 extra *mapextra // 溢出桶的指针,指向mapextra结构体,用于存储一些额外的字段和信息
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
// mapextra 处理桶溢出的结构体
type mapextra struct {
 overflow    *[]*bmap    // 溢出桶数组指针,仅当key和elem非指针时才使用
 oldoverflow *[]*bmap    // 旧的溢出桶数组指针,仅当key和elem非指针时才使用

 nextOverflow *bmap      // 下一个可用的溢出桶地址
}

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
bmap

In the source code, the bmap type has only one tophash field. However, during compilation, the Go compiler will automatically inject the corresponding key, value, and other structures according to the user code.

Surface bmap

type bmap struct {
	// tophash generally contains the top byte of the hash value
	// for each key in this bucket. If tophash[0] < minTopHash,
	// tophash[0] is a bucket evacuation state instead.
	tophash [bucketCnt]uint8
	// Followed by bucketCnt keys and then bucketCnt elems.
	// NOTE: packing all the keys together and then all the elems together makes the
	// code a bit more complicated than alternating key/elem/key/elem/... but it allows
	// us to eliminate padding which would be needed for, e.g., map[int64]int8.
	// Followed by an overflow pointer.
}

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

The actual bmap

// 编译期间会动态地创建一个新的结构:
type bmap struct {
   topbits  [8]uint8   // 这里存储哈希值的高八位,用于在确定key的时候快速试错,加快增删改查寻址效率,有时候也叫tophash
   keys     [8]keytype   // 存储key的数组,这里bmap最多存储8个键值对
   elems   [8]valuetype    // 存储value的数组,这里bmap也最多存储8个键值对
   ...
   overflow uintptr     // 溢出桶指针
}

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
Map bottom layer diagram

insert image description here

Map expansion

In the Go language, map expansion is automatic to maintain map performance.

First, the map is written throughruntime.mapassignDetermine whether capacity expansion is needed

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
...
 // If we hit the max load factor or we have too many overflow buckets,
 // and we're not already in the middle of growing, start growing.
 if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
  hashGrow(t, h)
  goto again // Growing the table invalidates everything, so try again
 }
...
}

// overLoadFactor reports whether count items placed in 1<<B buckets is over loadFactor.
func overLoadFactor(count int, B uint8) bool {
 return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}

func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {
 if B > 15 {
  B = 15
 }
 return noverflow >= uint16(1)<<(B&15)
}

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23

According to the above code, there are two conditions for capacity expansion:

  • Load factor exceeds the threshold of 6.5:overLoadFactor(h.count+1, h.B) , Load factor = number of elements ÷ number of buckets
  • Too many overflow buckets used (exceeded 32768):tooManyOverflowBuckets(h.noverflow, h.B))

Expansion method:

  • Incremental expansion:

When the load factor is too large, a new bucket is created, the length of the new bucket is twice that of the original bucket, and then the data in the old bucket is moved to the new bucket.

  • Equal expansion

There is not much data, but there are too many overflow buckets. When expanding, the number of buckets remains unchanged. Repeat the migration action similar to incremental expansion to rearrange the loose key-value pairs to make the bucket usage more efficient and ensure faster access.

Expansion steps:

  • 1. New bucket array: Create a new one with the same size as the original onedoubleThe new bucket array, map will be marked as expanded.
func hashGrow(t *maptype, h *hmap) {
 ...
 // 原有桶设置给oldbuckets
 oldbuckets := h.buckets   
 // 创建新桶
 newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

 flags := h.flags &^ (iterator | oldIterator)
 if h.flags&iterator != 0 {
  flags |= oldIterator
 }
 // commit the grow (atomic wrt gc)
 h.B += bigger
 h.flags = flags
 h.oldbuckets = oldbuckets
 h.buckets = newbuckets
 h.nevacuate = 0
 h.noverflow = 0

 ...
}

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 2. Re-hash: Use oldbuckets to point to the original bucket array, buckets to point to the new bucket array, traverse all key-value pairs in the old bucket array, and use the hash function to recalculate the position of each key and insert them into the new bucket array.
// 这个是mapdelete函数中的处理迁移的位置
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
...
 if h.growing() {
  // 
  growWork(t, h, bucket)
 }
...
}

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 3. Gradual Migration: In order to avoid pausing the entire program when expanding capacity, Go's Map implementation may choose渐进式驱逐Migrate key-value pairs. This means that during the expansion period, the old bucket array and the new bucket array will exist at the same time, the newly inserted key-value pairs will be directly placed in the new bucket, and access to the old bucket will trigger the migration operation.
// 进入后是一个简单的判断,之后的evacuate是核心逻辑处理,特别多,感兴趣自己看源码
func growWork(t *maptype, h *hmap, bucket uintptr) {
	// make sure we evacuate the oldbucket corresponding
	// to the bucket we're about to use
	evacuate(t, h, bucket&h.oldbucketmask())

	// evacuate one more oldbucket to make progress on growing
	if h.growing() {
		evacuate(t, h, h.nevacuate)
	}
}

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 4. Update internal state: When all key-value pairs in oldbuckets are moved, the internal state of the Map will be updated and oldbuckets will be deleted.

4. Usage scenarios

  • 1.quick search:When you need to quickly find a value based on a key, Map provides a search performance with an average time complexity of O(1).
  • 2.Deduplication: When it is necessary to store unique keys, Map keys are not allowed to be repeated, so the deduplication function can naturally be achieved.
  • 3.Dynamic Collection: Map provides flexible operations when key-value pairs need to be added or deleted dynamically.
  • 4.Linked Data: When data exists in the form of key-value pairs and needs to be updated or queried frequently, Map is a good choice.

5. Usage suggestions

  1. Preallocation: Try to use the make function to allocate capacity for maps of known size
  2. Data type selection: Use larger data types, such asintorint64
  3. Pointer storage value: For structures or variables with large amounts of data, try to use pointers to pass values ​​and store values.
  4. Concurrency Control: For concurrent access, usesync.MapOr implement your own concurrent safe Map.

6. References