2024-07-12
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
A map is an unordered collection of key-value pairs.
var name map[key_type]value_type
// 方式一
var m map[int]string = map[int]string{}
// 方式二
m := map[int]string{
1 : "老一",
2 : "老二",
3 : "老三",
}
// 方式三:5代表容量,也就是在内存中占用多大的空间,可以省略
m := make(map[int]string,5)
mapName[key] = value
// 假设map名为m,key为int,value为string
m[5] = "老五"
Delete elements according to the key, and no error will be reported when deleting a non-existent key
delete(mapName, key)
// 假设map名为m,key为int,value为string
delete(m, 5)
Just modify the value corresponding to the specified key.
mapName[key] = newValue
// 假设map名为m,key为int,value为string
m[5] = "五"
Get the value according to the key. OK is the flag bit of whether it is found. The type is Boolean.
If the value is not found, no error is reported and a null value of the corresponding type is returned
value, ok := mapName[key]
if !ok {
fmt.Println(ok)
}
Note: Map traversal is unordered
for key, value := range myMap {
// 处理每对键值
}
// 例子
for i, s := range m {
fmt.Println(i, s)
}
The underlying essence of the map in golang language is to use
hashmap
To implement, so map is essentially哈希表
。
哈希表
It is a use哈希函数
Organize data into data structures that support fast insertion and searching.
哈希函数
, also known as hash function, is a function that transforms an input of any length (such as a string) into an output of fixed length through a specific hash algorithm. Hash values are usually stored in an array-like form to ensure the access performance of hash values.
If the input range exceeds the range of the mapped output, it may result in different inputs getting the same output.
哈希冲突
。There are usually two ways to solve this problem:
开放地址法
and拉链法
Open address method:
Arrays are often used to implement data structures
shortcoming:
This approach requires more space to resolve collisions because not only does the data have to be stored, but additional space is required to resolve the collision.
Zipper method (map in go language uses this method):
Arrays and linked lists are often used as underlying data structures
The linked list at different indexes of the array is also called a bucket.
type hmap struct {
count int // 当前哈希表中的元素数量,即键值对数量,可用内置函数len()获取
flags uint8 // 标志位,标记map状态和属性的字段,如正在迭代等状态
B uint8 // 表示哈希表桶(buckets)的数量为2的B次方
noverflow uint16 // 溢出桶的大致数量,扩容时会用到
hash0 uint32 // 哈希种子,对key做哈希是加入种子计算哈希值,确保map安全性
buckets unsafe.Pointer // 存储桶数组的指针
oldbuckets unsafe.Pointer // 扩容时用于保存旧桶数组的指针 , 大小为新桶数组的一半
nevacuate uintptr // 扩容时的迁移进度器,迁移桶下标小于此值说明完成迁移
extra *mapextra // 溢出桶的指针,指向mapextra结构体,用于存储一些额外的字段和信息
}
// mapextra 处理桶溢出的结构体
type mapextra struct {
overflow *[]*bmap // 溢出桶数组指针,仅当key和elem非指针时才使用
oldoverflow *[]*bmap // 旧的溢出桶数组指针,仅当key和elem非指针时才使用
nextOverflow *bmap // 下一个可用的溢出桶地址
}
In the source code, the bmap type has only one tophash field. However, during compilation, the Go compiler will automatically inject the corresponding key, value, and other structures according to the user code.
Surface bmap
type bmap struct {
// tophash generally contains the top byte of the hash value
// for each key in this bucket. If tophash[0] < minTopHash,
// tophash[0] is a bucket evacuation state instead.
tophash [bucketCnt]uint8
// Followed by bucketCnt keys and then bucketCnt elems.
// NOTE: packing all the keys together and then all the elems together makes the
// code a bit more complicated than alternating key/elem/key/elem/... but it allows
// us to eliminate padding which would be needed for, e.g., map[int64]int8.
// Followed by an overflow pointer.
}
The actual bmap
// 编译期间会动态地创建一个新的结构:
type bmap struct {
topbits [8]uint8 // 这里存储哈希值的高八位,用于在确定key的时候快速试错,加快增删改查寻址效率,有时候也叫tophash
keys [8]keytype // 存储key的数组,这里bmap最多存储8个键值对
elems [8]valuetype // 存储value的数组,这里bmap也最多存储8个键值对
...
overflow uintptr // 溢出桶指针
}
In the Go language, map expansion is automatic to maintain map performance.
First, the map is written throughruntime.mapassign
Determine whether capacity expansion is needed
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
...
// If we hit the max load factor or we have too many overflow buckets,
// and we're not already in the middle of growing, start growing.
if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
hashGrow(t, h)
goto again // Growing the table invalidates everything, so try again
}
...
}
// overLoadFactor reports whether count items placed in 1<<B buckets is over loadFactor.
func overLoadFactor(count int, B uint8) bool {
return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}
func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {
if B > 15 {
B = 15
}
return noverflow >= uint16(1)<<(B&15)
}
According to the above code, there are two conditions for capacity expansion:
overLoadFactor(h.count+1, h.B)
, Load factor = number of elements ÷ number of bucketstooManyOverflowBuckets(h.noverflow, h.B))
Expansion method:
When the load factor is too large, a new bucket is created, the length of the new bucket is twice that of the original bucket, and then the data in the old bucket is moved to the new bucket.
There is not much data, but there are too many overflow buckets. When expanding, the number of buckets remains unchanged. Repeat the migration action similar to incremental expansion to rearrange the loose key-value pairs to make the bucket usage more efficient and ensure faster access.
Expansion steps:
func hashGrow(t *maptype, h *hmap) {
...
// 原有桶设置给oldbuckets
oldbuckets := h.buckets
// 创建新桶
newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)
flags := h.flags &^ (iterator | oldIterator)
if h.flags&iterator != 0 {
flags |= oldIterator
}
// commit the grow (atomic wrt gc)
h.B += bigger
h.flags = flags
h.oldbuckets = oldbuckets
h.buckets = newbuckets
h.nevacuate = 0
h.noverflow = 0
...
}
// 这个是mapdelete函数中的处理迁移的位置
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
...
if h.growing() {
//
growWork(t, h, bucket)
}
...
}
渐进式驱逐
Migrate key-value pairs. This means that during the expansion period, the old bucket array and the new bucket array will exist at the same time, the newly inserted key-value pairs will be directly placed in the new bucket, and access to the old bucket will trigger the migration operation.// 进入后是一个简单的判断,之后的evacuate是核心逻辑处理,特别多,感兴趣自己看源码
func growWork(t *maptype, h *hmap, bucket uintptr) {
// make sure we evacuate the oldbucket corresponding
// to the bucket we're about to use
evacuate(t, h, bucket&h.oldbucketmask())
// evacuate one more oldbucket to make progress on growing
if h.growing() {
evacuate(t, h, h.nevacuate)
}
}
int
orint64
。sync.Map
Or implement your own concurrent safe Map.