Ali P7 job interview, the interviewer asked me: why the number of elements of HashMap bottom tree standard is 8

I am Xue Mou 2020-11-13 01:12:17
ali p7 job interview interviewer


Preface

Say it first , This article is a bit of a title party , How can he de, a vegetable chicken like me, interview Ali P7 Gang , however , This is Ali p7 The interview questions for the first-class post , Of course , It's not me who's going to the interview , It's a big guy in my department . He shared his interview experience with me , Also let me indirectly experience the difficulty of Ali level interview , That's how it works , I've had an interview with ALI P7 The people in the post of , Suddenly, I felt confidence soar .
 Insert picture description here

Common interview questions

about HashMap, We can't be more familiar with , Most commonly used in daily development Java The set class is it , And when interviewing for HashMap Knowledge is basically a must , Take my previous interview experience as an example , These are the most frequently asked questions :

1、HashMap What is the underlying storage structure of ?

2、 Is thread safe ? Why is it not safe ?

3、1.7 and 1.8 Version of HashMap What's the difference? ?1.7 What are the hidden dangers of , What causes it ?

4、hashcode Is it the only one ? How to compare when inserting elements ?

5、 Follow HashTable,ConcurrentHashMap What's the difference? ?

For these questions , If you've seen some blogs , Or roughly browse the source code , Basically can answer , I've had a lot of interviews before , It's rarely in HashMap This one has been lost .

The fact proved that , I'm still a little younger ( Anyway, it's also 90 After ). occasionally , It's not because you know so much , It's not deep , If you don't have a deep understanding and thinking about the source code , Others look at it from a slightly different angle , You may be in trouble .

It's like the title on the title , Why? HashMap The standard of list treeling is 8 individual ? Tell the truth , Although I knew before that the threshold for treeing is 8, But why is this number? I haven't really thought about it carefully , Take this opportunity , I've combed it all over again HashMap Source code , This paper is also a summary of some new ideas .

HashMap Basic knowledge of

HashMap Can be said to be Java The most commonly used collection class in a project , As a typical K-V Stored data structure , Its bottom layer is made up of arrays - The list consists of , When you add a new element , It will be based on the elements of hash It's worth finding the corresponding " bucket ", That is to say HashMap Source code Node<K, V> Elements in , And insert it into the linked list at the corresponding position , If the number of linked list elements is too long, it will be converted to red black tree (JDK1.8 Later version ),
 Insert picture description here

Why turn into a red and black tree ?

We all know , The link list takes elements from the beginning node to the corresponding node , The complexity of this process is O(N) , And red black trees are based on the structure of binary trees , The complexity of finding elements is O(logN) , therefore , When there are too many elements , Using red black tree storage can improve the efficiency of search .

Since red and black trees are efficient , So why don't you start with a red and black tree ?

This is actually based on the balance of space and time ,JDK The source code of this problem has been explained :

* Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins.

Look at the first four lines in the note , Single TreeNode The space needed is about normal Node Twice as many , So only if it contains enough Nodes It turns into TreeNodes, There are enough criteria for this TREEIFY_THRESHOLD Value ( The default value is 8) Decisive . When the number of nodes in the bucket is removed or resize ( Capacity expansion ) When it's less , The red and black tree will be transformed into a common linked list , The threshold is UNTREEIFY_THRESHOLD( The default value is 6).

/**
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*/
static final int TREEIFY_THRESHOLD = 8;
/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
*/
static final int UNTREEIFY_THRESHOLD = 6;

It's easy to see here , Although the query efficiency of red black tree is higher than that of linked list , But nodes take up a lot of space , Only when it reaches a certain number can it have the meaning of treeing , This is a Based on the balance of time and space .

Why the treeling standard is 8 individual

As for why the number of tree standards is 8 individual , In the source code , There is a long note at the end of the above note , We can find the answer from that note , The original text is like this :

* usages with well-distributed user hashCodes, tree bins are
* rarely used. Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million

It means : If hashCode If the distribution is well dispersed , So red and black trees are rarely used , Because the values are evenly distributed , It's rare to have a long list . In an ideal situation , The length of the list conforms to Poisson distribution , The hit probability of each length decreases in turn , It's shown in the notes 1-8 Specific hit probability of length , When the length is 8 When , The probability probability is only 0.00000006, Such a small probability ,HashMap Black trees almost never change , Because we don't store so much data in our daily use , You'll save tens of millions of data to HashMap Medium ?

Of course , This is the ideal algorithm , But some users may use HashMap Process led hashCode Scenes with poor dispersion , At this time, converting to red and black trees is a good concession strategy .

As to what kind of situation will lead to , You can think for yourself or find the answer online , I won't go back to , Save some energy .
 Insert picture description here
other , Let's talk , Can't I go on writing , It's not easy. , I'll let you go for nothing , And be crowned with the title of scum man , What am I trying to do ?

Let's start with , stay HashMap in , Decide which one of the objects falls on “ bucket “, It's by the object hashCode Decisive ,JDK Can't prevent users from implementing their own hash algorithm , If the user rewrites hashCode, And the algorithm implementation is relatively poor , It will probably make HashMap The list becomes very long , Like this :

public class HashMapTest {
public static void main(String[] args) {
Map<User, Integer> map = new HashMap<>();
for (int i = 0; i < 1000; i++) {
map.put(new User(" I'm Xue " + i), i);
}
}
static class User{
private String name;
public User(String name) {
this.name = name;
}
@Override
public int hashCode() {
return 1;
}
}
}

We designed a hashCode For ever 1 Class User, In this way, the HashMap All of the User Objects are stored in the same “ bucket ” in , In this way, the query efficiency will undoubtedly be very low , And this is also HashMap The reason why the linked list is changed to red and black tree , It can effectively prevent users from implementing a bad hash algorithm, which leads to a long list .

hash Method

When it comes to hashing algorithms , Let's expand another knowledge point , That's what I think HashMap One of the most amazing designs in .

stay HashMap In the source code , Store the object hashCode Is calculated by hash() method-determined ,hash() yes HashMap The core function in , When storing data , take key The operation is carried out in the input , obtain key Hash value of , Only by this hash value operation can we get key It should be placed in “ bucket ” Which position of , The following is the source of the method :

static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

As you can see from the code , Pass in key after ,hash() Will get key Of hashCode Move to the right without a sign 16 position , And then do bitwise XOR , And return the calculated value to , This value right here is key Hash value of . This operation is to reduce collisions , Because most of the elements hashCode It's the same in the low post , It's easy to cause conflict if you don't deal with it .

Besides doing 16 Bit displacement processing , In the method of adding elements ,HashMap And put it hash Value and table.length - 1, That is to say “ bucket ” The size of the array and operation , The result is the corresponding “ bucket ” Index of the array , To find the linked list to which the element belongs . In the source code :

// n The value of is table.length
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);

When the corresponding index cannot be found , A new node will be created as the head node of the list . So why do we use i = (n - 1) & hash As an index operation ?

This is actually an optimization tool , Because the size of the array is always one 2 The next power , After the expansion , The new index of an element is either in its original position , Or add the capacity before expansion to the original location . The cleverness of this method lies in & operation , As mentioned before & Operations only focus on n
– 1(n = The length of the array ) The effective bit , After expansion ,n There will be one more significant bit more than before (n It's going to double what it was before , So make sure the array length is always 2 The power is important ), And then just judge hash The position of the new significant bit is 0 still 1 You can calculate the new index position , If it is 0, So the index doesn't change , If it is 1, The index is the original index plus the capacity before expansion .

It is represented by an effect picture :
 Insert picture description here
Bit operation , You don't have to recalculate every time you expand hash, Save a lot of time , And the new significant bit is 0 still 1 It's random , The first two collided Entry It is also possible to spread evenly again during expansion , To achieve a better distribution of discrete effect , I have to sigh , The design skills are really amazing , A few seemingly simple code actually contains so much knowledge .

Why is the threshold to degenerate into a linked list 6

The above said , When the list length reaches the threshold 8 It turns into a red and black tree when it's time , But the threshold of red black tree degradation to linked list is 6, Why not less than 8 Just degenerate ? for instance 7 And then it degenerates , It must be less than or equal to 6?

It's mainly a transition , Avoid frequent conversions between linked lists and red and black trees . If the threshold is 7 Words , To delete an element, the red black tree must degenerate into a linked list , To add an element, you have to tree , Switching structures back and forth will undoubtedly degrade performance , So the threshold is not so critical .

Last

HashMap There are still a lot of knowledge about , Here I also strongly urge you to see the source code several times , Not just for interviews , It's also about how to use yourself better HashMap To have a clearer understanding of , After all, it's so common , If you don't use it well, it's easy to produce bug. and , I think JDK The source code of our developers really have a lot to study , Like this HashMap, It doesn't have a lot of real code , But it's very efficient , most important of all , Every version of it is constantly optimized , Every line of code is exquisitely crafted , When I look at the source code, I have been sighing in my heart , If I could write that awesome code , Is it a matter to enter Ali or something ?

 Insert picture description here


I am Xue , An Internet person who is not limited to technology , I want to read more interesting articles to pay attention to my official account , It's not just technology , There are also interesting water blowing ~~~

 Insert picture description here


Originality is not easy. , Look at the officials 【 Three even 】 It will be the biggest driving force of my creation , Thank you for your support !

版权声明
本文为[I am Xue Mou]所创,转载请带上原文链接,感谢

  1. [front end -- JavaScript] knowledge point (IV) -- memory leakage in the project (I)
  2. This mechanism in JS
  3. Vue 3.0 source code learning 1 --- rendering process of components
  4. Learning the realization of canvas and simple drawing
  5. gin里获取http请求过来的参数
  6. vue3的新特性
  7. Get the parameters from HTTP request in gin
  8. New features of vue3
  9. vue-cli 引入腾讯地图(最新 api,rocketmq原理面试
  10. Vue 学习笔记(3,免费Java高级工程师学习资源
  11. Vue 学习笔记(2,Java编程视频教程
  12. Vue cli introduces Tencent maps (the latest API, rocketmq)
  13. Vue learning notes (3, free Java senior engineer learning resources)
  14. Vue learning notes (2, Java programming video tutorial)
  15. 【Vue】—props属性
  16. 【Vue】—创建组件
  17. [Vue] - props attribute
  18. [Vue] - create component
  19. 浅谈vue响应式原理及发布订阅模式和观察者模式
  20. On Vue responsive principle, publish subscribe mode and observer mode
  21. 浅谈vue响应式原理及发布订阅模式和观察者模式
  22. On Vue responsive principle, publish subscribe mode and observer mode
  23. Xiaobai can understand it. It only takes 4 steps to solve the problem of Vue keep alive cache component
  24. Publish, subscribe and observer of design patterns
  25. Summary of common content added in ES6 + (II)
  26. No.8 Vue element admin learning (III) vuex learning and login method analysis
  27. Write a mini webpack project construction tool
  28. Shopping cart (front-end static page preparation)
  29. Introduction to the fluent platform
  30. Webpack5 cache
  31. The difference between drop-down box select option and datalist
  32. CSS review (III)
  33. Node.js学习笔记【七】
  34. Node.js learning notes [VII]
  35. Vue Router根据后台数据加载不同的组件(思考-&gt;实现-&gt;不止于实现)
  36. Vue router loads different components according to background data (thinking - & gt; Implementation - & gt; (more than implementation)
  37. 【JQuery框架,Java编程教程视频下载
  38. [jQuery framework, Java programming tutorial video download
  39. Vue Router根据后台数据加载不同的组件(思考-&gt;实现-&gt;不止于实现)
  40. Vue router loads different components according to background data (thinking - & gt; Implementation - & gt; (more than implementation)
  41. 【Vue,阿里P8大佬亲自教你
  42. 【Vue基础知识总结 5,字节跳动算法工程师面试经验
  43. [Vue, Ali P8 teaches you personally
  44. [Vue basic knowledge summary 5. Interview experience of byte beating Algorithm Engineer
  45. 【问题记录】- 谷歌浏览器 Html生成PDF
  46. [problem record] - PDF generated by Google browser HTML
  47. 【问题记录】- 谷歌浏览器 Html生成PDF
  48. [problem record] - PDF generated by Google browser HTML
  49. 【JavaScript】查漏补缺 —数组中reduce()方法
  50. [JavaScript] leak checking and defect filling - reduce() method in array
  51. 【重识 HTML (3),350道Java面试真题分享
  52. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  53. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  54. [re recognize HTML (3) and share 350 real Java interview questions
  55. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  56. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  57. 【重识 HTML ,nginx面试题阿里
  58. 【重识 HTML (4),ELK原来这么简单
  59. [re recognize HTML, nginx interview questions]
  60. [re recognize HTML (4). Elk is so simple