Issue #14989 has been updated by shyouhei (Shyouhei Urabe).


Hello, I took a look at add-hash-support-for-transient-heap.patch.

It seems lines are copy & pasted from st.c to hash.c.  This is in fact
a wise idea to avoid common pitfalls.  I have almost no comments on
that part.

> --- a/array.c
> +++ b/array.c
> @@ -4379,11 +4379,12 @@ static inline void
>  ary_recycle_hash(VALUE hash)
>  {
>      assert(RBASIC_CLASS(hash) == 0);
> -    if (RHASH(hash)->ntbl) {
> -     st_table *tbl = RHASH(hash)->ntbl;
> +    if (RHASH_TABLE_P(hash)) {
> +     st_table *tbl = RHASH(hash)->as.ntbl;
>       st_free_table(tbl);
> +     RHASH(hash)->as.ntbl = NULL;
>      }
> -    rb_gc_force_recycle(hash);
> +    //rb_gc_force_recycle(hash);
>  }

1. This project don't use // -style comments.  Please follow our
   style.

2. I guess this part is incomplete?  Otherwise tell us why you
   comment out the call of rb_gc_force_recycle.

3. What if RHASH_ARRAY_P(hash) is true? Does that never happen
   here?

> @@ -4158,6 +4159,20 @@ mark_hash(rb_objspace_t *objspace, st_table *tbl)
>      st_foreach(tbl, mark_keyvalue, (st_data_t)objspace);
>  }
>  
> +static void
> +mark_hash_linear(rb_objspace_t *objspace, VALUE hash)
> +{
> +    if (RHASH_ARRAY_P(hash)) {
> +        linear_foreach(hash, mark_keyvalue, (st_data_t)objspace);
> +       if (objspace->mark_func_data == NULL && RHASH_TRANSIENT_P(hash)) {
> +            rb_transient_heap_mark(hash, RHASH(hash)->as.ltbl);
> +        }
> +    }
> +    else if (RHASH_TABLE_P(hash))
> +        st_foreach(RHASH(hash)->as.ntbl, mark_keyvalue, (st_data_t)objspace);
> +    gc_mark(objspace, RHASH(hash)->ifnone);
> +}
> +
>  void
>  rb_mark_hash(st_table *tbl)
>  {

1. This indentation is broken.  Seems you mix tabs and spaces.  You
   might want to use a text editor that help you indent things.

2. I guess you also have to modify rb_mark_hash() to use this new
   routine.

> +#define SET_KEY(entry, _key) (entry)->key = (_key)
> +#define SET_HASH(entry, _hash) (entry)->hash = (_hash)
> +#define SET_RECORD(entry, _value) (entry)->record = (_value)

Are these SET_* macros actually necessary?  I see no
reason why you don't directly assign them.

> +static li_table*
> +linear_init_table(VALUE hash, const struct st_hash_type *type)
> +{
> +    li_table *tab;
> +    uint8_t i;
> +    tab = (li_table*)rb_transient_heap_alloc(hash, sizeof(li_table));
> +    if (tab != NULL) {
> +       RHASH_SET_TRANSIENT_FLAG(hash);
> +    }
> +    else {
> +       RHASH_UNSET_TRANSIENT_FLAG(hash);
> +       tab = (li_table*)malloc(sizeof(li_table));
> +    }

So from this part it seems you are implementing linear table that
is NOT transient.  Your patch have following types of hash
tables:

- Size 0; that allocates nothing.
- Linear table allocated in transient heap.
- Linear table allocated using malloc.
- Good old st_table backended hash.

I guess it makes rb_hash_heap_free() overly complicated.  Is
there any reason to support this much variants?

> +static void
> +try_convert_table(VALUE hash)
> +{

I don't like the function name.  It describes nothing.  What does
it do? Try converting from what to what?

> +    new_tab = st_init_table_with_size(tab->type, size * 2);

Can I ask you the reason behind your choice of this magic number?

> +    for (i = 0; i < LINEAR_TABLE_BOUND; i++) {
> +       HASH_ASSERT(entries[i].hash != 0);
> +       st_add_direct(new_tab, entries[i].key, entries[i].record);
> +    }

It seems you forgot to treat empty entries.

> +static st_table *
> +force_convert_table(VALUE hash)
> +{

I don't like the idea that try_convert_table(x) and
force_convert_table(x) return different things.

> +static int
> +add_direct_with_hash(VALUE hash, st_data_t key, st_data_t val, st_hash_t hash_value)
> +{

I could not understand what this return value means.

> +static void
> +linear_clear(VALUE hash)
> +{
> +    li_table *tab = RHASH(hash)->as.ltbl;
> +    RHASH_SET_ARRAY_LEN(hash, 0);
> +    RHASH_SET_ARRAY_BOUND(hash, 0);
> +    memset(tab->entries, 0, LINEAR_TABLE_MAX_SIZE * sizeof(li_table_entry));
> +}

This is different from what you do in linear_init_table().  You should
take either way and use in both functions.

> @@ -330,7 +952,7 @@ st_foreach_safe(st_table *table, int (*func)(ANYARGS), st_data_t a)
>      arg.func = (st_foreach_func *)func;
>      arg.arg = a;
>      if (st_foreach_check(table, foreach_safe_i, (st_data_t)&arg, 0)) {
> -       rb_raise(rb_eRuntimeError, "hash modified during iteration");
> +       rb_raise(rb_eRuntimeError, "1hash modified during iteration");
>      }
>  }

I don't understand this part.

> +void rb_hash_free(VALUE);

I see this identical declaration duplicated in mutiple header files.
Pick one place and delete others.



----------------------------------------
Feature #14989: Add Hash support for transient heap
https://bugs.ruby-lang.org/issues/14989#change-73685

* Author: tacinight (Yimin Zhao)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version: 
----------------------------------------
Hi, there
I am a GSoC student following @ko1 to do some work for Ruby, here I want to make a brief introduction for one of my work in this period.

## Transient Heap
The first discussion of transient heap was opened by @ko1 at [Introduce 2nd GC heap named Transient heap](https://bugs.ruby-lang.org/issues/14858). In his prototype, he completed the implement for arrays, and the future work should also support the type String, Hash as well as Extend(shrink) policy for transient heap.

My work here is adding support for Hash so that transient heap can allocate space for Hash. I encounted some difficults that had to be handled.
1. Hash type objects use st_table to store elements, the correlation between those two is weak. For transient heap, it has to know the ruby type of the objects.
2. Transient heap is based on the hypothesis that most of objects die at a young age. To understand the issue with first need to survey the distribution of hash size.

### The distribution of hash size
#### Implement
The capacity of st_table is always a power of 2, we recorded the exponent into a global array before gc or functions explicitly frees st_table. Look at the [patch](https://github.com/Tacinight/ruby-gsoc-2018/blob/master/patch/st_table_size_statistics.patch) for more details.
#### Results
We measured 6 applications in order to find a real distribution of the hash size. Respectively, those applications are:
1. `make check` command under ruby source code.
2. `make gcbench_rdoc` command under ruby source code.
3. `gem uninstall --all` to uninstall over 300 gems
4. `bundle install` commmand under project [discourse](https://github.com/discourse/discourse)
5. `discourse` manually explore the web pages and simulate the different events for about 20 minutes (development environment)
6. `rails_ruby_bench` project [rails_ruby_bench](https://github.com/noahgibbs/rails_ruby_bench) with parameters : 8 workers, 1500 iterations
7. `rails_ruby_bench 2` 8 workers, 12000 iterations
8. `rails_ruby_bench 3` 20 workders, 20000 iterations

The results show that objects under size 8(area of exponent 2 to 3) account for more than 80% of the total data in most situations. The raw data can be found at [st_table size statistics](https://docs.google.com/spreadsheets/d/1xAjO_qb5K49aLnvk8SypGwO5Avtbm2X12cYb1d-n6Xs/edit?usp=sharing).

### Add Hash support
According to the previous survey result, we use a array-like fixed-size table to store the elements. We name it as LinearTable.
In this way, we have two benefits:

1. We don't have to create st_table for small hash(but it will when necessary)
2. the code is written into hash.c (while st_table code is in st.c). its easy to bind small hash with transient heap, and by default, when using st_table, it uses the malloced space.

So the basic data structure look like this.
```c
#define LINEAR_TABLE_MAX_SIZE 8

typedef struct li_table_entry {
    VALUE hash;
    VALUE key;
    VALUE record;
} li_table_entry;

typedef struct LinearTable {
    const struct st_hash_type *type;
    li_table_entry entries[LINEAR_TABLE_MAX_SIZE];
} li_table;

struct RHash {
    struct RBasic basic;
    union {
	struct st_table *ntbl;      /* possibly 0 */
	struct LinearTable *ltbl;
    } as;
    int iter_lev;
    const VALUE ifnone;
};

```

#### The layout of li_table_entry
Considering the size of 8 hashes/keys/records is just the size of a cache line, I also tried different data layouts.

```c
#define LINEAR_TABLE_MAX_SIZE 8

// the original layout
typedef struct li_table_entry {
    VALUE hash;
    VALUE key;
    VALUE record;
} li_table_entry;

typedef struct LinearTable {
    const struct st_hash_type *type;
    li_table_entry entries[LINEAR_TABLE_MAX_SIZE];
} li_table;
```

```c
#define LINEAR_TABLE_MAX_SIZE 8

// the original layout
typedef struct li_table_entry {
    VALUE hash;
    VALUE key;
    VALUE record;
} li_table_entry;

typedef struct LinearTable {
    const struct st_hash_type *type;
    li_table_entry entries[LINEAR_TABLE_MAX_SIZE];
} li_table;
```

```c
#define LINEAR_TABLE_MAX_SIZE 8

// The Variant 1
typedef struct li_table_entry {
    VALUE key;
    VALUE record;
} li_table_entry;

typedef struct LinearTable {
    const struct st_hash_type *type;
    VALUE hashes[LINEAR_TABLE_MAX_SIZE];
    li_table_entry pairs[LINEAR_TABLE_MAX_SIZE];
} li_table;
```

```c
#define LINEAR_TABLE_MAX_SIZE 8

// The Variant 2
typedef struct li_table_entry {
    VALUE hash;
    VALUE key;
} li_table_entry;

typedef struct LinearTable {
    const struct st_hash_type *type;
    li_table_entry entries[LINEAR_TABLE_MAX_SIZE];
    VALUE records[LINEAR_TABLE_MAX_SIZE];
} li_table;
```

```c
#define LINEAR_TABLE_MAX_SIZE 8

// The Variant 3
typedef struct LinearTable {
    const struct st_hash_type *type;
    VALUE hashes[LINEAR_TABLE_MAX_SIZE];
    VALUE keys[LINEAR_TABLE_MAX_SIZE];
    VALUE records[LINEAR_TABLE_MAX_SIZE];
} li_table;
```

The [microbench results](https://docs.google.com/spreadsheets/d/1Ag6DoAsmTNJkt3nmHyfXRQgy6fLhEnZJQF2G5GbykLc/edit?usp=sharing) show the original edition and variant 1 have better performance than the other two. The patches can be found in the repository as [linear_table_v1.patch](https://github.com/Tacinight/ruby-gsoc-2018/blob/master/patch/linear_table_v1.patch), [linear_table_v2.patch](https://github.com/Tacinight/ruby-gsoc-2018/blob/master/patch/linear_table_v2.patch), [linear_table_v3.patch](https://github.com/Tacinight/ruby-gsoc-2018/blob/master/patch/linear_table_v3.patch), [linear_table_v4.patch](https://github.com/Tacinight/ruby-gsoc-2018/blob/master/patch/linear_table_v4.patch);

#### Integrate data into flag bits
Considering the maximum size of entries is 8. We can store the num_entries and num_bound into RBasic(hash)->flag;
Related patch is [here](https://github.com/Tacinight/ruby-gsoc-2018/blob/master/patch/0003-integrate-data-to-hash-flag.patch) which based on [linear_table_v1.patch](https://github.com/Tacinight/ruby-gsoc-2018/blob/master/patch/linear_table_v1.patch)

### Benchmark results
I execuated comprehensive benchmark test based on official benchmark tool - [benchmark-driver](https://github.com/benchmark-driver/benchmark-driver).
1. [linear_table_benchmark](https://docs.google.com/spreadsheets/d/18JzY-q-boOZFu_GTwraRXSTJJVbt9Wmg7M2P4-8g41o/edit?usp=sharing), includes the `make benchmark` results of trunk vs. v1(the original version) vs. v2(the variant 1).
2. [transient hash benchmark](https://docs.google.com/spreadsheets/d/19074A0H0nwBQoumTb0aF-k9_a3GBBLA9NnFIpPGraPU/edit?usp=sharing), includes the `make benchmark` results of trunk vs. transient_heap(based on v1) vs. integrated flag
3. [linear_table_variant_benchmark](https://docs.google.com/spreadsheets/d/1Ag6DoAsmTNJkt3nmHyfXRQgy6fLhEnZJQF2G5GbykLc/edit?usp=sharing), includes the micro-benchmark results of trunk vs. v1 vs. v2 vs. v3 vs. v4. The used benchrmark scripts are located at microbench directory.
4.  [linear_table real app bench](https://docs.google.com/spreadsheets/d/1WoT2zRxm16DA-0fOf55HZKmRN9v2YJK4HoaMi6yq_0k/edit?usp=sharing), includes the test results of the six applications noted above.
5.  [transient heap microbench](https://docs.google.com/spreadsheets/d/1guaP_93ds1eSbdSg87gBrtiuJ8JzSFHJb_xcQsxn8eg/edit?usp=sharing), include the micro-benchrmark resutls of trunk vs. v1 vs. transient-hash vs integrated-flag. This time I also add memory usage comparison.
6. [linear_table_vary_size_microbench](https://docs.google.com/spreadsheets/d/1I7NFc893H7cUh0CAFgBupT259pVMqRFjC9CQwlE8Kyw/edit?usp=sharing), we once worried the size of linear table will impact on the benchmark results because of the linear search. So we test the linear table with different size and the results show the table size make no relevant difference.

### Future work
1. In [transient hash benchmark](https://docs.google.com/spreadsheets/d/19074A0H0nwBQoumTb0aF-k9_a3GBBLA9NnFIpPGraPU/edit?usp=sharing), we saw a preformance degradation compare to the preformance improvement in [linear_table_benchmark](https://docs.google.com/spreadsheets/d/18JzY-q-boOZFu_GTwraRXSTJJVbt9Wmg7M2P4-8g41o/edit?usp=sharing) in same items. It needs to be analyzed and to be solved.

### My GitHub Repository
I will update my status at a GitHub repository: https://github.com/Tacinight/ruby-gsoc-2018.
More explanatory images and other of my work also can be found at this repository.
Welcome to review the patches and leave your valuable comments.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>