Discussion:
[ovs-dev] [PATCH v4 0/3] Use improved dp_hash select group by default
Jan Scheurich
2018-05-24 15:27:58 UTC
Permalink
The current default OpenFlow select group implementation sends every new L4 flow
to the slow path for the balancing decision and installs a 5-tuple "miniflow"
in the datapath to forward subsequent packets of the connection accordingly.
Clearly this has major scalability issues with many parallel L4 flows and high
connection setup rates.

The dp_hash selection method for the OpenFlow select group was added to OVS
as an alternative. It avoids the scalability issues for the price of an
additional recirculation in the datapath. The dp_hash method is only available
to OF1.5 SDN controllers speaking the Netronome Group Mod extension to
configure the selection mechanism. This severely limited the applicability of
the dp_hash select group in the past.

Furthermore, testing revealed that the implemented dp_hash selection often
generated a very uneven distribution of flows over group buckets and didn't
consider bucket weights at all.

The present patch set in a first step improves the dp_hash selection method to
much more accurately distribute flows over weighted group buckets and to
apply a symmetric dp_hash function to maintain the symmetry property of the
legacy hash function. In a second step it makes the improved dp_hash method
the default in OVS for select groups that can be accurately handled by dp_hash.
That should be the vast majority of cases. Otherwise we fall back to the
legacy slow-path selection method.

The Netronome extension can still be used to override the default decision and
require the legacy slow-path or the dp_hash selection method.

v3 -> v4:
- Rebased to master (commit 82d5b337cd).
- Implemented Ben's improvement suggestions for patch 2/3.
- Fixed machine dependency of one select group test case.

v2 -> v3:
- Fixed another corner case crash reported by Chen Yuefang.
- Fixed several sparse and clang warnings reported by Ben.
- Rewritten the select group unit tests to abstract the checks from
the behavior of the system-specific hash function implementation.
- Added dpif_backer_support field for dp_hash algorithms to prevent
using the new OVS_HASH_L4_SYMMETRIC algorithm if it is not
supported by the datapath.

v1 -> v2:
- Fixed crashes for corner cases reported by Chen Yuefang.
- Fixed group ref leakage with dp_hash reported by Chen Yuefang.
- Changed all xlation logging from INFO to DBG.
- Revised, completed and detailed select group unit test cases in ofproto-dpif.
- Updated selection_method documentation in ovs-ofctl man page.
- Added NEWS item.


Jan Scheurich (3):
userspace datapath: Add OVS_HASH_L4_SYMMETRIC dp_hash algorithm
ofproto-dpif: Improve dp_hash selection method for select groups
ofproto-dpif: Use dp_hash as default selection method

NEWS | 2 +
datapath/linux/compat/include/linux/openvswitch.h | 4 +
lib/flow.c | 43 ++-
lib/flow.h | 1 +
lib/odp-execute.c | 23 +-
lib/odp-util.c | 4 +-
lib/ofp-group.c | 15 +-
ofproto/ofproto-dpif-xlate.c | 66 +++--
ofproto/ofproto-dpif.c | 211 ++++++++++++++-
ofproto/ofproto-dpif.h | 19 +-
ofproto/ofproto-provider.h | 2 +-
tests/mpls-xlate.at | 26 +-
tests/ofproto-dpif.at | 315 +++++++++++++++++-----
tests/ofproto-macros.at | 7 +-
utilities/ovs-ofctl.8.in | 47 ++--
15 files changed, 651 insertions(+), 134 deletions(-)
--
1.9.1
Jan Scheurich
2018-05-24 15:27:59 UTC
Permalink
This commit implements a new dp_hash algorithm OVS_HASH_L4_SYMMETRIC in
the netdev datapath. It will be used as default hash algorithm for the
dp_hash-based select groups in a subsequent commit to maintain
compatibility with the symmetry property of the current default hash
selection method.

A new dpif_backer_support field 'max_hash_alg' is introduced to reflect
the highest hash algorithm a datapath supports in the dp_hash action.

Signed-off-by: Jan Scheurich <***@ericsson.com>
Signed-off-by: Nitin Katiyar <***@ericsson.com>
Co-authored-by: Nitin Katiyar <***@ericsson.com>
---
datapath/linux/compat/include/linux/openvswitch.h | 4 ++
lib/flow.c | 43 +++++++++++++++++++++-
lib/flow.h | 1 +
lib/odp-execute.c | 23 ++++++++++--
ofproto/ofproto-dpif-xlate.c | 7 +++-
ofproto/ofproto-dpif.c | 45 +++++++++++++++++++++++
ofproto/ofproto-dpif.h | 5 ++-
7 files changed, 121 insertions(+), 7 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h b/datapath/linux/compat/include/linux/openvswitch.h
index 6f4fa01..5c1e238 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -724,6 +724,10 @@ struct ovs_action_push_vlan {
*/
enum ovs_hash_alg {
OVS_HASH_ALG_L4,
+#ifndef __KERNEL__
+ OVS_HASH_ALG_SYM_L4,
+#endif
+ __OVS_HASH_MAX
};

/*
diff --git a/lib/flow.c b/lib/flow.c
index 136f060..75ca456 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -2124,6 +2124,45 @@ flow_hash_symmetric_l4(const struct flow *flow, uint32_t basis)
return jhash_bytes(&fields, sizeof fields, basis);
}

+/* Symmetrically Hashes non-IP 'flow' based on its L2 headers. */
+uint32_t
+flow_hash_symmetric_l2(const struct flow *flow, uint32_t basis)
+{
+ union {
+ struct {
+ ovs_be16 eth_type;
+ ovs_be16 vlan_tci;
+ struct eth_addr eth_addr;
+ ovs_be16 pad;
+ };
+ uint32_t word[3];
+ } fields;
+
+ uint32_t hash = basis;
+ int i;
+
+ if (flow->packet_type != htonl(PT_ETH)) {
+ /* Cannot hash non-Ethernet flows */
+ return 0;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(fields.eth_addr.be16); i++) {
+ fields.eth_addr.be16[i] =
+ flow->dl_src.be16[i] ^ flow->dl_dst.be16[i];
+ }
+ fields.vlan_tci = 0;
+ for (i = 0; i < FLOW_MAX_VLAN_HEADERS; i++) {
+ fields.vlan_tci ^= flow->vlans[i].tci & htons(VLAN_VID_MASK);
+ }
+ fields.eth_type = flow->dl_type;
+ fields.pad = 0;
+
+ hash = hash_add(hash, fields.word[0]);
+ hash = hash_add(hash, fields.word[1]);
+ hash = hash_add(hash, fields.word[2]);
+ return hash_finish(hash, basis);
+}
+
/* Hashes 'flow' based on its L3 through L4 protocol information */
uint32_t
flow_hash_symmetric_l3l4(const struct flow *flow, uint32_t basis,
@@ -2144,8 +2183,8 @@ flow_hash_symmetric_l3l4(const struct flow *flow, uint32_t basis,
hash = hash_add64(hash, a[i] ^ b[i]);
}
} else {
- /* Cannot hash non-IP flows */
- return 0;
+ /* Revert to hashing L2 headers */
+ return flow_hash_symmetric_l2(flow, basis);
}

hash = hash_add(hash, flow->nw_proto);
diff --git a/lib/flow.h b/lib/flow.h
index 7a9e7d0..9de94b2 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -236,6 +236,7 @@ hash_odp_port(odp_port_t odp_port)

uint32_t flow_hash_5tuple(const struct flow *flow, uint32_t basis);
uint32_t flow_hash_symmetric_l4(const struct flow *flow, uint32_t basis);
+uint32_t flow_hash_symmetric_l2(const struct flow *flow, uint32_t basis);
uint32_t flow_hash_symmetric_l3l4(const struct flow *flow, uint32_t basis,
bool inc_udp_ports );

diff --git a/lib/odp-execute.c b/lib/odp-execute.c
index c5080ea..5831d1f 100644
--- a/lib/odp-execute.c
+++ b/lib/odp-execute.c
@@ -730,14 +730,16 @@ odp_execute_actions(void *dp, struct dp_packet_batch *batch, bool steal,
}

switch ((enum ovs_action_attr) type) {
+
case OVS_ACTION_ATTR_HASH: {
const struct ovs_action_hash *hash_act = nl_attr_get(a);

- /* Calculate a hash value directly. This might not match the
+ /* Calculate a hash value directly. This might not match the
* value computed by the datapath, but it is much less expensive,
* and the current use case (bonding) does not require a strict
* match to work properly. */
- if (hash_act->hash_alg == OVS_HASH_ALG_L4) {
+ switch (hash_act->hash_alg) {
+ case OVS_HASH_ALG_L4: {
struct flow flow;
uint32_t hash;

@@ -753,7 +755,22 @@ odp_execute_actions(void *dp, struct dp_packet_batch *batch, bool steal,
}
packet->md.dp_hash = hash;
}
- } else {
+ break;
+ }
+ case OVS_HASH_ALG_SYM_L4: {
+ struct flow flow;
+ uint32_t hash;
+
+ DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
+ flow_extract(packet, &flow);
+ hash = flow_hash_symmetric_l3l4(&flow,
+ hash_act->hash_basis,
+ false);
+ packet->md.dp_hash = hash;
+ }
+ break;
+ }
+ default:
/* Assert on unknown hash algorithm. */
OVS_NOT_REACHED();
}
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index c7c9df5..9f7fca7 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -4005,10 +4005,15 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
struct ovs_action_hash *act_hash;

/* Hash action. */
+ enum ovs_hash_alg hash_alg = xr->hash_alg;
+ if (hash_alg > ctx->xbridge->support.max_hash_alg) {
+ /* Algorithm supported by all datapaths. */
+ hash_alg = OVS_HASH_ALG_L4;
+ }
act_hash = nl_msg_put_unspec_uninit(ctx->odp_actions,
OVS_ACTION_ATTR_HASH,
sizeof *act_hash);
- act_hash->hash_alg = xr->hash_alg;
+ act_hash->hash_alg = hash_alg;
act_hash->hash_basis = xr->hash_basis;

/* Recirc action. */
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 1ed82d0..7162811 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -1291,6 +1291,50 @@ check_ct_clear(struct dpif_backer *backer)
return supported;
}

+/* Probe the highest dp_hash algorithm supported by the datapath. */
+static size_t
+check_max_dp_hash_alg(struct dpif_backer *backer)
+{
+ struct odputil_keybuf keybuf;
+ struct ofpbuf key;
+ struct flow flow;
+ struct ovs_action_hash *hash;
+ int max_alg = 0;
+
+ struct odp_flow_key_parms odp_parms = {
+ .flow = &flow,
+ .probe = true,
+ };
+
+ memset(&flow, 0, sizeof flow);
+ ofpbuf_use_stack(&key, &keybuf, sizeof keybuf);
+ odp_flow_key_from_flow(&odp_parms, &key);
+
+ /* All datapaths support algortithm 0 (OVS_HASH_ALG_L4). */
+ for (int alg = 1; alg < __OVS_HASH_MAX; alg++) {
+ struct ofpbuf actions;
+ bool ok;
+
+ ofpbuf_init(&actions, 300);
+ hash = nl_msg_put_unspec_uninit(&actions,
+ OVS_ACTION_ATTR_HASH, sizeof *hash);
+ hash->hash_basis = 0;
+ hash->hash_alg = alg;
+ ok = dpif_probe_feature(backer->dpif, "Max dp_hash algorithm", &key,
+ &actions, NULL);
+ ofpbuf_uninit(&actions);
+ if (ok) {
+ max_alg = alg;
+ } else {
+ break;
+ }
+ }
+
+ VLOG_INFO("%s: Max dp_hash algorithm probed to be %d",
+ dpif_name(backer->dpif), max_alg);
+ return max_alg;
+}
+
#define CHECK_FEATURE__(NAME, SUPPORT, FIELD, VALUE, ETHTYPE) \
static bool \
check_##NAME(struct dpif_backer *backer) \
@@ -1353,6 +1397,7 @@ check_support(struct dpif_backer *backer)
backer->rt_support.sample_nesting = check_max_sample_nesting(backer);
backer->rt_support.ct_eventmask = check_ct_eventmask(backer);
backer->rt_support.ct_clear = check_ct_clear(backer);
+ backer->rt_support.max_hash_alg = check_max_dp_hash_alg(backer);

/* Flow fields. */
backer->rt_support.odp.ct_state = check_ct_state(backer);
diff --git a/ofproto/ofproto-dpif.h b/ofproto/ofproto-dpif.h
index 47bf7f9..d654947 100644
--- a/ofproto/ofproto-dpif.h
+++ b/ofproto/ofproto-dpif.h
@@ -175,7 +175,10 @@ struct group_dpif *group_dpif_lookup(struct ofproto_dpif *,
DPIF_SUPPORT_FIELD(bool, ct_eventmask, "Conntrack eventmask") \
\
/* True if the datapath supports OVS_ACTION_ATTR_CT_CLEAR action. */ \
- DPIF_SUPPORT_FIELD(bool, ct_clear, "Conntrack clear")
+ DPIF_SUPPORT_FIELD(bool, ct_clear, "Conntrack clear") \
+ \
+ /* Highest supported dp_hash algorithm. */ \
+ DPIF_SUPPORT_FIELD(size_t, max_hash_alg, "Max dp_hash algorithm")

/* Stores the various features which the corresponding backer supports. */
struct dpif_backer_support {
--
1.9.1
Jan Scheurich
2018-05-24 15:28:01 UTC
Permalink
The dp_hash selection method for select groups overcomes the scalability
problems of the current default selection method which, due to L2-L4
hashing during xlation and un-wildcarding of the hashed fields,
basically requires an upcall to the slow path to load-balance every
L4 connection. The consequence are an explosion of datapath flows
(megaflows degenerate to miniflows) and a limitation of connection
setup rate OVS can handle.

This commit changes the default selection method to dp_hash, provided the
bucket configuration is such that the dp_hash method can accurately
represent the bucket weights with up to 64 hash values. Otherwise we
stick to original default hash method.

We use the new dp_hash algorithm OVS_HASH_L4_SYMMETRIC to maintain the
symmetry property of the old default hash method.

A controller can explicitly request the old default hash selection method
by specifying selection method "hash" with an empty list of fields in the
Group properties of the OpenFlow 1.5 Group Mod message.

Update the documentation about selection method in the ovs-ovctl man page.

Revise and complete the ofproto-dpif unit tests cases for select groups.

Signed-off-by: Jan Scheurich <***@ericsson.com>
Signed-off-by: Nitin Katiyar <***@ericsson.com>
Co-authored-by: Nitin Katiyar <***@ericsson.com>
---
NEWS | 2 +
lib/ofp-group.c | 15 ++-
ofproto/ofproto-dpif.c | 30 +++--
ofproto/ofproto-dpif.h | 1 +
ofproto/ofproto-provider.h | 2 +-
tests/mpls-xlate.at | 26 ++--
tests/ofproto-dpif.at | 316 +++++++++++++++++++++++++++++++++++----------
tests/ofproto-macros.at | 7 +-
utilities/ovs-ofctl.8.in | 47 ++++---
9 files changed, 334 insertions(+), 112 deletions(-)

diff --git a/NEWS b/NEWS
index ec548b0..2b2be1e 100644
--- a/NEWS
+++ b/NEWS
@@ -17,6 +17,8 @@ Post-v2.9.0
* OFPT_ROLE_STATUS is now available in OpenFlow 1.3.
* OpenFlow 1.5 extensible statistics (OXS) now implemented.
* New OpenFlow 1.0 extensions for group support.
+ * Default selection method for select groups is now dp_hash with improved
+ accuracy.
- Linux kernel 4.14
* Add support for compiling OVS with the latest Linux 4.14 kernel
- ovn:
diff --git a/lib/ofp-group.c b/lib/ofp-group.c
index f5b0af8..697208f 100644
--- a/lib/ofp-group.c
+++ b/lib/ofp-group.c
@@ -1600,12 +1600,17 @@ parse_group_prop_ntr_selection_method(struct ofpbuf *payload,
return OFPERR_OFPBPC_BAD_VALUE;
}

- error = oxm_pull_field_array(payload->data, fields_len,
- &gp->fields);
- if (error) {
- OFPPROP_LOG(&rl, false,
+ if (fields_len > 0) {
+ error = oxm_pull_field_array(payload->data, fields_len,
+ &gp->fields);
+ if (error) {
+ OFPPROP_LOG(&rl, false,
"ntr selection method fields are invalid");
- return error;
+ return error;
+ }
+ } else {
+ /* Selection_method "hash: w/o fields means default hash method. */
+ gp->fields.values_size = 0;
}

return 0;
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index c9c2e51..a45d6ea 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -1,5 +1,4 @@
/*
- * Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017 Nicira, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@@ -4787,7 +4786,7 @@ group_setup_dp_hash_table(struct group_dpif *group, size_t max_hash)
} *webster;

if (n_buckets == 0) {
- VLOG_DBG(" Don't apply dp_hash method without buckets");
+ VLOG_DBG(" Don't apply dp_hash method without buckets.");
return false;
}

@@ -4862,9 +4861,24 @@ group_set_selection_method(struct group_dpif *group)
const struct ofputil_group_props *props = &group->up.props;
const char *selection_method = props->selection_method;

+ VLOG_DBG("Constructing select group %"PRIu32, group->up.group_id);
if (selection_method[0] == '\0') {
- VLOG_DBG("No selection method specified.");
- group->selection_method = SEL_METHOD_DEFAULT;
+ VLOG_DBG("No selection method specified. Trying dp_hash.");
+ /* If the controller has not specified a selection method, check if
+ * the dp_hash selection method with max 64 hash values is appropriate
+ * for the given bucket configuration. */
+ if (group_setup_dp_hash_table(group, 64)) {
+ /* Use dp_hash selection method with symmetric L4 hash. */
+ group->selection_method = SEL_METHOD_DP_HASH;
+ group->hash_alg = OVS_HASH_ALG_SYM_L4;
+ group->hash_basis = 0;
+ VLOG_DBG("Use dp_hash with %d hash values using algorithm %d.",
+ group->hash_mask + 1, group->hash_alg);
+ } else {
+ /* Fall back to original default hashing in slow path. */
+ VLOG_DBG("Falling back to default hash method.");
+ group->selection_method = SEL_METHOD_DEFAULT;
+ }
} else if (!strcmp(selection_method, "dp_hash")) {
VLOG_DBG("Selection method specified: dp_hash.");
/* Try to use dp_hash if possible at all. */
@@ -4872,7 +4886,7 @@ group_set_selection_method(struct group_dpif *group)
group->selection_method = SEL_METHOD_DP_HASH;
group->hash_alg = props->selection_method_param >> 32;
if (group->hash_alg >= __OVS_HASH_MAX) {
- VLOG_DBG(" Invalid dp_hash algorithm %d. "
+ VLOG_DBG("Invalid dp_hash algorithm %d. "
"Defaulting to OVS_HASH_ALG_L4", group->hash_alg);
group->hash_alg = OVS_HASH_ALG_L4;
}
@@ -4881,7 +4895,7 @@ group_set_selection_method(struct group_dpif *group)
group->hash_mask + 1, group->hash_alg);
} else {
/* Fall back to original default hashing in slow path. */
- VLOG_DBG(" Falling back to default hash method.");
+ VLOG_DBG("Falling back to default hash method.");
group->selection_method = SEL_METHOD_DEFAULT;
}
} else if (!strcmp(selection_method, "hash")) {
@@ -4890,12 +4904,12 @@ group_set_selection_method(struct group_dpif *group)
/* Controller has specified hash fields. */
struct ds s = DS_EMPTY_INITIALIZER;
oxm_format_field_array(&s, &props->fields);
- VLOG_DBG(" Hash fields: %s", ds_cstr(&s));
+ VLOG_DBG("Hash fields: %s", ds_cstr(&s));
ds_destroy(&s);
group->selection_method = SEL_METHOD_HASH;
} else {
/* No hash fields. Fall back to original default hashing. */
- VLOG_DBG(" No hash fields. Falling back to default hash method.");
+ VLOG_DBG("No hash fields. Falling back to default hash method.");
group->selection_method = SEL_METHOD_DEFAULT;
}
} else {
diff --git a/ofproto/ofproto-dpif.h b/ofproto/ofproto-dpif.h
index e95fead..1a404c8 100644
--- a/ofproto/ofproto-dpif.h
+++ b/ofproto/ofproto-dpif.h
@@ -61,6 +61,7 @@ struct ofproto_async_msg;
struct ofproto_dpif;
struct uuid;
struct xlate_cache;
+struct xlate_ctx;

/* Number of implemented OpenFlow tables. */
enum { N_TABLES = 255 };
diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
index d636fb3..2b77b89 100644
--- a/ofproto/ofproto-provider.h
+++ b/ofproto/ofproto-provider.h
@@ -572,7 +572,7 @@ struct ofgroup {
const struct ovs_list buckets; /* Contains "struct ofputil_bucket"s. */
const uint32_t n_buckets;

- const struct ofputil_group_props props;
+ struct ofputil_group_props props;

struct rule_collection rules OVS_GUARDED; /* Referring rules. */
};
diff --git a/tests/mpls-xlate.at b/tests/mpls-xlate.at
index 9bbf22a..34d82a3 100644
--- a/tests/mpls-xlate.at
+++ b/tests/mpls-xlate.at
@@ -25,7 +25,7 @@ ***@ovs-dummy: hit:0 missed:0
])

dnl Setup single MPLS tags.
-AT_CHECK([ovs-ofctl -O OpenFlow13 add-group br0 group_id=1232,type=select,bucket=output:LOCAL])
+AT_CHECK([ovs-ofctl -O OpenFlow15 add-group br0 group_id=1232,type=select,selection_method=hash,bucket=output:LOCAL])
AT_CHECK([ovs-ofctl -O OpenFlow13 add-group br0 group_id=1233,type=all,bucket=output:LOCAL])
AT_CHECK([ovs-ofctl -O OpenFlow13 add-group br0 group_id=1234,type=all,bucket=dec_ttl,output:LOCAL])
AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 in_port=local,dl_type=0x0800,action=push_mpls:0x8847,set_field:10-\>mpls_label,output:1])
@@ -71,9 +71,15 @@ AT_CHECK([tail -1 stdout], [0],
[Datapath actions: pop_mpls(eth_type=0x800),recirc(0x2)
])

-AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'recirc_id(2),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=f8:bc:12:46:58:e0),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'], [0], [stdout])
-AT_CHECK([tail -1 stdout], [0],
- [Datapath actions: 100
+for d in 0 1 2 3; do
+ pkt="in_port(1),eth(src=f8:bc:12:44:34:b6,dst=f8:bc:12:46:58:e0),eth_type(0x8847),mpls(label=22,tc=0,ttl=64,bos=1)"
+ AT_CHECK([ovs-appctl netdev-dummy/receive p0 $pkt])
+done
+
+AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/packets.*actions:1/actions:1/' | strip_used | strip_ufid | sort], [0], [dnl
+flow-dump from non-dpdk interfaces:
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=f8:bc:12:44:34:b6,dst=f8:bc:12:46:58:e0),eth_type(0x8847),mpls(label=22/0xfffff,tc=0/0,ttl=64/0x0,bos=1/1), packets:3, bytes:54, used:0.0s, actions:pop_mpls(eth_type=0x800),recirc(0x3)
+recirc_id(0x3),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), actions:100
])

dnl Test MPLS pop then all group output (bucket actions do not trigger recirculation)
@@ -85,10 +91,10 @@ AT_CHECK([tail -1 stdout], [0],
dnl Test MPLS pop then all group output (bucket actions trigger recirculation)
AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(1),eth(src=f8:bc:12:44:34:b6,dst=f8:bc:12:46:58:e0),eth_type(0x8847),mpls(label=24,tc=0,ttl=64,bos=1)'], [0], [stdout])
AT_CHECK([tail -1 stdout], [0],
- [Datapath actions: pop_mpls(eth_type=0x800),recirc(0x3)
+ [Datapath actions: pop_mpls(eth_type=0x800),recirc(0x4)
])

-AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'recirc_id(3),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=f8:bc:12:46:58:e0),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'], [0], [stdout])
+AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'recirc_id(4),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=f8:bc:12:46:58:e0),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'], [0], [stdout])
AT_CHECK([tail -1 stdout], [0],
[Datapath actions: set(ipv4(ttl=63)),100
])
@@ -96,10 +102,10 @@ AT_CHECK([tail -1 stdout], [0],
dnl Test MPLS pop then all output to patch port
AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(1),eth(src=f8:bc:12:44:34:b6,dst=f8:bc:12:46:58:e0),eth_type(0x8847),mpls(label=25,tc=0,ttl=64,bos=1)'], [0], [stdout])
AT_CHECK([tail -1 stdout], [0],
- [Datapath actions: pop_mpls(eth_type=0x800),recirc(0x4)
+ [Datapath actions: pop_mpls(eth_type=0x800),recirc(0x5)
])

-AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'recirc_id(4),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=f8:bc:12:46:58:e0),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'], [0], [stdout])
+AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'recirc_id(5),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=f8:bc:12:46:58:e0),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'], [0], [stdout])
AT_CHECK([tail -1 stdout], [0],
[Datapath actions: 101
])
@@ -124,10 +130,10 @@ AT_CHECK([tail -1 stdout], [0],
dnl Double MPLS pop
AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(1),eth(src=f8:bc:12:44:34:b6,dst=f8:bc:12:46:58:e0),eth_type(0x8847),mpls(label=60,tc=0,ttl=64,bos=0,label=50,tc=0,ttl=64,bos=1)'], [0], [stdout])
AT_CHECK([tail -1 stdout], [0],
- [Datapath actions: pop_mpls(eth_type=0x8847),pop_mpls(eth_type=0x800),recirc(0x5)
+ [Datapath actions: pop_mpls(eth_type=0x8847),pop_mpls(eth_type=0x800),recirc(0x7)
])

-AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'recirc_id(5),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=f8:bc:12:46:58:e0),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'], [0], [stdout])
+AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'recirc_id(7),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=f8:bc:12:46:58:e0),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'], [0], [stdout])
AT_CHECK([tail -1 stdout], [0],
[Datapath actions: set(ipv4(ttl=10)),100
])
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 6d87951..00ab97b 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -337,10 +337,18 @@ OVS_VSWITCHD_START
add_of_ports br0 1 10
AT_CHECK([ovs-ofctl -O OpenFlow12 add-group br0 'group_id=1234,type=select,bucket=set_field:192.168.3.90->ip_src,output:10'])
AT_CHECK([ovs-ofctl -O OpenFlow12 add-flow br0 'ip actions=group:1234,output:10'])
-AT_CHECK([ovs-appctl ofproto/trace br0 'in_port=1,dl_src=50:54:00:00:00:05,dl_dst=50:54:00:00:00:07,dl_type=0x0800,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=1,nw_tos=0,nw_ttl=128,icmp_type=8,icmp_code=0'], [0], [stdout])
-AT_CHECK([tail -1 stdout], [0],
- [Datapath actions: set(ipv4(src=192.168.3.90,dst=192.168.0.2)),10,set(ipv4(src=192.168.0.1,dst=192.168.0.2)),10
+
+for d in 0 1 2 3; do
+ pkt="in_port(1),eth(src=50:54:00:00:00:07,dst=50:54:00:00:00:1),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.1.2,proto=1,tos=0,ttl=128,frag=no),icmp(type=8,code=0)"
+ AT_CHECK([ovs-appctl netdev-dummy/receive p1 $pkt])
+done
+
+AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/dp_hash(.*\/0xf)/dp_hash(0xXXXX\/0xf)/' | sed 's/packets.*actions:/actions:/' | strip_ufid | strip_used | sort], [0], [dnl
+flow-dump from non-dpdk interfaces:
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), actions:hash(sym_l4(0)),recirc(0x1)
+recirc_id(0x1),dp_hash(0xXXXX/0xf),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(src=192.168.0.1,frag=no), actions:set(ipv4(src=192.168.3.90)),10,set(ipv4(src=192.168.0.1)),10
])
+
OVS_VSWITCHD_STOP
AT_CLEANUP

@@ -397,81 +405,265 @@ AT_CLEANUP


AT_SETUP([ofproto-dpif - select group])
+
+# Helper function to check the spread of dp_hash flows over buckets in the datapath
+check_dpflow_stats () {
+ min_flows=$1
+ min_buckets=$2
+ read -d '' dpflows
+ hash_flow=`echo "$dpflows" | grep "actions:hash"`
+ n_flows=`echo "$dpflows" | grep -c dp_hash`
+ n_buckets=`echo "$dpflows" | grep dp_hash | grep -o "actions:[[0-9]]*" | sort | uniq -c | wc -l`
+ if [[ $n_flows -ge $min_flows ]]; then flows=ok; else flows=nok; fi
+ if [[ $n_buckets -ge $min_buckets ]]; then buckets=ok; else buckets=nok; fi
+ echo $hash_flow
+ echo "n_flows=$flows n_buckets=$buckets"
+}
+
OVS_VSWITCHD_START
add_of_ports br0 1 10 11
+
+ovs-appctl vlog/set ofproto_dpif:file:dbg
AT_CHECK([ovs-ofctl -O OpenFlow12 add-group br0 'group_id=1234,type=select,bucket=output:10,bucket=output:11'])
+AT_CHECK([grep -A6 "Constructing select group 1234" ovs-vswitchd.log | sed 's/^.*ofproto_dpif/ofproto_dpif/'], [0], [dnl
+ofproto_dpif|DBG|Constructing select group 1234
+ofproto_dpif|DBG|No selection method specified. Trying dp_hash.
+ofproto_dpif|DBG| Minimum weight: 1, total weight: 2
+ofproto_dpif|DBG| Using 16 hash values:
+ofproto_dpif|DBG| Bucket 0: weight=1, target=8.00 hits=8
+ofproto_dpif|DBG| Bucket 1: weight=1, target=8.00 hits=8
+ofproto_dpif|DBG|Use dp_hash with 16 hash values using algorithm 1.
+])
AT_CHECK([ovs-ofctl -O OpenFlow12 add-flow br0 'ip actions=write_actions(group:1234)'])

# Try a bunch of different flows and make sure that they get distributed
-# at least somewhat.
-for d in 0 1 2 3 4 5 6 7 8 9 a b c d e f; do
- AT_CHECK([ovs-appctl ofproto/trace br0 "in_port=1,dl_src=50:54:00:00:00:07,dl_dst=50:54:00:00:00:0$d,dl_type=0x0800,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=1,nw_tos=0,nw_ttl=128,icmp_type=8,icmp_code=0"], [0], [stdout])
- tail -1 stdout >> results
+# # at least somewhat.
+for d in 0 1 2 3; do
+ for s in 1 2 3 4 ; do
+ pkt="in_port(1),eth(src=50:54:00:00:00:07,dst=50:54:00:00:00:1),eth_type(0x0800),ipv4(src=192.168.0.$s,dst=192.168.1.$d,proto=1,tos=0,ttl=128,frag=no),icmp(type=8,code=0)"
+ AT_CHECK([ovs-appctl netdev-dummy/receive p1 $pkt])
+ done
done
-sort results | uniq -c
-AT_CHECK([sort results | uniq], [0],
- [Datapath actions: 10
-Datapath actions: 11
+
+AT_CHECK([ovs-appctl dpctl/dump-flows | sort | strip_ufid | strip_used | check_dpflow_stats 5 2], [0], [dnl
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:15, bytes:1590, used:0.0s, actions:hash(sym_l4(0)),recirc(0x1)
+n_flows=ok n_buckets=ok
])
+
OVS_VSWITCHD_STOP
AT_CLEANUP

AT_SETUP([ofproto-dpif - select group with watch port])
+
OVS_VSWITCHD_START
add_of_ports br0 1 10 11
AT_CHECK([ovs-ofctl -O OpenFlow12 add-group br0 'group_id=1234,type=select,bucket=watch_port:10,output:10,bucket=output:11'])
AT_CHECK([ovs-ofctl -O OpenFlow12 add-flow br0 'ip actions=write_actions(group:1234)'])
-AT_CHECK([ovs-appctl ofproto/trace br0 'in_port=1,dl_src=50:54:00:00:00:07,dl_dst=50:54:00:00:00:07,dl_type=0x0800,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=1,nw_tos=0,nw_ttl=128,icmp_type=8,icmp_code=0'], [0], [stdout])
-AT_CHECK([tail -1 stdout], [0],
- [Datapath actions: 11
+
+for d in 0 1 2 3; do
+ pkt="in_port(1),eth(src=50:54:00:00:00:07,dst=50:54:00:00:00:1),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=1,tos=0,ttl=128,frag=no),icmp(type=8,code=0)"
+ AT_CHECK([ovs-appctl netdev-dummy/receive p1 $pkt])
+done
+
+AT_CHECK([ovs-appctl dpctl/dump-flows | sort| sed 's/dp_hash(.*\/0xf)/dp_hash(0xXXXX\/0xf)/' | strip_ufid | strip_used], [0], [dnl
+flow-dump from non-dpdk interfaces:
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:3, bytes:318, used:0.0s, actions:hash(sym_l4(0)),recirc(0x1)
+recirc_id(0x1),dp_hash(0xXXXX/0xf),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:3, bytes:318, used:0.0s, actions:11
])
+
OVS_VSWITCHD_STOP
AT_CLEANUP

-AT_SETUP([ofproto-dpif - select group with weight])
+AT_SETUP([ofproto-dpif - select group with weights])
+
+# Helper function to check the spread of dp_hash flows over buckets in the datapath
+check_dpflow_stats () {
+ min_flows=$1
+ min_buckets=$2
+ read -d '' dpflows
+ hash_flow=`echo "$dpflows" | grep "actions:hash"`
+ n_flows=`echo "$dpflows" | grep -c dp_hash`
+ n_buckets=`echo "$dpflows" | grep dp_hash | grep -o "actions:[[0-9]]*" | sort | uniq -c | wc -l`
+ if [[ $n_flows -ge $min_flows ]]; then flows=ok; else flows=nok; fi
+ if [[ $n_buckets -ge $min_buckets ]]; then buckets=ok; else buckets=nok; fi
+ echo $hash_flow
+ echo "n_flows=$flows n_buckets=$buckets"
+}
+
+# Helper function to check the accuracy of distribution of packets over buckets
+check_group_stats () {
+ min=($1 $2 $3 $4)
+ buckets=`grep -o 'packet_count=[[0-9]]*' | cut -d'=' -f2 | tail -n +2`
+ i=0
+ for bucket in $buckets; do
+ if [[ $bucket -ge ${min[i]} ]]; then
+ echo "bucket$i >= ${min[[$i]]}"
+ else
+ echo "bucket$i < ${min[[$i]]}"
+ fi
+ (( i++ ))
+ if [[ $i -ge 4 ]]; then break; fi
+ done
+}
+
OVS_VSWITCHD_START
-add_of_ports br0 1 10 11 12
-AT_CHECK([ovs-ofctl -O OpenFlow12 add-group br0 'group_id=1234,type=select,bucket=output:10,bucket=output:11,weight=2000,bucket=output:12,weight=0'])
+add_of_ports br0 1 10 11 12 13 14
+
+ovs-appctl vlog/set ofproto_dpif:file:dbg
+AT_CHECK([ovs-ofctl -O OpenFlow13 add-group br0 'group_id=1234,type=select,bucket=weight:5,output:10,bucket=weight:10,output:11,bucket=weight:25,output:12,bucket=weight:60,output:13,bucket=weight:0,output:14'])
+AT_CHECK([grep -A9 "Constructing select group 1234" ovs-vswitchd.log | sed 's/^.*ofproto_dpif/ofproto_dpif/'], [0], [dnl
+ofproto_dpif|DBG|Constructing select group 1234
+ofproto_dpif|DBG|No selection method specified. Trying dp_hash.
+ofproto_dpif|DBG| Minimum weight: 5, total weight: 100
+ofproto_dpif|DBG| Using 32 hash values:
+ofproto_dpif|DBG| Bucket 0: weight=5, target=1.60 hits=2
+ofproto_dpif|DBG| Bucket 1: weight=10, target=3.20 hits=3
+ofproto_dpif|DBG| Bucket 2: weight=25, target=8.00 hits=8
+ofproto_dpif|DBG| Bucket 3: weight=60, target=19.20 hits=19
+ofproto_dpif|DBG| Bucket 4: weight=0, target=0.00 hits=0
+ofproto_dpif|DBG|Use dp_hash with 32 hash values using algorithm 1.
+])
AT_CHECK([ovs-ofctl -O OpenFlow12 add-flow br0 'ip actions=write_actions(group:1234)'])
-AT_CHECK([ovs-appctl ofproto/trace br0 'in_port=1,dl_src=50:54:00:00:00:07,dl_dst=50:54:00:00:00:07,dl_type=0x0800,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=1,nw_tos=0,nw_ttl=128,icmp_type=8,icmp_code=0'], [0], [stdout])
-AT_CHECK([tail -1 stdout], [0],
- [Datapath actions: 11
+
+# Try 1000 different flows and make sure that they get distributed according to weights
+for d1 in 0 1 2 3 4 5 6 7 8 9 ; do
+ for d2 in 0 1 2 3 4 5 6 7 8 9 ; do
+ for s in 0 1 2 3 4 5 6 7 8 9 ; do
+ pkt="in_port(1),eth(src=50:54:00:00:00:07,dst=50:54:00:00:00:1),eth_type(0x0800),ipv4(src=192.168.1.$s,dst=192.168.$d1.$d2,proto=6,tos=0,ttl=128,frag=no),tcp(src=1000$s,dst=1000)"
+ AT_CHECK([ovs-appctl netdev-dummy/receive p1 $pkt])
+ done
+ done
+done
+
+# Check balanced distribution over 32 dp_hash values
+AT_CHECK([ovs-appctl dpctl/dump-flows | sort | strip_ufid | strip_used | check_dpflow_stats 32 4 ], [0], [dnl
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:999, bytes:117882, used:0.0s, actions:hash(sym_l4(0)),recirc(0x1)
+n_flows=ok n_buckets=ok
+])
+
+# Check that actual distribution over the buckets is reasonably accurate:
+ ideal weights dp_hash values
+# bucket0: 5%*1000 = 50 2/32*1000 = 63
+# bucket1: 10%*1000 = 100 3/32*1000 = 94
+# bucket2: 25%*1000 = 250 8/32*1000 = 250
+# bucket3: 60%*1000 = 600 19/32*1000 = 594
+# bucket4: 0 0
+
+ovs-appctl time/warp 1000
+AT_CHECK([ovs-ofctl -O OpenFlow13 dump-group-stats br0 | sed 's/duration=[[0-9]]\.[[0-9]]*s,//' | check_group_stats 40 80 200 500],
+[0], [dnl
+bucket0 >= 40
+bucket1 >= 80
+bucket2 >= 200
+bucket3 >= 500
])
+
OVS_VSWITCHD_STOP
AT_CLEANUP

-AT_SETUP([ofproto-dpif - select group with hash selection method])
+AT_SETUP([ofproto-dpif - select group with explicit dp_hash selection method])
+
OVS_VSWITCHD_START
add_of_ports br0 1 10 11
-# Check that parse failures after 'fields' parsing work
-AT_CHECK([ovs-ofctl -O OpenFlow10 add-group br0 'group_id=1,type=select,fields(eth_dst),bukket=output:10'], [1], ,[dnl
-ovs-ofctl: unknown keyword bukket
+
+ovs-appctl vlog/set ofproto_dpif:file:dbg
+AT_CHECK([ovs-ofctl -O OpenFlow15 add-group br0 'group_id=1234,type=select,selection_method=dp_hash,bucket=output:10,bucket=output:11'])
+AT_CHECK([grep -A6 "Constructing select group 1234" ovs-vswitchd.log | sed 's/^.*ofproto_dpif/ofproto_dpif/'], [0], [dnl
+ofproto_dpif|DBG|Constructing select group 1234
+ofproto_dpif|DBG|Selection method specified: dp_hash.
+ofproto_dpif|DBG| Minimum weight: 1, total weight: 2
+ofproto_dpif|DBG| Using 16 hash values:
+ofproto_dpif|DBG| Bucket 0: weight=1, target=8.00 hits=8
+ofproto_dpif|DBG| Bucket 1: weight=1, target=8.00 hits=8
+ofproto_dpif|DBG|Use dp_hash with 16 hash values using algorithm 0.
+])
+
+# Fall back to legacy hash with zero buckets
+AT_CHECK([ovs-ofctl -O OpenFlow15 add-group br0 'group_id=1235,type=select,selection_method=dp_hash'])
+AT_CHECK([grep -A3 "Constructing select group 1235" ovs-vswitchd.log | sed 's/^.*ofproto_dpif/ofproto_dpif/'], [0], [dnl
+ofproto_dpif|DBG|Constructing select group 1235
+ofproto_dpif|DBG|Selection method specified: dp_hash.
+ofproto_dpif|DBG| Don't apply dp_hash method without buckets.
+ofproto_dpif|DBG|Falling back to default hash method.
+])
+
+# Fall back to legacy hash with zero buckets
+AT_CHECK([ovs-ofctl -O OpenFlow15 add-group br0 'group_id=1236,type=select,selection_method=dp_hash,bucket=weight=1,output:10,bucket=weight=1000,output:11'])
+AT_CHECK([grep -A4 "Constructing select group 1236" ovs-vswitchd.log | sed 's/^.*ofproto_dpif/ofproto_dpif/'], [0], [dnl
+ofproto_dpif|DBG|Constructing select group 1236
+ofproto_dpif|DBG|Selection method specified: dp_hash.
+ofproto_dpif|DBG| Minimum weight: 1, total weight: 1001
+ofproto_dpif|DBG| Too many hash values required: 1024
+ofproto_dpif|DBG|Falling back to default hash method.
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([ofproto-dpif - select group with legacy hash selection method])
+
+# Helper function to check the spread of dp_hash flows over buckets in the datapath
+check_dpflow_stats () {
+ min_flows=$1
+ min_buckets=$2
+ read -d '' dpflows
+ n_flows=`echo "$dpflows" | wc -l`
+ n_buckets=`echo "$dpflows" | grep -o "actions:[[0-9]]*" | sort | uniq -c | wc -l`
+ if [[ $n_flows -ge $min_flows ]]; then flows=ok; else flows=nok; fi
+ if [[ $n_buckets -ge $min_buckets ]]; then buckets=ok; else buckets=nok; fi
+ echo "n_flows=$flows n_buckets=$buckets"
+}
+
+OVS_VSWITCHD_START
+add_of_ports br0 1 10 11
+
+ovs-appctl vlog/set ofproto_dpif:file:dbg
+AT_CHECK([ovs-ofctl -O OpenFlow15 add-group br0 'group_id=1234,type=select,selection_method=hash,bucket=output:10,bucket=output:11'])
+AT_CHECK([grep -A2 "Constructing select group 1234" ovs-vswitchd.log | sed 's/^.*ofproto_dpif/ofproto_dpif/'], [0], [dnl
+ofproto_dpif|DBG|Constructing select group 1234
+ofproto_dpif|DBG|Selection method specified: hash.
+ofproto_dpif|DBG|No hash fields. Falling back to default hash method.
])
-AT_CHECK([ovs-ofctl -O OpenFlow15 add-group br0 'group_id=1234,type=select,selection_method=hash,fields(eth_dst,ip_dst,tcp_dst),bucket=output:10,bucket=output:11'])
+
AT_CHECK([ovs-ofctl -O OpenFlow15 add-flow br0 'ip actions=write_actions(group:1234)'])

-# Try a bunch of different flows and make sure that they get distributed
-# at least somewhat.
-for d in 0 1 2 3 4 5 6 7 8 9 a b c d e f; do
- AT_CHECK([ovs-appctl ofproto/trace br0 "in_port=1,dl_src=50:54:00:00:00:07,dl_dst=50:54:00:00:00:0$d,dl_type=0x0800,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=1,nw_tos=0,nw_ttl=128,icmp_type=8,icmp_code=0"], [0], [stdout])
- tail -1 stdout >> results
+# Try 16 flows with differing default hash values.
+for d in 0 1 2 3; do
+ for s in 1 2 3 4 ; do
+ pkt="in_port(1),eth(src=50:54:00:00:00:07,dst=50:54:00:00:00:1),eth_type(0x0800),ipv4(src=192.168.0.$s,dst=192.168.1.$d,proto=1,tos=0,ttl=128,frag=no),icmp(type=8,code=0)"
+ AT_CHECK([ovs-appctl netdev-dummy/receive p1 $pkt])
+ done
done
-sort results | uniq -c
-AT_CHECK([sort results | uniq], [0],
- [Datapath actions: 10
-Datapath actions: 11
+
+# Check that the packets installed 16 data path flows and each of the two
+# buckets is hit at least once.
+AT_CHECK([ovs-appctl dpctl/dump-flows | strip_ufid | strip_used | sort | check_dpflow_stats 16 2], [0], [dnl
+n_flows=ok n_buckets=ok
])

-> results
-# Try a bunch of different flows and make sure that they are not distributed
-# as they only vary a field that is not hashed
-for d in 0 1 2 3 4 5 6 7 8 9 a b c d e f; do
- AT_CHECK([ovs-appctl ofproto/trace br0 "in_port=1,dl_src=50:54:00:00:00:0$d,dl_dst=50:54:00:00:00:07,dl_type=0x0800,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=1,nw_tos=0,nw_ttl=128,icmp_type=8,icmp_code=0"], [0], [stdout])
- tail -1 stdout >> results
-done
-sort results | uniq -c
-AT_CHECK([sort results | uniq | sed 's/1[[01]]/1?/'], [0],
- [Datapath actions: 1?
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([ofproto-dpif - select group with custom hash selection method])
+
+# Helper function to check the spread of dp_hash flows over buckets in the datapath
+check_dpflow_stats () {
+ min_flows=$1
+ min_buckets=$2
+ read -d '' dpflows
+ n_flows=`echo "$dpflows" | wc -l`
+ n_buckets=`echo "$dpflows" | grep -o "actions:[[0-9]]*" | sort | uniq -c | wc -l`
+ if [[ $n_flows -ge $min_flows ]]; then flows=ok; else flows=nok; fi
+ if [[ $n_buckets -ge $min_buckets ]]; then buckets=ok; else buckets=nok; fi
+ echo "n_flows=$flows n_buckets=$buckets"
+}
+
+OVS_VSWITCHD_START
+add_of_ports br0 1 10 11
+
+# Check that parse failures after 'fields' parsing work
+AT_CHECK([ovs-ofctl -O OpenFlow10 add-group br0 'group_id=1,type=select,fields(eth_dst),bukket=output:10'], [1], ,[dnl
+ovs-ofctl: unknown keyword bukket
])

# Check that fields are rejected without "selection_method=hash".
@@ -484,43 +676,31 @@ AT_CHECK([ovs-ofctl -O OpenFlow15 add-group br0 'group_id=1235,type=select,selec
ovs-ofctl: selection_method_param is only allowed with "selection_method"
])

-OVS_VSWITCHD_STOP
-AT_CLEANUP
-
-AT_SETUP([ofproto-dpif - select group with dp_hash selection method])
-OVS_VSWITCHD_START
-add_of_ports br0 1 10 11
-AT_CHECK([ovs-ofctl -O OpenFlow15 add-group br0 'group_id=1234,type=select,selection_method=dp_hash,bucket=output:10,bucket=output:11'])
-AT_CHECK([ovs-ofctl -O OpenFlow15 add-flow br0 'ip,nw_src=192.168.0.1 actions=group:1234'])
+AT_CHECK([ovs-ofctl -O OpenFlow15 add-group br0 'group_id=1234,type=select,selection_method=hash,fields(eth_dst,ip_dst,tcp_dst),bucket=output:10,bucket=output:11'])
+AT_CHECK([ovs-ofctl -O OpenFlow15 add-flow br0 'ip actions=write_actions(group:1234)'])

-# Try a bunch of different flows and make sure that they get distributed
-# at least somewhat.
-for d in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do
- pkt="in_port(1),eth(src=50:54:00:00:00:07,dst=50:54:00:00:00:01),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.1.$d,proto=1,tos=0,ttl=128,frag=no),icmp(type=8,code=0)"
+# Try 16 flows with differing custom hash and check that they give rise to
+# 16 data path flows and each of the two buckets is hit at least once
+for d in 0 1 2 3 4 5 6 7 8 9 a b c d e f; do
+ pkt="in_port(1),eth(src=50:54:00:00:00:07,dst=50:54:00:00:00:$d),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=1,tos=0,ttl=128,frag=no),icmp(type=8,code=0)"
AT_CHECK([ovs-appctl netdev-dummy/receive p1 $pkt])
done

-AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/dp_hash(.*\/0xf)/dp_hash(0xXXXX\/0xf)/' | sed 's/packets.*actions:1/actions:1/' | \
- strip_ufid | strip_used | sort | uniq], [0], [dnl
-flow-dump from non-dpdk interfaces:
-recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(src=192.168.0.1,frag=no), packets:15, bytes:1590, used:0.0s, actions:hash(l4(0)),recirc(0x1)
-recirc_id(0x1),dp_hash(0xXXXX/0xf),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), actions:10
-recirc_id(0x1),dp_hash(0xXXXX/0xf),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), actions:11
+AT_CHECK([ovs-appctl dpctl/dump-flows | strip_ufid | strip_used | sort | check_dpflow_stats 16 2], [0], [dnl
+n_flows=ok n_buckets=ok
])

AT_CHECK([ovs-appctl revalidator/purge], [0])

-# Try a bunch of different flows and make sure that they are not distributed
-# as they only vary a field that is not hashed
+# Try 16 flows that differ only in fields that are not part of the custom
+# hash and check that there is only a single datapath flow
for d in 0 1 2 3 4 5 6 7 8 9 a b c d e f; do
pkt="in_port(1),eth(src=50:54:00:00:00:$d,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=1,tos=0,ttl=128,frag=no),icmp(type=8,code=0)"
AT_CHECK([ovs-appctl netdev-dummy/receive p1 $pkt])
done

-AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/dp_hash(.*\/0xf)/dp_hash(0xXXXX\/0xf)/' | sed 's/\(actions:1\)[[01]]/\1X/' | strip_ufid | strip_used | sort], [0], [dnl
-flow-dump from non-dpdk interfaces:
-recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(src=192.168.0.1,frag=no), packets:15, bytes:1590, used:0.0s, actions:hash(l4(0)),recirc(0x2)
-recirc_id(0x2),dp_hash(0xXXXX/0xf),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:15, bytes:1590, used:0.0s, actions:1X
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep -c recirc_id], [0], [dnl
+1
])

OVS_VSWITCHD_STOP
diff --git a/tests/ofproto-macros.at b/tests/ofproto-macros.at
index 9a37464..8923ce0 100644
--- a/tests/ofproto-macros.at
+++ b/tests/ofproto-macros.at
@@ -300,6 +300,11 @@ strip_used () {
sed 's/used:[[0-9]]\.[[0-9]]*/used:0.0/'
}

+# Removes all 'duration=...' to make output easier to compare.
+strip_duration () {
+ sed 's/duration=[[0-9]]*\.[[0-9]]*s,//'
+}
+
# Strips 'ufid:...' from output, to make it easier to compare.
# (ufids are random.)
strip_ufid () {
@@ -318,7 +323,7 @@ m4_define([_OVS_VSWITCHD_START],
[dnl Create database.
touch .conf.db.~lock~
AT_CHECK([ovsdb-tool create conf.db $abs_top_srcdir/vswitchd/vswitch.ovsschema])
-
+q
dnl Start ovsdb-server.
AT_CHECK([ovsdb-server --detach --no-chdir --pidfile --log-file --remote=punix:$OVS_RUNDIR/db.sock], [0], [], [stderr])
on_exit "kill `cat ovsdb-server.pid`"
diff --git a/utilities/ovs-ofctl.8.in b/utilities/ovs-ofctl.8.in
index 2e2f696..4f8555a 100644
--- a/utilities/ovs-ofctl.8.in
+++ b/utilities/ovs-ofctl.8.in
@@ -2120,28 +2120,23 @@ The selection method used to select a bucket for a select group.
This is a string of 1 to 15 bytes in length known to lower layers.
This field is optional for \fBadd\-group\fR, \fBadd\-groups\fR and
\fBmod\-group\fR commands on groups of type \fBselect\fR. Prohibited
-otherwise. The default value is the empty string.
+otherwise. If no selection method is specified, Open vSwitch up to
+release 2.9 applies the \fBhash\fR method with default fields. From
+2.10 onwards Open vSwitch defaults to the \fBdp_hash\fR method with symmetric
+L3/L4 hash algorithm, unless the weighted group buckets cannot be mapped to
+a maximum of 64 dp_hash values with sufficient accuracy.
+In those rare cases Open vSwitch 2.10 and later fall back to the \fBhash\fR
+method with the default set of hash fields.
.RS
-.IP \fBhash\fR
-Use a hash computed over the fields specified with the \fBfields\fR
-option, see below. \fBhash\fR uses the \fBselection_method_param\fR
-as the hash basis.
-.IP
-Note that the hashed fields become exact matched by the datapath
-flows. For example, if the TCP source port is hashed, the created
-datapath flows will match the specific TCP source port value present
-in the packet received. Since each TCP connection generally has a
-different source port value, a separate datapath flow will be need to
-be inserted for each TCP connection thus hashed to a select group
-bucket.
.IP \fBdp_hash\fR
Use a datapath computed hash value. The hash algorithm varies accross
different datapath implementations. \fBdp_hash\fR uses the upper 32
bits of the \fBselection_method_param\fR as the datapath hash
-algorithm selector, which currently must always be 0, corresponding to
-hash computation over the IP 5-tuple (selecting specific fields with
-the \fBfields\fR option is not allowed with \fBdp_hash\fR). The lower
-32 bits are used as the hash basis.
+algorithm selector. The supported values are \fB0\fR (corresponding to
+hash computation over the IP 5-tuple) and \fB1\fR (corresponding to a
+\fIsymmetric\fR hash computation over the IP 5-tuple). Selecting specific
+fields with the \fBfields\fR option is not supported with \fBdp_hash\fR).
+The lower 32 bits are used as the hash basis.
.IP
Using \fBdp_hash\fR has the advantage that it does not require the
generated datapath flows to exact match any additional packet header
@@ -2155,9 +2150,23 @@ when needed, and a second match is required to match some bits of its
value. This double-matching incurs a small additional latency cost
for each packet, but this latency is orders of magnitude less than the
latency of creating new datapath flows for new TCP connections.
+.IP \fBhash\fR
+Use a hash computed over the fields specified with the \fBfields\fR
+option, see below. If no hash fields are specified, \fBhash\fR defaults
+to a symmetric hash over the combination of MAC addresses, VLAN tags,
+Ether type, IP addresses and L4 port numbers. \fBhash\fR uses the
+\fBselection_method_param\fR as the hash basis.
+.IP
+Note that the hashed fields become exact matched by the datapath
+flows. For example, if the TCP source port is hashed, the created
+datapath flows will match the specific TCP source port value present
+in the packet received. Since each TCP connection generally has a
+different source port value, a separate datapath flow will be need to
+be inserted for each TCP connection thus hashed to a select group
+bucket.
.RE
.IP
-This option will use a Netronome OpenFlow extension which is only supported
+This option uses a Netronome OpenFlow extension which is only supported
when using Open vSwitch 2.4 and later with OpenFlow 1.5 and later.

.IP \fBselection_method_param\fR=\fIparam\fR
@@ -2167,7 +2176,7 @@ lower-layer that implements the \fBselection_method\fR. It is optional if
the \fBselection_method\fR field is specified as a non-empty string.
Prohibited otherwise. The default value is zero.
.IP
-This option will use a Netronome OpenFlow extension which is only supported
+This option uses a Netronome OpenFlow extension which is only supported
when using Open vSwitch 2.4 and later with OpenFlow 1.5 and later.

.IP \fBfields\fR=\fIfield\fR
--
1.9.1
Jan Scheurich
2018-05-24 15:28:00 UTC
Permalink
The current implementation of the "dp_hash" selection method suffers
from two deficiences: 1. The hash mask and hence the number of dp_hash
values is just large enough to cover the number of group buckets, but
does not consider the case that buckets have different weights. 2. The
xlate-time selection of best bucket from the masked dp_hash value often
results in bucket load distributions that are quite different from the
bucket weights because the number of available masked dp_hash values
is too small (2-6 bits compared to 32 bits of a full hash in the default
hash selection method).

This commit provides a more accurate implementation of the dp_hash
select group by applying the well known Webster method for distributing
a small number of "seats" fairly over the weighted "parties"
(see https://en.wikipedia.org/wiki/Webster/Sainte-Lagu%C3%AB_method).
The dp_hash mask is autmatically chosen large enough to provide good
enough accuracy even with widely differing weights.

This distribution happens at group modification time and the resulting
table is stored with the group-dpif struct. At xlation time, we use the
masked dp_hash values as index to look up the assigned bucket.

If the bucket should not be live, we do a circular search over the
mapping table until we find the first live bucket. As the buckets in
the table are by construction in pseudo-random order with a frequency
according to their weight, this method maintains correct distribution
even if one or more buckets are non-live.

Xlation is further simplified by storing some derived select group state
at group construction in struct group-dpif in a form better suited for
xlation purposes.

Adapted the unit test case for dp_hash select group accordingly.

Signed-off-by: Jan Scheurich <***@ericsson.com>
Signed-off-by: Nitin Katiyar <***@ericsson.com>
Co-authored-by: Nitin Katiyar <***@ericsson.com>
---
lib/odp-util.c | 4 +-
ofproto/ofproto-dpif-xlate.c | 59 ++++++++++-------
ofproto/ofproto-dpif.c | 150 +++++++++++++++++++++++++++++++++++++++++++
ofproto/ofproto-dpif.h | 13 ++++
tests/ofproto-dpif.at | 15 +++--
5 files changed, 211 insertions(+), 30 deletions(-)

diff --git a/lib/odp-util.c b/lib/odp-util.c
index 105ac80..8d4afa0 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -595,7 +595,9 @@ format_odp_hash_action(struct ds *ds, const struct ovs_action_hash *hash_act)
ds_put_format(ds, "hash(");

if (hash_act->hash_alg == OVS_HASH_ALG_L4) {
- ds_put_format(ds, "hash_l4(%"PRIu32")", hash_act->hash_basis);
+ ds_put_format(ds, "l4(%"PRIu32")", hash_act->hash_basis);
+ } else if (hash_act->hash_alg == OVS_HASH_ALG_SYM_L4) {
+ ds_put_format(ds, "sym_l4(%"PRIu32")", hash_act->hash_basis);
} else {
ds_put_format(ds, "Unknown hash algorithm(%"PRIu32")",
hash_act->hash_alg);
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 9f7fca7..c990d8a 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -4392,27 +4392,37 @@ pick_hash_fields_select_group(struct xlate_ctx *ctx, struct group_dpif *group)
static struct ofputil_bucket *
pick_dp_hash_select_group(struct xlate_ctx *ctx, struct group_dpif *group)
{
+ uint32_t dp_hash = ctx->xin->flow.dp_hash;
+
/* dp_hash value 0 is special since it means that the dp_hash has not been
* computed, as all computed dp_hash values are non-zero. Therefore
* compare to zero can be used to decide if the dp_hash value is valid
* without masking the dp_hash field. */
- if (!ctx->xin->flow.dp_hash) {
- uint64_t param = group->up.props.selection_method_param;
-
- ctx_trigger_recirculate_with_hash(ctx, param >> 32, (uint32_t)param);
+ if (!dp_hash) {
+ enum ovs_hash_alg hash_alg = group->hash_alg;
+ if (hash_alg > ctx->xbridge->support.max_hash_alg) {
+ /* Algorithm supported by all datapaths. */
+ hash_alg = OVS_HASH_ALG_L4;
+ }
+ ctx_trigger_recirculate_with_hash(ctx, hash_alg, group->hash_basis);
return NULL;
} else {
- uint32_t n_buckets = group->up.n_buckets;
- if (n_buckets) {
- /* Minimal mask to cover the number of buckets. */
- uint32_t mask = (1 << log_2_ceil(n_buckets)) - 1;
- /* Multiplier chosen to make the trivial 1 bit case to
- * actually distribute amongst two equal weight buckets. */
- uint32_t basis = 0xc2b73583 * (ctx->xin->flow.dp_hash & mask);
-
- ctx->wc->masks.dp_hash |= mask;
- return group_best_live_bucket(ctx, group, basis);
+ uint32_t hash_mask = group->hash_mask;
+ ctx->wc->masks.dp_hash |= hash_mask;
+
+ /* Starting from the original masked dp_hash value iterate over the
+ * hash mapping table to find the first live bucket. As the buckets
+ * are quasi-randomly spread over the hash values, this maintains
+ * a distribution according to bucket weights even when some buckets
+ * are non-live. */
+ for (int i = 0; i <= hash_mask; i++) {
+ struct ofputil_bucket *b =
+ group->hash_map[(dp_hash + i) & hash_mask];
+ if (bucket_is_alive(ctx, b, 0)) {
+ return b;
+ }
}
+
return NULL;
}
}
@@ -4427,17 +4437,22 @@ pick_select_group(struct xlate_ctx *ctx, struct group_dpif *group)
ctx_trigger_freeze(ctx);
}

- const char *selection_method = group->up.props.selection_method;
- if (selection_method[0] == '\0') {
+ switch (group->selection_method) {
+ case SEL_METHOD_DEFAULT:
return pick_default_select_group(ctx, group);
- } else if (!strcasecmp("hash", selection_method)) {
+ break;
+ case SEL_METHOD_HASH:
return pick_hash_fields_select_group(ctx, group);
- } else if (!strcasecmp("dp_hash", selection_method)) {
+ break;
+ case SEL_METHOD_DP_HASH:
return pick_dp_hash_select_group(ctx, group);
- } else {
- /* Parsing of groups should ensure this never happens */
+ break;
+ default:
+ /* Parsing of groups ensures this never happens */
OVS_NOT_REACHED();
}
+
+ return NULL;
}

static void
@@ -4731,8 +4746,8 @@ finish_freezing__(struct xlate_ctx *ctx, uint8_t table)
act_hash = nl_msg_put_unspec_uninit(ctx->odp_actions,
OVS_ACTION_ATTR_HASH,
sizeof *act_hash);
- act_hash->hash_alg = OVS_HASH_ALG_L4; /* Make configurable. */
- act_hash->hash_basis = 0; /* Make configurable. */
+ act_hash->hash_alg = ctx->dp_hash_alg;
+ act_hash->hash_basis = ctx->dp_hash_basis;
}
nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_RECIRC, recirc_id);
}
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 7162811..c9c2e51 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -32,6 +32,7 @@
#include "lacp.h"
#include "learn.h"
#include "mac-learning.h"
+#include "math.h"
#include "mcast-snooping.h"
#include "multipath.h"
#include "netdev-vport.h"
@@ -4762,6 +4763,147 @@ group_dpif_credit_stats(struct group_dpif *group,
ovs_mutex_unlock(&group->stats_mutex);
}

+/* Calculate the dp_hash mask needed to provide the least weighted bucket
+ * with at least one hash value and construct a mapping table from masked
+ * dp_hash value to group bucket using the Webster method.
+ * If the caller specifies a non-zero max_hash value, abort and return false
+ * if more hash values would be required. The absolute maximum number of
+ * hash values supported is 256. */
+
+#define MAX_SELECT_GROUP_HASH_VALUES 256
+
+static bool
+group_setup_dp_hash_table(struct group_dpif *group, size_t max_hash)
+{
+ struct ofputil_bucket *bucket;
+ uint32_t n_buckets = group->up.n_buckets;
+ uint64_t total_weight = 0;
+ uint16_t min_weight = UINT16_MAX;
+ struct webster {
+ struct ofputil_bucket *bucket;
+ uint32_t divisor;
+ double value;
+ int hits;
+ } *webster;
+
+ if (n_buckets == 0) {
+ VLOG_DBG(" Don't apply dp_hash method without buckets");
+ return false;
+ }
+
+ webster = xcalloc(n_buckets, sizeof(struct webster));
+ int i = 0;
+ LIST_FOR_EACH (bucket, list_node, &group->up.buckets) {
+ if (bucket->weight > 0 && bucket->weight < min_weight) {
+ min_weight = bucket->weight;
+ }
+ total_weight += bucket->weight;
+ webster[i].bucket = bucket;
+ webster[i].divisor = 1;
+ webster[i].value = bucket->weight;
+ webster[i].hits = 0;
+ i++;
+ }
+
+ if (total_weight == 0) {
+ VLOG_DBG(" Total weight is zero. No active buckets.");
+ free(webster);
+ return false;
+ }
+ VLOG_DBG(" Minimum weight: %d, total weight: %"PRIu64,
+ min_weight, total_weight);
+
+ uint64_t min_slots = DIV_ROUND_UP(total_weight, min_weight);
+ uint64_t min_slots2 = ROUND_UP_POW2(min_slots);
+ uint64_t n_hash = MAX(16, min_slots2);
+ if (n_hash > MAX_SELECT_GROUP_HASH_VALUES ||
+ (max_hash != 0 && n_hash > max_hash)) {
+ VLOG_DBG(" Too many hash values required: %"PRIu64, n_hash);
+ return false;
+ }
+
+ VLOG_DBG(" Using %"PRIu64" hash values:", n_hash);
+ group->hash_mask = n_hash - 1;
+ if (group->hash_map) {
+ free(group->hash_map);
+ }
+ group->hash_map = xcalloc(n_hash, sizeof(struct ofputil_bucket *));
+
+ /* Use Webster method to distribute hash values over buckets. */
+ for (int hash = 0; hash < n_hash; hash++) {
+ struct webster *winner = &webster[0];
+ for (i = 1; i < n_buckets; i++) {
+ if (webster[i].value > winner->value) {
+ winner = &webster[i];
+ }
+ }
+ winner->hits++;
+ winner->divisor += 2;
+ winner->value = (double) winner->bucket->weight / winner->divisor;
+ group->hash_map[hash] = winner->bucket;
+ }
+
+ i = 0;
+ LIST_FOR_EACH (bucket, list_node, &group->up.buckets) {
+ double target = (n_hash * bucket->weight) / (double) total_weight;
+ VLOG_DBG(" Bucket %d: weight=%d, target=%.2f hits=%d",
+ bucket->bucket_id, bucket->weight,
+ target, webster[i].hits);
+ i++;
+ }
+
+ free(webster);
+ return true;
+}
+
+static void
+group_set_selection_method(struct group_dpif *group)
+{
+ const struct ofputil_group_props *props = &group->up.props;
+ const char *selection_method = props->selection_method;
+
+ if (selection_method[0] == '\0') {
+ VLOG_DBG("No selection method specified.");
+ group->selection_method = SEL_METHOD_DEFAULT;
+ } else if (!strcmp(selection_method, "dp_hash")) {
+ VLOG_DBG("Selection method specified: dp_hash.");
+ /* Try to use dp_hash if possible at all. */
+ if (group_setup_dp_hash_table(group, 0)) {
+ group->selection_method = SEL_METHOD_DP_HASH;
+ group->hash_alg = props->selection_method_param >> 32;
+ if (group->hash_alg >= __OVS_HASH_MAX) {
+ VLOG_DBG(" Invalid dp_hash algorithm %d. "
+ "Defaulting to OVS_HASH_ALG_L4", group->hash_alg);
+ group->hash_alg = OVS_HASH_ALG_L4;
+ }
+ group->hash_basis = (uint32_t) props->selection_method_param;
+ VLOG_DBG("Use dp_hash with %d hash values using algorithm %d.",
+ group->hash_mask + 1, group->hash_alg);
+ } else {
+ /* Fall back to original default hashing in slow path. */
+ VLOG_DBG(" Falling back to default hash method.");
+ group->selection_method = SEL_METHOD_DEFAULT;
+ }
+ } else if (!strcmp(selection_method, "hash")) {
+ VLOG_DBG("Selection method specified: hash.");
+ if (props->fields.values_size > 0) {
+ /* Controller has specified hash fields. */
+ struct ds s = DS_EMPTY_INITIALIZER;
+ oxm_format_field_array(&s, &props->fields);
+ VLOG_DBG(" Hash fields: %s", ds_cstr(&s));
+ ds_destroy(&s);
+ group->selection_method = SEL_METHOD_HASH;
+ } else {
+ /* No hash fields. Fall back to original default hashing. */
+ VLOG_DBG(" No hash fields. Falling back to default hash method.");
+ group->selection_method = SEL_METHOD_DEFAULT;
+ }
+ } else {
+ /* Parsing of groups should ensure this never happens */
+ OVS_NOT_REACHED();
+ }
+}
+
static enum ofperr
group_construct(struct ofgroup *group_)
{
@@ -4770,6 +4912,10 @@ group_construct(struct ofgroup *group_)
ovs_mutex_init_adaptive(&group->stats_mutex);
ovs_mutex_lock(&group->stats_mutex);
group_construct_stats(group);
+ group->hash_map = NULL;
+ if (group->up.type == OFPGT11_SELECT) {
+ group_set_selection_method(group);
+ }
ovs_mutex_unlock(&group->stats_mutex);
return 0;
}
@@ -4779,6 +4925,10 @@ group_destruct(struct ofgroup *group_)
{
struct group_dpif *group = group_dpif_cast(group_);
ovs_mutex_destroy(&group->stats_mutex);
+ if (group->hash_map) {
+ free(group->hash_map);
+ group->hash_map = NULL;
+ }
}

static enum ofperr
diff --git a/ofproto/ofproto-dpif.h b/ofproto/ofproto-dpif.h
index d654947..e95fead 100644
--- a/ofproto/ofproto-dpif.h
+++ b/ofproto/ofproto-dpif.h
@@ -119,6 +119,12 @@ rule_dpif_is_internal(const struct rule_dpif *rule)

/* Groups. */

+enum group_selection_method {
+ SEL_METHOD_DEFAULT,
+ SEL_METHOD_DP_HASH,
+ SEL_METHOD_HASH,
+};
+
struct group_dpif {
struct ofgroup up;

@@ -129,6 +135,12 @@ struct group_dpif {
struct ovs_mutex stats_mutex;
uint64_t packet_count OVS_GUARDED; /* Number of packets received. */
uint64_t byte_count OVS_GUARDED; /* Number of bytes received. */
+
+ enum group_selection_method selection_method;
+ enum ovs_hash_alg hash_alg; /* dp_hash algorithm to be applied. */
+ uint32_t hash_basis; /* Basis for dp_hash. */
+ uint32_t hash_mask; /* Used to mask dp_hash (2^N - 1).*/
+ struct ofputil_bucket **hash_map; /* Map hash values to buckets. */
};

void group_dpif_credit_stats(struct group_dpif *,
@@ -137,6 +149,7 @@ void group_dpif_credit_stats(struct group_dpif *,
struct group_dpif *group_dpif_lookup(struct ofproto_dpif *,
uint32_t group_id, ovs_version_t version,
bool take_ref);
+

/* Backers.
*
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index fe42a57..6d87951 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -500,11 +500,12 @@ for d in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do
AT_CHECK([ovs-appctl netdev-dummy/receive p1 $pkt])
done

-AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/dp_hash(.*\/0x1)/dp_hash(0xXXXX\/0x1)/' | sed 's/packets.*actions:1/actions:1/' | strip_ufid | strip_used | sort], [0], [dnl
+AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/dp_hash(.*\/0xf)/dp_hash(0xXXXX\/0xf)/' | sed 's/packets.*actions:1/actions:1/' | \
+ strip_ufid | strip_used | sort | uniq], [0], [dnl
flow-dump from non-dpdk interfaces:
-recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(src=192.168.0.1,frag=no), packets:15, bytes:1590, used:0.0s, actions:hash(hash_l4(0)),recirc(0x1)
-recirc_id(0x1),dp_hash(0xXXXX/0x1),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), actions:10
-recirc_id(0x1),dp_hash(0xXXXX/0x1),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), actions:11
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(src=192.168.0.1,frag=no), packets:15, bytes:1590, used:0.0s, actions:hash(l4(0)),recirc(0x1)
+recirc_id(0x1),dp_hash(0xXXXX/0xf),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), actions:10
+recirc_id(0x1),dp_hash(0xXXXX/0xf),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), actions:11
])

AT_CHECK([ovs-appctl revalidator/purge], [0])
@@ -516,10 +517,10 @@ for d in 0 1 2 3 4 5 6 7 8 9 a b c d e f; do
AT_CHECK([ovs-appctl netdev-dummy/receive p1 $pkt])
done

-AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/dp_hash(.*\/0x1)/dp_hash(0xXXXX\/0x1)/' | sed 's/\(actions:1\)[[01]]/\1X/' | strip_ufid | strip_used | sort], [0], [dnl
+AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/dp_hash(.*\/0xf)/dp_hash(0xXXXX\/0xf)/' | sed 's/\(actions:1\)[[01]]/\1X/' | strip_ufid | strip_used | sort], [0], [dnl
flow-dump from non-dpdk interfaces:
-recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(src=192.168.0.1,frag=no), packets:15, bytes:1590, used:0.0s, actions:hash(hash_l4(0)),recirc(0x2)
-recirc_id(0x2),dp_hash(0xXXXX/0x1),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:15, bytes:1590, used:0.0s, actions:1X
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(src=192.168.0.1,frag=no), packets:15, bytes:1590, used:0.0s, actions:hash(l4(0)),recirc(0x2)
+recirc_id(0x2),dp_hash(0xXXXX/0xf),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:15, bytes:1590, used:0.0s, actions:1X
])

OVS_VSWITCHD_STOP
--
1.9.1
Ben Pfaff
2018-05-25 22:10:19 UTC
Permalink
Post by Jan Scheurich
The current default OpenFlow select group implementation sends every new L4 flow
to the slow path for the balancing decision and installs a 5-tuple "miniflow"
in the datapath to forward subsequent packets of the connection accordingly.
Clearly this has major scalability issues with many parallel L4 flows and high
connection setup rates.
The dp_hash selection method for the OpenFlow select group was added to OVS
as an alternative. It avoids the scalability issues for the price of an
additional recirculation in the datapath. The dp_hash method is only available
to OF1.5 SDN controllers speaking the Netronome Group Mod extension to
configure the selection mechanism. This severely limited the applicability of
the dp_hash select group in the past.
Furthermore, testing revealed that the implemented dp_hash selection often
generated a very uneven distribution of flows over group buckets and didn't
consider bucket weights at all.
The present patch set in a first step improves the dp_hash selection method to
much more accurately distribute flows over weighted group buckets and to
apply a symmetric dp_hash function to maintain the symmetry property of the
legacy hash function. In a second step it makes the improved dp_hash method
the default in OVS for select groups that can be accurately handled by dp_hash.
That should be the vast majority of cases. Otherwise we fall back to the
legacy slow-path selection method.
The Netronome extension can still be used to override the default decision and
require the legacy slow-path or the dp_hash selection method.
Thanks a lot. I applied this series to master.

Loading...