get_metrics: rewrite in Python, refactor output a bit

The script has gotten slow because of the test authors check.
It feels like the # of test files (i.e. inputs to git log) causes the
time to explode.

In commit, 82a75278c4c6 ("get_metrics.sh: use git-grep to cut runtime by
>70%"), I'd reduced the overall script runtime to 9s. Now it's more
like 20s :(

We'll probably need to do some more clever tricks to make this more
bearable as KUnit grows, so let's use a real programming language.

This change also
* changes output order to print the slow test authors result last
* makes sure we exclude internal contributors from contributor count,
  not just the contribution patch count.
* no longer assumes we're running the script from inside the Linux tree
  (probably using it from your checkout of this repo)

Before:
$ <path>/get_metrics.sh | ts -s '%M:%.S'
00:17.747701 Unique contributors:       35
00:17.747838 Number of patches: 73
00:17.747866 Number of total tests:     55
00:17.747879 Number of test authors:    93
00:17.747891 Number of test cases:      494

After:
$ python3 -u <path>/get_metrics.py | ts -s '%M:%.S'
00:04.278519 Number of test cases: 494
00:04.278704 Number of patches: 73
00:04.278722 Unique contributors: 30
00:04.278733 Number of total tests: 55
00:15.614618 Number of test authors: 93

I.e. we get most of the results within 4s and also see some small speed
benefits to doing our output processing in Python.

Signed-off-by: Daniel Latypov <dlatypov@google.com>
Change-Id: Iffb2baf2294bc4f56e3739b50e1ac4ea45a13e6b
diff --git a/get_metrics.py b/get_metrics.py
new file mode 100644
index 0000000..4f6fbb2
--- /dev/null
+++ b/get_metrics.py
@@ -0,0 +1,69 @@
+import argparse
+import subprocess
+from typing import Iterable, List, Tuple
+
+_TEAM_AUTHORS = frozenset([
+    'Brendan Higgins', 'David Gow', 'Heidi Fahim', 'Felix Guo',
+    'Avinash Kondareddy', 'Daniel Latypov'
+])
+
+# Manual list of files to exclude in `_test_names`.
+_NOT_TESTS=frozenset(['lib/kunit/test.c'])
+
+def _outside_contributors_and_patches() -> Tuple[int, int]:
+  cmd = ['git', 'log', '--format=%an', '--', 'include/kunit', 'lib/kunit', 'tools/testing/kunit']
+  contributors = set()
+  patches = 0
+  with subprocess.Popen(cmd, stdout=subprocess.PIPE, text=True) as p:
+    for author in p.stdout:
+      if author.strip() not in _TEAM_AUTHORS:
+        contributors.add(author)
+        patches += 1
+  return len(contributors), patches
+
+def _is_test_file(filename: str) -> bool:
+  if filename.startswith('Documentation/') or filename in _NOT_TESTS:
+    return False
+  return 'test' in filename or filename.endswith('_kunit.c')
+
+def _test_names() -> List[str]:
+  cmd = ['git', 'grep', '-l',  '-e', '#include <kunit/test.h']
+  with subprocess.Popen(cmd, stdout=subprocess.PIPE, text=True) as p:
+    return list(filter(_is_test_file, (line.strip() for line in p.stdout)))
+
+def _test_authors(test_names: List[str]) -> List[str]:
+  cmd = ['git', '--no-pager', 'shortlog', '--no-merges', '-n', '-s', '--'] + test_names
+  return subprocess.check_output(cmd, text=True).strip().split('\n')
+
+def _num_test_cases() -> int:
+  cmd = ['git', 'grep', 'KUNIT_CASE']
+  count = 0
+  with subprocess.Popen(cmd, stdout=subprocess.PIPE, text=True) as p:
+    for line in p.stdout:
+      line = line.strip()
+      if line.startswith('Documentation/') or line.startswith('include/kunit/test.h'):
+        continue
+      count += 1
+    return count
+
+
+parser = argparse.ArgumentParser(description='Optional app description')
+parser.add_argument('-v', '--verbose', action='store_true')
+
+args = parser.parse_args()
+
+num_test_cases = _num_test_cases()
+test_names = _test_names()
+contributors, patches = _outside_contributors_and_patches()
+
+print(f'Number of test cases: {num_test_cases}')
+print(f'Number of patches: {patches}')
+print(f'Unique contributors: {contributors}')
+print(f'Number of total tests: {len(test_names)}')
+
+test_authors = _test_authors(test_names)  # this step takes ~15s as of Aug 2022
+print(f'Number of test authors: {len(test_authors)}')
+
+if args.verbose:
+  print(f'All tests: ' + '\n'.join(test_names))
+  print(f'Test authors: ' + '\n'.join(test_authors))
diff --git a/get_metrics.sh b/get_metrics.sh
deleted file mode 100755
index 6a75d0c..0000000
--- a/get_metrics.sh
+++ /dev/null
@@ -1,36 +0,0 @@
-#!/bin/bash
-
-VERBOSE=
-[[ $1 == "-v" ]] && VERBOSE=y
-
-unique_contributors_num=$(git --no-pager shortlog --no-merges -n -s -- include/kunit lib/kunit tools/testing/kunit/ | wc -l)
-patches_not_from_team_num=$(git log --pretty=oneline --perl-regexp \
-  --author='^((?!Brendan Higgins|David Gow|Heidi Fahim|Felix Guo|Avinash Kondareddy|Daniel Latypov).*)$' \
-  -- include/kunit lib/kunit tools/testing/kunit/ | wc -l)
-
-# Manual list of files to exclude in `all_tests`.
-not_tests='lib/kunit/test.c
-'
-
-all_tests=$(for file_name in $(git grep -l -e '#include <kunit/test.h>' | grep -E 'test|_kunit\.c' | grep -v '^Documentation')
-do
-  if [[ ! $not_tests =~ (^|[[:space:]])"$file_name"($|[[:space:]]) ]] ; then
-    printf "$file_name\n"
-  fi
-done)
-
-tests_total_num=$(echo "$all_tests" | wc -l)
-test_authors=$(git --no-pager shortlog --no-merges -n -s -- $all_tests)
-test_author_num=$(echo "$test_authors" | wc -l)
-test_case_num=$(git grep 'KUNIT_CASE' | grep -Ev '^Documentation/|get_metrics.sh|include/kunit/test.h' | wc -l)
-
-
-if [[ -n $VERBOSE ]]; then
-  printf "All tests:\t$all_tests\n"
-  printf "Test authors:\t$test_authors\n"
-fi
-printf "Unique contributors:\t$unique_contributors_num\n"
-printf "Number of patches:\t$patches_not_from_team_num\n"
-printf "Number of total tests:\t$tests_total_num\n"
-printf "Number of test authors:\t$test_author_num\n"
-printf "Number of test cases:\t$test_case_num\n"