EAHub Matchmaking API specs

# EAHub Matchmaking API specs # APIs General helpers. Not necessary to manually call them. Some of them are used internally by other API-s: - `ping_server() -> bool` - Check server uptime status - The API operates synchronously (the signature is async for python coroutine purposes, but the API itself is fast from the caller's perspective). - `poll_job(model_name: str, job_id: str)` - Get job status - The API operates synchronously (the signature is async for python coroutine purposes, but the API itself is fast from the caller's perspective). - `get_job_result(model_name: str, job_id: str)` - Wait for the job completion and return result - The API operates a-synchronously and might take time to compute. ## Main API-s #### indexing - configuration, initial indexing, and profile updates - `configure_profile_field_relations(str_profiles_fields: List[str], tag_profiles_fields: List[str], relations_matrix: List[List[bool]])` - Provides a boolean matrix of profile field relations - which profile fields in one profile can be matched to which profile fields in the other profile. The matrix must be symmetric. In the future we might extend this API to accept weights in the range 0-1, but currently it is accepting only boolean relations. - Calling this API is not mandatory since there is a preconfigured default profile field relations matrix present in the matchmaking service. - This API must be called in case there are new profile fields which were not present before. If the profile fields are added then after calling this API and waiting for it to complete, reindex_all_profiles() API needs called in sequence. - The API operates a-synchronously and might take time to compute. ```python= def configure_profile_field_relations( str_profiles_fields: List[str], # "id" field not listed here tag_profiles_fields: List[str], relations_matrix=[ [True, False, True], [False, False, False], [True, False, True], ] ): pass { "offering": [], # counts as inclusion and is mandatory for fields with no excluded pairs "looking_for": ["looking_for", "organisational_affilications"], "summary": ["looking_for", "organisational_affilications"] } ``` - `configure_profile_field_relation_inclusions(str_profiles_fields: List[str], tag_profiles_fields: List[str], relations: Dict[str, List[str]])` - Includes the field pairs to matching, and every field pair not mentioned is automatically excluded in the matching. - Calling this API is not mandatory since there is a preconfigured default profile field relations matrix present in the matchmaking service. - This API must be called in case there are new profile fields which were not present before. If the profile fields are added then after calling this API and waiting for it to complete, reindex_all_profiles() API needs called in sequence. - The API operates a-synchronously and might take time to compute. ```python= def configure_profile_field_relation_inclusions( str_profiles_fields=List[str], # "id" field not listed here tag_profiles_fields=List[str], relations={ "offering": [], # counts as exclusion from all matching "looking_for": ["offering", ...], "summary": ["summary", "offering", ... ] }): pass ``` - `configure_profile_feld_relation_exclusions(str_profiles_fields: List[str], tag_profiles_fields: List[str], relations: Dict[str, List[str]])` - Excludes the field pairs from matching, and every field pair not mentioned is automatically included in the matching. - Calling this API is not mandatory since there is a preconfigured default profile field relations matrix present in the matchmaking service. - This API must be called in case there are new profile fields which were not present before. If the profile fields are added then after calling this API and waiting for it to complete, reindex_all_profiles() API needs called in sequence. - The API operates a-synchronously and might take time to compute. ```python= @api.post("configure-profile-field-relation-exclusions") def configure_profile_feld_relation_exclusions( str_profiles_fields=List[str], # "id" field not listed here tag_profiles_fields=List[str], relations={ "offering": ["offering", ...], # counts as exclusion "looking_for": ["looking_for", ...], "summary": ["giving_pledges", "organisational_affiliations", ... ] }): pass ``` - `reindex_all_profiles(profiles_fields: List[Profile])` - - Reindexes all profiles. - All published profiles must be sent to this API, including the ones who have opted out from matchmaking. This is because the profiles that have opted out of matchmaking should still appear in the search results. - - Each profile must contain a field with key "id", which must correspond to the profile id. - The API operates a-synchronously and takes time to compute. ```python= class Profile(Schema): offering: Optional[str] looking_for: Optional[str] summary: Optional[str] tags_organisational_affilications: List[str] # include all fields profiles_fields=[ { "id": "abc", "offering": "help", "summary": "i am an ea fan", "organisational_affilications": [Str], # human readable tags, not slugs "x": "" # it is going to be ignored if undeclared using configure_profile_feld_relation* }, { "id": "xyz" "offering": "help", "summary": "i am an ea fan", "organisational_affilications": [Str] } # keys should be contained configure_profile_feld_relation* profiles_fields argument ] ``` - `add_or_update_profile(profile_fields_dict: Profile)` - Updates one profile without the need to reindex all profiles. - In initial implementation this operation might still take as long as reindexing all profiles, but it will be significantly faster later. - All published profiles must be sent to this API, including the ones who have opted out from matchmaking. This is because the profiles that have opted out of matchmaking should still appear in the search results. - - The profile_fields_dict must contain a field with key "id", which must correspond to the profile id. - The API operates synchronously but should not be called in too tight loop if called in bulk since it is still relatively compute intensive and would otherwise slow down any concurrent search queries. - delete_profile(profile_id: str) - Deletes a profile from the list of indexed profiles without the need to reindex all profiles. - In initial implementation this operation might still take as long as reindexing all profiles, but it will be significantly faster later. - The API operates synchronously but should not be called in too tight loop if called in bulk since it is still relatively compute intensive and would otherwise slow down any concurrent search queries. #### search (conceptually using a "greedy optimisation" approach): - find_similar_profiles_by_text(search_text: str, num_results = 10, diversify_top_n: int = None, randomise_equal_results = True) -> Tuple[List[str], List[float], List[float]] - Finds similar profiles using a semantic search, given a single text field, which is matched to all profile fields of profiles in the index. - During this search, the profile field relations matrix IS NOT applied since the input text is assumed to be a generic text field, not a particular type of a profile field. - `diversify_top_n` (default = `num_results // 2`, maximum allowed value = 10) argument tries to reduce the redundancy of results in top_n results while at the same time maintaining query relevance of these results. It selects the final results list according to a combined criterion of query relevance and novelty of information. Maximum allowed value for this argument is currently 10. - `randomise_equal_results`: some profiles can theoretically be equal in terms of query matching (especially if value of `diversify_top_n` is small), and if this parameter is `True` then each search can return random order among the equal entries, ensured by the service. - The API operates synchronously (the signature is async for python coroutine purposes, but the API itself is fast from the user's perspective). - Cannot be called until reindex_all_profiles() has completed. ##### Randomised (randomise_equal_results = True): Try 1: Search: AI Safety - Profile A ("AI Safety") - Profile B ("AI Safety") Try 2: Search: AI Safety - Profile B ("AI Safety") - Profile A ("AI Safety") Non Randomised (randomise_equal_results = False): Try 1: Search: AI Safety - Profile A ("AI Safety") - Profile B ("AI Safety") Try 2: Search: AI Safety - Profile A ("AI Safety") - Profile B ("AI Safety") --- - find_similar_profiles_by_profile(profile_fields_dict: Dict[FieldName, str], num_results = 10, diversify_top_n: int = None, randomise_equal_results = True) -> Tuple[List[str], List[float], List[float]] - Finds similar profiles using a semantic search, given field data of a profile. - The profile whose field data is provided to this search DOES NOT NEED to be previously indexed. This function helps searching similar profiles when a new profile was just added/updated and indexing of this new profile has not yet been complete. - Therefore using this API is preferred / recommended instead of using find_similar_profiles_by_id() API below. - The profile_fields_dict must still contain a field with key "id", which must correspond to the profile id. This is necessary in order to avoid the profile itself appearing in the search results in case the profile existed previously in the index. - During this search, the profile field relations matrix IS applied. - `diversify_top_n` (default = `num_results // 2`, maximum allowed value = 10) argument tries to reduce the redundancy of results in top_n results while at the same time maintaining query relevance of these results. It selects the final results list according to a combined criterion of query relevance and novelty of information. Maximum allowed value for this argument is currently 10. - `randomise_equal_results`: some profiles can theoretically be equal in terms of query matching (especially if value of `diversify_top_n` is small), and if this parameter is `True` then each search can return random order among the equal entries, ensured by the service. - The API operates synchronously (the signature is async for python coroutine purposes, but the API itself is fast from the user's perspective). - Cannot be called until reindex_all_profiles() has completed. - find_similar_profiles_by_id(profile_id: str, num_results = 10, diversify_top_n: int = None, randomise_equal_results = True) -> Tuple[List[str], List[float], List[float]] - Finds similar profiles using a semantic search, given an id of a previously indexed profile. The indexing of the given profile must be complete by the time this function is called, else the results would be stale. - find_similar_profiles_by_profile() is preferred / recommended instead of using find_similar_profiles_by_id() API. - During this search, the profile field relations matrix IS applied. - `diversify_top_n` (default = `num_results // 2`, maximum allowed value = 10) argument tries to reduce the redundancy of results in top_n results while at the same time maintaining query relevance of these results. It selects the final results list according to a combined criterion of query relevance and novelty of information. Maximum allowed value for this argument is currently 10. - `randomise_equal_results`: some profiles can theoretically be equal in terms of query matching (especially if value of `diversify_top_n` is small), and if this parameter is `True` then each search can return random order among the equal entries, ensured by the service. - The API operates synchronously (the signature is async for python coroutine purposes, but the API itself is fast from the user's perspective). - Cannot be called until reindex_all_profiles() has completed. #### matchmaking (conceptually using a "global optimisation" approach) - compute_matches_for_date_range(max_matches_per_profile_dict: Dict[str, int], date_from: datetime, exclusive_date_to: datetime) - Executes a global optimisation algorithm for generating a matchmaking plan for a planning period (for example, a month). - For each profile an integer parameter must be provided which specifies how many matches this profile is willing to receive during the planning period. - If some profile has opted out of matchmaking, then that parameter value for this profile must be zero. - The planning period should be a longer time period (for example, a month) in order to get more optimal schedule. This is because different people might prefer different number of introductions and considering that requires the planning algorithm to have a bigger perspective view as well as the possibility to arrange the schedule for that longer time period. - Running the planning algorithm too frequently would result in less optimal matches or less fairness when considered from a longer time perspective. - During matchmaking, the profile field relations matrix IS applied. - The API operates a-synchronously and takes time to compute. - Cannot be called until reindex_all_profiles() has completed. - get_scheduled_matches_for_date(date: datetime) -> List[Tuple[str, str, float, float]] - Reads the generated matchmaking schedule for a particular date and returns matches that were planned for this date. - The API operates synchronously (the signature is async for python coroutine purposes, but the API itself is fast from the caller's perspective). - Cannot be called until compute_matches_for_date_range() has completed. ## Design proposal See https://docs.google.com/spreadsheets/d/17fHdFypIaQmlUeiXPcMkOjH9VKvM38pr1ZBfHEMUIHI/edit?usp=sharing for a graphical view of the proposal. - Personal profile view: - [Profile fields list] - Toggle button: Opt in to matchmaking service - Integer field: Number of matchmaking introductions I am willing to receive per month: [1-10] - Button: "Find other profiles semantically similar to my profile" - Results view: Top N similar profiles - Optional button: "Show more similar profiles" - Profile view of some other user: - [Profile fields list] - Button: "Find other profiles semantically similar to this profile" - Results view: Top N similar profiles - Optional button: "Show more similar profiles" - Profiles list: - [Profiles list] - Text search field: "Search text" - Button: "Find profiles semantically relating to given keywords..." - Results view: Top N found profiles - Optional button: "Show more search results"