Location Search Engine Optimization for Ride-Sharing Services: Database Expansion, Fuzzy Search, and Self-Improving Syst

# Location Search Engine Optimization for Ride-Sharing Services: Database Expansion, Fuzzy Search, and Self-Improving Systems Hi there! Welcome to my first technical article! I'm Aaron, currently working as a backend development intern at Ridey, where we're building a systematic ride-sharing platform in Taiwan. My motivation for writing technical articles is to document my thought processes and implementation logic while completing tasks. Learning by sharing! ## Chapter 1: System Architecture Overview ### 1.1 Product Requirements and Expectations The "search pickup and drop-off locations" feature is absolutely crucial for a ride-sharing company. A smooth and excellent UX can even increase the conversion rate of users creating new orders. Therefore, having a sufficiently powerful location search engine is a professional demonstration of Ridey's ride-sharing service capabilities. For this location search engine, we expect it to achieve the following effects: - Predict user input to display available locations for our service - Prioritize transportation hubs (subway or train stations) as popular pickup/drop-off locations - Support synonyms and alternative terms, such as "臺北車站", "台北車站", or "北車" (Taipei Station variations) - Allow word order variations, such as "中山捷運" or "捷運中山" (Zhongshan MRT variations) - Enable the system to gather feedback from actual user operations and gradually optimize ### 1.2 Current System Overview When it comes to location search functionality, Google Maps actually provides a [Text Search API](https://developers.google.com/maps/documentation/places/web-service/text-search?hl=zh-tw#text-search-requests) that can be used directly. However, relying directly on such third-party services might present several issues: 1. **Cost**: As our user base grows, we'll likely spend considerable money on this API each month 2. **Order Management**: If we don't have location-related data in our database, developing and maintaining order-related features becomes more complex 3. **System Performance**: Making network requests to third-party services every time is time-consuming and not UX-friendly Therefore, building our own database becomes a more reasonable solution, addressing all three issues above. Currently, Ridey's backend uses MongoDB with the Mongoose ODM framework to establish a self-maintained database. For database construction, we still utilize Google Maps' [Text Search API](https://developers.google.com/maps/documentation/places/web-service/text-search?hl=zh-tw#text-search-requests) and [Nearby Search API](https://developers.google.com/maps/documentation/places/web-service/nearby-search?hl=zh-tw#SearchNearbyRequests). While this still incurs additional costs, this one-time expense is clearly much more reasonable compared to continuously having users request third-party services. Currently, the database contains over 4,000 location records covering all of Taiwan. Next, regarding the logic for searching locations in the database, we directly employ two regular expressions targeting location names and addresses respectively. Here's appropriately simplified sample code: ```javascript= const placeSearch = async (place) => { // Search location names const resultsFromDisplayedName = await Place.find({ name: { $regex: place, $options: 'i' }, }) .limit(10); // Search addresses const resultsFromAddress = await Place.find({ address: { $regex: place, $options: 'i' }, }) .limit(10); // Merge into same array and deduplicate const results = [...resultsFromDisplayedName, ...resultsFromAddress]; return results.filter((result, index) => results.indexOf(result) === index); }; ``` Above, the current system already satisfies part of our expectations for a location search engine, as it has been running in production for some time. However, the UX still needs optimization, which is the main focus of my work this time. Currently, we have achieved: - [x] Predict user input to display available locations for our service - [ ] Prioritize transportation hubs (subway or train stations) as popular pickup/drop-off locations - [ ] Support synonyms and alternative terms, such as "臺北車站", "台北車站", or "北車" - [ ] Allow word order variations, such as "中山捷運" or "捷運中山" - [ ] Enable the system to gather feedback from actual user operations and gradually optimize ## Chapter 2: Dataset Expansion ### 2.1 The Story Behind 20,000 Records Since ride-sharing services primarily use cars as vehicles, compared to other large public transportation options, pickup and drop-off locations are actually more flexible and don't necessarily have to be transportation hubs. Therefore, the current 4,000+ location records in the database can be further expanded. In this project, we plan to expand it to 20,000 records at once, providing users with more diverse and flexible pickup/drop-off location options to ensure sufficiently convenient communication experiences and accurate ride-sharing fare calculations. It's worth noting the database expansion method. We first defined a `taiwanLocations.js` file to provide various cities and districts in Taiwan. ##### `taiwanLocations.js` ```javascript= module.exports = [ // North - Taipei City { region: '北', city: 'TPE', district: '中正區', lat: 25.0324, lng: 121.5201 }, { region: '北', city: 'TPE', district: '南港區', lat: 25.0550, lng: 121.6070 }, { region: '北', city: 'TPE', district: '內湖區', lat: 25.0835, lng: 121.5923 }, ... // North - New Taipei City // North - Keelung City // More other counties and cities...... ``` Simultaneously, when fetching locations from Google Maps API, we also made some specializations for ride-sharing services in terms of types, defining them in `placeTypes.js`. ##### `placeTypes.js` ```javascript= batches: [ // Batch 1: Transportation-related ['subway_station', 'train_station', 'bus_station', 'airport', 'parking', 'gas_station', 'taxi_stand', 'light_rail_station'], // Batch 2: Educational institutions ['university', 'secondary_school', 'primary_school', 'school', 'library'], // Batch 3: Dining and accommodation // More other batches... ``` After everything was ready, we could start the main database expansion logic. ##### `expandDatabase` Main Function ```javascript= /** * @param {number} targetNewPlaces - Target number of new places to add * @returns {Promise<object>} Expansion results */ const expandDatabase = async (targetNewPlaces) => { console.log(`Starting database expansion, target new additions: ${targetNewPlaces} records`); const allProcessedPlaceIds = new Set(); const processedLocations = []; let totalAdded = 0; // Get existing placeId set to avoid duplicate additions const existingPlaceIds = new Set(); const existingPlaces = await Place.find({}, { placeId: 1 }); existingPlaces.forEach(place => existingPlaceIds.add(place.placeId)); console.log(`${existingPlaceIds.size} records already exist in database`); // Process each region for (const location of taiwanLocations) { if (totalAdded >= targetNewPlaces) { console.log(`Target reached: ${totalAdded}/${targetNewPlaces}`); break; } console.log(`Processing: ${location.city} ${location.district}`); const locationStartCount = allProcessedPlaceIds.size; try { // Search using different type combinations and radii for (const typeBatch of placeTypes.batches) { for (const radius of [2000, 4000, 6000]) { if (totalAdded >= targetNewPlaces) break; // Nearby Search API const places = await fetchPlaceByCoordinatesWithRadius( location.lat, location.lng, radius, typeBatch, 20 // Number of search results per request ); if (!places || places.length === 0) continue; // Format data to match DB Schema const formattedPlaces = formatPlaceData(places); const newPlaces = formattedPlaces.filter(place => !existingPlaceIds.has(place.placeId) && !allProcessedPlaceIds.has(place.placeId) ); if (newPlaces.length > 0) { // Add to database with parallel processing await createPlaceDataInBatches(newPlaces, 100); // Update existing placeId set newPlaces.forEach(place => { allProcessedPlaceIds.add(place.placeId); existingPlaceIds.add(place.placeId); }); totalAdded += newPlaces.length; console.log(`${location.city} ${location.district}: Added ${newPlaces.length} records (radius: ${radius}m, types: ${typeBatch.join(',')})`); } // Avoid API Rate Limit await new Promise(resolve => setTimeout(resolve, 1000)); } } const locationAddedCount = allProcessedPlaceIds.size - locationStartCount; processedLocations.push({ ...location, addedCount: locationAddedCount }); console.log(`Completed ${location.city} ${location.district}: Added ${locationAddedCount} records total`); console.log(`Overall progress: ${totalAdded}/${targetNewPlaces} (${(totalAdded/targetNewPlaces*100).toFixed(2)}%)`); // Add appropriate delay buffer between regions await new Promise(resolve => setTimeout(resolve, 2000)); } catch (error) { console.error(`Error occurred while processing ${location.city} ${location.district}:`, error); // Continue processing next region } } console.log(`Database expansion completed: Added ${totalAdded} records total`); return { addedCount: totalAdded, processedLocations }; }; ``` So far, we're still optimizing and completing existing functionality (displaying available service locations). However, during the database expansion phase, we can actually make further optimizations to achieve the effect of "prioritizing transportation hubs (subway or train stations) as popular pickup/drop-off locations." ### 2.2 Optimizing Sorting Algorithm There are many ways to achieve this, such as caching or data analysis. However, I chose a relatively straightforward approach here: adding a "Weight" field to the database schema to represent the importance level of each location record in our service. This metric can elegantly achieve the expectation of "prioritizing transportation hubs" for the search algorithm. For example, when users search for "Ximending," what appears would be "Ximending MRT Station" or "Ximending" rather than irrelevant places like "Ximending Professional Nail Salon." Similarly, the location weight rules are directly defined in `placeTypes.js`. ##### `placeTypes.js` ```javascript= weights: { // Transportation hubs (9-10 points) 'subway_station': 10, 'train_station': 10, 'airport': 10, // ... // Commercial & Shopping (7-9 points) // Educational institutions (5-8 points) // ...... }, // Default weight defaultWeight: 2 ``` ##### `updateExistingPlacesWeight` Main Function ```javascript= const updateExistingPlacesWeight = async () => { console.log('Starting to update existing place weights'); const batchSize = 500; let processedCount = 0; let totalCount = await Place.countDocuments(); let cursor = Place.find().cursor(); let batch = []; for (let place = await cursor.next(); place != null; place = await cursor.next()) { const newWeight = assignWeightByPlaceTypes(place.types); batch.push({ updateOne: { filter: { _id: place._id }, update: { $set: { weight: newWeight } } } }); if (batch.length >= batchSize) { await Place.bulkWrite(batch); processedCount += batch.length; console.log(`Updated ${processedCount}/${totalCount} records (${(processedCount/totalCount*100).toFixed(2)}%)`); batch = []; // Avoid overly frequent operations await new Promise(resolve => setTimeout(resolve, 100)); } } // Process remaining data if (batch.length > 0) { await Place.bulkWrite(batch); processedCount += batch.length; } console.log(`Weight update completed: Processed ${processedCount} records total`); return { updatedCount: processedCount }; function assignWeightByPlaceTypes(types) { // If no types, return default weight if (!types || types.length === 0) return placeTypes.defaultWeight; // Find the highest weight type as the place weight let maxWeight = placeTypes.defaultWeight; for (const type of types) { const typeWeight = placeTypes.weights[type] || placeTypes.defaultWeight; if (typeWeight > maxWeight) { maxWeight = typeWeight; } } return maxWeight; }; }; ``` Thus, our Todo-list has completed the first two items: - [x] Predict user input to display available locations for our service - [x] Prioritize transportation hubs (subway or train stations) as popular pickup/drop-off locations - [ ] Support synonyms and alternative terms, such as "臺北車站", "台北車站", or "北車" - [ ] Allow word order variations, such as "中山捷運" or "捷運中山" - [ ] Enable the system to gather feedback from actual user operations and gradually optimize ## Chapter 3: Fuzzy Search Obviously, in this chapter, we want to achieve the third and fourth items in our Todo-list: - [ ] Support synonyms and alternative terms, such as "臺北火車站", "台北火車站", or "北車" - [ ] Allow word order variations, such as "中山捷運" or "捷運中山" These types of functionality can be collectively called Fuzzy Search. The most intuitive solution would actually be Elasticsearch. However, its support for Traditional Chinese is really inadequate. I recommend "[GuGu's Backend Notes - What is Elasticsearch? Understanding the World's Most Powerful Full-Text Search Tool!](https://kucw.io/blog/elasticsearch-intro/#%E8%A3%9C%E5%85%85%E5%8F%8D%E5%90%91%E7%B4%A2%E5%BC%95%E7%9A%84%E5%AF%A6%E6%88%B0%E7%B6%93%E9%A9%97%E5%88%86%E4%BA%AB)" article, whose practical experience aligns with my experimental findings using MongoDB geoSearch, so I won't adopt the most intuitive but ineffective solution here. However, I used a rather innovative approach here! That is, using Google Gemini AI to create "synonym fields" for the location database. This way, whether it's word order, segmentation, synonyms, alternative terms, or abbreviations commonly used by Taiwanese users can all be solved at once. As long as the database has sufficient and high-quality data to predict user input, we can simply use regular expressions to fetch corresponding locations! #### `addSynonymsToExistingPlaces` Main Function ```javascript= const addSynonymsToExistingPlaces = async () => { const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }); console.log('Starting to add synonyms field to existing places...'); try { // Initialize synonyms field await Place.updateMany( { synonyms: { $exists: false } }, { $set: { synonyms: [] } } ); // Not all places need synonyms const places = await Place.find({ types: { $in: placeTypes.typesWithSynonyms } }); console.log('Total', places.length, 'records need processing'); const BATCH_SIZE = 10; let processedCount = 0; // Use Gemini AI to generate synonyms for (let i = 0; i < places.length; i += BATCH_SIZE) { const batch = places.slice(i, i + BATCH_SIZE); await Promise.all(batch.map(async (place) => { try { const response = await ai.models.generateContent({ model: "gemini-2.0-flash-lite", contents: "list the synonyms for " + place.name + " in #zh-TW and use the JSON Schema: { synonyms: [string] }. Do not response anything else except the JSON Schema.", }); const cleanedText = response.text.replace('```json', '').replace('```', ''); const parsedSynonyms = JSON.parse(cleanedText); place.synonyms = parsedSynonyms.synonyms; await place.save(); } catch (error) { console.error(`Error occurred while processing place ${place.name}:`, error); } })); processedCount += batch.length; const percentage = (processedCount / places.length * 100).toFixed(1); console.log(`Processed ${percentage}% of data (${processedCount}/${places.length})`); } return { success: true, message: 'Adding synonyms field completed', places: places.length }; } catch (error) { console.error('Error occurred while adding synonyms field:', error); return { success: false, message: 'Adding synonyms field failed', error: error.message }; } }; ``` Thus, our Todo-list has only the last item remaining: - [x] Predict user input to display available locations for our service - [x] Prioritize transportation hubs (subway or train stations) as popular pickup/drop-off locations - [x] Support synonyms and alternative terms, such as "臺北車站", "台北車站", or "北車" - [x] Allow word order variations, such as "中山捷運" or "捷運中山" - [ ] Enable the system to gather feedback from actual user operations and gradually optimize ## Chapter 4: User-Driven Self-Improving System Although we always hope to design systems as perfectly as possible, user behavior patterns are ultimately unpredictable. As in the above scenarios, if users search for relatively obscure locations or other rare terms for places, there will inevitably be situations where the database lacks relevant data or regular expressions can't retrieve results. To avoid the terrible situation of returning empty values, as a last resort, we would still fetch data from the Text Search API. However, we can indeed record this process, so as users continue using the system, the database locations will become more numerous and complete, forming a user-driven self-improving system. ```javascript= const placeSearch = async (place) => { // parallel query const [resultsFromDisplayedName, resultsFromAddress, resultsFromSynonyms] = await Promise.all([ Place.find({ name: { $regex: place, $options: 'i' }, }) .sort({ weight: -1 }) .limit(10), Place.find({ address: { $regex: place, $options: 'i' }, }) .sort({ weight: -1 }) .limit(10), Place.find({ synonyms: { $regex: place, $options: 'i' }, }) .sort({ weight: -1 }) .limit(10) ]); const results = [...resultsFromDisplayedName, ...resultsFromAddress, ...resultsFromSynonyms]; // If regular expressions can't retrieve data or database lacks relevant data if (results.length === 0) { try { // Request Text Search API const responsePlace = await fetchPlaceByQuery(place); const formattedPlace = await formatPlaceDataByQuery(responsePlace); await savePlaceByQuery(formattedPlace); return formattedPlace; } catch (error) { console.log(error); return []; } } return results.filter((result, index) => results.indexOf(result) === index); }; ``` Above, the Todo-list has been completely finished: - [x] Predict user input to display available locations for our service - [x] Prioritize transportation hubs (subway or train stations) as popular pickup/drop-off locations - [x] Support synonyms and alternative terms, such as "臺北車站", "台北車站", or "北車" - [x] Allow word order variations, such as "中山捷運" or "捷運中山" - [x] Enable the system to gather feedback from actual user operations and gradually optimize ## Chapter 5: Conclusions and Results For example, if you search for "北市大博愛" now, the system can successfully match "臺==北市==立==大==學==博愛==校區" (Taipei Municipal University Boai Campus), and it also appears as the first search result. The effect is quite excellent! ### 5.1 System Optimization Results Through this system optimization, Ridey's location search engine achieved significant improvements: 1. **Database Expansion**: Expanded location data from the original 4,000+ records to 20,000 records, dramatically increasing available location coverage. Users can now search for more diverse pickup and drop-off locations. 2. **Location Weight System**: By introducing a weight mechanism, the system can now prioritize displaying important locations like transportation hubs, making search results more aligned with users' actual needs. 3. **Fuzzy Search Capability**: Using Google Gemini AI to generate synonyms successfully resolved issues with synonyms, alternative terms, and word order variations. Users can now use "北車" to find "臺北車站" (Taipei Station), use "捷運中山" to find "中山捷運站" (Zhongshan MRT Station), significantly improving UX. 4. **Self-Improving Mechanism**: By recording and saving queries when users can't find results, the system can automatically learn from Google Maps API and store new location data, becoming more intelligent and complete as usage increases. ### 5.2 Actual Search Effect Testing | Search Keywords | Before Optimization | After Optimization | | --------------- | ------------------- | ------------------ | | 松菸 | 無結果 | 松山文創園區服務中心 | | 臺大 | 新竹臺大分院 | 國立臺灣大學 | | 國館 | 無結果 | 國父紀念館 | ### 5.3 Future Optimization Directions Although the current system already performs well, there are still some possible optimization directions: 1. **Geographic Location Priority**: Combine users' current location to prioritize displaying nearby places, further improving search relevance. 2. **Automatic Synonym Updates**: Periodically use AI to update the location synonym database to adapt to emerging naming conventions. ### 5.4 Summary This location search engine optimization was an important project I was responsible for during my internship at Ridey. From database expansion, weight system establishment to fuzzy search and self-improving mechanism implementation, the entire process gave me deep understanding of search engine operating principles and optimization approaches. Particularly noteworthy is that we chose to combine traditional database technology with modern AI tools to solve problems. This hybrid approach maintains system performance while dramatically improving search intelligence. Compared to purely using Elasticsearch and other full-text search solutions, our approach is more suitable for location search needs in Traditional Chinese environments. As Ridey's user base grows, this self-improving search system will become increasingly intelligent, providing users with smoother and more intuitive ride-sharing experiences, thereby improving overall platform usage rates and order conversion rates. Finally, I want to express my sincere gratitude to Ridey's CTO [Yi-Tse Shih](https://www.linkedin.com/in/yitse-shih/) for continuously guiding me throughout the development process. I hope this technical article can be helpful to other developers facing similar challenges. If you have any questions or suggestions, please feel free to leave comments for discussion! Thank you for reading my article, see you next time!