Do you have an ever-growing product catalog wherein the products need to be searched as free text, filtered, sorted faster so that the customers visiting the site get a seamless and easy search experience?
Well, when customers shop online, they are in an urge to zero down on the product that they need as quickly as possible.
Time is valuable!
If they do not find the right product quickly enough they do not hesitate to move over to the next best eCommerce store, as it is that easy. Making the buying process simple, holding onto the visitors, and converting them into potential customers is always a challenge.
In the same way, one of the customers came to us for developing an eCommerce web application with a scalable eCommerce search solution, this is how we solved their problem.
Here is a step by step process we followed to solve their problem:
Stuck to RDBMS!
Initial thoughts – we use a relational database to save the product data and build queries to filter the data. But, with relational databases writing dynamic queries and getting optimum performance from the queries is difficult and time-consuming. Also, with the product catalog increasing regularly, foreseeing the ever-growing list of different properties being added to each product type was always going to be a herculean task.
Is NoSQL the right alternative for ever-growing product catalog?
Yes for sure. With no schema restriction, NoSQL alternative such as MongoDB seems to be the right solution. But, the performance efficiency of the filtered result depends on how the query is formed, and a number of factors such as shards, replica sets, data nodes are taken into consideration to deliver performance. As the scalability requirements go up we need to scale the database as per the above parameters. Also, how do I get free text search implemented, usually some search platform based on Lucene search is ideal.
Is there an easier search solution based on Lucene?
Yes, there is a very good alternative search solution i.e. ElasticSearch which is based on Lucene. ElasticSearch provides distributed, a multitenant capable full-text search engine that can take schema-free JSON documents.
But similar questions does it scale on its own?
No. Indices have to be created and divided into shards with each shard having one or more replicas. Infrastructure has to be managed.
Solution
Is there something available online that I can just use without worrying about infrastructure?
Yes indeed – Azure Search. Wiki Gyan – Azure Search is a cloud search-as-a-service solution that delegates server and infrastructure management to Microsoft, leaving you with a ready-to-use service that you can populate with your data and then use to add search to your web or mobile application.
This exactly fits my requirement where we can have the product catalog uploaded to Azure Search. Also, more importantly, the eCommerce web application we created was in ASP.NET MVC deployed in Azure Web Apps that does mean integration and management from Azure Portal becomes easier
How do I get started?
Provision Search service in the Azure Portal likes any other service. Choosing the right region for the data center was more important primarily taken into consideration the proximity to the eCommerce Web App provisioned, so assigned the same region.
What next?
Create an index for data i.e. product catalog with all the possible fields. What is an index? Azure documentation – An index is a persistent store of documents and other constructs used by an Azure Search service. A document is a single unit of searchable data in your index i.e. JSON. As in database or JSON, there are properties/fields wherein the field datatypes should be as per the specifications for Azure Search. Refer to this URL for more details on Index – //azure.microsoft.com/en-us/documentation/articles/search-what-is-an-index/
There are alternative techniques to create an index, we used Azure Search .NET SDK to create an index. The steps are pretty simple
1. Download the Azure Search .NET SDK NuGet package by using “Manage NuGet Packages” in Visual Studio. Just search for the package name Microsoft.Azure.Search on NuGet.org
2. Initialize object for SearchServiceClient providing the right search service name given in the Azure portal and apikey
SearchServiceClient serviceClient = new SearchServiceClient(searchServiceName, new SearchCredentials(apiKey));
3. Create an index using the service client and field definition as in the given example
var definition = new Index() { Name = "hotels", Fields = new[] { new Field("hotelId", DataType.String) { IsKey = true }, new Field("hotelName", DataType.String) { IsSearchable = true, IsFilterable = true }, new Field("baseRate", DataType.Double) { IsFilterable = true, IsSortable = true }, new Field("category", DataType.String) { IsSearchable = true, IsFilterable = true, IsSortable = true, IsFacetable = true }, new Field("city", DataType.String) { IsSearchable = true, IsFilterable = true, IsSortable = true, IsFacetable = true }, new Field("tags", DataType.Collection(DataType.String)) { IsSearchable = true, IsFilterable = true, IsFacetable = true }, new Field("parkingIncluded", DataType.Boolean) { IsFilterable = true, IsFacetable = true }, new Field("lastRenovationDate", DataType.DateTimeOffset) { IsFilterable = true, IsSortable = true, IsFacetable = true }, new Field("rating", DataType.Int32) { IsFilterable = true, IsSortable = true, IsFacetable = true }, new Field("location", DataType.GeographyPoint) { IsFilterable = true, IsSortable = true } } }; serviceClient.Indexes.Create(definition);
4. Populate the Index with data – This step requires uploading documents as per the index fields specified.
As per the example below-
Specify a class as per the index specifications
[SerializePropertyNamesAsCamelCase] public class Hotel { public string HotelId { get; set; } public string HotelName { get; set; } public string City { get; set; } public double? BaseRate { get; set; } public string Category { get; set; } public string[] Tags { get; set; } public bool? ParkingIncluded { get; set; } public DateTimeOffset? LastRenovationDate { get; set; } public int? Rating { get; set; } public GeographyPoint Location { get; set; } }
Create a method to add the documents to the index
private void AddDocuments(SearchIndexClient indexClient) { SearchIndexClient indexClient = serviceClient.Indexes.GetClient("hotels"); var documents = new Hotel[] { new Hotel() { HotelId = "1058-441", HotelName = "Fancy Stay", City = "Hyderabad", BaseRate = 199.0, Category = "Luxury", Tags = new[] { "pool", "view", "concierge" }, ParkingIncluded = false, LastRenovationDate = new DateTimeOffset(2010, 6, 27, 0, 0, 0, TimeSpan.Zero), Rating = 5, Location = GeographyPoint.Create(47.678581, -122.131577) }, new Hotel() { HotelId = "666-437", HotelName = "Roach Motel", City = "Mumbai", BaseRate = 79.99, Category = "Budget", Tags = new[] { "motel", "budget" }, ParkingIncluded = true, LastRenovationDate = new DateTimeOffset(1982, 4, 28, 0, 0, 0, TimeSpan.Zero), Rating = 1, Location = GeographyPoint.Create(49.678581, -122.131577) }, new Hotel() { HotelId = "970-501", HotelName = "Econo-Stay", City = "Hyderabad", BaseRate = 129.99, Category = "Budget", Tags = new[] { "pool", "budget" }, ParkingIncluded = true, LastRenovationDate = new DateTimeOffset(1995, 7, 1, 0, 0, 0, TimeSpan.Zero), Rating = 4, Location = GeographyPoint.Create(46.678581, -122.131577) }, new Hotel() { HotelId = "956-532", HotelName = "Express Rooms", City = "Delhi", BaseRate = 129.99, Category = "Budget", Tags = new[] { "wifi", "budget" }, ParkingIncluded = true, LastRenovationDate = new DateTimeOffset(1995, 7, 1, 0, 0, 0, TimeSpan.Zero), Rating = 4, Location = GeographyPoint.Create(48.678581, -122.131577) }, new Hotel() { HotelId = "566-518", HotelName = "Surprisingly Expensive Suites", City = "Mumbai", BaseRate = 279.99, Category = "Luxury", ParkingIncluded = false } }; try { var batch = IndexBatch.Upload(documents); indexClient.Documents.Index(batch); } catch (IndexBatchException e) { // Sometimes when your Search service is under load, indexing will fail for some of the documents in // the batch. Depending on your application, you can take compensating actions like delaying and // retrying. For this simple demo, we just log the failed document keys and continue. Console.WriteLine( "Failed to index some of the documents: {0}", String.Join(", ", e.IndexingResults.Where(r => !r.Succeeded).Select(r => r.Key))); } // Wait a while for indexing to complete. Thread.Sleep(2000); }
At a high level following steps are performed in the above operation,
- Create an array of documents as per the index structure
- Instantiate the IndexClient with reference to the Index name created for search
- Call Upload method of IndexBatch class with input documents
- Upload the batch into Index using IndexClient
- Search documents
What kind of search is needed depends on the Search UseCase that needs to be accomplished.
UseCase 1: Suggestions – Wherein the user types in some text in the site and suggestions are provided to the users based on the text input for quick navigation, Suggestions class can be used
Please refer to this link for more REST API reference – //msdn.microsoft.com/en-us/library/azure/dn798936.aspx
UseCase 2: Generic Search – User types in text, searches, and then want to filter the results.
In such cases, the Search Method can be used with a filter
private DocumentSearchResult<Hotel> SearchData(SearchIndexClient indexClient, string searchText, string filter = null) { // Execute search based on search text and optional filter var sp = new SearchParameters(); if (!String.IsNullOrEmpty(filter)) { sp.Filter = filter; } DocumentSearchResult<Hotel> response = indexClient.Documents.Search<Hotel>(searchText, sp); return response; } }
Note: filter uses OData syntax, so say for example the user wants to search for any hotel with an added filter for City name as “Mumbai†the call would be as follows
SearchData(indexClient, searchText: "*", filter: "city eq 'Mumbai'");
As in the above example, we built a comprehensive library to accommodate all the possible filters and corresponding queries for the same in our application
Challenges
Developing an application cannot be accomplished without any challenges.
1. Re-Populating/Updating data regularly
Writing code and repopulating the data becomes tedious every time the data changes or gets added. Syncing data and keeping data up to date was a challenge. With Azure, we can integrate directly with some common data sources, removing the need to write code to index your data. To set this up, you can call the Azure Search API to create and manage indexers and data sources.
Setting up automatic indexing is typically a four-step process:
- Identify the data source that contains the data that needs to be indexed. Keep in mind that Azure Search may not support all of the data types present in your data source. See Supported data types (Azure Search) for the list.
- Create an Azure Search index whose schema is compatible with your data source.
- Create an Azure Search data source A as described in Create Data Source (Azure Search Service REST API).
- Create an Azure Search indexer as described in Create Indexer (Azure Search Service REST API).
As mentioned in the above step, we created a view that would return a list of products with all possible properties for that product. We set up an indexer that runs every 12 hours to make sure the index data is up to date
2. Query Performance
When multiple performance tests were run, we found that the query performance especially response time was bad as we increased the load. Again, Azure documentation makes sure that the search limits are defined which defines 15 queries per second for S1 instance per replica.
//azure.microsoft.com/en-in/documentation/articles/search-limits-quotas-capacity/
Increasing the Search units and adding replicas helped us to improve the query performance for a higher user workload.
The following link gives more insights
//azure.microsoft.com/en-in/documentation/articles/search-manage/
We also implemented Cache Aside pattern using Azure Redis to make sure we reduce calls to Azure search for repeated search filters
//en.blog.gbellmann.technology/2015/05/11/improving-performance-with-azure-search-and-redis-cache/
Benefits of Managed Search
- Fully Managed Service – End to end search solution which requires zero infrastructure management. Integrated into Azure portal, Azure Search can be provisioned pretty easily
- Availability – Microsoft guaranteeing an SLA 99.9% that does mean highly reliable and resilient
- Scalability – can be easily scaled in horizontal and vertical dimensions to handle more document storage, higher query loads, or both based on the search needs
- Sorting and Paging – Sorting can be done on Multiple fields and also specified at query time. Paging can be done to get a limited set of results based on the search query.
- Scoring – Get the search results based on relevance which can be defined by the marketing team. For example, you might want newer products or discounted products to appear higher in the search results
- Faceted Search – Get category wise count for search enabling the user to drill-down and filter search results
- Search Explorer – Check the queries against the indexes for refining it, before it can be used in the application
Very successful companies realize how important search is to engage their user’s and drive Sales. They continuously invest in search technology to optimize both their user experience and promote their products at the same time.
But what about you? Perhaps you think that as a small, medium, or even large online merchant, you have neither the funds nor people to maintain an Amazon-like search engine on your website? Implementing Azure Search can help to build a scalable search solution quickly, also allowing to scale search independently as per search needs, at the same time improve the performance of the site.
With this addition, you can improve on your site search, engage your visitors, and increase conversion rates on your site.