How to create an Azure Search index with Cognitive Services using C#

For project I’m working on, I had a need to create an Azure Search index from code because indexes cannot be rebuilt via the Azure Portal. After doing a tremendous amount of research of examples of creating an index from code, I couldn’t find an example that documented how to implement one with Cognitive Services. The steps of the process are:

  1. Create a data source (I’ll be using CosmosDB)
  2. Create a suggester (optional)
  3. Create input field mapping entries for Cognitive Services skill sets
  4. Create output field mapping entries for Cognitive Services skill sets
  5. Create entity categories
  6. Create skills for Cognitive Services
  7. Setup CognitiveServicesByKey
  8. Create skill sets
  9. Create output mappings for the Cognitive Services skill sets
  10. Create the index
  11. Create and run the indexer

Create a data source. I’m indexing data that is stored in CosmosDB. The variables have been assigned in the constructor. Since they are specific to my use case, you will supply your own configuration data from your Azure Portal CosmosDB account.

DataSource cosmosDbDataSource = DataSource.CosmosDb(name: dataSourceName,cosmosDbConnectionString: connectionString,collectionName: collection,useChangeDetection: true);
await svcClient.DataSources.CreateOrUpdateAsync(cosmosDbDataSource);

Create a suggester (optional). I created and indexer named ktssuggester that uses title and summary for source fields.

List<Suggester> suggesters = new List<Suggester>();
suggesters.Add(new Suggester(name: "ktssuggester", sourceFields: new string[] { "title", "summary" }));

Create input mapping field entries for Cognitive Services skill sets. In my use case, I only needed one input mapping for both skill sets. The two skill sets will be using the same source. The skill sets that were used are entity recognition and key phrase extraction.

List<InputFieldMappingEntry> inputMappings = new List<InputFieldMappingEntry>();
inputMappings.Add(new InputFieldMappingEntry(name: "text",source: "/document/summary"));

Create output mapping field entries for Cognitive Services skill sets.

List<OutputFieldMappingEntry> erSkillOutputMappings = new List<OutputFieldMappingEntry>();
erSkillOutputMappings.Add(new OutputFieldMappingEntry(name: "persons",targetName: "People"));
erSkillOutputMappings.Add(new OutputFieldMappingEntry(name: "organizations",targetName: "Organizations"));
erSkillOutputMappings.Add(new OutputFieldMappingEntry(name: "locations",targetName: "Locations"));
erSkillOutputMappings.Add(new OutputFieldMappingEntry(name: "entities",targetName: "entities"));

List<OutputFieldMappingEntry> kpeSkillOutputMappings = new List<OutputFieldMappingEntry>();
kpeSkillOutputMappings.Add(new OutputFieldMappingEntry(name: "keyPhrases",targetName: "Keyphrases"));

Create entity categories.

List<EntityCategory> entityCategory = new List<EntityCategory>();
entityCategory.Add(EntityCategory.Person);
entityCategory.Add(EntityCategory.Quantity);
entityCategory.Add(EntityCategory.Organization);
entityCategory.Add(EntityCategory.Url);
entityCategory.Add(EntityCategory.Email);
entityCategory.Add(EntityCategory.Location);
entityCategory.Add(EntityCategory.Datetime);

Create skills for Cognitive Services.

EntityRecognitionSkill entityRecognitionSkill = new EntityRecognitionSkill(name: "#1",description: null,context: "/document/summary",inputs: inputMappings,outputs: erSkillOutputMappings,categories: entityCategory,defaultLanguageCode: EntityRecognitionSkillLanguage.En);

KeyPhraseExtractionSkill keyPhraseExtractionSkill = new KeyPhraseExtractionSkill(name: "#2",description: null,context: "/document/summary",defaultLanguageCode: KeyPhraseExtractionSkillLanguage.En,inputs: inputMappings,outputs: kpeSkillOutputMappings)

Setup CognitiveServicesByKey. You will have to find these values under Skillsets in the Azure Portal for your Azure Search.

CognitiveServicesByKey cogServices = new CognitiveServicesByKey();
cogServices.Description = cognitiveServicesDescription;
cogServices.Key = cognitiveServicesKey;

Create skill sets.

List<Skill> skills = new List<Skill>();
skills.Add(entityRecognitionSkill);
skills.Add(keyPhraseExtractionSkill);

Skillset skillset = new Skillset(
name: "ktsskillset",
description: "KTS cognitive skillset",
cognitiveServices: cogServices,
skills: skills);

if (svcClient.Skillsets.Exists("ktsskillset"))
{
    await svcClient.Skillsets.DeleteAsync("ktsskillset");
}
await svcClient.Skillsets.CreateOrUpdateAsync(skillset);

Create output mappings for the Cognitive Services skill sets.

List<FieldMapping> outputMappings = new List<FieldMapping>();
outputMappings.Add(new FieldMapping(
    sourceFieldName: "/document/summary/Organizations",
    targetFieldName: "Organizations"));
outputMappings.Add(new FieldMapping(
    sourceFieldName: "/document/summary/Keyphrases",
    targetFieldName: "Keyphrases"));
outputMappings.Add(new FieldMapping(
    sourceFieldName: "/document/summary/Locations",
    targetFieldName: "Locations"));
outputMappings.Add(new FieldMapping(
    sourceFieldName: "/document/summary/People",
    targetFieldName: "People"));

Create the index.

Microsoft.Azure.Search.Models.Index index = new Microsoft.Azure.Search.Models.Index()
{
    Name = searchIndexName,
    Fields = FieldBuilder.BuildForType<RssItem>(),
    Suggesters = suggesters
};

if (svcClient.Indexes.Exists(searchIndexName))
{
    await svcClient.Indexes.DeleteAsync(searchIndexName);
}

await svcClient.Indexes.CreateOrUpdateAsync(index);

Create and run the indexer.

Indexer cosmosDbIndexer = new Indexer(name: searchIndexer,dataSourceName: cosmosDbDataSource.Name,targetIndexName: index.Name,skillsetName: skillset.Name,
parameters: new IndexingParameters(maxFailedItems: -1,maxFailedItemsPerBatch: -1),
schedule: new IndexingSchedule(TimeSpan.FromDays(1)), outputFieldMappings: outputMappings);

if (svcClient.Indexers.Exists(searchIndexer))
{
    await svcClient.Indexers.DeleteAsync(searchIndexer);
}
await svcClient.Indexers.CreateOrUpdateAsync(cosmosDbIndexer);
await svcClient.Indexers.RunAsync(searchIndexer);

About the Author

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You may also like these