How to create an Azure Cognitive Service search index in C#

This article demonstrates how to create a search index with a suggester, skillset, and mappings over data in CosmosDb, where the data layout is:

public class RssItem
{
    [JsonProperty(PropertyName = "id")]
    public string Id { get; set; }
    [JsonProperty(PropertyName = "title")]
    public string Title { get; set; }
    [JsonProperty(PropertyName = "summary")]
    public string Summary { get; set; }
    [JsonProperty(PropertyName = "publishedDate")]
    public string PublishedDate { get; set; }
    [JsonProperty(PropertyName = "source")]
    public string Source { get; set; }
    [JsonProperty(PropertyName = "link")]
    public string Link { get; set; }
}

The first thing to do is define the attributes and fields of the index:

public class RssItem
{
    [Key]
    public string id { get; set; }

    [IsRetrievable(true)]
    public string title { get; set; }

    [IsSearchable,IsRetrievable(true)]
    [Analyzer(AnalyzerName.AsString.EnMicrosoft)]
    public string summary { get; set; }

    [IsRetrievable(true)]
    public string publishedDate { get; set; }

    [IsRetrievable(true)]
    public string source { get; set; }

    [IsRetrievable(true)]
    public string link { get; set; }

    [IsRetrievable(false)]
    public string rid { get; set; }

    [IsRetrievable(true)]
    public string[] People { get; set; }

    [IsRetrievable(true)]
    public string[] Organizations { get; set; }

    [IsRetrievable(true)]
    public string[] Locations { get; set; }

    [IsRetrievable(true)]
    public string[] Keyphrases { get; set; }
}

A suggester provides a list of fields that undergo extra tokenization, generating prefix sequences to support matches on partial terms. For example, a suggester that includes a City field with a value for “Seattle” will have prefix combinations of “sea”, “seat”, “seatt”, and “seattl” to support typeahead. (Source: https://learn.microsoft.com/en-us/azure/search/index-add-suggesters) The suggester will be created over the summary field:

List<Suggester> suggesters = new List<Suggester>();
suggesters.Add(new Suggester(
   name: "ktssuggester",
   sourceFields: new string[] { "summary" }));

Setup the connection to CosmosDB:

DataSource cosmosDbDataSource = DataSource.CosmosDb(
    name: dataSourceName, 
    cosmosDbConnectionString: connectionString,
    collectionName: collection,
    useChangeDetection: true);
await svcClient.DataSources.CreateOrUpdateAsync(cosmosDbDataSource);

A skillset is a reusable resource in Azure Cognitive Search that’s attached to an indexer. It contains one or more skills that call built-in AI or external custom processing over documents retrieved from an external data source.

The following diagram illustrates the basic data flow of skillset execution.

From the onset of skillset processing to its conclusion, skills read from and write to an enriched document. Initially, an enriched document is just the raw content extracted from a data source (articulated as the "/document" root node). With each skill execution, the enriched document gains structure and substance as skill writes its output as nodes in the graph.

After skillset execution is done, the output of an enriched document finds its way into an index through output field mappings. Any raw content that you want transferred intact, from source to an index, is defined through field mappings.

To configure enrichment, you’ll specify settings in a skillset and indexer.

(Source: https://learn.microsoft.com/en-us/azure/search/cognitive-search-working-with-skillsets)

List<InputFieldMappingEntry> inputMappings = new List<InputFieldMappingEntry>();
inputMappings.Add(new InputFieldMappingEntry(
    name: "text",
    source: "/document/summary"));

List<OutputFieldMappingEntry> erSkillOutputMappings = new List<OutputFieldMappingEntry>();
erSkillOutputMappings.Add(new OutputFieldMappingEntry(
    name: "persons",
    targetName: "People"));
erSkillOutputMappings.Add(new OutputFieldMappingEntry(
    name: "organizations",
    targetName: "Organizations"));
erSkillOutputMappings.Add(new OutputFieldMappingEntry(
    name: "locations",
    targetName: "Locations"));
erSkillOutputMappings.Add(new OutputFieldMappingEntry(
    name: "entities",
    targetName: "entities"));

List<EntityCategory> entityCategory = new List<EntityCategory>();
entityCategory.Add(EntityCategory.Person);
entityCategory.Add(EntityCategory.Quantity);
entityCategory.Add(EntityCategory.Organization);
entityCategory.Add(EntityCategory.Url);
entityCategory.Add(EntityCategory.Email);
entityCategory.Add(EntityCategory.Location);
entityCategory.Add(EntityCategory.Datetime);

EntityRecognitionSkill entityRecognitionSkill = new EntityRecognitionSkill(
    name: "#1",
    description: null,
    context: "/document/summary",
    inputs: inputMappings,
    outputs: erSkillOutputMappings,
    categories: entityCategory,
    defaultLanguageCode: EntityRecognitionSkillLanguage.En);

List<OutputFieldMappingEntry> kpeSkillOutputMappings = new List<OutputFieldMappingEntry>();
kpeSkillOutputMappings.Add(new OutputFieldMappingEntry(
    name: "keyPhrases",
    targetName: "Keyphrases"));

KeyPhraseExtractionSkill keyPhraseExtractionSkill = new KeyPhraseExtractionSkill(
    name: "#2",
    description: null,
    context: "/document/summary",
    defaultLanguageCode: KeyPhraseExtractionSkillLanguage.En,
    inputs: inputMappings,
    outputs: kpeSkillOutputMappings);

List<Skill> skills = new List<Skill>();
skills.Add(entityRecognitionSkill);
skills.Add(keyPhraseExtractionSkill);

CognitiveServicesByKey cogServices = new CognitiveServicesByKey();
cogServices.Description = cognitiveServicesDescription;
cogServices.Key = cognitiveServicesKey;
Skillset skillset = new Skillset(
    name: "ktsskillset",
    description: "KTS cognitive skillset",
    cognitiveServices: cogServices,
    skills: skills);

if (svcClient.Skillsets.Exists("ktsskillset"))
{
    await svcClient.Skillsets.DeleteAsync("ktsskillset");
}

await svcClient.Skillsets.CreateOrUpdateAsync(skillset);

List<FieldMapping> outputMappings = new List<FieldMapping>();
outputMappings.Add(new FieldMapping(
    sourceFieldName: "/document/summary/Organizations",
    targetFieldName: "Organizations"));
outputMappings.Add(new FieldMapping(
    sourceFieldName: "/document/summary/Keyphrases",
    targetFieldName: "Keyphrases"));
outputMappings.Add(new FieldMapping(
    sourceFieldName: "/document/summary/Locations",
    targetFieldName: "Locations"));
outputMappings.Add(new FieldMapping(
    sourceFieldName: "/document/summary/People",
    targetFieldName: "People"));

Define and create the index:

Microsoft.Azure.Search.Models.Index index = new Microsoft.Azure.Search.Models.Index()
{
    Name = searchIndexName,
    Fields = FieldBuilder.BuildForType<RssItem>(),
    Suggesters = suggesters
};

if (svcClient.Indexes.Exists(searchIndexName))
{
    await svcClient.Indexes.DeleteAsync(searchIndexName);
}
await svcClient.Indexes.CreateOrUpdateAsync(index);

Create and run the indexer:

Indexer cosmosDbIndexer = new Indexer(
name: searchIndexer,
dataSourceName: cosmosDbDataSource.Name,
targetIndexName: index.Name,
skillsetName: skillset.Name,
parameters: new IndexingParameters(
    maxFailedItems: -1,
    maxFailedItemsPerBatch: -1),
schedule: new IndexingSchedule(TimeSpan.FromDays(1)),
outputFieldMappings: outputMappings);

if (svcClient.Indexers.Exists(searchIndexer))
{
    await svcClient.Indexers.DeleteAsync(searchIndexer);
}

await svcClient.Indexers.CreateOrUpdateAsync(cosmosDbIndexer);
await svcClient.Indexers.RunAsync(searchIndexer);

About the Author

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You may also like these