Problem Statement: Given a text obtained from a product image in an E-commerce site, we want to extract different attributes from the product text.
Background: Product attributes are essential for online shoppers, as they provide valuable information to help with search, recommendation, and product comparison. The massive, constantly changing products bring the open-world challenge. Products we already know may gain new attributes over time – “HDR compatibility” barely came to mind when people bought a TV ten years ago, but is relevant for many shoppers nowadays. Not to mention, new types of products with distinctively new sets of attributes may emerge in the future. Moreover, they must operate with limited supervision, as human annotation can never keep up with the forever-expanding product catalog.
Framework Searching: There are many frameworks in the existing research which tries to solve the product attribute mining problem such as BiLSTM-Tag, OpenTag, SU-OpenTag, DeepAlign+ etc. These are sequence tagging-based models, which are the de-facto models for closed-world product attribute mining. OAMine is the recent state-of-the-art framework which is the only one which solves the open-world attribute mining problem.
Why OAMine?: OAMine is a framework used for attribute extraction in an open-world setting (the set of attribute types and values are not known beforehand). Unlike standard NER techniques which require the set of attributes to be given in the input, OAMine can even extract unseen attribute types from the product text using weak supervision. Instead of providing comprehensive training data, all we need to do is provide a few examples for a few known attribute types as weak supervision.
OAMine Input:
Product data: product text + product type
Weak supervision: seed attribute values for a few known types
OAMine output:
New attribute types and values
OAMine assumes the type of the products are known a priori, as each product type holds a distinct set of applicable attributes (e.g., “resolution” applies to TVs but not shoes), and the attribute mining process should be done with respect to each product type.
OAMine overview:
OA-Mine is a two-step framework with candidate value generation followed by attribute value grouping.
The first part aims to obtain candidate attribute values from product text with high recall. The candidate value generation uses a method called Title Segmentation from Perturbed Masking in which the idea is that the pre-trained language model should capture word-to-word impact.
Steps:
We first fine-tune a BERT language model using the Masked Language Model (MLM) objective on the product text. This step is necessary because the product text differs from the general domain natural language text on which BERT is pre-trained.
Given a product title as a sequence of words 𝑊 = 𝑤1𝑤2...𝑤𝑛, we compute the likelihood of two adjacent words belonging to the same phrase using the fine-tuned BERT model.
Each pair of words with a likelihood score above the threshold will be merged into a phrase, and those below the threshold will be segmented into different phrases.
The second part aims to group candidate values into attribute clusters with seed as guidance. Each of the generated cluster represents a type of attribute. In order to cluster these values, we need embeddings of these candidate values. But the distance metric between two embedding vectors obtained from pre-trained BERT does not fully capture attribute information. So we need to do attribute-aware fine-tuning using three objective functions (Binary Meta-Classification, Contrastive Learning, and Multi-class Classification).
Steps:
Data Generation - The first step takes segmented product titles, user given seed sets, to generate fine-tuning data for three objectives: binary meta-classification, contrastive learning, and classification.
Attribute-aware multitask fine-tuning - The second steps fine-tunes BERT representation using the following objectives:
Binary Meta-Classification - The ultimate goal of fine-tuning is to embed values from the same attribute close to each other and embed values from different attributes far apart. We build a binary classification model to directly optimize for this goal.
Contrastive Learning - In addition to directly optimizing the binary meta-classification loss, It is beneficial to bind together the positive and the negative training pairs with a contrastive learning loss.
Multi-class Classification - Neither of those two loss functions enforce the class distinctiveness of attribute values. Adding class distinctiveness by a multi-class classification loss would push the embedding from different attributes further apart.
Inference - The third step is inference. First, get embedding for each value candidate. Then, we run DBSCAN on the embedding to generate attribute clusters. After that, we run classifier inference. Finally, we combine results from DBSCAN and the classifier.
Observations:
We used the NVIDIA T4 XLARGE GPU (16 CPUs, 64 GB RAM, T4 16GB GPU) machine to run and train the model. We tried the OAMine to extract attributes for our product text. We used the BERT model fine-tuned on the product text (checkpoint - BERT Checkpoint) in the first part of the framework to calculate the phrase score (likelihood of two adjacent words belonging to the same phrase). And then we did the attribute aware fine-tuning using the amazon dataset of 100 product categories (dataset - Amazon Dataset). The model performed well on the product titles of amazon given in the dataset. But the performance of our data is not as expected.
The reason is that our product text which is extracted from the images is totally different from the amazon product titles dataset. Since the in-domain fine-tuned BERT model has not been trained on this kind of data, the candidate value generation part does not perform well on our data. And also since attribute-aware fine-tuning was done on the amazon product titles, the embeddings might not be accurate representations of the phrases of our product text. Mainly the sentences are very large and do not follow one of the assumptions of the model that the product titles are a bag of attribute values.
Difference between the amazon product title and our product text:
Amazon Product title - Popchips Potato Chips, Crazy Hot Potato Chips, (3.5 oz Bags), Gluten-Free Potato Chips, Low Fat, No Artificial Flavoring, Kosher (Pack of 12) Our product text - HERRS, POTATO CHIPS, CRISP 'N TASTY, CHOICE POTATOES COOKED IN VEGETABLE OIL (CONTAINS ONE OR MORE OF THE FOLLOWING: CORN, COTTONSEED, SOYBEAN, SUNFLOWER), WITH SALT ADDED., Serving Size 13.0 CHIPS, Nt Wt. 4.04 kg(142.54 OZ), Protein 7.14 g, Total lipid (fat) 28.57 g, Carbohydrate, by difference 57.14 g, Energy 500.0 kcal, Sugars, total 0.0 g, Fiber, total dietary 3.6 g, Calcium, Ca 0.0 mg, Iron, Fe 1.29 mg, Sodium, Na 643.0 mg, Vitamin A, IU 0.0 IU, Vitamin C, total ascorbic acid 21.4 mg, Cholesterol 0.0 mg, Fatty acids, total trans 0.0 g, Fatty acids, total saturated 8.93 g, Herr Foods Inc., Product of Argentina, Manufactured in Italy |
Both the product texts are of the same product type but our product text contains additional information such as ingredients, nutritional facts, and manufactured country. This additional information is not present in the amazon product titles.
Some example results of our data:
Product Type - vegetable
PICKLED VEGETABLE LEMON CUCUMBERS, ORGANIC LEMON CUCUMBERS, WATER, ORGANIC DISTILLED WHITE VINEGAR, SEA SALT, ORGANIC SUGAR, FRESH ORGANIC LEMONGRASS, ORGANIC SPICES, Serving quantity 1.0 SPEAR, NT WT. 6.21 OZ(176 g), Protein 0.0 g, Total lipid (fat) 0.0 g, Carbohydrate, by difference 3.57 g, Energy 18.0 kcal, Sugars, total 3.57 g, Sodium, Na 357.0 mg, Vitamin A, IU 0.0 IU, Vitamin C, total ascorbic acid 0.0 mg, Manufactured By Faithful To Foods, Inc., Product of Singapore, Manufactured in India |
↓ Candidate value generation
["pickled vegetable", "lemon cucumbers", "organic", "lemon cucumbers", "water", "organic distilled", "white vinegar", "sea salt", "organic sugar", "fresh organic", "lemongrass", "organic spices", "serving quantity", "1.0 spear", "nt wt", "6.21 oz", "176 g", "protein 0.0 g", "total", "lipid", "fat", "0.0 g", "carbohydrate", "difference 3.57 g", "energy", "18.0 kcal", "sugars", "total 3.57 g", "sodium", "na 357.0 mg", "vitamin a", "iu 0.0 iu", "vitamin c", "total", "ascorbic acid", "0.0 mg", "manufactured by", "faithful to foods", "inc", "product of", "singapore", "manufactured in india"] |
↓ Attribute value grouping
'C_2' -> ['pickled vegetable', 'water', 'white vinegar', 'sea salt'] 'C_3' -> ['organic', 'organic distilled', 'organic sugar', 'fresh organic', 'manufactured in india'] 'C_5' -> ['lemon cucumbers', 'lemon cucumbers', 'lemongrass'] 'C_7' -> ['faithful to foods'] 'C_9' -> ['organic spices', 'ascorbic acid', 'singapore'] 'C_10' -> ['iu 0.0 iu'] 'noise'-> ['serving quantity', '1.0 spear', 'nt wt', '6.21 oz', '176 g', 'protein 0.0 g', 'total', 'lipid', 'fat', '0.0 g', 'carbohydrate', 'difference 3.57 g', 'energy', '18.0 kcal', 'sugars', 'total 3.57 g', 'sodium', 'na 357.0 mg', 'vitamin a', 'vitamin c', 'total', '0.0 mg', 'manufactured by', 'inc', 'product of'] |
Product Type - Snack chip and crisp
HERRS, POTATO CHIPS, CRISP 'N TASTY, CHOICE POTATOES COOKED IN VEGETABLE OIL (CONTAINS ONE OR MORE OF THE FOLLOWING: CORN, COTTONSEED, SOYBEAN, SUNFLOWER), WITH SALT ADDED., Serving Size 13.0 CHIPS, Nt Wt. 4.04 kg(142.54 OZ), Protein 7.14 g, Total lipid (fat) 28.57 g, Carbohydrate, by difference 57.14 g, Energy 500.0 kcal, Sugars, total 0.0 g, Fiber, total dietary 3.6 g, Calcium, Ca 0.0 mg, Iron, Fe 1.29 mg, Sodium, Na 643.0 mg, Vitamin A, IU 0.0 IU, Vitamin C, total ascorbic acid 21.4 mg, Cholesterol 0.0 mg, Fatty acids, total trans 0.0 g, Fatty acids, total saturated 8.93 g, Herr Foods Inc., Product of Argentina, Manufactured in Italy |
↓ Candidate value generation
["herrs", "potato chips", "crisp", "n", "tasty", "choice potatoes", "cooked in", "vegetable oil", "contains", "one", "or more", "of the following", "corn", "cottonseed", "soybean", "sunflower", "salt", "added", "serving size", "13.0 chips", "nt wt", "4.04 kg", "142.54 oz", "protein 7.14 g", "total", "lipid", "fat", "28.57 g", "carbohydrate", "by difference", "57.14 g", "energy", "500.0 kcal", "sugars", "total 0.0 g", "fiber", "total", "dietary 3.6 g", "calcium", "ca 0.0 mg", "iron", "fe 1.29 mg", "sodium", "na 643.0 mg", "vitamin a", "iu 0.0 iu", "vitamin c", "total", "ascorbic acid", "21.4 mg", "cholesterol 0.0 mg", "fatty acids total", "trans 0.0 g", "fatty acids total", "saturated 8.93 g", "herr foods", "inc", "product of", "argentina", "manufactured in", "italy"] |
↓ Attribute value grouping
'C_0' -> ['n', 'total', 'by difference', 'total', 'total', 'inc'] 'C_1' -> ['contains', 'added', 'serving size', 'italy'] 'C_2' -> ['manufactured in'] 'C_4' -> ['or more'] 'C_5' -> ['potato chips', 'crisp'] 'C_6' -> ['tasty', 'cooked in', 'sunflower', 'salt', 'carbohydrate', 'calcium', 'sodium', 'cholesterol 0.0 mg'] 'C_7' -> ['13.0 chips', '4.04 kg', '142.54 oz', 'protein 7.14 g', '28.57 g', '57.14 g', 'total 0.0 g', 'dietary 3.6 g', 'ca 0.0 mg', 'fe 1.29 mg', 'na 643.0 mg', 'ascorbic acid', '21.4 mg', 'saturated 8.93 g'] 'C_11' -> ['herrs', 'herr foods'] 'C_12' -> ['choice potatoes', 'vegetable oil', 'corn', 'cottonseed', 'soybean', 'energy', 'fiber'] 'noise'-> ['one', 'of the following', 'nt wt', 'lipid', 'fat', '500.0 kcal', 'sugars', 'iron', 'vitamin a', 'iu 0.0 iu', 'vitamin c', 'fatty acids total', 'trans 0.0 g', 'fatty acids total', 'product of', 'argentina']) |
From the above examples we can observe that some of the clusters contains values of different attribute types (Eg - Example 1, C_9 cluster contains organic spices, ascorbic acid, and singapore which are not related) and phrases are not accurate (Eg - nt wt and 4.04 kg are supposed to be in the same phrase) and more importantly lot of useful information is considered as noise (mostly nutritional facts).
Conclusion: Although the performance of the OAMine on the amazon product titles is good, the OAMine which is pre-trained and fine-tuned on amazon product titles data does not perform well on our data. The reason being the BERT model in the first part (candidate value generation) was only pre-trained on the amazon product titles which is leading to the poor performance of phrasing on our data and the fine-tuning of the BERT model in the second part was done using only amazon product titles which is leading to poor embeddings of the candidate phrases. A possible way to improve the performance of OAMine on our data is to pre-train and fine-tune the BERT models in the first and second parts respectively.