Invented by Zhongfei Zhang, Shuangfei Zhai, Research Foundation of State University of New York, Research Foundation for State University

The Market for Semisupervised Autoencoder for Sentiment Analysis

In today’s customer-oriented culture, understanding consumers’ overall opinion and sentiment about a product or service is paramount for business success (Rambocas and Pacheco 2018).

Sentiment analysis is the computational study of people’s opinions, emotions and attitudes toward entities. It offers actionable knowledge that businesses can use to refine their strategy and gain insights into customer feedback.

Market Segmentation

Market segmentation is the practice of dividing your target market into distinct groups based on shared characteristics. This strategy assists marketers in crafting more tailored marketing campaigns that speak directly to each group’s needs and desires. Furthermore, it serves as a great way to determine which customers are most likely to purchase your products or services.

Market segmentation is the key to success when creating a new product or marketing an existing one. It allows you to understand what drives certain groups of customers and what they value most, helping determine which marketing messages work best and saving both time and money in the process.

Segmenting your target audience can be done in several ways, such as behavioral, psychographic, geographic and technographic. While these four types are the foundations of market segmentation, there are numerous other strategies you can employ for maximum effectiveness.

Demographic segmentation is an effective marketing technique that divides your target audience into groups based on shared characteristics like age, gender, nationality, education level, income level, credit rating and buying habits. This strategy allows you to understand how customers value your products and services as well as which are the most profitable to sell.

Psychographic segmentation is a market research method that utilizes methods such as surveys, focus groups and case studies to collect data on your target audience. By learning more about their lifestyles, attitudes and interests, you can develop better product offerings and more successful marketing campaigns.

Behavioral market segmentation is another popular type of segmentation, targeting specific steps in your ideal customer’s buying process. This includes what they need from a product, why they need it, and how they go about getting it.

With these data, you can design a product that caters to their needs and values. Furthermore, tailor your marketing and advertising campaigns towards those most likely to purchase the item – saving both time and money while providing a more personalized experience.

Market Size

The Semisupervised Autoencoder for Sentiment Analysis market is expected to experience rapid growth in the near future due to its expanding applications across a variety of industries, particularly business where companies seek feedback to refine strategies and gain insight into customer opinions about products. Furthermore, social media industries use sentiment analysis techniques for customer support, better products design, and new customer acquisition.

Sentiment analysis employs both unsupervised learning and supervised learning techniques, the latter when there is a large amount of labeled data available for analysis. Machine learning algorithms such as LSTM, SVM, and CNN are commonly employed in this process.

Sentiment analysis has become an invaluable asset to businesses and research projects, as it helps identify users’ emotions and attitudes toward events or entities. As such, sentiment analysis requires collaboration from specialists across disciplines like psychology and linguistics.

While deep learning algorithms have become more widely employed in sentiment analysis over the years, researchers have yet to fully embrace them. Recently however, some work has been done that attempts to utilize this technique for increasing model accuracy.

Researchers have also proposed hybrid approaches that combine lexicon-based and machine learning-based techniques. This strategy offers several advantages, such as the capacity to learn with various inputs such as different features, leading to a more accurate model.

Another advantage of this method is that it can be applied on various data sets, allowing the system to customize its training process according to each application. This leads to improved performance which has become increasingly popular in sentiment analysis (Pozzi et al. 2017).

Sentiment analysis can be invaluable when it comes to recommender systems, helping improve and validate recommendations by combining multiple types of information such as user reviews and other sources. The outcomes of these analyses could then be used for creating a recommendation engine that more efficiently suggests relevant content to users.

Market Share

The Semisupervised Autoencoder for Sentiment Analysis market is expanding at an impressive rate and is expected to remain dominant in the near future. This trend can be attributed to rising demand for automatic sentiment classification, particularly on social networks and online reviews. This technology allows automated text reconstruction with minimal human involvement required.

Semi-supervised methods are data-driven techniques that don’t need labeled data for learning, making them ideal for unlabeled Big Data. Typically, such models employ hidden layers or manifolds to learn more complex nonlinear representations between words – making them perfect for sentiment analysis.

Many approaches have been explored in this area, such as co-training and self-training techniques, transductive SVM and graph-based methods. Nonetheless, certain factors may hinder their success.

First, the accuracy of a classifier in performing classification tasks relies on its learning of representations from input text. Frequent words make up most of this probability mass for word occurrences, so it’s essential that this latent representation captures all semantic space associated with those words.

Solving this problem is no small feat. Popular methods for word representation, such as squared Euclidean distance and element-wise KL divergence, reconstruct all dimensions of the input independently and without discrimination.

Second, this approach may lead to bias since the autoencoder will typically reconstruct predictive words with a high weighting and ignore less frequent ones. This is especially true when trying to accurately classify words according to their class labels (e.g., sentiment words).

Natural language tends to follow a power-law distribution of word occurrences. That is, most instances of each word increase in frequency with increased power, while more frequently occurring words tend to occupy only a small proportion of the total vocabulary. For sentiment classification purposes, this distribution of words with clear polarities becomes especially critical since they will occupy only a fraction of total words.

Market Forecast

The market for Semisupervised Autoencoder for Sentiment Analysis is anticipated to reach $800 million by 2021, growing at a compound annual growth rate (CAGR) of 16% during the forecast period. The primary drivers of this market include the growing popularity of social media and an influx of companies offering sentiment analytics solutions. Furthermore, this market benefits from the massive amounts of unstructured data produced daily. Data sentiment analysis is the most crucial element. With the market for sentiment analysis set to expand, so too will demand for advanced machine learning algorithms that can extract meaningful information from unstructured text. This market is being driven by an uptick in demand for big data analytics solutions. Notable players include Amazon, Google, Microsoft and Facebook. The most significant challenge facing this space is scaling existing machine learning algorithms to handle big data volumes; however, many firms have already found a solution that meets both accuracy and speed requirements. Effective solutions must be able to handle these demands without sacrificing accuracy or speed during data processing.

The Research Foundation of State University of New York, Research Foundation for State University invention works as follows

A method for modelling data. It involves: training an objective function on a linear classification system, using a set labeled with data to derive a classifier weight; defining a posterior probability distribution on the classifier weights set by the linear classifier; approximating an autoencoder’s marginalized loss function as a Bregman divergence based upon the posterior probability distribution of the classifier weights learnt from the linear Classifier; and classifying unlabeled information using the autoencoder based on the marginalized data

Background for Semisupervised Autoencoder for Sentiment Analysis

Machine learning uses the Bag of Words (BoW) to represent documents. This reduces text of arbitrary lengths into a fixed-length vector. BoW is still the most popular representation for text classification, despite its simplicity. A lot of research has been done to learn useful representations of textual data (Turney, Pantel 2010, Blei, Ng and Jordan 2003; Deerwester, et al. 1990; Mikolov et al. 2013; Glorot Bordes, Bengio, and Bordes. The co-occurrence pattern can be used to create a low-dimensional vector that represents a document in a compact, meaningful way. This new representation can be used for other tasks, such as topic visualization or information retrieval. To learn task-dependent representations of textual data, one of the most widely used representation learning methods is autoencoders (Bengio 2009. This model incorporates label information in its objective function. This allows the learned representation to be directly linked with the task of interest.

See, U.S. Pat. Nos. Nos. 5,116,061; 6,507.829; 2013,0216228. 20120233779. 2012024365. 20120242783. 20120233647. 20120245647. 20120258736. 20120232728. 20120243637. 20120244848. 201202327. 20120233637. 20120244848. 2012025794. 20120255657. 201402448. 2014025794.20150155556565.

Sentiment Analysis (SA) is a specific type of task in text-mining. It’s an example of how the technology can be applied. One special case of SA is a binary classification problem. This is where text can be classified as either positive or negative. This is largely due in part to the rise of social media, which allows people to voice their opinions on certain topics. It is also easy to find cleanly labeled data on SA by crawling reviews from sites like Amazon or IMDB. SA is a great benchmark for evaluating features and text classification models. The technology isn’t limited to this example.

See, U.S. Pat. Nos. 7,523,085 and 7,363,239; 2,0215623, 20120215645; 2,0220542; (2012)0216175; 20120220542; 2013,0220856; 2013,0227287; 2013,004354; (2013)002442258;2013004654, 20130292879;2013004654 ;201300229564 ;201300226374 ;20130022965 ;2013002694 ;2013002495792675288722875537 003372685372472875355372872762872675355372785373372 201755352282572282 0262092595371 ;2015012282 002675452 ;2013014352452 ;201402522828753922823622935236229352393235612875128336752635287236226353369535229751287552635349228752737422875535535236227585354462875355355617553535152635236875355355355355355355355260121212121212121212121212121212121212121212121212121212121212131212121212121212121212121212121212121212121212121212121212141212121212121212121212121212121212121212121212121212121212161212121212161217317335163351617315162273151631733723931313131314161117337412131313131313131313131313131313131313131313131313131313131313151633531733617339313131313131313131313

Autoencoders are a key building block in Deep Learning (Bengio 2009). They are used to learn feature features by reconstructing inputs according to a loss function. The hidden layer of an autoencoder neural network implementation is considered the learned feature. Although it can be easy to get good reconstructions using plain autoencoders (Bengio 2009, Vincent et. al. 2008; Rifai et al. 2011b). The loss function is an important tool for modeling textual data, but little attention has been paid to it. Common loss functions, such as squared Euclidean Distance and element-wise L Divergence for example, attempt to reconstruct every dimension of input separately and randomly. This is not the best approach when text classification is the goal. Two reasons are involved. It is well-known that word occurrences in natural language follow the power-law. This means that only a few words are likely to account for the majority of the word occurrences. The Autoencoder tends to focus its efforts on the most common words and ignores the more rare ones. This is a direct result. This can lead to poor performance, especially when the class distribution cannot be captured using just the frequently used words. This is a problem for sentiment analysis. It is evident that only a fraction of the vocabulary contains the most useful words or phrases. Reconstructing irrelevant words like?actor? is difficult. Or?movie? It is unlikely that this will help us learn more useful representations to classify movie reviews’ sentiment. It is also expensive to explicitly reconstruct all words in input text. The latent representation must contain all aspects, even irrelevant, of the semantic space carried through the words. The vocabulary size can easily exceed tens of thousand even for a small dataset. This means that the hidden layer size must be very large in order to achieve a reasonable reconstruction. It also causes a lot of model capacity waste and makes it difficult scale to larger problems.

In reality, the reasoning above applies all unsupervised learning methods in the general, which is one the most important problems you need to address in order learn task-specific representations.

See, U.S. Pat. Nos. Nos.

A bias can be caused by the labelling process in any labelled data. This bias could be either a priori (selectively creating labels across the data set with an intrinsic bias) or ex post facto (selecting data that has a particular bias from a larger data set, which may itself be biased or objective).

For instance, subjective user feedback about a datum would generally lead to an priori biased label dataset. This represents the user’s subjective responses and may differ from those of other members. The biases do not have to be specific to one individual. They can be representative of a group, family, community, demographic group or sex. Sometimes, the user characteristics or labels are known beforehand and the data labelled according to their source characteristics. This is an example of predetermined classifications. The data can be separated or labelled with the classification and then the data selected based on the original classification or characteristics.

Alternately, the data may not be classified or grouped according to any predetermined bias or source. The data about the user/source/bias can also be stored as additional parameters to an unsegregated set. This is where a larger number of data can be used for analysis. A subsequent process is then employed to select or prepare the data for use.

In a multiparametric user/source/bias space, the data may be clustered using a statistical clustering algorithm, in order to automatically classify the user/source/bias, and perhaps the data content itself, either according to an automatically optimized classification/segmentation, or according to arbitrary or not predetermined classification at any time, including after collection of the labelled data. Data from other users/sources/bias can be used to improve statistical reliability and distinctive power. If the user/source/bias is antithetical, the data may be combined with the biased data to improve decision-making. The weighting does not have to be limited to opposites. Each axis can also have its own independent variation in a multiparametric classification.

In some cases the active learning process does not limit to the preprocessing of data for later use with an autoencoder. The latter process can be integrated with the user/source/bias class and the potentially rich classification data carrier through analysis, such as as additional dimensions or degrees.

Data clustering refers to the process of grouping data points with similar characteristics. Automated processes define a cost function and a distance function. Data is then classified according to its relationship to the defined clusters or automatically defined clusters. Clustering is therefore an automated decision-making issue. There are many different approaches to clustering. The science of clustering has been well developed. Once the distance or cost function has been defined, it can be used as a clustering criteria. The clustering process then becomes an optimization process according to the optimization process. This may be imperfect or produce different optimized results depending on the optimization applied. A complete evaluation of one optimum state can be difficult for large data sets. This makes the optimization process susceptible to bias, error, ambiguity and other artifacts.

In some cases the data distribution is continuous. The boundaries of clusters are sensitive to subjective considerations, or more sensitive to specific characteristics of the clustering technique used. However, data inclusion within a specific cluster can be relatively sensitive to the clustering method in some cases. In some cases, however, the focus of the clustering results is on the marginal data. This is why the quality of the clustering is critical to the success of the system.

Clustering is a way to reduce the data set’s dimensionality by treating each cluster like a degree of freedom with a distance from the centroid or another characteristic exemplar. The distance in a non-hybrid set is a scalar. In systems that have some flexibility but are more complex, the distance may be a vector. A data set of 10,000 data points could potentially have 10,000 degrees of freedom. Each data point is the centroid in its own cluster. If the data set is divided into 100 groups with 100 data points, then the degrees of freedom are reduced to 100. The remaining differences are expressed as distances from the cluster definition. Cluster analysis is a method of grouping data objects using information about or in relation to the data. The object within a group should be related to each other and distinct from or unrelated to the objects in the other groups. The greater the group’s similarity or homogeneity and the greater the differences between them, the better. Clustering is more distinct than the others.

It is important to note that these degrees of freedom are typically linked to words, phrases, and other information in a text or semantic app. A labelled data application adds external and/or explicit labels to the data set. This information may also include possible information about the source or origin. Although labelled data is usually static, if the source, origin, or other information is changed after the labelling, this information could be added to the labelled dataset.

In certain cases, the dimension may be reduced to 1. In this case, all of the dimensional variation of the data set are reduced to a distance using a distance function. This is a binary classification. This distance function can be very useful as it allows dimensionless comparisons of the whole data set and allows the user to adjust the function to suit different constraints. In certain types of clustering, distance functions can be used to define each cluster and apply them to the whole data set. Other types of clustering define the distance function for all data sets and cannot be easily modified for each cluster. Clustering algorithms that work well with large data sets should not include interactive distance functions, which can cause the distance function to change depending on the data. Clustering processes that are iterative tend to produce a preliminary clustering of data. Then, they seek to improve the clustering and, if that is possible, create a more effective clustering. Complex data sets can have relationships among data points that could result in a penalty or reward, or a cost, if they are clustered in certain ways. The clustering algorithm can split data points that have an affinity or group together data points with a negative affinity. However, optimization becomes more difficult.

A semantic database can be described as a collection of documents that contain words or phrases. Words can be ambiguous. For example, “apple” could represent a fruit, a record company, or a musician. To make the database useful, it is necessary to resolve the multiple meanings and contexts. An automated process may be used to extract the relevant information to separate the meanings. This is a way to group documents according to their context. As the data set grows, this automated process may prove difficult. In some cases, the information available is not sufficient to allow for precise automated clustering. A human can, however, often identify a context through making an inference. This, while subject to error and bias, may still be a very useful result.

In supervised classification, the mapping of a set data vectors to an infinite set of discrete classes labels is modelled in terms of some mathematical function that includes a vector with adjustable parameters. An inductive learning algorithm, also known as an inducer, determines the values of these adjustable parameters and optimizes them. Its purpose is to minimize an empirical danger function on a finite set of input data. An induced classifier is created when the inducer reaches convergence. Unsupervised classification is also known as exploratory data analysis or clustering. There are no labeled data available in unsupervised classification. Clustering is a method that separates a finite set of unlabeled data into a discrete and finite set of?natural? data. Hidden data structures are used to provide a more accurate description of unobserved samples that were generated from the same probability distribution. Semi-supervised classification involves labeling a portion of data or using sparse feedback to help with the process.

Non-predictive Clustering is subjective in nature. It seeks to ensure that objects within a particular cluster have greater similarities than objects from other clusters. Cluster analysis is a method of dividing data into meaningful, useful or mixed groups (clusters). If the goal is to create meaningful groups, the clusters should capture the natural? The data structure should be clear. Cluster analysis can be useful in some cases but it is not a good starting point for data summarization. This raises the question of what the natural structure is of the data and how can we tell when clustering is not true? As we have discussed, labels can be biased and may contain different truths, or a range of truths.


Regression and principal component analysis (PCA) are data analysis techniques that have a time or spatial complexity of O(m2) or greater (where m is how many objects). They are therefore not suitable for large data sets. Instead of applying the algorithm to all data sets, the algorithm can be applied to smaller data sets that only contain cluster prototypes. The results may be comparable to the ones that could have been obtained if all of the data had been used, depending on the type of analysis and how many prototypes were used. The clusters may then be assigned the entire data set based on distance functions.


Click here to view the patent on Google Patents.