Manyam Mallela

Manyam Mallela

San Francisco, California, United States
4K followers 500+ connections

About

Seasoned technical leader and entrepreneur with twenty plus years of hard earned…

Contributions

Activity

Join now to see all activity

Experience

  • Blueshift Graphic

    Blueshift

    San Francisco Bay Area

  • -

    San Francisco Bay Area

  • -

    San Francisco Bay Area

  • -

    San Francisco Bay Area

  • -

  • -

Education

Licenses & Certifications

Publications

  • Information Theoretic Co-Clustering

    International Conference on Knowledge Discovery and Data Mining

    Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the co-clustering problem as an optimization problem in information theory---the…

    Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the co-clustering problem as an optimization problem in information theory---the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters. We present an innovative co-clustering algorithm that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages. Using the practical example of simultaneous word-document clustering, we demonstrate that our algorithm works well in practice, especially in the presence of sparsity and high-dimensionality.

    Other authors
    • Inderjit Dhillon
    See publication
  • Enhanced Word Clustering for Hierarchical Text Classification

    SIG KDD 2002

    In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering" of features has been found to achieve improvements over feature selection in terms of classification accuracy, especially at lower number of features. However the existing clustering techniques are agglomerative in nature and result in (i) sub-optimal word clusters and (ii) high computational cost. In order to explicitly…

    In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering" of features has been found to achieve improvements over feature selection in terms of classification accuracy, especially at lower number of features. However the existing clustering techniques are agglomerative in nature and result in (i) sub-optimal word clusters and (ii) high computational cost. In order to explicitly capture the optimality of word clusters in an information theoretic framework, we first derive a global criterion for feature clustering. We then present a fast, divisive algorithm that monotonically decreases this objective function value, thus converging to a local minimum. We show that our algorithm minimizes the "within-cluster Jensen-Shannon divergence" while simultaneously maximizing the "between-cluster Jensen-Shannon divergence". In comparison to the previously proposed agglomerative strategies our divisive algorithm achieves higher classification accuracy especially at lower number of features. We further show that feature clustering is an effective technique for building smaller class models in hierarchical classification. We present detailed experimental results using Naive Bayes and Support Vector Machines on the 20 Newsgroups data set and a 3-level hierarchy of HTML documents collected from Dmoz Open Directory.

    See publication
  • Using Memex to archive and mine community Web browsing experience

    Proceedings of the 9th International World Wide Web Conference (WWW)

    Other authors

Patents

  • Event-based personalized merchandising schemes and applications in messaging

    Issued US 9,779,443

    Systems and methods for designing personalized merchandising schemes that are responsive to events received in an event stream may provide, for example, one or more graphical user interfaces by which to receive parameters for a personalized merchandising scheme from a designer. Messages may be selected for delivery to a user responsively to an event according to a defined personalized merchandising scheme.

    Other inventors
    See patent
  • Delivering search results

    Issued US 8,176,041

    Delivering search results is disclosed. A search term is obtained and categories are determined. Results specific to each category are obtained and ranked based on a criterion that is specific to each category. The results are ranked based at least in part on a topic dependent score and may also be ranked in part on a topic independent score.

    Other inventors
    See patent

Recommendations received

More activity by Manyam

View Manyam’s full profile

  • See who you know in common
  • Get introduced
  • Contact Manyam directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Add new skills with these courses