Skip to content

Research Paper Of Data Mining

Data Mining and Machine Learning Papers

Below are select papers on a variety of topics. The list is not meant to be exhaustive. The papers found on this page either relate to my research interests of are used when I teach courses on machine learning or data mining.

  • General (articles)
    • Data Mining and Statistics: What's the Connection?
    • Data Mining: Statistics and More?, D. Hand, American Statistician, 52(2):112-118.
    • Data Mining, G. Weiss and B. Davison, in Handbook of Technology Management, John Wiley and Sons, expected 2010.
    • From Data Mining to Knowledge Discovery in Databases, U. Fayyad, G. Piatesky-Shapiro & P. Smyth, AI Magazine, 17(3):37-54, Fall 1996.
    • Mining Business Databases, Communications of the ACM, 39(11): 42-48.
    • 10 Challenging Problems in Data Mining Research, Q. Yiang and X. Wu, International Journal of Information Technology & Decision Making, Vol. 5, No. 4, 2006, 597-604. (slides)
  • General (short news articles)
  • General Data Mining Methods and Algorithms
    • Top 10 Algorithms in Data Mining, X. Wu, V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, H. motoda, G.J. MClachlan, A. Ng, B. Liu, P.S. Yu, Z. Zhou, M. Steinbach, D. J. Hand, D. Steinberg, Knowl Inf Syst (2008) 141-37.
    • Induction of Decision Trees, R. Quinlan, Machine Learning, 1(1):81-106, 1986.
  • Web and Link Mining
    • The Pagerank Citation Ranking: Bringing Order to the Web, L. Page, S. Brin, R. Motwani, T. Winograd, Technical Report, Stanford University, 1999.
    • The Structure and Function of Complex Networks, M. E. J. Newman, SIAM Review, 2003, 45, 167-256.
    • Link Mining: A New Data Mining Challenge, L. Getoor, SIGKDD Explorations, 2003, 5(1), 84-89.
    • Link Mining: A Survey, L. Getoor, SIGKDD Explorations, 2005, 7(2), 3-12.
  • Semi-supervised Learning
    • Semi-Supervised Learning Literature Survey, X. Zhu, Computer Sciences TR 1530, University of Wisconsin -- Madison.
    • Introduction to Semi-Supervised Learning, in Semi-Supervised Learning (Chapter 1) O. Chapelle, B. Scholkopf, A. Zien (eds.), MIT Press, 2006. (Fordham's library has online access to the entire text)
    • Learning with Labeled and Unlabeled Data, M. Seeger, University of Edinburgh (unpublished), 2002.
    • Person Identification in Webcam Images: An Application of Semi-Supervised Learning, M. Balcan, A. Blum, P. Choi, J. lafferty, B. Pantano, M. Rwebangira, X. Zhu, Proceedings of the 22nd ICML Workshop on Learning with Partially Classified Training Data, 2005.
    • Learning from Labeled and Unlabeled Data: An Empirical Study across Techniques and Domains, N. Chawla, G. Karakoulas, Journal of Artificial Intelligence Research, 23:331-366, 2005.
    • Text Classification from Labeled and Unlabeled Documents using EM, K. Nigam, A. McCallum, S. Thrun, T. Mitchell, Machine Learning, 39, 103-134, 2000.
    • Self-taught Learning: Transfer Learning from Unlabeled Data, R. Raina, A. Battle, H. Lee, B. Packer, A. Ng, in Proceedings of the 24th International Conference on Machine Learning, 2007.
    • An iterative algorithm for extending learners to a semisupervised setting, M. Culp, G. Michailidis, 2007 Joint Statistical Meetings (JSM), 2007
  • Partially-Supervised Learning / Learning with Uncertain Class Labels
    • Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers, V. Sheng, F. Provost, P. Ipeirotis, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008.
    • Logistic Regression for Partial Labels, in 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Volume III, pp. 1935-1941, 2002.
    • Classification with Partial labels, N. Nguyen, R. Caruana, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008.
    • Imprecise and Uncertain Labelling: A Solution based on Mixture Model and Belief Functions, E. Come, 2008 (powerpoint slides).
    • Induction of Decision Trees from Partially Classified Data Using Belief Functions, M. Bjanger, Norweigen University of Science and Technology, 2000.
    • Knowledge Discovery in Large Image Databases: Dealing with Uncertainties in Ground Truth, P. Smyth, M. Burl, U. Fayyad, P. Perona, KDD Workshop 1994, AAAI Technical Report WS-94-03, pp. 109-120, 1994.
  • Recommender Systems
  • Rarity and Class Imbalance

    General resources available on this topic:

    Papers

    • A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data, G. Batista, R. Prati, and M. Monard, SIGKDD Explorations, 6(1):20-29, 2004.
    • Class Imbalance versus Small Disjuncts, T. Jo and N. Japkowicz, SIGKDD Explorations, 6(1): 40-49, 2004.
    • Extreme Re-balancing for SVMs: a Case Study, B. Raskutti and A. Kowalczyk, SIGKDD Explorations, 6(1):60-69, 2004.
    • A Multiple Resampling Method for Learning from Imbalanced Data Sets, A. Estabrooks, T. Jo, and N. Japkowicz, in Computational Intelligence, 20(1), 2004.
    • SMOTE: Synthetic Minority Over-sampling Technique, N. Chawla, K. Boyer, L. Hall, and W. Kegelmeyer, Journal of Articifial Intelligence Research, 16:321-357.
    • Generative Oversampling for Mining Imbalanced Datasets, A. Liu, J. Ghosh, and C. Martin, Third International Conference on Data Mining (DMIN-07), 66-72.
    • Learning from Little: Comparison of Classifiers Given Little of Classifiers given Little Training, G. Forman and I. Cohen, in 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, 161-172, 2004.
    • Issues in Mining Imbalanced Data Sets - A Review Paper, S. Visa and A. Ralescu, in Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference, pp. 67-73, 2005.
    • Wrapper-based Computation and Evaluation of Sampling Methods for Imbalanced Datasets, N. Chawla, L. Hall, and A. Joshi, in Proceedings of the 1st International Workshop on Utility-based Data Mining, 24-33, 2005.
    • C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling, C. Drummond and R. Holte, in ICML Workshop on Learning from Imbalanced Datasets II, 2003.
    • C4.5 and Imbalanced Data sets: Investigating the effect of sampling method, probabilistic estimate, and decision tree structure, N. Chawla, in ICML Workshop on Learning from Imbalanced Datasets II, 2003.
    • Class Imbalances: Are we Focusing on the Right Issue?, N. Japkowicz, in ICML Workshop on Learning from Imbalanced Datasets II, 2003.
    • Learning when Data Sets are Imbalanced and When Costs are Unequal and Unknown, M. Maloof, in ICML Workshop on Learning from Imbalanced Datasets II, 2003.
    • Uncertainty Sampling Methods for One-class Classifiers, P. Juszcak and R. Duin, in ICML Workshop on Learning from Imbalanced Datasets II, 2003.
  • Active Learning
    • Improving Generalization with Active Learning, D Cohn, L. Atlas, and R. Ladner, Machine Learning 15(2), 201-221, May 1994.
    • On Active Learning for Data Acquisition, Z. Zheng and B. Padmanabhan, In Proc. of IEEE Intl. Conf. on Data Mining, 2002.
    • Active Sampling for Class Probability Estimation and Ranking, M. Saar-Tsechansky and F. Provost, Machine Learning 54:2 2004, 153-178.
    • The Learning-Curve Sampling Method Applied to Model-Based Clustering, C. Meek, B. Thiesson, and D. Heckerman, Journal of Machine Learning Research 2:397-418, 2002.
    • Active Sampling for Feature Selection, S. Veeramachaneni and P. Avesani, Third IEEE Conference on Data Mining, 2003.
    • Heterogeneous Uncertainty Sampling for Supervised Learning, D. Lewis and J. Catlett, In Proceedings of the 11th International Conference on Machine Learning, 148-156, 1994.
    • Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction, G. Weiss and F. Provost, Journal of Artificial Intelligence Research, 19:315-354, 2003.
    • Active Learning using Adaptive Resampling, KDD 2000, 91-98.
  • Cost-Sensitive Learning

Research Papers

Following are PostScript files containing papers by the research group of Vipin Kumar organized by topics.

For a complete list of publications see vita .


Data Mining

  • Anomaly Detection
    • Anomaly Detection : A Survey (2009) Varun Chandola, Arindam Banerjee, and Vipin Kumar, ACM Computing Surveys, Vol. 41(3), Article 15, July 2009.
    • A Comparative Evaluation of Anomaly Detection Techniques for Sequence Data (2008) Varun Chandola, Varun Mithal, and Vipin Kumar, To appear in Proceedings of International Conference on Data Mining (ICDM), December 2008.

  • Principles
    • A Framework for Analyzing Categorical Data (2009). Varun Chandola, Shyam Boriah, and Vipin Kumar, In Proceedings of SIAM Data Mining Conference, April 2009, Sparks, NV.
    • Similarity Measures for Categorical Data: A Comparative Evaluation (2008). Shyam Boriah, Varun Chandola and Vipin Kumar, In Proceedings of SIAM Data Mining Conference, April 2008, Atlanta, GA.
    • Summarization - Compressing Data into an Informative Representation (2006). Varun Chandola and Vipin Kumar. Knowledge Discovery and Information Systems (KAIS), Vol. 12(3), 2007.
    • Summarization - Compressing Data into an Informative Representation (2005). Varun Chandola and Vipin Kumar. Proceedings of 5th International Conference on Data Mining (ICDM), 2005
    • Generalizing the Notion of Confidence (2006). Michael Steinbach and Vipin Kumar. To appear in Knowledge Discovery and Information Systems (KAIS).
    • Support Envelopes: A Technique for Exploring the Structure of Association Patterns (2004). Michael Steinbach, Pang-Ning Tan, and Vipin Kumar, in Proc of the Tenth ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (SIGKDD).
    • Generalizing the Notion of Support (2004). Michael Steinbach, Pang-Ning Tan, Hui Xiong, and Vipin Kumar, in Proc of the Tenth ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (SIGKDD).
    • Privacy Leakage in Multi-relational Databases via Pattern based Semi-supervised Learning (2004). Hui Xiong, Michael Steinbach, and Vipin Kumar, University of Minnesota Technical Report 04-23.
    • Selecting the Right Interestingness Measure for Association Patterns (2002). Pang-Ning Tan, Vipin Kumar, and Jaideep Srivastava, Proc of the Eighth ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (SIGKDD-2002).
    • A Universal Formulation of Sequential Patterns (2001). Mahesh Joshi, George Karypis, and Vipin Kumar, KDD 2001 workshop on Temporal Data Mining (Technical Report # 99-021).
    • Interestingness Measures for Association Patterns : A Perspective (2000). Pang-Ning Tan and Vipin Kumar, KDD 2000 Workshop on Postprocessing in Machine Learning and Data Mining (Technical Report # TR00-036).

  • Clustering
    • HICAP:Hierarchial Clustering with Pattern Preservation (2004). Hui Xiong, Michael Steinbach, Pang-Ning Tan, and Vipin Kumar, In Proc. of the Fourth SIAM International Conf. on Data Mining (SDM'04), Florida, USA, 2004.
    • Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data (2003). Levent Ertoz, Michael Steinbach, and Vipin Kumar, SIAM International Conference on Data Mining (SDM '03)
      (Technical Report version).
    • Finding Topics in Collections of Documents: A Shared Nearest Neighbor Approach (2003). Levent Ertoz, Michael Steinbach, and Vipin Kumar, Clustering and Information Retrieval, forthcoming 2003, Kluwer Academic Publishers.
    • Challenges of Clustering High Dimensional Data (2003). Michael Steinbach, Levent Ertoz, and Vipin Kumar, New Vistas in Statistical Physics -- Applications in Econophysics, Bioinformatics, and Pattern Recognition, forthcoming 2003, Springer-Verlag.
    • A New Shared Nearest Neighbor Clustering Algorithm and its Applications (2002). Levent Ertoz, Michael Steinbach, and Vipin Kumar, Workshop on Clustering High Dimensional Data and its Applications at 2nd SIAM International Conference on Data Mining (2002).
    • CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling (1999). George Karypis, Eui-Hong (Sam) Han, and Vipin Kumar, IEEE Computer: Special Issue on Data Analysis and Mining, vol. 32, no. 8, pp 68-75, August 1999.
    • Multilevel Refinement for Hierarchical Clustering (1999). George Karypis, Eui-Hong (Sam) Han, and Vipin Kumar, Technical Report # 99-020.
    • Hypergraph Based Clustering in High-Dimensional Data Sets: A Summary of Results (1998). Eui-Hong (Sam) Han, George Karypis, Vipin Kumar and B. Mobasher, Bulletin of the Technical Committee on Data Engineering, Vol. 21, No. 1, March 1998.
    • Clustering In A High-Dimensional Space Using Hypergraph Models (1997). Eui-Hong (Sam) Han, George Karypis, Vipin Kumar and Bamshad Mobasher, Technical Report # 97-019.
    • Clustering Based On Association Rule Hypergraphs (1997). Eui-Hong (Sam) Han, George Karypis, Vipin Kumar and Bamshad Mobasher, SIGMOD'97 Workshop on Research Issues on Data Mining and Knowledge Discovery.

  • Associations/Correlations/Co-locations
    • Hyperclique Pattern Discovery Hui Xiong Pang-Ning Tan, and Vipin Kumar, Data Mining and Knowledge Discovery (DMKD), Accepted for publication as a regular paper, 2006.
    • TAPER: A Two-Step Approach for All-strong-pairs Correlation Query in Large Databases Hui Xiong, Shashi Shekhar, Pang-Ning Tan, and Vipin Kumar, IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 18, No 4, pp. 493-508, 2006.
    • Enhancing Data Analysis with Noise Removal Hui Xiong, Gaurav Pandey, Michael Steinbach, Vipin Kumar, IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 18, No. 3, pp. 304-319, March 2006
    • A Framework for Discovering Co-location Patterns in Data Sets with Extended Spatial Objects (2004). Hui Xiong, Shashi Shekhar, Yan Huang, Vipin Kumar, X. Ma, J. Yoo, In Proc. 2004 SIAM International Conf. on Data Mining (SDM'04), Florida, USA, 2004.
    • Exploiting a Support-based Upper Bound of Pearson's Correlation Coefficient for Efficiently Identifying Strongly Correlated Pairs (2004), Hui Xiong, Shashi Shekhar, Pang-Ning Tan, Vipin Kumar, in Proc. of the Tenth ACM SIGKDD Internatonal Conference on Knowledge Discovery and Data Mining, Seattle, USA, 2004.
    • Mining Strong Affinity Association Patterns in Data Sets with Skewed Support Distribution, Hui Xiong, Pang-Ning Tan, and Vipin Kumar, In Proc. of the Third IEEE International Conference on Data Mining (ICDM'03), pp. 387-394, Melbourne, Florida, USA, 2003. (Also in Technical Report TR-03-006 Mining Hyperclique Patterns with Confidence Pruning, January 2003)
    • Mining Indirect Associations in Web Data (2001). Pang-Ning Tan, and Vipin Kumar, WebKDD 2001: Mining Log Data Across All Customer Touch Points.
    • Using SAS for Mining Indirect Associations in Data (2001). Pang-Ning Tan, Vipin Kumar, and Harumi Kuno, Western Users of SAS Software Conference.
    • Indirect Association: Mining Higher Order Dependencies in Data (2000). Pang-Ning Tan, Vipin Kumar, and Jaideep Srivastava, PKDD 2000 (Technical Report # TR00-037).
    • Min-Apriori: An Algorithm for Finding Association Rules in Data with Continuous Attributes (1997). Eui-Hong (Sam) Han, George Karypis and Vipin Kumar.

  • Classification and Predictive Models for Rare Classes
    • RBA: An Integrated Framework for Regression Based on Association Rules (2004). Aysel Ozgur, Pang-Ning Tan, and Vipin Kumar, 2004 SIAM International Conf. on Data Mining (SDM'04), Florida, USA, 2004.
    • Predicting Rare Classes: Comparing Two-Phase Rule Induction to Cost-Sensitive Boosting (2002) Mahesh V. Joshi, Ramesh C. Agrawal, and Vipin Kumar, Sixth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'02),
    • Predicting Rare Classes: Can Boosting Make Any Weak Learner Strong? (2002) Mahesh V. Joshi, Ramesh C. Agrawal, and Vipin Kumar, Proc of the Eighth ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD-2002).
    • Evaluating Boosting Algorithms to Classify Rare Classes: Comparison and Improvements (2001) Mahesh V. Joshi, Vipin Kumar, and Ramesh C. Agrawal, First IEEE International Conference on Data Mining.
    • Mining Needles in a Haystack: Classifying Rare Classes via Two-Phase Rule Induction (2001) Mahesh V. Joshi, Ramesh C. Agrawal, and Vipin Kumar, SIGMOD'01 conference on Management of Data.
    • Automated Morphological Classification of Galaxies and the Morphology-Density Relation (1999). J.R. Kriessler, E.H. Han, S.C. Odewahn, and T.C. Beers, Abstract in the 193rd Meeting of the American Astronomical Society.

  • Parallel and Distributed Data Mining
    • Parallel and Distributed Computing for Cybersecurity (2005), Vipin Kumar. Invited Article: Security, IEEE Distributed Systems Online, Vol 6, No 10
    • High Performance Data Mining Vipin Kumar, Mahesh V. Joshi, Eui-Hong (Sam) Han, Pang-Ning Tan, and Michael Steinbach, "High Performance Computing for Computational Science - VECPAR 2002", Palma, J. M.L.M., Dongarra, J., Hernndez, V., and Sousa, A. A. (Eds.) 5th International Conference, Porto, Portugal, June 26-28, 2002. Selected Papers and Invited Talks
    • Scalable Parallel Data Mining for Association Rules(2000). Eui-Hong (Sam) Han, George Karypis and Vipin Kumar, IEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 3, May/June 2000.
    • Parallel Algorithms for Data Mining (2000). Mahesh V. Joshi, Eui-Hong (Sam) Han, George Karypis and Vipin Kumar, Editors: J. Dongarra, I. Foster, G. Fox, K. Kennedy, L. Torczon, and A. White, CRPC Parallel Computing Handbook, Morgan Kaufmann, 2000.
    • Efficient Parallel Algorithms for Mining Associations (2000). Mahesh V. Joshi, Eui-Hong (Sam) Han, George Karypis and Vipin Kumar, Editors: M. Zaki and C.-T. Ho, Large-scale Parallel and Distributed Data Mining, Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence (LNCS/LNAI), vol. 1759, Springer-Verlag, 2000.
    • Parallel Formulations of Decision-Tree Classification Algorithms (1999). Anurag Srivastava, Eui-Hong (Sam) Han, Vipin Kumar, and Vineet Singh, Data Mining and Knowledge Discovery: An International Journal, vol. 3, no. 3, pp 237-261, September 1999.
    • Parallel Formulations of Decision-Tree Classification Algorithms (1998). Anurag Srivastava, Eui-Hong (Sam) Han, Vipin Kumar, and Vineet Singh, Proc. of the 1998 International Conference on Parallel Processing
    • Dynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers (1998). Anurag Srivastava, Eui-Hong (Sam) Han, Vipin Kumar, and Vineet Singh, IPPS'98 Workshop on High Performance Data Mining.
    • ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets (1998). Mahesh V. Joshi, George Karypis and Vipin Kumar, Proc. of 1998 International Parallel Processing Symposium, April 1998.
    • Scalable Parallel Data Mining for Association Rules(1997). Eui-Hong (Sam) Han, George Karypis and Vipin Kumar, Proc. of 1997 ACM-SIGMOD International Conference on Management of Data, May 1997.
    • An Efficient, Scalable, Parallel Classifier for Data Mining (1996). Anurag Srivastava, Vineet Singh, Eui-Hong (Sam) Han and Vipin Kumar.

  • Spatio-Temporal patterns in Climate Data
    • Discovery of Climate Indices Using Clustering (2003). Michael Steinbach, Pang-Ning Tan, Vipin Kumar, Steven Klooster, and Christopher Potter, to appear in KDD 2003.
    • Global Teleconnections of Ocean Climate to Terrestrial Carbon Flux (2003). Christopher Potter, Steven Klooster, Michael Steinbach, Pang-Ning Tan, Vipin Kumar, Shashi Shekhar, Ranga Myneni, Ramakrishna Nemani, to appear in J. Geophysical Research-Atmospheres.
    • Exploiting Spatial Autocorrelation to Efficiently Process Correlation-Based Similarity Queries (2003). Pusheng Zhang, Yan Huang, Shashi Shekhar and Vipin Kumar, to appear in the Proc. of the 8th Int'l Symposium on Spatial and Temporal Databases (SSTD '03), July 25-27, 2003, Santorini Island, Greece.
    • Major Disturbance Events in Terrestrial Ecosystems Detected using Global Satellite Data Sets (2003). Christopher Potter, Pang-Ning Tan, Michael Steinbach, Steven Klooster, Vipin Kumar, Ranga Myneni, Vanessa Genovese accepted for Global Change Biology.
    • Continental scale comparisons of terrestrial carbon sinks estimated from satellite data and ecosystem modeling 1982-98 (2003). Chrisopher Potter, Steven Klooster, Ranga Myneni, Vanessa Genovese, Pang-Ning Tan, Vipin Kumar, Global and Planetary Change (in press).
    • Correlation Analysis of Spatial Time Series Datasets: A Filter-And-Refine Approach (2003). Pusheng Zhang, Yan Huang, Shashi Shekhar, and Vipin Kumar, Proc of the Seventh Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD03), Seoul, Korea.
    • Temporal Data Mining for the Discovery and Analysis of Ocean Climate Indices (2002). Michael Steinbach, Pang-Ning Tan, Vipin Kumar, Steven Klooster, and Christopher Potter, accepted for KDD Workshop on Temporal Data Mining.
    • Data Mining for the Discovery of Ocean Climate Indices (2002). Michael Steinbach, Pang-Ning Tan, Vipin Kumar, Steven Klooster, and Christopher Potter, Proc of the Fifth Workshop on Scientific Data Mining at 2nd SIAM International Conference on Data Mining.
    • Mining Scientific Data: Discovery of Patterns in the Global Climate System (2001). Vipin Kumar, Michael Steinbach, Pang-Ning Tan, Steven Klooster, Christopher Potter, Alicia Torregrosa, 2001 Joint Statistical Meeting.
    • Clustering Earth Science Data: Goals, Issues and Results (2001). Michael Steinbach, Pang-Ning Tan, Vipin Kumar, Steven Klooster, Christopher Potter, Alicia Torregrosa, KDD 2001 Workshop on Mining Scientific Dataset.
    • Finding Spatio-Temporal Patterns in Earth Science Data (2001). Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Steven Klooster, Christopher Potter, Alicia Torregrosa, KDD 2001 Workshop on Temporal Data Mining.

  • Bioinformatics
  • Network Intrusion Detection
    • Data Mining for Cyber Security (2006). Varun Chandola, Eric Eilertson, Levent Ertoz, Gyorgy Simon and Vipin Kumar. Book Chapter, To Appear in Data Warehousing and Data Mining Techniques for Computer Security, editor Anoop Singhal, Springer
    • Scan Detection - A Data Mining Approach (2006). Gyorgy Simon,Hui Xiong, Eric Eilertson, and Vipin Kumar. SIAM International Conf. on Data Mining (SDM)
    • The MINDS - Minnesota Intrusion Detection System, "Next Generation Data Mining, Ertoz, L., Eilertson, E., Lazarevic, A., Tan, P., Srivastava, J., Kumar, V., Dokas, P., MIT Press, 2004".
    • Protecting Against Cyber Threats in Network Centric Systems (2003). Aleksandar Lazarevic, Jaideep Srivastava, Vipin Kumar, SPIE Annual Symposium on AeroSense, Battlespace Digitization and Network Centric Systems III, Orlando, FL.
    • A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection (2003). Aleksandar Lazarevic, Levent Ertoz, Aysel Ozgur, Jaideep Srivastava, Vipin Kumar, to appear in the 3rd SIAM Conference on Data Mining, San Francisco, CA.
    • Data Mining for Network Intrusion Detection (2002). Paul Dokas, Levent Ertoz, Vipin Kumar, Aleksandar Lazarevic, Jaideep Srivastava, Pang-Ning Tan, Proc. NSF Workshop on Next Generation Data Mining, Baltimore, MD.
    • Cyber Threat Analysis - A Key Enabling Technology for the Objective Force (A Case Study in Network Intrusion Detection) (2002). Aleksandar Lazarevic, Paul Dokas, Levent Ertoz, Vipin Kumar, Jaideep Srivastava, Pang-Ning Tan, Proceedings 23rd Army Science Conference, Orlando, FL.

  • Web and Text Mining
    • Expert Agreement and Content Based Reranking in a Meta Search Engine Environment using Mearf (2002). B. Uygar Oztekin, George Karypis, and Vipin Kumar, WWW 2002.
    • Mining Association Patterns in Web Usage Data (2002). Pang-Ning Tan, and Vipin Kumar, International Conference on Advances in Infrastructure for e-Business, e-Education, e-Science, and e-Medicine on the Internet.
    • Discovery of Web Robot Sessions based on their Navigational Patterns (2002). Pang-Ning Tan, and Vipin Kumar, Data Mining and Knowledge Discovery, 6(1):9-35.
    • Finding Topics in Collections of Documents: A Shared Nearest Neighbor Approach (2001). Levent Ertoz, Michael Steinbach, and Vipin Kumar, Text Mine'01, Workshop on Text Mining (1st SIAM International Conference on Data Mining).
    • Efficient Algorithms for Creating Product Catalogs (2001). Michael Steinbach, George Karypis, and Vipin Kumar, Web Mining Workshop (1st SIAM International Conference on Data Mining).
    • Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification (2001). Eui-Hong (Sam) Han, George Karypis, and Vipin Kumar, PAKDD'2001.
    • A Comparison of Document Clustering Techniques (2000). Michael Steinbach, George Karypis, and Vipin Kumar, TextMining Workshop, KDD 2000.
    • Fast Supervised Dimensionality Reduction Algorithm with Applications to Document Categorization & Retrieval (2000). George Karypis and Eui-Hong (Sam) Han, CIKM'2000.
    • Centroid-Based Document Classification: Analysis & Experimental Results(2000). Eui-Hong (Sam) Han and George Karypis, PKDD'2000.
    • Modeling of Web Robot Navigational Patterns (2000). Pang-Ning Tan and Vipin Kumar, WebKDD 2000: Web Mining for E-Commerce. (Technical Report # TR00-038).
    • Document Categorization and Query Generation on the World Wide Web Using WebACE (1999). Daniel Boley, Maria Gini, Robert Gross, Eui-Hong (Sam) Han, Kyle Hastings, George Karypis, Vipin Kumar, Bamshad Mobasher, and Jerome Moore, AI Review, Vol. 13, No. 5-6, 1999.
    • Partitioning-Based Clustering for Web Document Categorization (1999). Daniel Boley, Maria Gini, Robert Gross, Eui-Hong (Sam) Han, Kyle Hastings, George Karypis, Vipin Kumar, Bamshad Mobasher, and Jerome Moore, Decision Support Systems Journal, Vol 27, No. 3, pp 329-341, 1999.
    • WebACE: A Web Agent for Document Categorization and Exploartion (1998). Eui-Hong (Sam) Han, Daniel Boley, Maria Gini, Robert Gross, Kyle Hastings, George Karypis, Vipin Kumar, B. Mobasher, and Jerry Moore, Proc. of the 2nd International Conference on Autonomous Agents (Agents'98)
    • Web Page Categorization and Feature Selection Using Association Rule and Principal Component Clustering (1997). Jerome Moore, Eui-Hong (Sam) Han, Daniel Boley, Maria Gini, Robert Gross, Kyle Hastings, George Karypis, Vipin Kumar, and Bamshad Mobasher, Workshop on Information Technologies and Systems, 1997. (HTML version)
    • Web Mining: Pattern Discovery from World Wide Web Transactions (1996). Bamshad Mobasher, Namit Jain, Eui-Hong (Sam) Han and Jaideep Srivastava.

Graph Partitioning

  • Graph Partitioning for High Performance Scientific Simulations (2000). Kirk Schloegel, George Karypis, and Vipin Kumar, Technical Report 00-018. Chapter in CRPC Parallel Computing Handbook, J. Dongarra, I. Foster, G. Fox, K. Kennedy, L. Torczon, and A. White, editors. Morgan Kaufmann, 2000.
  • Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning (1999). Kirk Schloegel, George Karypis, and Vipin Kumar, Technical Report 99-031.
  • A New Algorithm for Multi-objective Graph Partitioning (1999). Kirk Schloegel, George Karypis, and Vipin Kumar, Technical Report 99-003. Proceedings of Europar '99.
  • Wavefront Diffusion and LMSR: Algorithms for Dynamic Repartitioning of Adaptive Meshes (1998). Kirk Schloegel, George Karypis, and Vipin Kumar, Technical Report TR 98-034.
  • A Performance Study of Diffusive vs. Remapped Load-Balancing Schemes (1998). Kirk Schloegel, George Karypis, and Vipin Kumar, Proceedings of the 11th International Conference on Parallel and Distributed Computing Systems (PDCS-98).
  • Parallel Multilevel Diffusion Algorithms for Repartitioning of Adaptive Meshes (1997). Kirk Schloegel, George Karypis, and Vipin Kumar, Technical Report TR 97-014.
  • Multilevel Diffusion Schemes for Repartitioning of Adaptive Meshes (1997). Kirk Schloegel, George Karypis, and Vipin Kumar, Journal of Parallel and Distributed Computing, 47(2):109-124.
  • Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs (1996). George Karypis and Vipin Kumar, Proceedings of Supercomputing'96, Pittsburg, November, 1996.
  • Multilevel k-way Partitioning Scheme for Irregular Graphs (1995). George Karypis and Vipin Kumar, to appear in Journal of Parallel and Distributed Computing. Also available as Tech Report 95-064, Department of Computer Science, University of Minnesota, 1995.
  • A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graph s (1995). George Karypis and Vipin Kumar, to appear in the SIAM Journal on Scientific Computing 1997. Also available as Tech Report 95-035, Department of Computer Science, University of Minnesota, 1995.
  • A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering. George Karypis and Vipin Kumar, to appear in Journal of Parallel and Distributed Computing. A short version appears in Proceedings of the International Parallel Processing Symposium, April 1996.
  • Analysis of Multilevel Graph Partitioning (1995). George Karypis and Vipin Kumar, Proceedings of Supercomputing'95, December 1995, San Diego. Also available as Tech Report 95-037, Department of Computer Science, University of Minnesota, 1995.
  • Scalability Analysis of Partitioning Strategies for Finite Element Graphs (1992). Ananth Grama and Vipin Kumar, Proceedings of Supercomputing'92, November 1992, Minneapolis. Extended version available as Tech Report 92-38, Department of Computer Science, University of Minnesota, 1992.
  • Multilevel Hypergraph Partitioning: Application in VLSI Domain (1997). G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, Proceedings ACM/IEEE Design Automation Conference, June 1997. Extended version available as technical report TR 97R-006 from Computer Science Department, University of Minnesota.

Parallel Solution of Sparse Linear Sysem of Equations

  • A High Performance Two Dimensional Scalable Parallel Algorithm for Solving Sparse Triangular Systems (1997). Mahesh V. Joshi, Anshul Gupta, George Karypis, and Vipin Kumar, 4th International Conference on High Performance Computing, (HiPC'97).
  • Design and Implementation of a Scalable Parallel Direct Solver for Sparse Symmetric Positive Definite Systems (1997). Anshule Gupta, Fred Gustavson, Mahesh Joshi, George Karypis, Vipin Kumar, Proceedings of the Eighth SIAM Conference on Parallel Processing, March 1997.
  • Parallel Threshold-based ILU Factorization (1996). George Karypis and Vipin Kumar, Tech Report 96-061, Department of Computer Science, University of Minnesota. A short version appears in the Proceedings of Supercomputing '97.
  • Highly Scalable Parallel Algorithms for Sparse Matrix Factorization (1995). Anshul Gupta, George Karypis, and Vipin Kumar, IEEE Transactions on Parallel and Distributed Systems Volume 8, Number 5. A short version of this paper won the Outstanding Student Paper Award from the Supercomputing 94 conference.
  • A High Performance Sparse Cholesky Factorization Algorithm for Scalable Parallel Computers (1995). George Karypis and Vipin Kumar, Proceedings of Frontiers '95 Conference. Extended version available as Tech Report 94-41, Departemnt of Computer Science, University of Minnesota, 1994.
  • Performance and Scalability of Preconditioned Conjugate Gradient Methods on the CM5. Anshul Gupta, Vipin Kumar and Ahmed Sameh, IEEE Transactions on Parallel and Distributed Systems Volume 6, Number 5, pp. 455-469, May 1995.
  • Parallel Algorithms for Forward Elimination and Backward Substitution in Direct Solution of Sparse Linear Systems (1995). Anshul Gupta and Vipin Kumar, Proceedings of Supercomputing'95, December 1995, San Diego.

N-Body Computation and Dense Linear System Solvers

  • Parallel Iterative Solvers and Preconditioners Using Approximate Hierarchical Methods (Extended Abstract) Ananth Grama, Vipin Kumar, and Ahmed Sameh, Proceedings of the Copper Mountain Conference on Iterative Methods, April 1996, Copper Mountain, CO.
  • Parallel Matrix-Vector Product Using Approximate Hierarchical Methods Ananth Grama, Vipin Kumar, and Ahmed Sameh, Proceedings of Supercomputing'95, December 1995, San Diego.
  • On n-Body Simulations Using Message Passing Parallel Computers Ananth Grama, Vipin Kumar, and Ahmed Sameh, Proceedings of the Seventh SIAM Coference on Parallel Processing for Scientific Computing, San Francisco, CA. 1995.
  • Scalable Parallel Formulations of the Barnes-Hut Algorithm for n-Body Simulations Ananth Grama, Vipin Kumar, and Ahmed Sameh, Proceedings of Supercomputing'94, November 1994, Washington DC.
  • Scalable Parallel Formulations of the Barnes-Hut Algorithm (1994). Ananth Y. Grama and Vipin Kumar.
  • Parallel Hierarchical Solvers and Preconditioners for Boundary Element Methods (1996). A. Grama, V. Kumar, and A. Sameh, to appear in SIAM Journal on Scientific Computing. A short version appears in Proceedings of Supercomputing '96, Pittsburgh, November 1996. Selected as Best Student Paper Nominee for Supercomputing '96.

Scalability Analysis

  • Parallel Algorithm Scalability Issues in Petaflops Architectures (2000). Ananth Grama, Anshul Gupta, Eui-Hong (Sam) Han, and Vipin Kumar, Ultrascale Computing, 2000.
  • Isoefficiency Function: A Scalability Metric for Parallel Algorithms and Architectures (1993). Ananth Grama, Anshul Gupta, and Vipin Kumar, IEEE Parallel and Distributed Technology, Special Issue on Parallel and Distributed Systems: From Theory to Practice, August 1993, Volume 1, Number 3, pp 12-21.
  • Analyzing Scalability of Parallel Algorithms and Architectures (1993). Vipin Kumar and Anshul Gupta, Journal of Parallel and Distributed Computing (special issue on scalability), Volume 22, Number 3, September 1994, pp. 379-391. Also available as Tech Report TR 91-18, Department of Computer Science, University of Minnesota, 1991.
  • Performance Properties of Large Scale Parallel Systems (1993). Anshul Gupta and Vipin Kumar, Journal of Parallel and Distributed Computing, Volume 19, Number 3, Novemeber 1993.
  • The Scalability of FFT on Parallel Computers (1992). Anshul Gupta and Vipin Kumar, IEEE Transactions on Parallel and Distributed Systems, August 1993, Volume 4, Number 8, pp 922-932.
  • Scalability of Parallel Algorithms for Matrix Multiplication (1993). Anshul Gupta and Vipin Kumar.
  • A Highly Parallel Formulation of Backpropagation on Hypercubes (1994). Vipin Kumar, Shashi Shekhar, and Minesh B. Amin, IEEE Transactions on Parallel and Distributed Systems Volume 5, Number 10, pp. 1073-1091, October 1994.
  • Scalability of Parallel Sorting on Mesh Multicomputers (1991). V. Singh, V. Kumar, G. Agha, and C. Tomlinson, International Journal of Parallel Programming Volume 20(2), April 1991.
  • Scalability of Parallel Algorithms for the All-Pairs Shortest Path Problem V. Kumar and V. Singh, Journal of Parallel and Distributed Computing (special issue on massively parallel computation), Vol 13, #2, 1991, 124-138.

Linear Programming


Parallel Tree Search and Load Balancing

  • State of the Art in Parallel Search Techniques for Discrete Optimization Problems (1999). Ananth Y. Grama and Vipin Kumar, IEEE Transactions on Knowledge and Data Engineering, Volume 11, Number 1, January/February 1999.
  • Scalable Load Balancing Techniques for Parallel Computers, Vipin Kumar, Ananth Y. Grama and Vempaty Nageshwara Rao, Journal of Parallel and Distributed Computing, Volume 22, Number 1, pp. 60-79, July 1994.
  • Parallel Processing of Discrete Optimization Problems (1993). Grama Y. Ananth, Vipin Kumar and Panos Pardalos in Encyclopedia of Microcomputers, John Wiley & Sons, 1993.
  • Efficient Parallel Formulations for Some Dynamic Programming Algorithms (1992). George Karypis and Vipin Kumar, Proceedings of the International Parallel Processing Symposium, April 1993. Extended version available as Tech Report 92-59, Department of Computer Science, University of Minnesota, 1992.
  • Unstructured Tree Search on SIMD Parallel Computers (1992). George Karypis and Vipin Kumar, IEEE Transactios on Parallel and Distributed Systems Volume 5, Number 10, pp. 1057-1072, October 1994. Extended version available as Tech Report TR 92-21, Department of Computer Science, University of Minnesota, 1992.
  • On the Efficiency of Parallel Backtracking (1992). V. Nageshwara Rao and Vipin Kumar, IEEE Transactions on Parallel and Distributed Systems, 4(4), pp. 427-437, April 1993.
  • Parallel Best-First Search of State-Space Graphs: A Summary of Results (1988). Vipin Kumar, V. Nageshwara Rao and K. Ramesh, Proceedings of the 1988 National Conf. on Artificial Intelligence (AAAI-88), August 1988.
  • Parallel Depth-First Search on Multiprocessors Part II: Analysis (1987) Vipin Kumar and V. Nageshwara Rao, International Journal of Parallel Programming, Volume 16, #6, 1987, 501-519.
  • Parallel Depth-First Search on Multiprocessors Part I: Implementation (1987) V. Nageshwara Rao and Vipin Kumar, International Journal of Parallel Programming, Volume 16, #6, 1987, 479-499.
  • Declustering and Load Balancing methods for Parallelizing Geographical Information Systems S. Shekhar, S. Ravada, V. Kumar, G. Turner and D. Chubb, IEEE Transactions on Knowledge and Data Engineering (to appear).
  • A Survey of Parallel Search Algorithms for Discrete Optimization Problems (1995). Ananth Y. Grama and Vipin Kumar, ORSA Journal of Computing vol. 7, no. 4, pp. 365-85, 1995.
  • Automatic Test Pattern Generation on Multiprocessors (1991). S. Arvindam, V. Kumar, V.N. Rao and V. Singh, Parallel Computing, Vol 17, 1991, pp. 1323-1342.
  • Concurrent Access of Priority Queues (1988). V.N. Rao and V. Kumar, IEEE Transactions on Computers Vol 37, Number 12, December 1998, pp. 1657-1665.

Parallel Natural Language Parsing


Constraint Satisfaction


Miscellaneous

  • Role of Message-Passing in Performance Oriented Parallel Programming (1997). V. Kumar, G. Karypis, and A. Grama, Proceedings of the Eighth SIAM Conference Conference of Parallel Processing, March 1997.
  • The C3I Parallel Benchmark Suite - Introduction and Preliminary Results (1996). R. Metzger, B. Van Voorst, L. Pires, R. Jha, W. Au, M. Amin, D. Castanon, V. Kumar, Proceedings of Supercomputing '96, Pittsburgh, November 1996.
  • A3: A Simple and Asymptotically Accurate Model for Parallel Computation (1996). Ananth Grama, Vipin Kumar, Sanjay Ranka, and Vineet Singh, Proceedings of the Sixth Symposium on Frontiers of Massively Parallel Computing, Annapolis, MD, Octover 1996.



Last modified: Tue Nov 16 12:45:32 CST 1999