数据挖掘与数据仓库
数据挖掘和数据仓库既是分析数据的非常强大和流行的技术。倾向于统计的用户使用数据挖掘。他们利用统计模型来寻找数据中的隐藏模式。数据矿工有兴趣寻找不同数据元素之间的有用关系,这最终对企业盈利。但另一方面,可以直接分析业务尺寸的数据专家倾向于使用数据仓库。
Data mining is also known as Knowledge Discovery in data (KDD). As mentioned above, it is a field of computer science, which deals with extraction of previously unknown and interesting information from raw data. Due to the exponential growth of data, especially in areas such as business, data mining has become very important tool to convert this large wealth of data in to business intelligence, as manual extraction of patterns has become seemingly impossible in the past few decades. For example, it is currently been used for various applications such as social network analysis, fraud detection and marketing. Data mining usually deals with following four tasks: clustering, classification, regression, and association. Clustering is identifying similar groups from unstructured data. Classification is learning rules that can be applied to new data and will typically include following steps: preprocessing of data, designing modeling, learning/feature selection and Evaluation/validation. Regression is finding functions with minimal error to model data. And association is looking for relationships between variables. Data mining is usually used to answer questions like what are the main products that might help to obtain high profit next year in Wal-Mart?
As mentioned above, Data warehousing is also used for analyzing data, but by different sets of users and a slightly different goal in mind. For example, when it comes to the retail sector, Data warehousing users are more concerned with what kinds of purchases are popular among customers, so the results of the analysis can help the customer by improving the customer experience. But Data miners first conjecture a hypothesis such as which customers buy a certain type of product and analyze the data to test the hypothesis. Data warehousing could be carried out by a major retailer who initially stocks its stores with the same sizes of products to later find out that New York stores sells smaller size inventory much faster than in Chicago stores. So, by looking at this result the retailer can stock the New York store with smaller sizes compared to Chicago stores.
So, as you can clearly see, these two types of analysis appear to be of the same nature to the naked eye. Both do concern about increasing profits based on the historical data. But of course, there are key differences. In simple terms, Data Mining and Data Warehousing are dedicated to furnishing different types of analytics, but definitely for different types of users. In other words, Data Mining looks for correlations, patters to support a statistical hypothesis. But, Data Warehousing answers a comparatively broader question and it slices and dices data from there onwards to recognize ways of improvement in the future.
发表评论