Database definition

What is Data Analysis? – Definition of WhatIs.com

Data analysis (DA) is the process of examining sets of data to find patterns and draw conclusions about the information they contain. Increasingly, data analysis is performed using specialized systems and software. Data analytics technologies and techniques are widely used in business industries to enable organizations to make more informed business decisions. Scientists and researchers also use analytical tools to verify or disprove scientific models, theories, and hypotheses.

As a term, data analytics primarily refers to an assortment of applications, ranging from basic business intelligence (BI), reporting and online analytical processing (OLAP) to various forms of advanced analytics. In this sense, it is similar in nature to business analytics, another umbrella term for data analytics approaches. The difference is that the latter is geared towards business uses, while data analytics has a broader scope. The broad view of the term is not universal, however: in some cases, people use data analytics specifically to refer to advanced analytics, treating BI as a separate category.

Data analytics initiatives can help businesses increase revenue, improve operational efficiency, optimize marketing campaigns, and strengthen customer service efforts. The analysis also enables organizations to react quickly to emerging market trends and gain a competitive advantage over their business competitors. However, the ultimate goal of data analytics is to improve business performance. Depending on the particular application, the data analyzed may consist of historical records or new information that has been processed for real-time analysis. Additionally, it can come from a mix of internal systems and external data sources.

Types of data analysis applications

At a high level, data analysis methodologies include exploratory data analysis (EDA) and confirmatory data analysis (CDA). EDA aims to find patterns and relationships in data while CDA applies statistical techniques to determine whether assumptions about a data set are true or false. EDA is often compared to detective work, while CDA is akin to the work of a judge or jury in a trial – a distinction first drawn by statistician John W. Tukey in his 1977 book Exploratory data analysis.

Data analysis can also be separated into quantitative data analysis and qualitative data analysis. The first involves the analysis of numerical data with quantifiable variables. These variables can be compared or measured statistically. The qualitative approach is more interpretative – it focuses on understanding the content of non-numerical data such as text, images, audio, and video, as well as common phrases, themes, and viewpoints.

At the application level, BI and reporting provide business leaders and employees with actionable insights into key performance indicators, business operations, customers and more. Previously, data queries and reports were typically created for end users by BI developers who worked in IT. Today, more and more organizations are using self-service BI tools that allow executives, business analysts, and operational staff to run their own ad-hoc queries and create reports themselves.

Advanced types of data analysis include data mining, which involves sorting through large sets of data to identify trends, patterns, and relationships. Another is predictive analytics, which seeks to predict customer behavior, equipment failures, and other future business scenarios and events. Machine learning can also be used for data analysis, running automated algorithms to iterate through datasets faster than data scientists can through conventional analytical modeling. Big data analytics applies data mining, predictive analytics, and machine learning tools to datasets that can include a mix of structured, unstructured, and semi-structured data. Text mining provides a way to analyze documents, emails, and other textual content.

Data analytics initiatives support a wide variety of business uses. For example, banks and credit card companies analyze withdrawal and spending habits to prevent fraud and identity theft. E-commerce companies and marketing service providers use clickstream analysis to identify website visitors who are likely to purchase a particular product or service, based on browsing and page viewing habits. Health organizations use patient data to assess the effectiveness of treatments for cancer and other diseases.

Mobile network operators look at customer data to predict churn; which enables them to take action to prevent customers from defecting to competing suppliers. To boost customer relationship management efforts, companies are engaging in CRM analytics to segment customers for marketing campaigns and equip call center employees with up-to-date caller information.

At the heart of the data analysis process

Data analytics applications involve more than just analyzing data, especially on advanced analytics projects. Much of the work required takes place upstream, when collecting, integrating and preparing the data, then developing, testing and revising the analytical models to ensure that they produce results. precise. In addition to data scientists and other data analysts, analytics teams often include data engineers, who create data pipelines and help prepare datasets for analysis.

The analysis process begins with data collection. Data scientists identify the information they need for a particular analytics application, then work alone or with data engineers and IT staff to assemble it for use. Data from different source systems may need to be combined through data integration routines, transformed into a common format, and loaded into an analytics system, such as a Hadoop cluster, NoSQL database, or data warehouse. data.

In other cases, the collection process may consist of extracting a relevant subset of a data stream that passes, for example, to Hadoop. The data is then moved to a separate system partition so that it can be analyzed without affecting the overall dataset.

Once the necessary data is in place, the next step is to find and resolve data quality issues that could affect the accuracy of analytics applications. This includes performing data profiling and cleansing tasks to ensure that the information in a data set is consistent and that errors and duplicate entries are eliminated. Additional data preparation work is done to manipulate and organize the data for the intended analytical use. Data governance policies are then enforced to ensure the data adheres to corporate standards and is used appropriately.

From there, a data scientist builds an analytical model using predictive modeling tools or other analytical software and programming languages ​​such as Python, Scala, R, and SQL. Typically, the model is initially run on a partial dataset to test its accuracy; it is then revised and retested as needed. This process is called “training” the model until it works as expected. Finally, the model is run in production mode on the entire dataset, which can be done once to meet a specific information need or continuously as the data is updated.

In some cases, analytics applications can be configured to automatically trigger business actions. An example is stock trading by a financial services company. Otherwise, the last step in the data analysis process is to communicate the results generated by the analytical models to business executives and other end users. Charts and other infographics can be designed to make the results easier to understand. Data visualizations are often built into BI dashboard applications that display data on a single screen and can be updated in real time as new information becomes available.

Data Analytics vs Data Science

As automation grows, data scientists will focus more on business needs, strategic monitoring, and deep learning. Data analysts who work in business intelligence will focus more on building models and other routine tasks. In general, data scientists focus their efforts on producing general insights, while data analysts focus on answering specific questions. In terms of technical skills, future data scientists will need to focus more on the process of machine learning operations, also known as MLOps.