0% found this document useful (0 votes)
32 views7 pages

DA Interview Questions

important

Uploaded by

mayushbbk18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
32 views7 pages

DA Interview Questions

important

Uploaded by

mayushbbk18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 7
Interview Questions Data Analytics KIT-601 Which are the technical tools that you have used for analysis and presentation purposes? Some of the popular tools should know are: © MS SQL Server, MySQL. For working with data stored in relational databases @ MS Excel, Tableau For creating reports and dashboards © Python, R, SPSS For statistical analysis, data modeling, and exploratory analysis @ MS PowerPoint For presentation, displaying the final results and important conclusions Where is time series analysis used? Since time series analysis (TSA) has a wide scope of usage, it can be used in multiple domains, Here are some of the places where TSA plays an important role: * Statistics * Signal processing * Econometrics # Weather forecasting Earthquake prediction * Astronomy * Applied science ‘What are the common problems that data analysts encounter during analysis? ‘The common problems steps involved in any analytics proj © Handling duplicate © Collecting the meaningful right data and the right time © Handling data purging and storage problems © Making data secure and dealing with compliance issues ‘What are your strengths and weaknesses as a data analyst? Some general strengths of a data analyst may include strong analytical skills, attention to detail, proficiency in data manipulation and visualization, and the ability to derive insights from complex datasets. 5. Weaknesses could include limited domain knowledge, lack of experience with certain data analysis tools or techniques, or challenges in effectively communicating technical findings to non-technical stakeholders. What are some common data visualization tools you have used? A list of the commonly used data visualization tools in the industry: © Tableau © Microsoft Power BI © QlikView © Google Data Studio © Plotly ¢ — Matplotlib (Python library) © Excel (with built-in charting capabilities) © SAP Lumira © IBM Cognos Analytics How can you handle missing values in a dataset? There are four methods to handle missing values in a dataset. © Listwise Deletion In the listwise deletion method, an entire record is excluded from analysis if any single value is missing. © Average Imputation ‘Take the average value of the other participants' responses and fill in the missing value. © Regression Substitution Can use multiple regression analyses to estimate a missing value. © Multiple Imputations It creates plausible values based on the correlations for the missing data and then averages the simulated datasets by incorporating random errors in your predictions. Which are the technical tools that you have used for analysis and presentation purposes? Some of the popular tools should know are: * MS SQL Server, MySQL For working with data stored in relational databases © MS Excel, Tableau For creating reports and dashboards © Python, R, SPSS For statistical analysis, data modeling, and exploratory analysis © MS PowerPoint For presentation, displaying the final results and important conclusions 8. Explain the KNN imputation method, in brief, KNN is the method that requires the selection of several nearest neighbors and a distance metric at the same time. It can predict both discrete and continuous attributes of a dataset. A distance function is used here to find the similarity of two or more attributes, which will help in further analysis. 9. What is Hierarchical Clustering? Hierarchical Clustering or hierarchical cluster analysis, is an algorithm that groups similar objects into common groups called clusters. The goal is to create a set of clusters, where each cluster is different from the other and, individually, they contain similar entities. 10. What are the steps involved when working on a data analysis project? Many steps are involved when working end-to-end on a data analysis project. Some of the important steps are as mentioned below: © Problem statement © Data cleaning/preprocessing © Data exploration © Modeling © Data validation © Implementation © Verification 11, What are the best methods for data cleaning? © Create a data cleaning plan by understanding where the common errors take place and keep all communications open. © Before working with the data, identify and remove the duplicates. This will lead to an easy and effective data analysis process. © Focus on the accuracy of the data. Set cross-field validation, maintain the value types of data, and provide mandatory constraints, © Normalize the data at the entry point so that it is less chaotic. You will be able to ensure that all information is standardized, leading to fewer errors on entry. 12. What is the significance of Exploratory Data Analysis (EDA)? ‘© Exploratory data analysis (EDA) helps to machine learning algorithm. building, understand the data better. It helps to obtain confidence in the data to a point where you're ready to engage a Tt allows us to refine the selection of feature variables that will be used later for model Help to discover hidden trends and insights from the data. 13. Differences between Data Mining and Data Profiling? Data Mining Data Profiting Data mining is the process of discovering relevant information that has not yet been identified before Data profiling is done to evaluate a dataset for its uniqueness, logic, and consistency. Tn data mining, raw data is converted into valuable information Tecannot identify maccurate data values. 14, What do you mean by logistic regression? Logistic Regression more independent variables that determine a between multiple independent variables, the 15. What is collaborative filtering? Collaborative filtering is an algorithm used on the behavioral data of a customer or w: a mathematical model that in be used to study datasets with one or particular outcome. By studying the relationship model predicts a dependent data variable. to create recommendation systems based mainly sommerce sites, For example, when browsing a section called ‘Recommended for you’ is present. This is done using the browsing history, analyzing the previous purchases, and collaborative filtering. 16. How is Overfitting different from Underfitting? Overfitting Underfitting The model trains the data well using the training set, Here, the model neither trains the data well nor can generalize to new data, The performance drops considerably over the test set, Performs poorly both on the tain and the test set. Happens when the model leams the This happens when there is less data to build random fluctuations and noise in the|an accurate model and when we try to training dataset in detail, develop a linear model using non-linear data. 17. Difference between data analysis and data mining. Data Analysis: It generally involves extracting, cleansing, transforming, modeling, and visualizing data to obtain useful and important information that may contribute towards determining conclusions and deciding what to do next, Analyzing data has been in use since the 1960s, Data Mining: In data mining, also known as knowledge discovery in the database, huge quantities of knowledge are explored and analyzed to find patterns and rules. Since the 1990s, it has been a buzzword. Data Analysis Data Mining Analyzing data provides insight or tests | A hidden pattem is identified and discovered in hypotheses large datasets, This is considered as one of the activities in Data Analysis, It consists of collecting, preparing, and modeling data to extract meaning or insights. Data-driven decisions can be taken using this way Data usability is the main objective. Data visualization is certainly required Visualization is generally not necessary. Tt is an interdisciplinary field that requires knowledge of computer science, statistics, mathematics, and machine learning. Databases, machine learning, and statistics are usually combined in this field. Here the dataset can be large, medium, or small, and it can be structured, semi-structured, and unstructured Tn this case, datasets are typically large and structured. 18. Describe univariate, bivariate, and multivariate analysis. © Univariate analysis is the simplest and easiest form of data analysis where the data being analyzed contains only one variable. Example - Studying the heights of players in the NBA. Univariate analysis can be described using Central Tendency, Dispersion, Quartiles, Bar charts, Histograms, Pie charts, and Frequency distribution tables, 19. © The bivariate analysis involves the analysis of two variables to find causes, relationships, and correlations between the variables. Example — Analyzing the sale of ice creams based on the temperature outside. The bivariate analysis can be explained using Correlation coefficients, Linear regression, Logistic regression, Scatter plots, and Box plots. © Multivariate analysis involves the analysis of three or more variables to understand the relationship of each variable with the other Example — Analysing Revenue based on expenditure. Multivariate analysis can be performed using Multiple regression, Factor anal Classification _& regression trees, Cluster analysis, Principal component analysis, Dual- axis charts, etc. riables, What are the ethical considerations of data analysis? Some of the most ethical considerations of data analysis include: © Privacy: Safeguarding the privacy and confidentiality of individuals’ data, ensuring compliance with applicable privacy laws and regulations. © Informed Consent: Obtaining informed consent from individuals whose data is being analyzed, explaining the purpose and potential implications of the analysis. © Data Security: Implementing robust security measures to protect data from unauthorized access, breaches, or misuse. © Data Bias: Being mindful of potential biases in data collection, processing, or interpretation that may lead to unfair or discriminatory outcomes. © Transparency: Being transparent about the data analysis methodologies, algorithms, and models used, enables stakeholders to understand and assess the results. © Data Ownership and Rights: Respecting data ownership rights and intellectual property, using data only within the boundaries of legal permissions or agreements. © Accountability: Taking responsibility for the consequences of data analysis, ensuring that actions based on the analysis are fair, just, and beneficial to individuals and society. © Data Quality and Integrity: Ensuring the accuracy, completeness, and reliability of data used in the analysis to avoid misleading or incorrect conclusions © Social Impact: Considering the potential social impact of data analysis results, including potential unintended consequences or negative effects on marginalized groups. © Compliance: Adhering to legal and regulatory requirements related to data analysis, such as data protection laws, industry standards, and ethical guidelines. 20. Explain the concept of outlier detection and how you would identify outliers in a dataset? How do you treat outliers in a dataset? An outlier is a data point that is distant from other similar points. They may be due to variability in the measurement or may indicate experimental errors. Outlier detection is the process of identifying observations or data points that significantly deviate from the expected or normal behavior of a dataset. Outliers can be valuable sources of information or indications of anomalies, errors, or rare events. I's important to note that outlier detection is not a definitive process, and the identified outliers should be further investigated to determine their va or model. Outliers can be due to various reasons, including data entry errors, measurement errors, oF genuinely anomalous observations, and each case requires careful consideration and interpretation, idity and potential impact on the analysis The graph depicted below shows there are three outliers in the dataset oO °o To deal with outliers, one can use the following four methods: * Drop the outlier records © Cap your outliers’ data © Assign a new value . ‘Try a new transformation

You might also like