**Data analytics mcq questions and answers**

**1. Is it possible that Assignment of observations to clusters does not change between successive iterations in K-Means**

- Yes
- No
- Can’t say
- None of these

Yes

**2. Which of the following can act as possible termination conditions in K-Means?**

- For a fixed number of iterations.
- Assignment of observations to clusters does not change between iterations. Except for cases with a bad local minimum.
- Centroids do not change between successive iterations.
- Terminate when RSS falls below a threshold.
- All of the above

All of the above

**3. Which of the following clustering algorithms suffers from the problem of convergence at local optima?**

- K- Means clustering algorithm
- Agglomerative clustering algorithm
- Expectation-Maximization clustering algorithm
- Diverse clustering algorithm
- both a and c

both a and c

**4. How can Clustering (Unsupervised Learning) be used to improve the accuracy of Linear Regression model (Supervised Learning):**

- Creating different models for different cluster groups.
- Creating an input feature for cluster ids as an ordinal variable.
- Creating an input feature for cluster centroids as a continuous variable.
- Creating an input feature for cluster size as a continuous variable.
- All of the above

All of the above

**5. What could be the possible reason(s) for producing two different dendrograms using agglomerative clustering algorithm for the same dataset? because**

- Proximity function used
- of data points used
- of variables used
- All of the above

All of the above

**6. In which of the following cases will K-Means clustering fail to give good results?**

- Data points with outliers
- Data points with different densities
- Data points with round shapes
- Data points with non-convex shapes
- a, b and d

a, b and d

**7. Which of the following is/are valid iterative strategy for treating missing values before clustering analysis?**

- Imputation with mean
- Nearest Neighbor assignment
- computation with Expectation
- Maximization algorithm All of the above

computation with Expectation

**8. Feature scaling is an important step before applying K-Mean algorithm. What is reason behind this?**

- In distance calculation it will give the same weights for all features
- You always get the same clusters. If you use or don’t use feature scaling
- In Manhattan distance it is an important step but in Euclidian it is not
- None of these

In distance calculation it will give the same weights for all features

**9. Which of the following method is used for finding optimal of cluster in K-Mean algorithm?**

- Elbow method
- Manhattan method
- Euclidian mehthod
- All of the above

Elbow method

**10. What is true about K-Mean Clustering?**

- K-means is extremely sensitive to cluster center initializations
- Bad initialization can lead to Poor convergence speed
- Bad initialization can lead to bad overall clustering
- None of these

None of these

**Data Analytics mcq sppu**

**11. Which of the following can be applied to get good results for K-means algorithm corresponding to global minima?**

- Try to run algorithm for different centroid initialization
- Adjust number of iterations
- Find out the optimal number of clusters
- All of the above

All of the above

**12. If you are using Multinomial mixture models with the expectation-maximization algorithm for clustering a set of data points into two clusters, which of the assumptions are important:**

- All the data points follow two Gaussian distribution
- All the data points follow n Gaussian distribution (n >2)
- All the data points follow two multinomial distribution
- All the data points follow n multinomial distribution (n >2)

All the data points follow two multinomial distribution

**13. Which of the following is/are not true about Centroid based K-Means clustering algorithm and Distribution based expectation-maximization clustering algorithm:**

- Both starts with random initializations
- Both are iterative algorithms
- Both have strong assumptions that the data points must fulfill
- Expectation maximization algorithm is a special case of K-Means

Expectation maximization algorithm is a special case of K-Means

**14. Which of the following is/are not true about DBSCAN clustering algorithm:**

- For data points to be in a cluster, they must be in a distance threshold to a core point
- It has strong assumptions for the distribution of data points in dataspace
- It has substantially high time complexity of order O(n3)
- It does not require prior knowledge of the no. of desired clusters
- both b and c

both b and c

**15. Which of the following are the high and low bounds for the existence of F-Score?**

- [0,1]
- (0,1)
- [-1,1]
- None of the above

[0,1]

**16. All of the following increase the width of a confidence interval except:**

- Increased confidence level
- Increased variability
- Increased sample size
- Decreased sample size

Increased sample size

**17. The p-value in hypothesis testing represents which of the following: Please select the best answer of those provided below**

- The probability of failing to reject the null hypothesis, given the observed results
- The probability that the null hypothesis is true, given the observed results
- The probability that the observed results are statistically significant, given that the null hypothesis is true
- The probability of observing results as extreme or more extreme than currently observed, given that the null hypothesis is true

The probability of observing results as extreme or more extreme than currently observed, given that the null hypothesis is true

**18. Assume that the difference between the observed, paired sample values is defined in the same manner and that the specified significance level is the same for both hypothesis tests. Using the same data, the statement that “a paired/dependent two sample t-test is equivalent to a one sample t-test on the paired differences, resulting in the same test statistic, same p-value, and same conclusion” is: Please select the best answer of those provided below.**

- Always True
- Never True
- Sometimes True
- Not Enough Information

Always True

**19. Green sea turtles have normally distributed weights, measured in kilograms, with a mean of 134.5 and a variance of 49 0. A particular green sea turtle’s weight has a z-score of -2.4. What is the weight of this green sea turtle? Round to the nearest whole number.**

- 17 kg
- 151 kg
- 118 kg
- 252 kg c

118 kg

**Data analytics mcq with answers**

**20. What percentage of measurements in a dataset fall above the median?**

- 49%
- 50%
- 51%
- Cannot Be Determined

Cannot Be Determined

**21. The proportion of variation in 5k race times that can be explained by the variation in the age of competitive male runners was approximately 0.663. What is the value of the sample linear correlation coefficient? Round to 3 decimal places.**

- 0.663
- 0.814
- -0.814
- 0.440

-0.814

**22. Using all of the results provided, is it reasonable to predict the 5k race time (minutes) of a competitive male runner 73 years of age?”**

- Yes; linear correlation between age and 5k race times is statistically significant
- Yes; both the sample linear regression equation and an age in years is provided
- No; linear correlation between age and 5k race times is not statistically significant
- No; the age provided is beyond the scope of our available sample data” d

No; linear correlation between age and 5k race times is not statistically significant

**23. If an itemset is considered frequent, then any subset of the frequent itemset must also be frequent.**

- Apriori Property
- Downward Closure Property
- Either 1 or 2
- Both 1 and 2

Both 1 and 2

**24. Algorithm is**

- It uses machine-learning techniques. Here program can learn from past experience and adapt themselves to new situations
- Computational procedure that takes some value as input and produces some value as output
- Science of making machines performs tasks that would require intelligence when performed by humans
- None of these

Computational procedure that takes some value as input and produces some value as output

**25. Bias is**

- A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory
- Any mechanism employed by a learning system to constrain the search space of a hypothesis
- An approach to the design of learning algorithms that is inspired by the fact that when people encounter new situations, they often explain them by reference to familiar experiences, adapting the explanations to fit the new situation.
- None of these

Any mechanism employed by a learning system to constrain the search space of a hypothesis

**26. Classification is**

- A subdivision of a set of examples into a number of classes
- A measure of the accuracy, of the classification of a concept that is given by a certain theory
- The task of assigning a classification to a set of examples
- None of these

A subdivision of a set of examples into a number of classes

**27. Binary attribute are**

- This takes only two values. In general, these values will be 0 and 1 and .they can be coded as one bit
- The natural environment of a certain species
- Systems that can be used without knowledge of internal operations
- None of these

This takes only two values. In general, these values will be 0 and 1 and .they can be coded as one bit

**28. Cluster is**

- Group of similar objects that differ significantly from other objects
- Operations on a database to transform or simplify data in order to prepare it for a machine-learning algorithm
- Symbolic representation of facts or ideas from which information can potentially be extracted
- None of these

Group of similar objects that differ significantly from other objects

**29. A definition of a concept is ______ if it recognizes all the instances of that concept**

- Complete
- Consistent
- Constant
- None of these

Complete

**30. A definition oF a concept is _______ if it classifies any examples as coming within the concept**

- Complete
- Consistent
- Constant
- None of these

Consistent

**30. Data selection is**

- The actual discovery phase of a knowledge discovery process
- The stage of selecting the right data for a KDD process
- A subject-oriented integrated time variant non-volatile collection of data in support of management
- None of these

The stage of selecting the right data for a KDD process

**30. Classification task referred to**

- A subdivision of a set of examples into a number of classes
- A measure of the accuracy, of the classification of a concept that is given by a certain theory
- The task of assigning a classification to a set of examples
- None of these

The task of assigning a classification to a set of examples

**data analytics mcqs with answers**

**High performance computing mcq**

data analytics mcq, data analytics mcq pdf, data analytics mcq questions and answers, data analytics mcq with answers, data analytics mcq with answers pdf, data analytics mcqs, data analytics multiple choice questions, data analytics sppu mcq, data analytics mcq questions and answers, data analytics mcqs with answers, data analytics mcq with answers, data analytics mcq with answers pdf, data analytics mcq, data analytics mcq sppu, data analytics mcq with answers, big data analytics mcq, big data analytics mcq with answers