0% found this document useful (0 votes)
26 views3 pages

Big Data Analytics Suggestion

Uploaded by

paulbossaniket
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views3 pages

Big Data Analytics Suggestion

Uploaded by

paulbossaniket
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

1.

Let the membership function of two fuzzy set A and B are given by:
µA(x) = {(x1, 0.2), (x2, 0.3), (x3, 0.5), (x4, 0.6)}
µB(x) = {(y1, 0.8), (y2, 0.6), (y3, 0.3)}
Identify the membership function for A x B

2. Importance of Analysis of Variances (ANOVA) with examples.


3. Explain what is meant by seasonal fluctuations of a time series. A company manufactures
bicycles. Given the quarterly production figures of the company for the last 4 years, explain the
procedure to compute seasonal indices by the ‘link relatives’ method. Use link relatives method
to compute seasonal indices from the recorded production figures given below:

4. Neuron inhibition depends on activation function


5. Feed-forward neural network
6. Design a supervised machine learning pipeline, including data pre-processing, feature selection,
model selection, and performance evaluation, and what criteria would you use to critically
evaluate the effectiveness and ethical implications of your model's predictions?
7. Suppose you are tasked with solving a real-world problem using supervised machine
learning. How would you determine the most appropriate algorithm for the task,
considering factors such as dataset size, complexity, and the nature of the problem?
Additionally, how would you assess the model's performance and make decisions about
its deployment while considering ethical considerations?
8. Describe the way in which the algorithm for counting frequent item sets in a stream
maintain efficiency, considering the dynamic nature of streaming data.
9. K-Means Clustering Algorithm
10. Hierarchical Clustering Algorithm
11. Explain how frequent pattern-based clustering integrates the extraction of frequent item
sets into the clustering process.
12. Counting Distinct Elements in a Stream.
13. Real time Analytics Platform (RTAP) application in big data.
14. Imagine you are working with a continuous data stream from a sensor network, and you
need to implement a sampling strategy to reduce the data volume while maintaining the
representativeness of the original stream. How would you select an appropriate sampling
technique, considering factors like data frequency, storage constraints, and the need for
accurate trend analysis?
15. Bayesian Statistics and Bayes Theorem
16. Write and explain initialization, activation, computation of actual response adaptation of
weight vector and continuation operations of perceptron convergence theorem.
17. What kind of operations can be implemented with perceptron? Show that it cannot
implement Exclusive OR function.
18. Methane biofilters can be used to oxidize methane using biological activities. It has become
necessary to compare performance of two test columns, A and B. The methane outflow level at
the surface, in nondimensional units of X = {50, 100, 150, 200], was detected and is tabulated
below against the respective methane inflow into each test column. The following fuzzy sets
represent the test columns:

Calculate the union, intersection, and the difference for the test columns.
19. Mcculloch Pitts neuron model
20. Suppose there is a student that decides whether or not to go into campus on any given day based
on the weather, wake-up time and whether there is a seminar talk he is interested in attending.
There are data collected from 13 days.
21.

i) Build a decision tree based on these observations, using information gain concept.
ii) Show your work and the resulting tree.
22. Consider the two-dimensional patterns (2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8). Compute
the principal component using Principle Component Analysis (PCA) Algorithm.
23. Limited pass Algorithm
24. A database has five transactions. Let min sup = 50% and min conf=70% TID ITEMS
T100 Milk, Onion, Nuts, Kiwi, Egg, Yoghurt T200 Dhal, Onion, Nuts, Kiwi, Egg,
Yoghurt T200 Milk, Apple, Kiwi, Egg T300 Milk, Curd, Kiwi, Yoghurt T400 Curd,
Onion, Kiwi, Ice cream, Egg Find all frequent item sets using Apriori method
25. Explain the concept of streaming data and why traditional frequent itemset mining
techniques may not be directly applicable to streaming scenarios.
26. Mining frequent itemset
27. Linkage method in Hierarchical clustering algorithm
28. Discuss the challenges of handling concept drift in the context of counting frequent item
sets in a stream. Explore strategies for adapting to changes in item frequencies over time,
and analyze the impact of concept drift on the accuracy of frequency estimation. Provide
examples to illustrate how different algorithms cope with concept drift.
29. Discuss the impact of the choice of the initial centroids on the convergence and final
clustering results in K-means. Different strategies for initializing centroids, such as
random initialization, k-means++, and hierarchical initialization in K-Means clustering
algorithm
30. Impact of varying the minimum support threshold in frequent pattern-based clustering.
How does adjusting this threshold affect the granularity of the clusters and the discovery
of meaningful associations in the data?
31. Real-time Analytics Platform (RTAP) for a large e-commerce company
32. Compare and contrast the advantages and disadvantages of using rule-based sentiment
analysis techniques versus machine learning-based approaches
33. You are analyzing the performance of two stock market prediction algorithms. Algorithm A has
an accuracy rate of 60% in predicting stock price movements, while Algorithm B has an accuracy
rate of 55%. However, Algorithm B consistently outperforms Algorithm A in terms of returns on
investment. Explain the factors that could contribute to this discrepancy and evaluate the
effectiveness of these algorithms for making trading decisions
34. You are analyzing a dataset of stock prices over a year. Using a machine learning model,
you achieve an RMSE (Root Mean Square Error) of 0.010 for your stock price
predictions. Another model achieves an RMSE of 0.01. Compare and evaluate the
performance of these models.
35. Justify the applications of different statistical inferences in big data analytics.
36. Illustrate the various phases involved in Big Data Analytics with neat diagram
37. Distinguish between different formats of big data along with examples.
38. Explain the “Least Square” method for linear regression
39. What are the different vendor-specific distributions of Hadoop?
40. Why is Hadoop Distributed File System (HDFS) fault-tolerant?
41. Hadoop and spark Architecture with diagram
42. Explain principal component analysis with respect to big data.
43. Distinguish between Supervisory Learning and Un supervisory Learning in Artificial
Neural Network
44. Neuromorphic Computing
45. Fuzzy Set
46. Neural network
47. Data Analytics
48. Prediction Error

You might also like