Big Data Analytics Suggestion
Big Data Analytics Suggestion
Let the membership function of two fuzzy set A and B are given by:
µA(x) = {(x1, 0.2), (x2, 0.3), (x3, 0.5), (x4, 0.6)}
µB(x) = {(y1, 0.8), (y2, 0.6), (y3, 0.3)}
Identify the membership function for A x B
Calculate the union, intersection, and the difference for the test columns.
19. Mcculloch Pitts neuron model
20. Suppose there is a student that decides whether or not to go into campus on any given day based
on the weather, wake-up time and whether there is a seminar talk he is interested in attending.
There are data collected from 13 days.
21.
i) Build a decision tree based on these observations, using information gain concept.
ii) Show your work and the resulting tree.
22. Consider the two-dimensional patterns (2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8). Compute
the principal component using Principle Component Analysis (PCA) Algorithm.
23. Limited pass Algorithm
24. A database has five transactions. Let min sup = 50% and min conf=70% TID ITEMS
T100 Milk, Onion, Nuts, Kiwi, Egg, Yoghurt T200 Dhal, Onion, Nuts, Kiwi, Egg,
Yoghurt T200 Milk, Apple, Kiwi, Egg T300 Milk, Curd, Kiwi, Yoghurt T400 Curd,
Onion, Kiwi, Ice cream, Egg Find all frequent item sets using Apriori method
25. Explain the concept of streaming data and why traditional frequent itemset mining
techniques may not be directly applicable to streaming scenarios.
26. Mining frequent itemset
27. Linkage method in Hierarchical clustering algorithm
28. Discuss the challenges of handling concept drift in the context of counting frequent item
sets in a stream. Explore strategies for adapting to changes in item frequencies over time,
and analyze the impact of concept drift on the accuracy of frequency estimation. Provide
examples to illustrate how different algorithms cope with concept drift.
29. Discuss the impact of the choice of the initial centroids on the convergence and final
clustering results in K-means. Different strategies for initializing centroids, such as
random initialization, k-means++, and hierarchical initialization in K-Means clustering
algorithm
30. Impact of varying the minimum support threshold in frequent pattern-based clustering.
How does adjusting this threshold affect the granularity of the clusters and the discovery
of meaningful associations in the data?
31. Real-time Analytics Platform (RTAP) for a large e-commerce company
32. Compare and contrast the advantages and disadvantages of using rule-based sentiment
analysis techniques versus machine learning-based approaches
33. You are analyzing the performance of two stock market prediction algorithms. Algorithm A has
an accuracy rate of 60% in predicting stock price movements, while Algorithm B has an accuracy
rate of 55%. However, Algorithm B consistently outperforms Algorithm A in terms of returns on
investment. Explain the factors that could contribute to this discrepancy and evaluate the
effectiveness of these algorithms for making trading decisions
34. You are analyzing a dataset of stock prices over a year. Using a machine learning model,
you achieve an RMSE (Root Mean Square Error) of 0.010 for your stock price
predictions. Another model achieves an RMSE of 0.01. Compare and evaluate the
performance of these models.
35. Justify the applications of different statistical inferences in big data analytics.
36. Illustrate the various phases involved in Big Data Analytics with neat diagram
37. Distinguish between different formats of big data along with examples.
38. Explain the “Least Square” method for linear regression
39. What are the different vendor-specific distributions of Hadoop?
40. Why is Hadoop Distributed File System (HDFS) fault-tolerant?
41. Hadoop and spark Architecture with diagram
42. Explain principal component analysis with respect to big data.
43. Distinguish between Supervisory Learning and Un supervisory Learning in Artificial
Neural Network
44. Neuromorphic Computing
45. Fuzzy Set
46. Neural network
47. Data Analytics
48. Prediction Error