Big Data Online Practice Test - 2
This Test will cover complete Big Data with very important questions, starting off from basics to advanced level.
Q. In a quiz,the following code snippet,was given:
object MyClass {
def main(args: Array[String]) {
var m1=Map("Zootopia"->4,"Toy Story"->3);
print(m1.apply("Zootopia"));
print(m1.apply("Kung Fu Panda"))
}
}
Mark the valid option.
Q. A developer was told,to write the following MongoDb query:
Display PortName and NumberOfShips,from Ports collection,where _id is 4, but,_id should not be displayed,in the result.
Mark the correct syntax for the same
|
|
|
|
Q. In an interview,it was asked to mark,if the following statements are true or false:
1) Every Hadoop cluster,has only one Job Tracker daemon.
2) One slave node , has one Task Tracker.
3) One Hadoop cluster,can have,only one Namenode.
4) For a 314Mb file,where default block size is 64Mb with replication factor 3,the total number of blocks created are 5.
Choose the correct option.
|
|
|
|
Q. Below are a few features mentioned,of the Flume channels.
1) Data is lost if there is a power failure.
2) It is better for web server logs.
3) Events are stored in a persistent storage.
4) Events can be stored in a Kafka cluster.
Mark the names with their descriptions.
|
|
|
|
Q. Following are,descriptions,of building blocks of Kafka. Map them with their names:
1) Data is stored in them. They are a stream of messages.
2) They are publisher of messages.
3) They help in maintaining published data.
4) They handle all reads and writes,for given partition.
Mark the appropriate option.
|
|
|
|
Q. A developer,was told to change,the replication factor of a directory,from 3 to 4. He set the configuration, dfs.replication,in the hdfs-site.xml,to 4. Then,he created some new files,in the directory. Consider the following points,and,mark the correct statement:
|
|
|
|
Q.A group of leads,were involved in a discussion,about data locality in Hadoop:
Lead A- It is not always possible,to move algorithms close to the data. So,the data should be brought,closer to them. This minimizes network congestion.
Lead B- It is much more efficient,if the algorithms are brought closer to the data. Though,this decreases the throughput,it minimizes the network congestion problem.Inter-rack locality is most preferred scenario.
Lead C- Inter-rack scenario is least preferred scenario.Data local data locality,is most preferred,as data is on the same node,as the mapper.
Mark the correct option.
Q. I present the developers, a ready-to-use framework,which allows them,to perform data mining of massive amounts of data:
My algorithms are written,on top of Hadoop,so I work well in distributed environment.
I am mainly used,for creating many machine learning algorithms.
I consists of multiple,matrix and vector libraries.
Mark the framework.
Q. In a quiz,freshers were provided following Multiple Choice Questions related to Hive:
1) Bucketed tables are not stored as a file.
2) Sub-queries,are supported in Hive,only in _____ clause.
3) Custom types and functions can be defined in Hive.
4) HQL allows, downloading the contents of a table,to a local directory.
Mark the correct option.
|
|
|
|
Q. In an interview,a fresher was asked to mention a features of Spark transformations and actions.
She mixed them up in her answer:
1) They are evaluated on demand.
2) From existing RDDs , new RDDs are created.
3) To load data into original RDD, they trigger, a lineage graph.
4) They return the final values,of the RDD computation.
Mark which of them belong to Transformation(T) or Action(A).
|
|
|
|