Signup/Sign In

Big Data Online Practice Test - 6

This Test will cover complete Big Data with very important questions, starting off from basics to advanced level.
Q. A developer was working on his project,of many PiG relations.It was needed,to merge the contents,of two relations at one occasion.Which is the best method to implement the same?
The relations given were as follows:
C=load '/demotravel/travel.tsv' as (id,name,currency);
D=load '/demotravel/currency.tsv' as (id,name,currencyconv);
Mark the correct option.
Q. Mark the syntax,which corresponds,to the following problem statements: 1) Stop MR job history server.
2) Start datanode individually.
3) Start resource manager.
Mark the correct option.
Q. Which of the following points are true,regarding the comparison of Hive framework and Impala:
1) Impala is ideal for interactive computing.Hive is not.
2) Impala requires more time,to run complex queries,while Hive takes less time.
3) Impala supports complex types,while, Hive does not.
4) The output of a query,is produced in Impala,even if a datanode goes down. It does not happens so in Hive.
Mark the correct option.
Q. In a quiz, the following question was placed:
Find the documents from Movies collection,where Genre is set,neither to 'Comedy',nor,set to 'Drama'.
Mark the syntax which corresponds to the problem statement.

Q. A fresher was learning the concept of Distributed cache from his peer. Which of the following statements mentioned by her peer are correct?
1) If some files,are needed, from all the datanodes,then those are placed in the distributed cache.
2) Files that can be cached are archives, read only text files and jar files.
3) Default size of the Hadoop distributed cache is 8Gb.But cache size can be set in the mapred-site.xml.
4) The cached files,cannot be changed, until the job is executing.
Choose the appropriate option.
Q. Write the phases of mapreduce in ascending order of their execution. The phases explained are as follows:
1) Data gets pre-processed, so that, data gets reduced.
2) Partitioner decides based on the Key Value pair, which data goes to which Reducer.
3) The Record Reader, reads records from the split.
4) Sorting of data,based on the values,rather than based on the keys, happens.
Mark the correct order.
Q. A group of leads,were discussing Spark framework. Mark which of the statements are invalid or valid:
1) Spark runs in a cloud system.But not as a standalone:
2) Spark can process data,from other file systems as well, apart from HDFS.
3) Spark does has a storage layer,but, it has extra advantages when it runs on Hadoop.
4) Since Spark is open source,just like Hadoop,it integrates well with Hadoop.
Choose the correct option.
Q. In an interview, freshers were asked to guess the terms,related to Cassandra architecture, based on their description:
1) It is an algorithm to test if a number is a member of a set.
2) It is a process of freeing up space,by merging,large collected data files.
3) Memory resident data structure.
4) Recovery mechanism in Cassandra.
Choose the appropriate words
Q. A developer was teaching his peer about Sqoop export. Which of the following statements are true/false:
1) In sqoop export, each record is converted into an Insert statement,which adds a row,to the target RDBMS table.
2) If we insert a duplicate primary key, sqoop export will fail.
3) Sqoop export is an atomic process.
4) Insufficient disk space, can cause sqoop export to fail.
Mark the correct option.

Q. A developer was told,to add security,in the Hadoop setup of her project.She did the following changes:
1) She configured the Hadoop web consoles, to use HTTP SPNEGO authentication.
2) Enabled SASL, by setting the hadoop.sasl.protection property in core-site.xml to true.
3) Configured Access Control Lists, for the Hadoop file permissions.
Mark the correct option.

Related Tests: